[1]
Rank by Time or by Relevance?: Revisiting Email Search
Session 2B: Retrieval Algorithms
/
Carmel, David
/
Halawi, Guy
/
Lewin-Eytan, Liane
/
Maarek, Yoelle
/
Raviv, Ariel
Proceedings of the 2015 ACM Conference on Information and Knowledge
Management
2015-10-19
p.283-292
© Copyright 2015 ACM
Summary: With Web mail services offering larger and larger storage capacity, most
users do not feel the need to systematically delete messages anymore and
inboxes keep growing. It is quite surprising that in spite of the huge progress
of relevance ranking in Web Search, mail search results are still typically
ranked by date. This can probably be explained by the fact that users demand
perfect recall in order to "re-find" a previously seen message, and would not
trust relevance ranking. Yet mail search is still considered a difficult and
frustrating task, especially when trying to locate older messages. In this
paper, we study the current search traffic of Yahoo mail, a major Web
commercial mail service, and discuss the limitations of ranking search results
by date. We argue that this sort-by-date paradigm needs to be revisited in
order to account for the specific structure and nature of mail messages, as
well as the high-recall needs of users. We describe a two-phase ranking
approach, in which the first phase is geared towards maximizing recall and the
second phase follows a learning-to-rank approach that considers a rich set of
mail-specific features to maintain precision. We present our results obtained
on real mail search query traffic, for three different datasets, via manual as
well as automatic evaluation. We demonstrate that the default time-driven
ranking can be significantly improved in terms of both recall and precision, by
taking into consideration time recency and textual similarity to the query, as
well as mail-specific signals such as users' actions.
[2]
You Will Get Mail! Predicting the Arrival of Future Email
TempWeb 2015
/
Gamzu, Iftah
/
Karnin, Zohar
/
Maarek, Yoelle
/
Wajc, David
Companion Proceedings of the 2015 International Conference on the World Wide
Web
2015-05-18
v.2
p.1327-1332
© Copyright 2015 ACM
Summary: The majority of Web email is known to be generated by machines even when one
excludes spam. Many machine-generated email messages such as invoices or travel
itineraries are critical to users. Recent research studies establish that
causality relations between certain types of machine-generated email messages
exist and can be mined. These relations exhibit a link between a given message
to a past message that gave rise to its creation. For example, a shipment
notification message can often be linked to a past online purchase message.
Instead of studying how an incoming message can be linked to the past, we
propose here to focus on predicting future email arrival as implied by
causality relations. Such a prediction method has several potential
applications, ranging from improved ad targeting in up sell scenarios to
reducing false positives in spam detection.
We introduce a novel approach for predicting which types of
machine-generated email messages, represented by so-called "email templates", a
user should receive in future time windows. Our prediction approach relies on
(1) statistically inferring causality relations between email templates, (2)
building a generative model that explains the inbox of each user using those
causality relations, and (3) combining those results to predict which email
templates are likely to appear in future time frames. We present preliminary
experimental results and some data insights obtained by analyzing several
million inboxes of Yahoo Mail users, who voluntarily opted-in for such
research.
[3]
How Many Folders Do You Really Need?: Classifying Email into a Handful of
Categories
KM Session 10: Text Data Mining I
/
Grbovic, Mihajlo
/
Halawi, Guy
/
Karnin, Zohar
/
Maarek, Yoelle
Proceedings of the 2014 ACM Conference on Information and Knowledge
Management
2014-11-03
p.869-878
© Copyright 2014 ACM
Summary: Email classification is still a mostly manual task. Consequently, most Web
mail users never define a single folder. Recently however, automatic
classification offering the same categories to all users has started to appear
in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather
than previous (unsuccessful) personalized approaches because of the change in
the nature of consumer email traffic, which is now dominated by (non-spam)
machine-generated email. We propose here a novel approach for (1) automatically
distinguishing between personal and machine-generated email and (2) classifying
messages into latent categories, without requiring users to have defined any
folder. We report how we have discovered that a set of 6 "latent" categories
(one for human- and the others for machine-generated messages) can explain a
significant portion of email traffic. We describe in details the steps involved
in building a Web-scale email categorization system, from the collection of
ground-truth labels, the selection of features to the training of models.
Experimental evaluation was performed on more than 500 billion messages
received during a period of six months by users of Yahoo mail service, who
elected to be part of such research studies. Our system achieved precision and
recall rates close to 90% and the latent categories we discovered were shown to
cover 70% of both email traffic and email search queries. We believe that these
results pave the way for a change of approach in the Web mail industry, and
could support the invention of new large-scale email discovery paradigms that
had not been possible before.
[4]
When machines dominate humans: the challenges of mining and consuming
machine-generated web mail
WWW 2014 industry track
/
Maarek, Yoelle
Companion Proceedings of the 2014 International Conference on the World Wide
Web
2014-04-07
v.2
p.605-606
© Copyright 2014 ACM
Summary: In spite of personal communications moving more and more towards social and
mobile, especially with younger generations, email traffic continues to grow.
This growth is mostly attributed to (non-spam) machine-generated email, which,
against common perception, is often extremely valuable. Indeed, together with
monthly newsletters that can easily be ignored, inboxes contain flight
itineraries, booking confirmations, receipts or invoices that are critical to
many users. In this talk, I will discuss the new nature of consumer email,
which is dominated by machine-generated messages of highly heterogeneous forms
and value. I will show how the change has not been fully recognized yet by my
most email clients (as an example, why should there still be a reply option
associated with a message coming from a "do-not-reply@" address?). I will
introduce some approaches for large-scale mail mining specifically tailored to
machine-generated email. I will conclude by discussing possible applications
and research directions.
[5]
From query to question in one click: suggesting synthetic questions to
searchers
Research papers
/
Dror, Gideon
/
Maarek, Yoelle
/
Mejer, Avihai
/
Szpektor, Idan
Proceedings of the 2013 International Conference on the World Wide Web
2013-05-13
v.1
p.391-402
© Copyright 2013 ACM
Summary: In Web search, users may remain unsatisfied for several reasons: the search
engine may not be effective enough or the query might not reflect their intent.
Years of research focused on providing the best user experience for the data
available to the search engine. However, little has been done to address the
cases in which relevant content for the specific user need has not been posted
on the Web yet. One obvious solution is to directly ask other users to generate
the missing content using Community Question Answering services such as Yahoo!
Answers or Baidu Zhidao. However, formulating a full-fledged question after
having issued a query requires some effort. Some previous work proposed to
automatically generate natural language questions from a given query, but not
for scenarios in which a searcher is presented with a list of questions to
choose from. We propose here to generate synthetic questions that can actually
be clicked by the searcher so as to be directly posted as questions on a
Community Question Answering service. This imposes new constraints, as
questions will be actually shown to searchers, who will not appreciate an
awkward style or redundancy. To this end, we introduce a learning-based
approach that improves not only the relevance of the suggested questions to the
original query, but also their grammatical correctness. In addition, since
queries are often underspecified and ambiguous, we put a special emphasis on
increasing the diversity of suggestions via a novel diversification mechanism.
We conducted several experiments to evaluate our approach by comparing it to
prior work. The experiments show that our algorithm improves question quality
by 14% over prior work and that adding diversification reduced redundancy by
55%.
[6]
When relevance is not enough: promoting diversity and freshnessin
personalized question recommendation
Research papers
/
Szpektor, Idan
/
Maarek, Yoelle
/
Pelleg, Dan
Proceedings of the 2013 International Conference on the World Wide Web
2013-05-13
v.1
p.1249-1260
© Copyright 2013 ACM
Summary: What makes a good question recommendation system for community
question-answering sites? First, to maintain the health of the ecosystem, it
needs to be designed around answerers, rather than exclusively for askers.
Next, it needs to scale to many questions and users, and be fast enough to
route a newly-posted question to potential answerers within the few minutes
before the asker's patience runs out. It also needs to show each answerer
questions that are relevant to his or her interests. We have designed and built
such a system for Yahoo! Answers, but realized, when testing it with live
users, that it was not enough.
We found that those drawing-board requirements fail to capture user's
interests. The feature that they really missed was diversity. In other words,
showing them just the main topics they had previously expressed interest in was
simply too dull. Adding the spice of topics slightly outside the core of their
past activities significantly improved engagement. We conducted a large-scale
online experiment in production in Yahoo! Answers that showed that
recommendations driven by relevance alone perform worse than a control group
without question recommendations, which is the current behavior. However, an
algorithm promoting both diversity and freshness improved the number of answers
by 17%, daily session length by 10%, and had a significant positive impact on
peripheral activities such as voting.
[7]
When web search fails, searchers become askers: understanding the transition
Community QA
/
Liu, Qiaoling
/
Agichtein, Eugene
/
Dror, Gideon
/
Maarek, Yoelle
/
Szpektor, Idan
Proceedings of the 35th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2012-08-12
p.801-810
© Copyright 2012 ACM
Summary: While Web search has become increasingly effective over the last decade, for
many users' needs the required answers may be spread across many documents, or
may not exist on the Web at all. Yet, many of these needs could be addressed by
asking people via popular Community Question Answering (CQA) services, such as
Baidu Knows, Quora, or Yahoo! Answers. In this paper, we perform the first
large-scale analysis of how searchers become askers. For this, we study the
logs of a major web search engine to trace the transformation of a large number
of failed searches into questions posted on a popular CQA site. Specifically,
we analyze the characteristics of the queries, and of the patterns of search
behavior that precede posting a question; the relationship between the content
of the attempted queries and of the posted questions; and the subsequent
actions the user performs on the CQA site. Our work develops novel insights
into searcher intent and behavior that lead to asking questions to the
community, providing a foundation for more effective integration of automated
web search and social information seeking.
[8]
(Big) usage data in web search
Tutorial presentations
/
Baeza-Yates, Ricardo
/
Maarek, Yoelle
Proceedings of the 35th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2012-08-12
p.1181-1182
© Copyright 2012 ACM
[9]
Learning from the past: answering new questions with past answers
Leveraging user-generated content
/
Shtok, Anna
/
Dror, Gideon
/
Maarek, Yoelle
/
Szpektor, Idan
Proceedings of the 2012 International Conference on the World Wide Web
2012-04-16
v.1
p.759-768
© Copyright 2012 ACM
Summary: Community-based Question Answering sites, such as Yahoo! Answers or Baidu
Zhidao, allow users to get answers to complex, detailed and personal questions
from other users. However, since answering a question depends on the ability
and willingness of users to address the asker's needs, a significant fraction
of the questions remain unanswered. We measured that in Yahoo! Answers, this
fraction represents 15% of all incoming English questions. At the same time, we
discovered that around 25% of questions in certain categories are recurrent, at
least at the question-title level, over a period of one year.
We attempt to reduce the rate of unanswered questions in Yahoo! Answers by
reusing the large repository of past resolved questions, openly available on
the site. More specifically, we estimate the probability whether certain new
questions can be satisfactorily answered by a best answer from the past, using
a statistical model specifically trained for this task. We leverage concepts
and methods from query-performance prediction and natural language processing
in order to extract a wide range of features for our model. The key challenge
here is to achieve a level of quality similar to the one provided by the best
human answerers.
We evaluated our algorithm on offline data extracted from Yahoo! Answers,
but more interestingly, also on online data by using three "live" answering
robots that automatically provide past answers to new questions when a certain
degree of confidence is reached. We report the success rate of these robots in
three active Yahoo! Answers categories in terms of both accuracy, coverage and
askers' satisfaction. This work presents a first attempt, to the best of our
knowledge, of automatic question answering to questions of social nature, by
reusing past answers of high quality.
[10]
Predicting web searcher satisfaction with existing community-based answers
Communities
/
Liu, Qiaoling
/
Agichtein, Eugene
/
Dror, Gideon
/
Gabrilovich, Evgeniy
/
Maarek, Yoelle
/
Pelleg, Dan
/
Szpektor, Idan
Proceedings of the 34th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2011-07-25
p.415-424
© Copyright 2011 ACM
Summary: Community-based Question Answering (CQA) sites, such as Yahoo! Answers,
Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The
resulting archives of posted answers to questions, in Yahoo! Answers alone,
already exceed in size 1 billion, and are aggressively indexed by web search
engines. In fact, a large number of search engine users benefit from these
archives, by finding existing answers that address their own queries. This
scenario poses new challenges and opportunities for both search engines and CQA
sites. To this end, we formulate a new problem of predicting the satisfaction
of web searchers with CQA answers. We analyze a large number of web searches
that result in a visit to a popular CQA site, and identify unique
characteristics of searcher satisfaction in this setting, namely, the effects
of query clarity, query-to-question match, and answer quality. We then propose
and evaluate several approaches to predicting searcher satisfaction that
exploit these characteristics. To the best of our knowledge, this is the first
attempt to predict and validate the usefulness of CQA archives for external
searchers, rather than for the original askers. Our results suggest promising
directions for improving and exploiting community question answering services
in pursuit of satisfying even more Web search queries.
[11]
Web retrieval: the role of users
Tutorials
/
Baeza-Yates, Ricardo
/
Maarek, Yoelle
Proceedings of the 34th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2011-07-25
p.1303-1304
© Copyright 2011 ACM
Summary:
Web retrieval methods have evolved through three major steps in the last
decade or so. They started from standard document-centric IR in the early days
of the Web, then made a major step forward by leveraging the structure of the
Web, using link analysis techniques in both crawling and ranking challenges. A
more recent, no less important but maybe more discrete step forward, has been
to enter the user in this equation in two ways: (1) Implicitly, through the
analysis of usage data captured by query logs, and session and click
information in general; the goal here being to improve ranking as well as to
measure user's happiness and engagement; (2) Explicitly, by offering novel
interactive features; the goal here being to better answer users' needs. This
half day tutorial covers the user-related challenges associated with the
implicit and explicit role of users in Web retrieval. More specifically, we
review and discuss challenges associated with:
- Usage data analysis and metrics -- It is critical to monitor how users take advantage and interact with Web retrieval systems, as this implicit relevant feedback aggregated at a large scale, can provide insights on users' underlying intent as well as approximate quite accurately the level of success of a given feature. Here we have to consider not only clicks statistics, the sequences of queries, the time spent in a page, the number of actions per session, etc. This is the focus of the first part of the tutorial.
- User interaction -- Given the intrinsic problems posed by the Web, the key challenge for the user is to conceive a good query to be submitted to the search system, one that leads to a manageable and relevant answer. The retrieval system must assist users during two key stages of interaction: before the query is fully expressed and after the results are returned. After quite some stagnation on the front-end of Web retrieval, we have seen numerous novel interactive features appear in the last 3 to 4 years, as the leading commercial search engines seem to compete for users' attention. The second part of the tutorial will be dedicated to explicit user interaction. We will introduce novel material (as compared to previous versions of this tutorial that were given at SIGIR'2010, WSDM'2011 and ECIR'2011) in order to reflect recent Web search features such as Google Instant or Yahoo! Direct Search.
The goal of this tutorial is to teach the key principles and technologies
behind the activities and challenges briefly outlined above, bring new
understanding and insights to the attendees, and hopefully foster future
research. A previous version of this tutorial was offered at the ACM
SIGIR'2010, WSDM'2011 and ECIR'2011.
[12]
Improving recommendation for long-tail queries via templates
Recommendation
/
Szpektor, Idan
/
Gionis, Aristides
/
Maarek, Yoelle
Proceedings of the 2011 International Conference on the World Wide Web
2011-03-28
v.1
p.47-56
© Copyright 2011 ACM
Summary: The ability to aggregate huge volumes of queries over a large population of
users allows search engines to build precise models for a variety of
query-assistance features such as query recommendation, correction, etc. Yet,
no matter how much data is aggregated, the long-tail distribution implies that
a large fraction of queries are rare. As a result, most query assistance
services perform poorly or are not even triggered on long-tail queries. We
propose a method to extend the reach of query assistance techniques (and in
particular query recommendation) to long-tail queries by reasoning about rules
between query templates rather than individual query transitions, as currently
done in query-flow graph models. As a simple example, if we recognize that
'Montezuma' is a city in the rare query "Montezuma surf" and if the rule 'city
surf → beach has been observed, we are able to offer "Montezuma beach" as
a recommendation, even if the two queries were never observed in a same
session. We conducted experiments to validate our hypothesis, first via
traditional small-scale editorial assessments but more interestingly via a
novel automated large scale evaluation methodology. Our experiments show that
general coverage can be relatively increased by 24% using templates without
penalizing quality. Furthermore, for 36% of the 95M queries in our query flow
graph, which have no out edges and thus could not be served recommendations, we
can now offer at least one recommendation in 98% of the cases.
[13]
Web retrieval: the role of users
Tutorials
/
Baeza-Yates, Ricardo
/
Maarek, Yoelle
Proceedings of the 33rd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2010-07-19
p.907
Keywords: user interaction, web retrieval
© Copyright 2010 ACM
Summary: Web retrieval methods have evolved through three major steps in the last
decade or so. They started from standard document-centric IR in the early days
of the Web, then made a major step forward by leveraging the structure of the
Web, using link analysis techniques in both crawling and ranking challenges. A
more recent, no less important but maybe more discrete step forward, has been
to enter the user in this equation in two ways: (1) implicitly, through the
analysis of usage data captured by query logs, and session and click
information in general, the goal being to improve ranking as well as to measure
user's happiness and engagement; (2) explicitly, by offering novel interactive
features; the goal here being to better answer users' needs. In this tutorial,
we will cover the user-related challenges associated with the implicit and
explicit role of users in Web retrieval. We will review and discuss challenges
associated with two types of activities, namely: usage data analysis and
metrics and user interaction. The goal of this tutorial is to teach the key
principles and technologies behind the activities and challenges briefly
outlined above, bring new understanding and insights to the attendees, and
hopefully foster future research.
[14]
Do you want to take notes?: identifying research missions in Yahoo! search
pad
Full papers
/
Donato, Debora
/
Bonchi, Francesco
/
Chi, Tom
/
Maarek, Yoelle
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.321-330
Keywords: persistent search, query logs, sessions, task-oriented search
© Copyright 2010 ACM
Summary: Addressing user's information needs has been one of the main goals of Web
search engines since their early days. In some cases, users cannot see their
needs immediately answered by search results, simply because these needs are
too complex and involve multiple aspects that are not covered by a single Web
or search results page. This typically happens when users investigate a certain
topic in domains such as education, travel or health, which often require
collecting facts and information from many pages. We refer to this type of
activities as "research missions". These research missions account for 10% of
users' sessions and more than 25% of all query volume, as verified by a manual
analysis that was conducted by Yahoo! editors.
We demonstrate in this paper that such missions can be automatically
identified on-the-fly, as the user interacts with the search engine, through
careful runtime analysis of query flows and query sessions.
The on-the-fly automatic identification of research missions has been
implemented in Search Pad, a novel Yahoo! application that was launched in
2009, and that we present in this paper. Search Pad helps users keeping trace
of results they have consulted. Its novelty however is that unlike previous
notes taking products, it is automatically triggered only when the system
decides, with a fair level of confidence, that the user is undertaking a
research mission and thus is in the right context for gathering notes. Beyond
the Search Pad specific application, we believe that changing the level of
granularity of query modeling, from an isolated query to a list of queries
pertaining to the same research missions, so as to better reflect a certain
type of information needs, can be beneficial in a number of other Web search
applications. Session-awareness is growing and it is likely to play, in the
near future, a fundamental role in many on-line tasks: this paper presents a
first step on this path.
[15]
Current trends in the integration of searching and browsing
Panels
/
Broder, Andrei Z.
/
Maarek, Yoelle S.
/
Bharat, Krishna
/
Dumais, Susan
/
Papa, Steve
/
Pedersen, Jan
/
Raghavan, Prabhakar
Proceedings of the 2005 International Conference on the World Wide Web
2005-05-10
v.2
p.793
© Copyright 2005 International World Wide Web Conference Committee (IW3C2)
Summary: Searching and browsing are the two basic information discovery paradigms,
since the early days of the Web. After more than ten years down the road, three
schools seem to have emerged: (1) The search-centric school argues that guided
navigation is superfluous since free form search has become so good and the
search UI so common, that users can satisfy all their needs via simple queries
(2) The taxonomy navigation school claims that users have difficulties
expressing informational needs and (3) The meta-data centric school advocates
the use of meta-data for narrowing large sets of results, and is successful in
e-commerce where it is known as "multi faceted search". This panel brings
together experts and advocates for all three schools, who will discuss these
approaches and share their experiences in the field. We will ask the audience
to challenge our experts with real information architecture problems.
[16]
Searching XML documents via XML fragments
Structured documents
/
Carmel, David
/
Maarek, Yoelle S.
/
Mandelbrod, Matan
/
Mass, Yosi
/
Soffer, Aya
Proceedings of the 26th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2003-07-28
p.151-158
Summary: Most of the work on XML query and search has stemmed from the publishing and
database communities, mostly for the needs of business applications. Recently,
the Information Retrieval community began investigating the XML search issue to
answer information discovery needs. Following this trend, we present here an
approach where information needs can be expressed in an approximate manner as
pieces of XML documents or "XML fragments" of the same nature as the documents
that are being searched. We present an extension of the vector space model for
searching XML collections via XML fragments and ranking results by relevance.
We describe how we have extended a full-text search engine to comply with this
model. The value of the proposed method is demonstrated by the relative high
precision of our system, which was among the top performers in the recent INEX
workshop. Our results indicate that certain queries are more appropriate than
others for the extended vector space model. Specifically, queries with
relatively specific contexts but vague information needs are best situated to
reap the benefit of this model. Finally our results show that one method may
not fit all types of queries and that it could be worthwhile to use different
solutions for different applications.
[17]
Personalized pocket directories for mobile devices
Mobility and Wireless Access
/
Cohen, Doron
/
Herscovici, Michael
/
Petruschka, Yael
/
Maarek, Yoëlle S.
/
Soffer, Aya
Proceedings of the 2002 International Conference on the World Wide Web
2002-05-07
p.627-638
Keywords: hierarchical browsers, mobile devices, mobile search, personalization
© Copyright 2002 Authors
Summary: In spite of the increase in the availability of mobile devices in the last
few years, Web information is not yet as accessible from PDAs or WAP phones as
it is from the desktop. In this paper, we propose a solution for supporting one
of the most popular information discovery mechanisms, namely Web directory
navigation, from mobile devices. Our proposed solution consists of caching
enough information on the device itself in order to conduct most of the
navigation actions locally (with subsecond response time) while intermittently
communicating with the server to receive updates and additional data requested
by the user. The cached information is captured in a "directory capsule". The
directory capsule represents only the portion of the directory that is of
interest to the user in a given context and is sufficiently rich and consistent
to support the information needs of the user in disconnected mode. We define a
novel subscription model specifically geared for Web directories and for the
special needs of PDAs. This subscription model enables users to specify the
parts of the directory that are of interest to them as well as the preferred
granularity. We describe a mechanism for keeping the directory capsule in sync
over time with the Web directory and user subscription requests. Finally, we
present the Pocket Directory Browser for Palm powered computers that we have
developed. The pocket directory can be used to define, view and manipulate the
capsules that are stored on the Palm. We provide several usage examples of our
system on the Open Directory Project, one of the largest and most popular Web
directories.
[18]
Livemaps for collection awareness
/
Cohen, Doron
/
Jacovi, Michal
/
Maarek, Yoelle S.
/
Soroka, Vladimir
International Journal of Human-Computer Studies
2002
v.56
n.1
p.7-23
© Copyright 2002 Elsevier Science Publishers
Summary: With the increasing proliferation of chat applications on the web, the old
vision of "adding people" to the web is becoming a reality. Along with
collaboration tools, more and more sites offer people awareness mechanisms to
let the site visitors know about each other. This reflects the dual nature of
the web as a place for virtual meetings as well as an information repository.
While standalone chat tools became the killer application of the Internet,
site-related awareness applications did not quite catch on. In this work, we
suggest possible reasons for this phenomenon and propose a new paradigm for
awareness and social navigation. We identify three main obstacles to the
existing site-related awareness applications: high sensitivity to the "critical
mass" requirement, inflexible meeting place granularity and poor visitor
visibility. To address these issues, we extend the well-known "document
awareness" concept to a more general one that we call "collection awareness",
which better reflects the graph structure of the web. We introduce a new tool
for high-level awareness and collaboration, called Livemaps, which projects
live information onto a web site map. We demonstrate how Livemaps addresses the
obstacles we pointed out and describe a user study conducted on a "fan" web
site for the "Friends" comedy series, so as to verify whether Livemaps actually
improves social awareness.
[19]
Knowledge encapsulation for focused search from pervasive devices
/
Aridor, Yariv
/
Carmel, David
/
Maarek, Yoelle S.
/
Soffer, Aya
/
Lempel, Ronny
ACM Transactions on Information Systems
2002
v.20
n.1
p.25-46
Keywords: Focused searches, disconnected search, knowledge agents, pervasive devices
© Copyright 2002 ACM
Summary: Mobile knowledge seekers often need access to information on the Web during
a meeting or on the road, while away from their desktop. A common practice
today is to use pervasive devices such as Personal Digital Assistants or mobile
phones. However, these devices have inherent constraints (e.g., slow
communication, form factor) which often make information discovery tasks
impractical.In this paper, we present a new focused-search approach
specifically oriented for the mode of work and the constraints dictated by
pervasive devices. It combines focused search within specific topics with
encapsulation of topic-specific information in a persistent repository. One key
characteristic of these persistent repositories is that their footprint is
small enough to fit on local devices, and yet they are rich enough to support
many information discovery tasks in disconnected mode. More specifically, we
suggest a representation for topic-specific information based on
"knowledge-agent bases" that comprise all the information necessary to access
information about a topic (under the form of key concepts and key Web pages)
and assist in the full search process from query formulation assistance to
result scanning on the device itself. The key contribution of our work is the
coupling of focused search with encapsulated knowledge representation making
information discovery from pervasive devices practical as well as efficient. We
describe our model in detail and demonstrate its aspects through sample
scenarios.
[20]
Static index pruning for information retrieval systems
/
Carmel, David
/
Cohen, Doron
/
Fagin, Ronald
/
Farchi, Eitan
/
Herscovici, Michael
/
Maarek, Yoelle S.
/
Soffer, Aya
Proceedings of the 24th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2001-09-09
p.43-50
Summary: We introduce static index pruning methods that significantly reduce the
index size in information retrieval systems.
We investigate uniform and term-based methods that each remove selected
entries from the index and yet have only a minor effect on retrieval results.
In uniform pruning, there is a fixed cutoff threshold, and all index entries
whose contribution to relevance scores is bounded above by a given threshold
are removed from the index. In term-based pruning, the cutoff threshold is
determined for each term, and thus may vary from term to term. We give
experimental evidence that for each level of compression, term-based pruning
outperforms uniform pruning, under various measures of precision. We present
theoretical and experimental evidence that under our term-based pruning scheme,
it is possible to prune the index greatly and still get retrieval results that
are almost as good as those based on the full index.
[21]
Knowledge encapsulation for focused search from pervasive devices
/
Aridor, Yariv
/
Carmel, David
/
Maarek, Yoëlle S.
/
Soffer, Aya
/
Lempel, Ronny
Proceedings of the 2001 International Conference on the World Wide Web
2001-05-01
p.754-764
Keywords: disconnected search, focused search, knowledge agents, pervasive devices
© Copyright 2001 Authors
[22]
Incremental Maintenance of Semantic Links in Dynamically Changing Hypertext
Systems
Articles
/
Kaplan, Simon M.
/
Maarek, Yoelle S.
Interacting with Computers
1990
v.2
n.3
p.337-366
Keywords: Incremental maintenance, Semantic links, Hypertext systems
© Copyright 1990 Butterworth-Heinemann Ltd.
Summary: One purported advantage of hypertext systems is the ability to move between
semantically related parts of a document (or family of documents). If the
document is undergoing frequent modification (for example while an author is
writing a book or while a software design stored in the hypertext system is
evolving) the question arises as to how to incrementally maintain semantic
interconnections in the face of the modifications.
The paper presents an optimal technique for the incremental maintenance of
such interconnections as a document evolves. The technique, based on theories
of information retrieval based on lexical affinities and theories of
incremental computation, updates semantic interconnections as nodes are checked
into the hypertext system (either new or as a result of an edit). Because we
use the semantic weight of lexical affinities to determine which affinities are
meaningful in the global context of the document, introducing a new affinity or
changing the weight of an existing affinity can potentially have an effect on
any node in the system. The challenge met by our algorithm is to guarantee
that despite this potentially arbitrary impact, we still update link
information optimally.
Once established the semantic interconnections are used to allow the user to
move from node to node based not on rigid connections but instead on
dynamically determined semantic interrelationships among the nodes.
[23]
Full Text Indexing Based on Lexical Relations An Application: Software
Libraries
Representation
/
Maarek, Yoelle S.
/
Smadja, Frank A.
Proceedings of the Twelfth Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
1989-06-25
p.198-206
Keywords: Automatic indexing, Software libraries, software reuse, Lexical relations,
Natural language processing, Co-occurrence knowledge
© Copyright 1989 Association for Computing Machinery
Summary: In contrast to other kinds of libraries, software libraries need to be
conceptually organized. When looking for a component, the main concern of
users is the functionality of the desired component; implementation details are
secondary. Software reuse would be enhanced with conceptually organized large
libraries of software components. In this paper, we present GURU, a tool that
allows automatical building of such large software libraries from documented
software components. We focus here on GURU's indexing component which extracts
conceptual attributes from natural language documentation. This indexing
method is based on words' co-occurrences. It first uses EXTRACT, a
co-occurrence knowledge compiler for extracting potential attributes from
textual documents. Conceptually relevant collocations are then selected
according to their resolving power, which scales down the noise due to context
words. This fully automated indexing tool thus goes further than keyword-based
tools in the understanding of a document without the brittleness of
knowledge-based tools. The indexing component of GURU is fully implemented,
and some results are given in the paper.