HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,646,475
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Maarek_Y* Results: 23 Sorted by: Date  Comments?
Help Dates
Limit:   
[1] Rank by Time or by Relevance?: Revisiting Email Search Session 2B: Retrieval Algorithms / Carmel, David / Halawi, Guy / Lewin-Eytan, Liane / Maarek, Yoelle / Raviv, Ariel Proceedings of the 2015 ACM Conference on Information and Knowledge Management 2015-10-19 p.283-292
ACM Digital Library Link
Summary: With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in Web Search, mail search results are still typically ranked by date. This can probably be explained by the fact that users demand perfect recall in order to "re-find" a previously seen message, and would not trust relevance ranking. Yet mail search is still considered a difficult and frustrating task, especially when trying to locate older messages. In this paper, we study the current search traffic of Yahoo mail, a major Web commercial mail service, and discuss the limitations of ranking search results by date. We argue that this sort-by-date paradigm needs to be revisited in order to account for the specific structure and nature of mail messages, as well as the high-recall needs of users. We describe a two-phase ranking approach, in which the first phase is geared towards maximizing recall and the second phase follows a learning-to-rank approach that considers a rich set of mail-specific features to maintain precision. We present our results obtained on real mail search query traffic, for three different datasets, via manual as well as automatic evaluation. We demonstrate that the default time-driven ranking can be significantly improved in terms of both recall and precision, by taking into consideration time recency and textual similarity to the query, as well as mail-specific signals such as users' actions.

[2] You Will Get Mail! Predicting the Arrival of Future Email TempWeb 2015 / Gamzu, Iftah / Karnin, Zohar / Maarek, Yoelle / Wajc, David Companion Proceedings of the 2015 International Conference on the World Wide Web 2015-05-18 v.2 p.1327-1332
ACM Digital Library Link
Summary: The majority of Web email is known to be generated by machines even when one excludes spam. Many machine-generated email messages such as invoices or travel itineraries are critical to users. Recent research studies establish that causality relations between certain types of machine-generated email messages exist and can be mined. These relations exhibit a link between a given message to a past message that gave rise to its creation. For example, a shipment notification message can often be linked to a past online purchase message. Instead of studying how an incoming message can be linked to the past, we propose here to focus on predicting future email arrival as implied by causality relations. Such a prediction method has several potential applications, ranging from improved ad targeting in up sell scenarios to reducing false positives in spam detection.
    We introduce a novel approach for predicting which types of machine-generated email messages, represented by so-called "email templates", a user should receive in future time windows. Our prediction approach relies on (1) statistically inferring causality relations between email templates, (2) building a generative model that explains the inbox of each user using those causality relations, and (3) combining those results to predict which email templates are likely to appear in future time frames. We present preliminary experimental results and some data insights obtained by analyzing several million inboxes of Yahoo Mail users, who voluntarily opted-in for such research.

[3] How Many Folders Do You Really Need?: Classifying Email into a Handful of Categories KM Session 10: Text Data Mining I / Grbovic, Mihajlo / Halawi, Guy / Karnin, Zohar / Maarek, Yoelle Proceedings of the 2014 ACM Conference on Information and Knowledge Management 2014-11-03 p.869-878
ACM Digital Library Link
Summary: Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approaches because of the change in the nature of consumer email traffic, which is now dominated by (non-spam) machine-generated email. We propose here a novel approach for (1) automatically distinguishing between personal and machine-generated email and (2) classifying messages into latent categories, without requiring users to have defined any folder. We report how we have discovered that a set of 6 "latent" categories (one for human- and the others for machine-generated messages) can explain a significant portion of email traffic. We describe in details the steps involved in building a Web-scale email categorization system, from the collection of ground-truth labels, the selection of features to the training of models. Experimental evaluation was performed on more than 500 billion messages received during a period of six months by users of Yahoo mail service, who elected to be part of such research studies. Our system achieved precision and recall rates close to 90% and the latent categories we discovered were shown to cover 70% of both email traffic and email search queries. We believe that these results pave the way for a change of approach in the Web mail industry, and could support the invention of new large-scale email discovery paradigms that had not been possible before.

[4] When machines dominate humans: the challenges of mining and consuming machine-generated web mail WWW 2014 industry track / Maarek, Yoelle Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.605-606
ACM Digital Library Link
Summary: In spite of personal communications moving more and more towards social and mobile, especially with younger generations, email traffic continues to grow. This growth is mostly attributed to (non-spam) machine-generated email, which, against common perception, is often extremely valuable. Indeed, together with monthly newsletters that can easily be ignored, inboxes contain flight itineraries, booking confirmations, receipts or invoices that are critical to many users. In this talk, I will discuss the new nature of consumer email, which is dominated by machine-generated messages of highly heterogeneous forms and value. I will show how the change has not been fully recognized yet by my most email clients (as an example, why should there still be a reply option associated with a message coming from a "do-not-reply@" address?). I will introduce some approaches for large-scale mail mining specifically tailored to machine-generated email. I will conclude by discussing possible applications and research directions.

[5] From query to question in one click: suggesting synthetic questions to searchers Research papers / Dror, Gideon / Maarek, Yoelle / Mejer, Avihai / Szpektor, Idan Proceedings of the 2013 International Conference on the World Wide Web 2013-05-13 v.1 p.391-402
ACM Digital Library Link
Summary: In Web search, users may remain unsatisfied for several reasons: the search engine may not be effective enough or the query might not reflect their intent. Years of research focused on providing the best user experience for the data available to the search engine. However, little has been done to address the cases in which relevant content for the specific user need has not been posted on the Web yet. One obvious solution is to directly ask other users to generate the missing content using Community Question Answering services such as Yahoo! Answers or Baidu Zhidao. However, formulating a full-fledged question after having issued a query requires some effort. Some previous work proposed to automatically generate natural language questions from a given query, but not for scenarios in which a searcher is presented with a list of questions to choose from. We propose here to generate synthetic questions that can actually be clicked by the searcher so as to be directly posted as questions on a Community Question Answering service. This imposes new constraints, as questions will be actually shown to searchers, who will not appreciate an awkward style or redundancy. To this end, we introduce a learning-based approach that improves not only the relevance of the suggested questions to the original query, but also their grammatical correctness. In addition, since queries are often underspecified and ambiguous, we put a special emphasis on increasing the diversity of suggestions via a novel diversification mechanism. We conducted several experiments to evaluate our approach by comparing it to prior work. The experiments show that our algorithm improves question quality by 14% over prior work and that adding diversification reduced redundancy by 55%.

[6] When relevance is not enough: promoting diversity and freshnessin personalized question recommendation Research papers / Szpektor, Idan / Maarek, Yoelle / Pelleg, Dan Proceedings of the 2013 International Conference on the World Wide Web 2013-05-13 v.1 p.1249-1260
ACM Digital Library Link
Summary: What makes a good question recommendation system for community question-answering sites? First, to maintain the health of the ecosystem, it needs to be designed around answerers, rather than exclusively for askers. Next, it needs to scale to many questions and users, and be fast enough to route a newly-posted question to potential answerers within the few minutes before the asker's patience runs out. It also needs to show each answerer questions that are relevant to his or her interests. We have designed and built such a system for Yahoo! Answers, but realized, when testing it with live users, that it was not enough.
    We found that those drawing-board requirements fail to capture user's interests. The feature that they really missed was diversity. In other words, showing them just the main topics they had previously expressed interest in was simply too dull. Adding the spice of topics slightly outside the core of their past activities significantly improved engagement. We conducted a large-scale online experiment in production in Yahoo! Answers that showed that recommendations driven by relevance alone perform worse than a control group without question recommendations, which is the current behavior. However, an algorithm promoting both diversity and freshness improved the number of answers by 17%, daily session length by 10%, and had a significant positive impact on peripheral activities such as voting.

[7] When web search fails, searchers become askers: understanding the transition Community QA / Liu, Qiaoling / Agichtein, Eugene / Dror, Gideon / Maarek, Yoelle / Szpektor, Idan Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12 p.801-810
ACM Digital Library Link
Summary: While Web search has become increasingly effective over the last decade, for many users' needs the required answers may be spread across many documents, or may not exist on the Web at all. Yet, many of these needs could be addressed by asking people via popular Community Question Answering (CQA) services, such as Baidu Knows, Quora, or Yahoo! Answers. In this paper, we perform the first large-scale analysis of how searchers become askers. For this, we study the logs of a major web search engine to trace the transformation of a large number of failed searches into questions posted on a popular CQA site. Specifically, we analyze the characteristics of the queries, and of the patterns of search behavior that precede posting a question; the relationship between the content of the attempted queries and of the posted questions; and the subsequent actions the user performs on the CQA site. Our work develops novel insights into searcher intent and behavior that lead to asking questions to the community, providing a foundation for more effective integration of automated web search and social information seeking.

[8] (Big) usage data in web search Tutorial presentations / Baeza-Yates, Ricardo / Maarek, Yoelle Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12 p.1181-1182
ACM Digital Library Link

[9] Learning from the past: answering new questions with past answers Leveraging user-generated content / Shtok, Anna / Dror, Gideon / Maarek, Yoelle / Szpektor, Idan Proceedings of the 2012 International Conference on the World Wide Web 2012-04-16 v.1 p.759-768
ACM Digital Library Link
Summary: Community-based Question Answering sites, such as Yahoo! Answers or Baidu Zhidao, allow users to get answers to complex, detailed and personal questions from other users. However, since answering a question depends on the ability and willingness of users to address the asker's needs, a significant fraction of the questions remain unanswered. We measured that in Yahoo! Answers, this fraction represents 15% of all incoming English questions. At the same time, we discovered that around 25% of questions in certain categories are recurrent, at least at the question-title level, over a period of one year.
    We attempt to reduce the rate of unanswered questions in Yahoo! Answers by reusing the large repository of past resolved questions, openly available on the site. More specifically, we estimate the probability whether certain new questions can be satisfactorily answered by a best answer from the past, using a statistical model specifically trained for this task. We leverage concepts and methods from query-performance prediction and natural language processing in order to extract a wide range of features for our model. The key challenge here is to achieve a level of quality similar to the one provided by the best human answerers.
    We evaluated our algorithm on offline data extracted from Yahoo! Answers, but more interestingly, also on online data by using three "live" answering robots that automatically provide past answers to new questions when a certain degree of confidence is reached. We report the success rate of these robots in three active Yahoo! Answers categories in terms of both accuracy, coverage and askers' satisfaction. This work presents a first attempt, to the best of our knowledge, of automatic question answering to questions of social nature, by reusing past answers of high quality.

[10] Predicting web searcher satisfaction with existing community-based answers Communities / Liu, Qiaoling / Agichtein, Eugene / Dror, Gideon / Gabrilovich, Evgeniy / Maarek, Yoelle / Pelleg, Dan / Szpektor, Idan Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2011-07-25 p.415-424
ACM Digital Library Link
Summary: Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 billion, and are aggressively indexed by web search engines. In fact, a large number of search engine users benefit from these archives, by finding existing answers that address their own queries. This scenario poses new challenges and opportunities for both search engines and CQA sites. To this end, we formulate a new problem of predicting the satisfaction of web searchers with CQA answers. We analyze a large number of web searches that result in a visit to a popular CQA site, and identify unique characteristics of searcher satisfaction in this setting, namely, the effects of query clarity, query-to-question match, and answer quality. We then propose and evaluate several approaches to predicting searcher satisfaction that exploit these characteristics. To the best of our knowledge, this is the first attempt to predict and validate the usefulness of CQA archives for external searchers, rather than for the original askers. Our results suggest promising directions for improving and exploiting community question answering services in pursuit of satisfying even more Web search queries.

[11] Web retrieval: the role of users Tutorials / Baeza-Yates, Ricardo / Maarek, Yoelle Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2011-07-25 p.1303-1304
ACM Digital Library Link
Summary: Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways: (1) Implicitly, through the analysis of usage data captured by query logs, and session and click information in general; the goal here being to improve ranking as well as to measure user's happiness and engagement; (2) Explicitly, by offering novel interactive features; the goal here being to better answer users' needs. This half day tutorial covers the user-related challenges associated with the implicit and explicit role of users in Web retrieval. More specifically, we review and discuss challenges associated with:
  • Usage data analysis and metrics -- It is critical to monitor how users take advantage and interact with Web retrieval systems, as this implicit relevant feedback aggregated at a large scale, can provide insights on users' underlying intent as well as approximate quite accurately the level of success of a given feature. Here we have to consider not only clicks statistics, the sequences of queries, the time spent in a page, the number of actions per session, etc. This is the focus of the first part of the tutorial.
  • User interaction -- Given the intrinsic problems posed by the Web, the key challenge for the user is to conceive a good query to be submitted to the search system, one that leads to a manageable and relevant answer. The retrieval system must assist users during two key stages of interaction: before the query is fully expressed and after the results are returned. After quite some stagnation on the front-end of Web retrieval, we have seen numerous novel interactive features appear in the last 3 to 4 years, as the leading commercial search engines seem to compete for users' attention. The second part of the tutorial will be dedicated to explicit user interaction. We will introduce novel material (as compared to previous versions of this tutorial that were given at SIGIR'2010, WSDM'2011 and ECIR'2011) in order to reflect recent Web search features such as Google Instant or Yahoo! Direct Search.
The goal of this tutorial is to teach the key principles and technologies behind the activities and challenges briefly outlined above, bring new understanding and insights to the attendees, and hopefully foster future research. A previous version of this tutorial was offered at the ACM SIGIR'2010, WSDM'2011 and ECIR'2011.

[12] Improving recommendation for long-tail queries via templates Recommendation / Szpektor, Idan / Gionis, Aristides / Maarek, Yoelle Proceedings of the 2011 International Conference on the World Wide Web 2011-03-28 v.1 p.47-56
ACM Digital Library Link
Summary: The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies that a large fraction of queries are rare. As a result, most query assistance services perform poorly or are not even triggered on long-tail queries. We propose a method to extend the reach of query assistance techniques (and in particular query recommendation) to long-tail queries by reasoning about rules between query templates rather than individual query transitions, as currently done in query-flow graph models. As a simple example, if we recognize that 'Montezuma' is a city in the rare query "Montezuma surf" and if the rule 'city surf → beach has been observed, we are able to offer "Montezuma beach" as a recommendation, even if the two queries were never observed in a same session. We conducted experiments to validate our hypothesis, first via traditional small-scale editorial assessments but more interestingly via a novel automated large scale evaluation methodology. Our experiments show that general coverage can be relatively increased by 24% using templates without penalizing quality. Furthermore, for 36% of the 95M queries in our query flow graph, which have no out edges and thus could not be served recommendations, we can now offer at least one recommendation in 98% of the cases.

[13] Web retrieval: the role of users Tutorials / Baeza-Yates, Ricardo / Maarek, Yoelle Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2010-07-19 p.907
Keywords: user interaction, web retrieval
ACM Digital Library Link
Summary: Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways: (1) implicitly, through the analysis of usage data captured by query logs, and session and click information in general, the goal being to improve ranking as well as to measure user's happiness and engagement; (2) explicitly, by offering novel interactive features; the goal here being to better answer users' needs. In this tutorial, we will cover the user-related challenges associated with the implicit and explicit role of users in Web retrieval. We will review and discuss challenges associated with two types of activities, namely: usage data analysis and metrics and user interaction. The goal of this tutorial is to teach the key principles and technologies behind the activities and challenges briefly outlined above, bring new understanding and insights to the attendees, and hopefully foster future research.

[14] Do you want to take notes?: identifying research missions in Yahoo! search pad Full papers / Donato, Debora / Bonchi, Francesco / Chi, Tom / Maarek, Yoelle Proceedings of the 2010 International Conference on the World Wide Web 2010-04-26 v.1 p.321-330
Keywords: persistent search, query logs, sessions, task-oriented search
ACM Digital Library Link
Summary: Addressing user's information needs has been one of the main goals of Web search engines since their early days. In some cases, users cannot see their needs immediately answered by search results, simply because these needs are too complex and involve multiple aspects that are not covered by a single Web or search results page. This typically happens when users investigate a certain topic in domains such as education, travel or health, which often require collecting facts and information from many pages. We refer to this type of activities as "research missions". These research missions account for 10% of users' sessions and more than 25% of all query volume, as verified by a manual analysis that was conducted by Yahoo! editors.
    We demonstrate in this paper that such missions can be automatically identified on-the-fly, as the user interacts with the search engine, through careful runtime analysis of query flows and query sessions.
    The on-the-fly automatic identification of research missions has been implemented in Search Pad, a novel Yahoo! application that was launched in 2009, and that we present in this paper. Search Pad helps users keeping trace of results they have consulted. Its novelty however is that unlike previous notes taking products, it is automatically triggered only when the system decides, with a fair level of confidence, that the user is undertaking a research mission and thus is in the right context for gathering notes. Beyond the Search Pad specific application, we believe that changing the level of granularity of query modeling, from an isolated query to a list of queries pertaining to the same research missions, so as to better reflect a certain type of information needs, can be beneficial in a number of other Web search applications. Session-awareness is growing and it is likely to play, in the near future, a fundamental role in many on-line tasks: this paper presents a first step on this path.

[15] Current trends in the integration of searching and browsing Panels / Broder, Andrei Z. / Maarek, Yoelle S. / Bharat, Krishna / Dumais, Susan / Papa, Steve / Pedersen, Jan / Raghavan, Prabhakar Proceedings of the 2005 International Conference on the World Wide Web 2005-05-10 v.2 p.793
ACM Digital Library Link
Summary: Searching and browsing are the two basic information discovery paradigms, since the early days of the Web. After more than ten years down the road, three schools seem to have emerged: (1) The search-centric school argues that guided navigation is superfluous since free form search has become so good and the search UI so common, that users can satisfy all their needs via simple queries (2) The taxonomy navigation school claims that users have difficulties expressing informational needs and (3) The meta-data centric school advocates the use of meta-data for narrowing large sets of results, and is successful in e-commerce where it is known as "multi faceted search". This panel brings together experts and advocates for all three schools, who will discuss these approaches and share their experiences in the field. We will ask the audience to challenge our experts with real information architecture problems.

[16] Searching XML documents via XML fragments Structured documents / Carmel, David / Maarek, Yoelle S. / Mandelbrod, Matan / Mass, Yosi / Soffer, Aya Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003-07-28 p.151-158
ACM Digital Library Link
Summary: Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can be expressed in an approximate manner as pieces of XML documents or "XML fragments" of the same nature as the documents that are being searched. We present an extension of the vector space model for searching XML collections via XML fragments and ranking results by relevance. We describe how we have extended a full-text search engine to comply with this model. The value of the proposed method is demonstrated by the relative high precision of our system, which was among the top performers in the recent INEX workshop. Our results indicate that certain queries are more appropriate than others for the extended vector space model. Specifically, queries with relatively specific contexts but vague information needs are best situated to reap the benefit of this model. Finally our results show that one method may not fit all types of queries and that it could be worthwhile to use different solutions for different applications.

[17] Personalized pocket directories for mobile devices Mobility and Wireless Access / Cohen, Doron / Herscovici, Michael / Petruschka, Yael / Maarek, Yoëlle S. / Soffer, Aya Proceedings of the 2002 International Conference on the World Wide Web 2002-05-07 p.627-638
Keywords: hierarchical browsers, mobile devices, mobile search, personalization
ACM Digital Library Link
Summary: In spite of the increase in the availability of mobile devices in the last few years, Web information is not yet as accessible from PDAs or WAP phones as it is from the desktop. In this paper, we propose a solution for supporting one of the most popular information discovery mechanisms, namely Web directory navigation, from mobile devices. Our proposed solution consists of caching enough information on the device itself in order to conduct most of the navigation actions locally (with subsecond response time) while intermittently communicating with the server to receive updates and additional data requested by the user. The cached information is captured in a "directory capsule". The directory capsule represents only the portion of the directory that is of interest to the user in a given context and is sufficiently rich and consistent to support the information needs of the user in disconnected mode. We define a novel subscription model specifically geared for Web directories and for the special needs of PDAs. This subscription model enables users to specify the parts of the directory that are of interest to them as well as the preferred granularity. We describe a mechanism for keeping the directory capsule in sync over time with the Web directory and user subscription requests. Finally, we present the Pocket Directory Browser for Palm powered computers that we have developed. The pocket directory can be used to define, view and manipulate the capsules that are stored on the Palm. We provide several usage examples of our system on the Open Directory Project, one of the largest and most popular Web directories.

[18] Livemaps for collection awareness / Cohen, Doron / Jacovi, Michal / Maarek, Yoelle S. / Soroka, Vladimir International Journal of Human-Computer Studies 2002 v.56 n.1 p.7-23
Summary: With the increasing proliferation of chat applications on the web, the old vision of "adding people" to the web is becoming a reality. Along with collaboration tools, more and more sites offer people awareness mechanisms to let the site visitors know about each other. This reflects the dual nature of the web as a place for virtual meetings as well as an information repository. While standalone chat tools became the killer application of the Internet, site-related awareness applications did not quite catch on. In this work, we suggest possible reasons for this phenomenon and propose a new paradigm for awareness and social navigation. We identify three main obstacles to the existing site-related awareness applications: high sensitivity to the "critical mass" requirement, inflexible meeting place granularity and poor visitor visibility. To address these issues, we extend the well-known "document awareness" concept to a more general one that we call "collection awareness", which better reflects the graph structure of the web. We introduce a new tool for high-level awareness and collaboration, called Livemaps, which projects live information onto a web site map. We demonstrate how Livemaps addresses the obstacles we pointed out and describe a user study conducted on a "fan" web site for the "Friends" comedy series, so as to verify whether Livemaps actually improves social awareness.

[19] Knowledge encapsulation for focused search from pervasive devices / Aridor, Yariv / Carmel, David / Maarek, Yoelle S. / Soffer, Aya / Lempel, Ronny ACM Transactions on Information Systems 2002 v.20 n.1 p.25-46
Keywords: Focused searches, disconnected search, knowledge agents, pervasive devices
ACM Digital Library Link
Summary: Mobile knowledge seekers often need access to information on the Web during a meeting or on the road, while away from their desktop. A common practice today is to use pervasive devices such as Personal Digital Assistants or mobile phones. However, these devices have inherent constraints (e.g., slow communication, form factor) which often make information discovery tasks impractical.In this paper, we present a new focused-search approach specifically oriented for the mode of work and the constraints dictated by pervasive devices. It combines focused search within specific topics with encapsulation of topic-specific information in a persistent repository. One key characteristic of these persistent repositories is that their footprint is small enough to fit on local devices, and yet they are rich enough to support many information discovery tasks in disconnected mode. More specifically, we suggest a representation for topic-specific information based on "knowledge-agent bases" that comprise all the information necessary to access information about a topic (under the form of key concepts and key Web pages) and assist in the full search process from query formulation assistance to result scanning on the device itself. The key contribution of our work is the coupling of focused search with encapsulated knowledge representation making information discovery from pervasive devices practical as well as efficient. We describe our model in detail and demonstrate its aspects through sample scenarios.

[20] Static index pruning for information retrieval systems / Carmel, David / Cohen, Doron / Fagin, Ronald / Farchi, Eitan / Herscovici, Michael / Maarek, Yoelle S. / Soffer, Aya Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001-09-09 p.43-50
ACM Digital Library Link
Summary: We introduce static index pruning methods that significantly reduce the index size in information retrieval systems.
    We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index.

[21] Knowledge encapsulation for focused search from pervasive devices / Aridor, Yariv / Carmel, David / Maarek, Yoëlle S. / Soffer, Aya / Lempel, Ronny Proceedings of the 2001 International Conference on the World Wide Web 2001-05-01 p.754-764
Keywords: disconnected search, focused search, knowledge agents, pervasive devices
ACM Digital Library Link

[22] Incremental Maintenance of Semantic Links in Dynamically Changing Hypertext Systems Articles / Kaplan, Simon M. / Maarek, Yoelle S. Interacting with Computers 1990 v.2 n.3 p.337-366
Keywords: Incremental maintenance, Semantic links, Hypertext systems
Summary: One purported advantage of hypertext systems is the ability to move between semantically related parts of a document (or family of documents). If the document is undergoing frequent modification (for example while an author is writing a book or while a software design stored in the hypertext system is evolving) the question arises as to how to incrementally maintain semantic interconnections in the face of the modifications.
    The paper presents an optimal technique for the incremental maintenance of such interconnections as a document evolves. The technique, based on theories of information retrieval based on lexical affinities and theories of incremental computation, updates semantic interconnections as nodes are checked into the hypertext system (either new or as a result of an edit). Because we use the semantic weight of lexical affinities to determine which affinities are meaningful in the global context of the document, introducing a new affinity or changing the weight of an existing affinity can potentially have an effect on any node in the system. The challenge met by our algorithm is to guarantee that despite this potentially arbitrary impact, we still update link information optimally.
    Once established the semantic interconnections are used to allow the user to move from node to node based not on rigid connections but instead on dynamically determined semantic interrelationships among the nodes.

[23] Full Text Indexing Based on Lexical Relations An Application: Software Libraries Representation / Maarek, Yoelle S. / Smadja, Frank A. Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1989-06-25 p.198-206
Keywords: Automatic indexing, Software libraries, software reuse, Lexical relations, Natural language processing, Co-occurrence knowledge
Summary: In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge-based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.