HCI Bibliography Home | HCI Conferences | ECIR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ECIR Tables of Contents: 03040506070809101112131415

Proceedings of ECIR'11, the 2011 European Conference on Information Retrieval

Fullname:ECIR 2011: Advances in Information Retrieval: 33rd European Conference on IR Research
Editors:Paul Clough; Colum Foley; Cathal Gurrin; Gareth J. F. Jones; Wessel Kraaij; Hyowon Lee; Vanessa Mudoch
Location:Dublin, Ireland
Dates:2011-Apr-18 to 2011-Apr-21
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 6611
Standard No:DOI: 10.1007/978-3-642-20161-5 hcibib: ECIR11; ISBN: 978-3-642-20160-8 (print), 978-3-642-20161-5 (online)
Papers:95
Pages:774
Links:Online Proceedings | Conference Home Page
  1. Keynote Talks
  2. Text Categorisation (I)
  3. Recommender Systems
  4. Web IR (I)
  5. IR Evaluation
  6. IR for Social Networks (I)
  7. Cross-Language IR
  8. IR Theory (I)
  9. Text Categorisation (II)
  10. Multimedia IR
  11. IR for Social Networks (II)
  12. IR Applications
  13. Text Categorisation (III)
  14. IR for Social Networks (III)
  15. Web IR (II)
  16. IR Theory (II)
  17. Interactive IR
  18. Question Answering / NLP
  19. Posters
  20. Demonstrations

Keynote Talks

IR Research: Systems, Interaction, Evaluation and Theories BIBAFull-Text 1-3
  Kalervo Järvelin
The ultimate goal of information retrieval (IR) research is to create ways to support humans to better access information in order to better carry out their (work) tasks. Because of this, IR research has a primarily technological interest in knowledge creation -- how to find information (better)? IR research therefore has a constructive aspect (to create novel systems) and an evaluative aspect (are they any good?). Evaluation is sometimes referred to as a hallmark and distinctive feature of IR research. No claim on IR system performance is granted any merit unless proven through evaluation. Technological innovation alone is not sufficient. In fact, much research in IR deals with IR evaluation and its methodology.
   Evaluation, in general, is the systematic determination of merit and significance of something using criteria against some standards. Evaluation therefore requires some object that is evaluated and some goal that should be achieved or served. In IR, both can be set in many ways. The object usually is an IR system -- but what is an IR system? The goal is typically the quality of the retrieved result -- but what is the retrieved result and how does one measure quality? These questions can be answered in alternative ways leading to different kinds of IR evaluation.
   Practical life with all its variability is difficult and expensive to investigate. Therefore surrogate and more easily measurable goals are employed in IR evaluation, typically the quality of the ranked result list instead of the work task result. The task performance process may also be cut down from a work task to a search task and down to running an individual query in a test collection. This simplification has led to standardization of research designs and tremendous success in IR research. However, as the goals and systems drift farther away from the practical life condition, one needs to ask, whether the findings still best serve the initial goal of evaluation (supporting human performance)? If means (outputs) replace ends (outcomes), one runs the risk of sub-optimization.
   It is important to evaluate all subsystems of information retrieval processes, in addition to the search engines. Through a wider perspective one may be able to put the subsystems and their contributions in relation with each other. We will discuss nested IR evaluation frameworks ranging from IR system centered evaluation to work-task based evaluation. We will also point to the Pandora's box of problems that the enlargement of the scope of research entails. Is science at risk here?
   The contributions of a research area, in addition to constructive and evaluative contributions, may be generally empirical, theoretical and methodological. Why should anyone in IR care about anything beyond IR experimentation (i.e. evaluation) using test collections? The Cranfield model seeks to relate texts (documents), queries, their representations and matching to topical relevance in ranked output. Who relates this, and a range of possible other contributing factors, to outcomes in search task performance or work task performance? The talk will outline some possibilities for descriptive, explanatory and theoretical research in IR. As an example of descriptive research, we will discuss information access in task processes. Regarding explanatory and theoretical research, we look at unit theories that connect work task stages and properties to information need properties, information sources, and searching. Such studies do not solve a technical problem, nor evaluate any particular technique, and may therefore be considered unpractical. However, they may identify mechanisms that mediate between IR processes and task outcomes and position factors in the processes of information access into a balanced perspective. Therefore they may help focus research efforts on technical problems or evaluation.
Ad Retrieval Systems in vitro and in vivo: Knowledge-Based Approaches to Computational Advertising BIBAFull-Text 4-5
  Evgeniy Gabrilovich
Over the past decade, online advertising became the principal economic force behind many an Internet service, from major search engines to globe-spanning social networks to blogs. There is often a tension between online advertising and user experience, but on the other hand, advertising revenue enables a myriad of free Web services to the public and fosters a great deal of innovation. Matching the advertisers' message to a receptive and interested audience benefits both sides; indeed, literally hundreds of millions of users occasionally click on the ads, hence they should be considered relevant to the users' information needs by current IR evaluation principles. The utility of ads can be better explained by considering advertising as a medium of information [2,3]. Similarly to aggregated search [1], which enhances users' Web search experience with relevant news, local results, user-generated content, or multimedia, online advertising provides another rich source of content. This source, however, is in a complexity class of its own, due to the brevity of bid phrases, ad text being optimized for presentation rather than indexing, and multiple, possibly contradictory utility functions.
   A new scientific sub-discipline -- Computational Advertising -- has recently emerged, which strives to make online advertising integral to the user experience and relevant to the users' information needs, as well as economically worthwhile to the advertiser and the publisher. In this talk we discuss the unique algorithmic challenges posed by searching the ad corpus, and report on empirical evaluation of large-scale advertising systems in vivo. At first approximation, finding user-relevant ads is akin to ad hoc information retrieval, where the user context is distilled into a query executed against an index of ads. However, the elaborate structure of ad campaigns, along with the cornucopia of pertinent non-textual information, makes ad retrieval substantially and interestingly different. We show how to adapt standard IR methods for ad retrieval, by developing structure-aware indexing techniques and by augmenting the ad selection process with exogenous knowledge. Computational advertising also employs a host of NLP techniques, from text summarization for just-in-time ad matching, to machine translation for cross-language ad retrieval, to natural language generation for automatic construction of advertising campaigns. Last but not least, we study the interplay between the algorithmic and sponsored search results, as well as formulate and explore context transfer, which characterizes the user's transition from Web search to the context of the landing page following an ad-click. These studies offer deep insights into how users interact with ads, and facilitate better understanding of the much broader notion of relevance in ad retrieval compared to Web search.
The Value of User Feedback BIBAFull-Text 6
  Thorsten Joachims
Information retrieval systems and their users engage in a dialogue. While the main flow of information is from the system to the user, feedback from the user to the system provides many opportunities for short-term and long-term learning. In this talk, I will explore two interrelated questions that are central to the effective use of feedback. First, how can user feedback be collected so that it does not lay a burden on the user? I will argue that the mechanisms for collecting feedback have to be integrated into the design of the retrieval process, so that the user's short-term goals are well-aligned with the system's goal of collecting feedback. Second, if this integration succeeds, how valuable is the information that user feedback provides? For the tasks of retrieval evaluation and query disambiguation, the talk will quantify by how much user feedback can save human annotator effort and improve retrieval quality respectively.

Text Categorisation (I)

Text Classification for a Large-Scale Taxonomy Using Dynamically Mixed Local and Global Models for a Node BIBAKFull-Text 7-18
  Heung-Seon Oh; Yoonjung Choi; Sung-Hyon Myaeng
Hierarchical text classification for a large-scale Web taxonomy is challenging because the number of categories hierarchically organized is large and the training data for deep categories are usually sparse. It's been shown that a narrow-down approach involving a search of the taxonomical tree is an effective method for the problem. A recent study showed that both local and global information for a node is useful for further improvement. This paper introduces two methods for mixing local and global models dynamically for individual nodes and shows they improve classification effectiveness by 5% and 30%, respectively, over and above the state-of-art method.
Keywords: Web Taxonomy; Hierarchical Text Classification; ODP
User-Related Tag Expansion for Web Document Clustering BIBAFull-Text 19-31
  Peng Li; Bin Wang; Wei Jin; Yachao Cui
As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags on clustering. In this work, we propose a user-related tag expansion method to overcome the problem, which incorporates additional useful tags into the original tag document by utilizing user tagging as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. This problem is addressed in this research by designing a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that (1) Our user-related tag expansion method can be effectively applied to over 90% tagged web documents; (2) Folk-LDA can alleviate the topic drift in expansion, especially for those topic-specific documents; (3) Compared to word-based clustering, our approach using only tags achieves a statistically significant increase of 39% on F1 score while reducing 76% terms involved in computation at best.
A Comparative Experimental Assessment of a Threshold Selection Algorithm in Hierarchical Text Categorization BIBAFull-Text 32-42
  Andrea Addis; Giuliano Armano; Eloisa Vargiu
Most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold, i.e., on hierarchical text categorization. For solutions of a hierarchical problem that make use of an ensemble of classifiers, the behavior of each classifier typically depends on an acceptance threshold, which turns a degree of membership into a dichotomous decision. In principle, the problem of finding the best acceptance thresholds for a set of classifiers related with taxonomic relationships is a hard problem. Hence, devising effective ways for finding suboptimal solutions to this problem may have great importance. In this paper, we assess a greedy threshold selection algorithm aimed at finding a suboptimal combination of thresholds in a hierarchical text categorization setting. Comparative experiments, performed on Reuters, report the performance of the proposed threshold selection algorithm against a relaxed brute-force algorithm and against two state-of-the-art algorithms. Results highlight the effectiveness of the approach.

Recommender Systems

Improving Tag-Based Recommendation by Topic Diversification BIBAFull-Text 43-54
  Christian Wartena; Martin Wibbels
Collaborative tagging has emerged as a mechanism to describe items in large on-line collections. Tags are assigned by users to describe and find back items, but it is also tempting to describe the users in terms of the tags they assign or in terms of the tags of the items they are interested in. The tag-based profile thus obtained can be used to recommend new items.
   If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recommending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it is assumed that there exists a trade-off between accuracy and diversity.
   In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study topic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity.
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating BIBAKFull-Text 55-66
  Jorge Carrillo de Albornoz; Laura Plaza; Pablo Gervás; Alberto Díaz
The information in customer reviews is of great interest to both companies and consumers. This information is usually presented as non-structured free-text so that automatically extracting and rating user opinions about a product is a challenging task. Moreover, this opinion highly depends on the product features on which the user judgments and impressions are expressed. Following this idea, our goal is to predict the overall rating of a product review based on the user opinion about the different product features that are evaluated in the review. To this end, the system first identifies the features that are relevant to consumers when evaluating a certain type of product, as well as the relative importance or salience of such features. The system then extracts from the review the user opinions about the different product features and quantifies such opinions. The salience of the different product features and the values that quantify the user opinions about them are used to construct a Vector of Feature Intensities which represents the review and will be the input to a machine learning model that classifies the review into different rating categories. Our method is evaluated over 1000 hotel reviews from booking.com. The results compare favorably with those achieved by other systems addressing similar evaluations.
Keywords: automatic product rating; feature mining; polarity detection; sentiment analysis
Modeling Answerer Behavior in Collaborative Question Answering Systems BIBAFull-Text 67-79
  Qiaoling Liu; Eugene Agichtein
A key functionality in Collaborative Question Answering (CQA) systems is the assignment of the questions from information seekers to the potential answerers. An attractive solution is to automatically recommend the questions to the potential answerers with expertise or interest in the question topic. However, previous work has largely ignored a key problem in question recommendation -- namely, whether the potential answerer is likely to accept and answer the recommended questions in a timely manner. This paper explores the contextual factors that influence the answerer behavior in a large, popular CQA system, with the goal to inform the construction of question routing and recommendation systems. Specifically, we consider when users tend to answer questions in a large-scale CQA system, and how answerers tend to choose the questions to answer. Our results over a dataset of more than 1 million questions draw from a real CQA system could help develop more realistic evaluation methods for question recommendation, and inform the design of future question recommender systems.

Web IR (I)

Clash of the Typings BIBAKFull-Text 80-91
  Karl Gyllstrom; Marie-Francine Moens
The TadPolemic system identifies whether web search queries (1) are controversial in nature and/or (2) pertain to children's topics. We are incorporating it into a children's web search engine to assist children's search during difficult topics, as well as to provide filtering or mitigation of bias in results when children search for contentious topics. We show through an evaluation that the system is effective at detecting kids' topics and controversies for a broad range of topics. Though designed to assist children, we believe these methods are generalizable beyond young audiences and can be usefully applied in other contexts.
Keywords: controversy detection; children's search
Are Semantically Related Links More Effective for Retrieval? BIBAKFull-Text 92-103
  Marijn Koolen; Jaap Kamps
Why do links work? Link-based ranking algorithms are based on the often implicit assumption that linked documents are semantically related to each other, and that link information is therefore useful for retrieval. Although the benefits of link information are well researched, this underlying assumption on why link evidence works remains untested, and the main aim of this paper is to do exactly that. Specifically, we use Wikipedia because it has a dense link structure in combination with a large category structure, which allows for an independent measurement of the semantic relatedness of linked documents. Our main findings are that: 1) global, query-independent link evidence, is not affected by the semantic nature of the links, and 2) for local, query-dependent link evidence, the effectiveness of links increases as their semantic distance decreases. That is, we directly observe that links between semantically related pages are more effective for ad hoc retrieval than links between unrelated ones. These findings confirm and quantify the underlying assumption of existing link-based methods, which sheds further light on our understanding of the nature of link evidence. Such deeper understanding is instrumental for the development of novel link-based methods.
Keywords: Links; Semantic Relatedness; Effectiveness; Wikipedia
Caching for Realtime Search BIBAFull-Text 104-116
  Edward Bortnikov; Ronny Lempel; Kolman Vornovitsky
Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP -- the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment -- on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache's memory footprint.
Enhancing Deniability against Query-Logs BIBAFull-Text 117-128
  Avi Arampatzis; Pavlos Efraimidis; George Drosatos
We propose a method for search privacy on the Internet, focusing on enhancing plausible deniability against search engine query-logs. The method approximates the target search results, without submitting the intended query and avoiding other exposing queries, by employing sets of queries representing more general concepts. We model the problem theoretically, and investigate the practical feasibility and effectiveness of the proposed solution with a set of real queries with privacy issues on a large web collection. The findings may have implications for other IR research areas, such as query expansion and fusion in meta-search.

IR Evaluation

On the Contributions of Topics to System Evaluation BIBAFull-Text 129-140
  Stephen Robertson
We consider the selection of good subsets of topics for system evaluation. It has previously been suggested that some individual topics and some subsets of topics are better for system evaluation than others: given limited resources, choosing the best subset of topics may give significantly better prediction of overall system effectiveness than (for example) choosing random subsets. Earlier experimental results are extended, with particular reference to generalisation: the ability of a subset of topics selected on the basis on one collection of system runs to perform well in evaluating another collection of system runs. It turns out to be hard to establish generalisability; it is not at all clear that it is possible to identify subsets of topics that are good for general evaluation.
A Methodology for Evaluating Aggregated Search Results BIBAFull-Text 141-152
  Jaime Arguello; Fernando Diaz; Jamie Callan; Ben Carterette
Aggregated search is the task of incorporating results from different specialized search services, or verticals, into Web search results. While most prior work focuses on deciding which verticals to present, the task of deciding where in the Web results to embed the vertical results has received less attention. We propose a methodology for evaluating an aggregated set of results. Our method elicits a relatively small number of human judgements for a given query and then uses these to facilitate a metric-based evaluation of any possible presentation for the query. An extensive user study with 13 verticals confirms that, when users prefer one presentation of results over another, our metric agrees with the stated preference. By using Amazon's Mechanical Turk, we show that reliable assessments can be obtained quickly and inexpensively.
Design and Implementation of Relevance Assessments Using Crowdsourcing BIBAFull-Text 153-164
  Omar Alonso; Ricardo Baeza-Yates
In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.
In Search of Quality in Crowdsourcing for Search Engine Evaluation BIBAKFull-Text 165-176
  Gabriella Kazai
Crowdsourcing is increasingly looked upon as a feasible alternative to traditional methods of gathering relevance labels for the evaluation of search engines, offering a solution to the scalability problem that hinders traditional approaches. However, crowdsourcing raises a range of questions regarding the quality of the resulting data. What indeed can be said about the quality of the data that is contributed by anonymous workers who are only paid cents for their efforts? Can higher pay guarantee better quality? Do better qualified workers produce higher quality labels? In this paper, we investigate these and similar questions via a series of controlled crowdsourcing experiments where we vary pay, required effort and worker qualifications and observe their effects on the resulting label quality, measured based on agreement with a gold set.
Keywords: IR evaluation; relevance data gathering; crowdsourcing

IR for Social Networks (I)

Summarizing a Document Stream BIBAFull-Text 177-188
  Hiroya Takamura; Hikaru Yokono; Manabu Okumura
We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.
A Link Prediction Approach to Recommendations in Large-Scale User-Generated Content Systems BIBAKFull-Text 189-200
  Nitin Chiluka; Nazareno Andrade; Johan Pouwelse
Recommending interesting and relevant content from the vast repositories of User-Generated Content systems (UGCs) such as YouTube, Flickr and Digg is a significant challenge. Part of this challenge stems from the fact that classical collaborative filtering techniques -- such as k-Nearest Neighbor -- cannot be assumed to perform as well in UGCs as in other applications. Such technique has severe limitations regarding data sparsity and scalability that are unfitting for UGCs. In this paper, we employ adaptations of popular Link Prediction algorithms that were shown to be effective in massive online social networks for recommending items in UGCs. We evaluate these algorithms on a large dataset we collect from Flickr. Our results suggest that Link Prediction algorithms are a more scalable and accurate alternative to classical collaborative filtering in the context of UGCs. Moreover, our experiments show that the algorithms considering the immediate neighborhood of users in an user-item graph to recommend items outperform the algorithms that use the entire graph structure for the same. Finally, we find that, contrary to intuition, exploiting explicit social links among users in the recommendation algorithms improves only marginally their performance.
Keywords: User-Generated Content Systems; Recommendation; Collaborative Filtering; Link Prediction; Flickr
Topic Classification in Social Media Using Metadata from Hyperlinked Objects BIBAFull-Text 201-206
  Sheila Kinsella; Alexandre Passant; John G. Breslin
Social media presents unique challenges for topic classification, including the brevity of posts, the informal nature of conversations, and the frequent reliance on external hyperlinks to give context to a conversation. In this paper we investigate the usefulness of these external hyperlinks for determining the topic of an individual post. We focus specifically on hyperlinks to objects which have related metadata available on the Web, including Amazon products and YouTube videos. Our experiments show that the inclusion of metadata from hyperlinked objects in addition to the original post content improved classifier performance measured with the F-score from 84% to 90%. Further, even classification based on object metadata alone outperforms classification based on the original post content.
Peddling or Creating? Investigating the Role of Twitter in News Reporting BIBAFull-Text 207-213
  Ilija Subašic; Bettina Berendt
The widespread use of social media is regarded by many as the emergence of a new highway for information and news sharing promising a new information-driven "social revolution". In this paper, we analyze how this idea transfers to the news reporting domain. To analyze the role of social media in news reporting, we ask whether citizen journalists tend to create news or peddle (re-report) existing content. We introduce a framework for exploring divergence between news sources by providing multiple views on corpora in comparison. The results of our case study comparing Twitter and other news sources suggest that a major role of Twitter authors consists of neither creating nor peddling, but extending them by commenting on news.

Cross-Language IR

Latent Sentiment Model for Weakly-Supervised Cross-Lingual Sentiment Classification BIBAKFull-Text 214-225
  Yulan He
In this paper, we present a novel weakly-supervised method for cross-lingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domain-specific polarity words from text.
Keywords: Latent sentiment model (LSM); cross-lingual sentiment analysis; Generalized expectation; latent Dirichlet allocation
Fractional Similarity: Cross-Lingual Feature Selection for Search BIBAFull-Text 226-237
  Jagadeesh Jagarlamudi; Paul N. Bennett
Training data as well as supplementary data such as usage-based click behavior may abound in one search market (i.e., a particular region, domain, or language) and be much scarcer in another market. Transfer methods attempt to improve performance in these resource-scarce markets by leveraging data across markets. However, differences in feature distributions across markets can change the optimal model. We introduce a method called Fractional Similarity, which uses query-based variance within a market to obtain more reliable estimates of feature deviations across markets. An empirical analysis demonstrates that using this scoring method as a feature selection criterion in cross-lingual transfer improves relevance ranking in the foreign language and compares favorably to a baseline based on KL divergence.
Is a Query Worth Translating: Ask the Users! BIBAKFull-Text 238-250
  Ahmed Hefny; Kareem Darwish; Ali Alkahky
Users in many regions of the world are multilingual and they issue similar queries in different languages. Given a source language query, we propose query picking which involves finding equivalent target language queries in a large query log. Query picking treats translation as a search problem, and can serve as a translation method in the context of cross-language and multilingual search. Further, given that users usually issue queries when they think they can find relevant content, the success of query picking can serve as a strong indicator to the projected success of cross-language and multilingual search. In this paper we describe a system that performs query picking and we show that picked queries yield results that are statistically indistinguishable from a monolingual baseline. Further, using query picking to predict the effectiveness of cross-language results can have statistically significant effect on the success of multilingual search with improvements over a monolingual baseline. Multilingual merging methods that do not account for the success of query picking can often hurt retrieval effectiveness.
Keywords: cross-language search; multilingual search; query translation mining

IR Theory (I)

Balancing Exploration and Exploitation in Learning to Rank Online BIBAFull-Text 251-263
  Katja Hofmann; Shimon Whiteson; Maarten de Rijke
As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice.
ReFER: Effective Relevance Feedback for Entity Ranking BIBAFull-Text 264-276
  Tereza Iofciu; Gianluca Demartini; Nick Craswell; Arjen P. de Vries
Web search increasingly deals with structured data about people, places and things, their attributes and relationships. In such an environment an important sub-problem is matching a user's unstructured free-text query to a set of relevant entities. For example, a user might request 'Olympic host cities'. The most challenging general problem is to find relevant entities, of the correct type and characteristics, based on a free-text query that need not conform to any single ontology or category structure. This paper presents an entity ranking relevance feedback model, based on example entities specified by the user or on pseudo feedback. It employs the Wikipedia category structure, but augments that structure with 'smooth categories' to deal with the sparseness of the raw category information. Our experiments show the effectiveness of the proposed method, whether applied as a pseudo relevance feedback method or interactively with the user in the loop.
The Limits of Retrieval Effectiveness BIBAFull-Text 277-282
  Ronan Cummins; Mounia Lalmas; Colm O'Riordan
Best match systems in Information Retrieval have long been one of the most predominant models used in both research and practice. It is argued that the effectiveness of these types of systems for the ad hoc task in IR has plateaued. In this short paper, we conduct experiments to find the upper limits of performance of these systems from three different perspectives. Our results on TREC data show that there is much room for improvement in terms of term-weighting and query reformulation in the ad hoc task given an entire information need.
Learning Conditional Random Fields from Unaligned Data for Natural Language Understanding BIBAFull-Text 283-288
  Deyu Zhou; Yulan He
In this paper, we propose a learning approach to train conditional random fields from unaligned data for natural language understanding where input to model learning are sentences paired with predicate formulae (or abstract semantic annotations) without word-level annotations. The learning approach resembles the expectation maximization algorithm. It has two advantages, one is that only abstract annotations are needed instead of fully word-level annotations, and the other is that the proposed learning framework can be easily extended for training other discriminative models, such as support vector machines, from abstract annotations. The proposed approach has been tested on the DARPA Communicator Data. Experimental results show that it outperforms the hidden vector state (HVS) model, a modified hidden Markov model also trained on abstract annotations. Furthermore, the proposed method has been compared with two other approaches, one is the hybrid framework (HF) combining the HVS model and the support vector hidden Markov model, and the other is discriminative training of the HVS model (DT). The proposed approach gives a relative error reduction rate of 18.7% and 8.3% in F-measure when compared with HF and DT respectively.

Text Categorisation (II)

Subspace Tracking for Latent Semantic Analysis BIBAFull-Text 289-300
  Radim Rehurek
Modern applications of Latent Semantic Analysis (LSA) must deal with enormous (often practically infinite) data collections, calling for a single-pass matrix decomposition algorithm that operates in constant memory w.r.t. the collection size. This paper introduces a streamed distributed algorithm for incremental SVD updates. Apart from the theoretical derivation, we present experiments measuring numerical accuracy and runtime performance of the algorithm over several data collections, one of which is the whole of the English Wikipedia.
Text Retrieval Methods for Item Ranking in Collaborative Filtering BIBAKFull-Text 301-306
  Alejandro Bellogín; Jun Wang; Pablo Castells
Collaborative Filtering (CF) aims at predicting unknown ratings of a user from other similar users. The uniqueness of the problem has made its formulation distinctive to other information retrieval problems. While the formulation has proved to be effective in rating prediction tasks, it has limited the potential connections between these algorithms and Information Retrieval (IR) models. In this paper we propose a common notational framework for IR and rating-based CF, as well as a technique to provide CF data with a particular structure, in order to be able to use any IR weighting function with it. We argue that the flexibility of our approach may yield to much better performing algorithms. In fact, in this work we have found that IR models perform well in item ranking tasks, along with different normalization strategies.
Keywords: Collaborative Filtering; Text Retrieval; Unified Models
Classifying with Co-stems BIBAFull-Text 307-313
  Nedim Lipka; Benno Stein
Besides the content the writing style is an important discriminator in information filtering tasks. Ideally, the solution of a filtering task employs a text representation that models both kinds of characteristics. In this respect word stems are clearly content capturing, whereas word suffixes qualify as writing style indicators. Though the latter feature type is used for part of speech tagging, it has not yet been employed for information filtering in general. We propose a text representation that combines both the output of a stemming algorithm (stems) and the stem-reduced words (co-stems). A co-stems can be a prefix, an infix, a suffix, or a concatenation of prefixes, infixes, or suffixes. Using accepted standard corpora, we analyze the discriminative power of this representation for a broad range of information filtering tasks to provide new insights into the adequacy and task-specificity of text representation models. Altogether we observe that co-stems-based representations outperform the classical bag of words model for several filtering tasks.

Multimedia IR

Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content BIBAKFull-Text 314-325
  Marçal Rusiñol; David Aldavert; Dimosthenis Karatzas; Ricardo Toledo; Josep Lladós
In this paper we propose an efficient queried-by-example retrieval system which is able to retrieve trademark images by similarity from patent and trademark offices' digital libraries. Logo images are described by both their semantic content, by means of the Vienna codes, and their visual contents, by using shape and color as visual cues. The trademark descriptors are then indexed by a locality-sensitive hashing data structure aiming to perform approximate k-NN search in high dimensional spaces in sub-linear time. The resulting ranked lists are combined by using a weighted Condorcet method and a relevance feedback step helps to iteratively revise the query and refine the obtained results. The experiments demonstrate the effectiveness and efficiency of this system on a realistic and large dataset.
Keywords: Multimedia Information Retrieval; Trademark Image Retrieval; Graphics Recognition; Feature Indexing
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases BIBAFull-Text 326-337
  Avi Arampatzis; Konstantinos Zagoris; Savvas A. Chatzichristofis
Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a 'better' subset. Using a relatively 'cheap' first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed.

IR for Social Networks (II)

Comparing Twitter and Traditional Media Using Topic Models BIBAKFull-Text 338-349
  Wayne Xin Zhao; Jing Jiang; Jianshu Weng; Jing He; Ee-Peng Lim; Hongfei Yan; Xiaoming Li
Twitter as a new form of social media can potentially contain much useful information, but content analysis on Twitter has not been well studied. In particular, it is not clear whether as an information source Twitter can be simply regarded as a faster news feed that covers mostly the same information as traditional news media. In This paper we empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling. We use a Twitter-LDA model to discover topics from a representative sample of the entire Twitter. We then use text mining techniques to compare these Twitter topics with topics from New York Times, taking into consideration topic categories and types. We also study the relation between the proportions of opinionated tweets and retweets and topic categories and types. Our comparisons show interesting and useful findings for downstream IR or DM applications.
Keywords: Twitter; microblogging; topic modeling
Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval BIBAKFull-Text 350-361
  Huizhong Duan; Chengxiang Zhai
Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we study how to exploit the naturally available raw thread structures of forums to improve retrieval accuracy in the language modeling framework. Specifically, we propose and study two different schemes for smoothing the language model of a forum post based on the thread containing the post. We explore several different variants of the two schemes to exploit thread structures in different ways. We also create a human annotated test data set for forum post retrieval and evaluate the proposed smoothing methods using this data set. The experiment results show that the proposed methods for leveraging forum threads to improve estimation of document language models are effective, and they outperform the existing smoothing methods for the forum post retrieval task.
Keywords: Forum post retrieval; language modeling; smoothing
Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts BIBAFull-Text 362-367
  Kamran Massoudi; Manos Tsagkias; Maarten de Rijke; Wouter Weerkamp
We propose a retrieval model for searching microblog posts for a given topic of interest. We develop a language modeling approach tailored to microblogging characteristics, where redundancy-based IR methods cannot be used in a straightforward manner. We enhance this model with two groups of quality indicators: textual and microblog specific. Additionally, we propose a dynamic query expansion model for microblog post retrieval. Experimental results on Twitter data reveal the usefulness of boolean search, and demonstrate the utility of quality indicators and query expansion in microblog search.
Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models BIBAFull-Text 368-374
  Oscar Täckström; Ryan McDonald
In this paper we investigate the use of latent variable structured prediction models for fine-grained sentiment analysis in the common situation where only coarse-grained supervision is available. Specifically, we show how sentence-level sentiment labels can be effectively learned from document-level supervision using hidden conditional random fields (HCRFs) [10]. Experiments show that this technique reduces sentence classification errors by 22% relative to using a lexicon and 13% relative to machine-learning baselines.

IR Applications

Combining Global and Local Semantic Contexts for Improving Biomedical Information Retrieval BIBAKFull-Text 375-386
  Duy Dinh; Lynda Tamine
In the context of biomedical information retrieval (IR), this paper explores the relationship between the document's global context and the query's local context in an attempt to overcome the term mismatch problem between the user query and documents in the collection. Most solutions to this problem have been focused on expanding the query by discovering its context, either global or local. In a global strategy, all documents in the collection are used to examine word occurrences and relationships in the corpus as a whole, and use this information to expand the original query. In a local strategy, the top-ranked documents retrieved for a given query are examined to determine terms for query expansion. We propose to combine the document's global context and the query's local context in an attempt to increase the term overlap between the user query and documents in the collection via document expansion (DE) and query expansion (QE). The DE technique is based on a statistical method (IR-based) to extract the most appropriate concepts (global context) from each document. The QE technique is based on a blind feedback approach using the top-ranked documents (local context) obtained in the first retrieval stage. A comparative experiment on the TREC 2004 Genomics collection demonstrates that the combination of the document's global context and the query's local context shows a significant improvement over the baseline. The MAP is significantly raised from 0.4097 to 0.4532 with a significant improvement rate of +10.62% over the baseline. The IR performance of the combined method in terms of MAP is also superior to official runs participated in TREC 2004 Genomics and is comparable to the performance of the best run (0.4075).
Keywords: Term Mismatch; Concept Extraction; Document Expansion; Query Expansion; Biomedical Information Retrieval
Smoothing Click Counts for Aggregated Vertical Search BIBAFull-Text 387-398
  Jangwon Seo; W. Bruce Croft; Kwang Hyun Kim; Joon Ho Lee
Clickthrough data is a critical feature for improving web search ranking. Recently, many search portals have provided aggregated search, which retrieves relevant information from various heterogeneous collections called verticals. In addition to the well-known problem of rank bias, clickthrough data recorded in the aggregated search environment suffers from severe sparseness problems due to the limited number of results presented for each vertical. This skew in clickthrough data, which we call rank cut, makes optimization of vertical searches more difficult. In this work, we focus on mitigating the negative effect of rank cut for aggregated vertical searches. We introduce a technique for smoothing click counts based on spectral graph analysis. Using real clickthrough data from a vertical recorded in an aggregated search environment, we show empirically that clickthrough data smoothed by this technique is effective for improving the vertical search.
Automatic People Tagging for Expertise Profiling in the Enterprise BIBAFull-Text 399-410
  Pavel Serdyukov; Mike Taylor; Vishwa Vinay; Matthew Richardson; Ryen W. White
In an enterprise search setting, there is a class of queries for which people, rather than documents, are desirable answers. However, presenting users with just a list of names of knowledgeable employees without any description of their expertise may lead to confusion, lack of trust in search results, and abandonment of the search engine. At the same time, building a concise meaningful description for a person is not a trivial summarization task. In this paper, we propose a solution to this problem by automatically tagging people for the purpose of profiling their expertise areas in the scope of the enterprise where they are employed. We address the novel task of automatic people tagging by using a machine learning algorithm that combines evidence that a certain tag is relevant to a certain employee acquired from different sources in the enterprise. We experiment with the data from a large distributed organization, which also allows us to study sources of expertise evidence that have been previously overlooked, such as personal click-through history. The evaluation of the proposed methods shows that our technique clearly outperforms state of the art approaches.

Text Categorisation (III)

Text Classification: A Sequential Reading Approach BIBAFull-Text 411-423
  Gabriel Dulac-Arnold; Ludovic Denoyer; Patrick Gallinari
We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided.
Domain Adaptation for Text Categorization by Feature Labeling BIBAKFull-Text 424-435
  Cristina Kadar; José Iria
We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.
Keywords: Domain Adaptation; Generalized Expectation Criteria; Weakly-Supervised Latent Dirichlet Allocation

IR for Social Networks (III)

TEMPER: A Temporal Relevance Feedback Method BIBAFull-Text 436-447
  Mostafa Keikha; Shima Gerani; Fabio Crestani
The goal of a blog distillation (blog feed search) method is to rank blogs according to their recurrent relevance to the query. An interesting property of blog distillation which differentiates it from traditional retrieval tasks is its dependency on time. In this paper we investigate the effect of time dependency in query expansion. We propose a framework, TEMPER, which selects different terms for different times and ranks blogs according to their relevancy to the query over time. By generating multiple expanded queries based on time, we are able to capture the dynamics of the topic both in aspects and vocabulary usage. We show performance gains over the baseline techniques which generate a single expanded query using the top retrieved posts or blogs irrespective of time.
Terms of a Feather: Content-Based News Recommendation and Discovery Using Twitter BIBAFull-Text 448-459
  Owen Phelan; Kevin McCarthy; Mike Bennett; Barry Smyth
User-generated content has dominated the web's recent growth and today the so-called real-time web provides us with unprecedented access to the real-time opinions, views, and ratings of millions of users. For example, Twitter's 200m+ users are generating in the region of 1000+ tweets per second. In this work, we propose that this data can be harnessed as a useful source of recommendation knowledge. We describe a social news service called Buzzer that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions. This is achieved by a content-based approach of mining trending terms from both the public Twitter timeline and from the timeline of tweets published by a user's own Twitter friend subscriptions. We also present results of a live-user evaluation which demonstrates how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques.
Topical and Structural Linkage in Wikipedia BIBAFull-Text 460-465
  Kelly Y. Itakura; Charles L. A. Clarke; Shlomo Geva; Andrew Trotman; Wei Chi Huang
We explore statistical properties of links within Wikipedia. We demonstrate that a simple algorithm can predict many of the links that would normally be added to a new article, without considering the topic of the article itself. We then explore a variant of topic-oriented PageRank, which can effectively identify topical links within existing articles, when compared with manual judgments of their topical relevance. Based on these results, we suggest that linkages within Wikipedia arise from a combination of structural requirements and topical relationships.

Web IR (II)

An Analysis of Time-Instability in Web Search Results BIBAFull-Text 466-478
  Jinyoung Kim; Vitor R. Carvalho
Due to the dynamic nature of web and the complex architectures of modern commercial search engines, top results in major search engines can change dramatically over time. Our experimental data shows that, for all three major search engines (Google, Bing and Yahoo!), approximately 90% of queries have their top 10 results altered within a period of ten days. Although this instability is expected in some situations such as in news-related queries, it is problematic in general because it can dramatically affect retrieval performance measurements and negatively affect users' perception of search quality (for instance, when users cannot re-find a previously found document).
   In this work we present the first large scale study on the degree and nature of these changes. We introduce several types of query instability, and several metrics to quantify it. We then present a quantitative analysis using 12,600 queries collected from a commercial web search engine over several weeks. Our analysis shows that the results from all major search engines have similar levels of instability, and that many of these changes are temporary. We also identified classes of queries with clearly different instability profiles -- for instance, navigational queries are considerably more stable than non-navigational, while longer queries are significantly less stable than shorter ones.
Rules of Thumb for Information Acquisition from Large and Redundant Data BIBAFull-Text 479-490
  Wolfgang Gatterbauer
We develop an abstract model of information acquisition from redundant data. We assume a random sampling process from data which contain information with bias and are interested in the fraction of information we expect to learn as function of (i) the sampled fraction (recall) and (ii) varying bias of information (redundancy distributions). We develop two rules of thumb with varying robustness. We first show that, when information bias follows a Zipf distribution, the 80-20 rule or Pareto principle does surprisingly not hold, and we rather expect to learn less than 40% of the information when randomly sampling 20% of the overall data. We then analytically prove that for large data sets, randomized sampling from power-law distributions leads to "truncated distributions" with the same power-law exponent. This second rule is very robust and also holds for distributions that deviate substantially from a strict power law. We further give one particular family of power-law functions that remain completely invariant under sampling. Finally, we validate our model with two large Web data sets: link distributions to web domains and tag distributions on delicious.com.
Bringing Why-QA to Web Search BIBAFull-Text 491-496
  Suzan Verberne; Lou Boves; Wessel Kraaij
We investigated to what extent users could be satisfied by a web search engine for answering causal questions. We used an assessment environment in which a web search interface was simulated. For 1401 why-queries from a search engine log we pre-retrieved the first 10 results using Bing. 311 queries were assessed by human judges. We found that even without clicking a result, 25.2% of the why-questions is answered on the first result page. If we count an intended click on a result as a vote for relevance, then 74.4% of the why-questions gets at least one relevant answer in the top-10. 10% of why-queries asked to web search engines are not answerable according to human assessors.
The Power of Peers BIBAFull-Text 497-502
  Nick Craswell; Dennis Fetterly; Marc Najork
We present a study of the contributions of three classes of ranking signals: BM25F, a retrieval function that is based on words in the content of web pages and the anchors that link to them; SALSA, a link-based feature that takes all or part of the result set to a query as input; and matching-anchor count (MAC), a feature that measures precise matches between queries and anchors pointing to result pages. All three features incorporate both link and textual features, but in varying degrees. BM25F is the state-of-the art exponent of Salton's term-vector model, and is based on a solid theoretical foundation; the two other features are somewhat more ad-hoc. We studied the impact of two factors that go into the formation of SALSA's "base" set: whether to use conjunctive or disjunctive query semantics, and how many results to include into the base set. We found that the choice of query semantics has little impact on the effectiveness of SALSA (with conjunctive semantics having a slight edge); more surprisingly, we found that limiting the size of the base set to a few hundred results of high expected quality maximizes performance. Furthermore, we experimented with various linear combinations of BM25F, MAC and SALSA. In doing so, we made a remarkable observation: adding BM25F to a two-way weighted linear combination of MAC and SALSA does not increase performance in any statistically significant way.
Introducing the User-over-Ranking Hypothesis BIBAFull-Text 503-509
  Benno Stein; Matthias Hagen
The User-over-Ranking hypothesis states that rather the user herself than a web search engine's ranking algorithm can help to improve retrieval performance. The means are longer queries that provide additional keywords.
   Readers who take this hypothesis for granted should recall the fact that virtually no user and none of the search index providers consider its implications. For readers who feel insecure about the claim, our paper gives empirical evidence.
Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines BIBAKFull-Text 510-516
  I. Sengor Altingovde; Rifat Ozcan; B. Barla Cambazoglu; Özgür Ulusoy
Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result pages computed for queries. The docID cache in the second layer stores ids of documents in search results. Experiments under various scenarios show that, in terms of average query processing time, this hybrid caching approach outperforms the traditional approach, which relies only on the HTML cache.
Keywords: Search engines; query processing; result cache

IR Theory (II)

Learning Models for Ranking Aggregates BIBAFull-Text 517-529
  Craig Macdonald; Iadh Ounis
Aggregate ranking tasks are those where documents are not the final ranking outcome, but instead an intermediary component. For instance, in expert search, a ranking of candidate persons with relevant expertise to a query is generated after consideration of a document ranking. Many models exist for aggregate ranking tasks, however obtaining an effective and robust setting for different aggregate ranking tasks is difficult to achieve. In this work, we propose a novel learned approach to aggregate ranking, which combines different document ranking features as well as aggregate ranking approaches. We experiment with our proposed approach using two TREC test collections for expert and blog search. Our experimental results attest the effectiveness and robustness of a learned model for aggregate ranking across different settings.
Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries BIBAFull-Text 530-542
  Simon Jonassen; Svein Erik Bratsberg
In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations.
Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing BIBAFull-Text 543-554
  Sree Lekha Thota; Ben Carterette
Document-centric static index pruning methods provide smaller indexes and faster query times by dropping some within-document term information from inverted lists. We present a method of pruning inverted lists derived from the formulation of unigram language models for retrieval. Our method is based on the statistical significance of term frequency ratios: using the two-sample two-proportion (2P2N) test, we statistically compare the frequency of occurrence of a word within a given document to the frequency of its occurrence in the collection to decide whether to prune it. Experimental results show that this technique can be used to significantly decrease the size of the index and querying speed with less compromise to retrieval effectiveness than similar heuristic methods. Furthermore, we give a formal statistical justification for such methods.
SkipBlock: Self-indexing for Block-Based Inverted List BIBAFull-Text 555-561
  Stéphane Campinas; Renaud Delbru; Giovanni Tummarello
In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide sub-optimal performance. In this paper, we present SkipBlock, a self-indexing model for block-based inverted lists. Based on a cost model, we show that it is possible to achieve significant improvements on both search performance and structure's space storage.
Weight-Based Boosting Model for Cross-Domain Relevance Ranking Adaptation BIBAFull-Text 562-567
  Peng Cai; Wei Gao; Kam-Fai Wong; Aoying Zhou
Adaptation techniques based on importance weighting were shown effective for RankSVM and RankNet, viz., each training instance is assigned a target weight denoting its importance to the target domain and incorporated into loss functions. In this work, we extend RankBoost using importance weighting framework for ranking adaptation. We find it non-trivial to incorporate the target weight into the boosting-based ranking algorithms because it plays a contradictory role against the innate weight of boosting, namely source weight that focuses on adjusting source-domain ranking accuracy. Our experiments show that among three variants, the additive weight-based RankBoost, which dynamically balances the two types of weights, significantly and consistently outperforms the baseline trained directly on the source domain.

Interactive IR

What Makes Re-finding Information Difficult? A Study of Email Re-finding BIBAFull-Text 568-579
  David Elsweiler; Mark Baillie; Ian Ruthven
Re-finding information that has been seen or accessed before is a task which can be relatively straight-forward, but often it can be extremely challenging, time-consuming and frustrating. Little is known, however, about what makes one re-finding task harder or easier than another. We performed a user study to learn about the contextual factors that influence users' perception of task difficulty in the context of re-finding email messages. 21 participants were issued re-finding tasks to perform on their own personal collections. The participants' responses to questions about the tasks combined with demographic data and collection statistics for the experimental population provide a rich basis to investigate the variables that can influence the perception of difficulty. A logistic regression model was developed to examine the relationships between variables and determine whether any factors were associated with perceived task difficulty. The model reveals strong relationships between difficulty and the time lapsed since a message was read, remembering when the sought-after email was sent, remembering other recipients of the email, the experience of the user and the user's filing strategy. We discuss what these findings mean for the design of re-finding interfaces and future re-finding research.
A User-Oriented Model for Expert Finding BIBAFull-Text 580-592
  Elena Smirnova; Krisztian Balog
Expert finding addresses the problem of retrieving a ranked list of people who are knowledgeable on a given topic. Several models have been proposed to solve this task, but so far these have focused solely on returning the most knowledgeable people as experts on a particular topic. In this paper we argue that in a real-world organizational setting the notion of the "best expert" also depends on the individual user and her needs. We propose a user-oriented approach that balances two factors that influence the user's choice: time to contact an expert, and the knowledge value gained after. We use the distance between the user and an expert in a social network to estimate contact time, and consider various social graphs, based on organizational hierarchy, geographical location, and collaboration, as well as the combination of these. Using a realistic test set, created from interactions of employees with a university-wide expert search engine, we demonstrate substantial improvements over a state-of-the-art baseline on all retrieval measures.
Simulating Simple and Fallible Relevance Feedback BIBAKFull-Text 593-604
  Feza Baskaya; Heikki Keskustalo; Kalervo Järvelin
Much of the research in relevance feedback (RF) has been performed under laboratory conditions using test collections and either test persons or simple simulation. These studies have given mixed results. The design of the present study is unique. First, the initial queries are realistically short queries generated by real end-users. Second, we perform a user simulation with several RF scenarios. Third, we simulate human fallibility in providing RF, i.e., incorrectness in feedback. Fourth, we employ graded relevance assessments in the evaluation of the retrieval results. The research question is: how does RF affect IR performance when initial queries are short and feedback is fallible? Our findings indicate that very fallible feedback is no different from pseudo-relevance feedback (PRF) and not effective on short initial queries. However, RF with empirically observed fallibility is as effective as correct RF and able to improve the performance of short initial queries.
Keywords: Relevance feedback; fallibility; simulation
AutoEval: An Evaluation Methodology for Evaluating Query Suggestions Using Query Logs BIBAFull-Text 605-610
  M-Dyaa Albakour; Udo Kruschwitz; Nikolaos Nanas; Yunhyong Kim; Dawei Song; Maria Fasli; Anne De Roeck
User evaluations of search engines are expensive and not easy to replicate. The problem is even more pronounced when assessing adaptive search systems, for example system-generated query modification suggestions that can be derived from past user interactions with a search engine. Automatically predicting the performance of different modification suggestion models before getting the users involved is therefore highly desirable. AutoEval is an evaluation methodology that assesses the quality of query modifications generated by a model using the query logs of past user interactions with the system. We present experimental results of applying this methodology to different adaptive algorithms which suggest that the predicted quality of different algorithms is in line with user assessments. This makes AutoEval a suitable evaluation framework for adaptive interactive search engines.
To Seek, Perchance to Fail: Expressions of User Needs in Internet Video Search BIBAKFull-Text 611-616
  Christoph Kofler; Martha Larson; Alan Hanjalic
This work investigates user expressions of content needs in Internet video search, focusing on cases in which users have failed to meet their search goals, although relevant content is reasonably certain to exist. We study expressions of user needs in the form of requests (i.e., questions) formulated in natural language and published to Yahoo! Answers. Experiments show that classifiers can distinguish requests associated with search-goal failure. We identify a group of 'easy-to-predict' requests (cases for which the classifier predicts search-goal failure well) and compile an inventory of strategies used by users to express search goals in these cases. In a final set of experiments, we demonstrate the feasibility of predicting search-goal failure based on query-like representations of the original natural-language requests. The results of our study are intended to inform the future development of indexing and retrieval techniques for Internet video that target difficult queries.
Keywords: Multimedia retrieval; Internet video; user information need; search-goal failure; crowdsourcing

Question Answering / NLP

Passage Reranking for Question Answering Using Syntactic Structures and Answer Types BIBAKFull-Text 617-628
  Elif Aktolga; James Allan; David A. Smith
Passage Retrieval is a crucial step in question answering systems, one that has been well researched in the past. Due to the vocabulary mismatch problem and independence assumption of bag-of-words retrieval models, correct passages are often ranked lower than other incorrect passages in the retrieved list. Whereas in previous work, passages are reranked only on the basis of syntactic structures of questions and answers, our method achieves a better ranking by aligning the syntactic structures based on the question's answer type and detected named entities in the candidate passage. We compare our technique with strong retrieval and reranking baselines. Experimental results using the TREC QA 1999-2003 datasets show that our method significantly outperforms the baselines over all ranks in terms of the MRR measure.
Keywords: Passage Retrieval; Question Answering; Reranking; Dependency Parsing; Named Entities
An Iterative Approach to Text Segmentation BIBAKFull-Text 629-640
  Fei Song; William M. Darling; Adnan Duric; Fred W. Kroon
We present divSeg, a novel method for text segmentation that iteratively splits a portion of text at its weakest point in terms of the connectivity strength between two adjacent parts. To search for the weakest point, we apply two different measures: one is based on language modeling of text segmentation and the other, on the interconnectivity between two segments. Our solution produces a deep and narrow binary tree -- a dynamic object that describes the structure of a text and that is fully adaptable to a user's segmentation needs. We treat it as a separate task to flatten the tree into a broad and shallow hierarchy either through supervised learning of a document set or explicit input of how a text should be segmented. The rich structure of our created tree further allows us to segment documents at varying levels such as topic, sub-topic, etc. We evaluated our new solution on a set of 265 articles from Discover magazine where the topic structures are unknown and need to be discovered. Our experimental results show that the iterative approach has the potential to generate better segmentation results than several leading baselines, and the separate flattening step allows us to adapt the results to different levels of details and user preferences.
Keywords: Text Segmentation; Language Modeling
Improving Query Focused Summarization Using Look-Ahead Strategy BIBAKFull-Text 641-652
  Rama Badrinath; Suresh Venkatasubramaniyan; C. E. Veni Madhavan
Query focused summarization is the task of producing a compressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as nodes and edges can be added based on sentence similarity. Graph based ranking algorithms which use 'Biased random surfer model' like topic-sensitive LexRank have been successfully applied to query focused summarization. In these algorithms, random walk will be biased towards the sentences which contain query relevant words. Specifically, it is assumed that random surfer knows the query relevance score of the sentence to where he jumps. However, neighbourhood information of the sentence to where he jumps is completely ignored. In this paper, we propose look-ahead version of topic-sensitive LexRank. We assume that random surfer not only knows the query relevance of the sentence to where he jumps but he can also look N-step ahead from that sentence to find query relevance scores of future set of sentences. Using this look ahead information, we figure out the sentences which are indirectly related to the query by looking at number of hops to reach a sentence which has query relevant words. Then we make the random walk biased towards even to the indirect query relevant sentences along with the sentences which have query relevant words. Experimental results show 20.2% increase in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Further, our system outperforms best systems in DUC 2006 and results are comparable to state of the art systems.
Keywords: Topic Sensitive LexRank; Look-ahead; Biased random walk
A Generalized Method for Word Sense Disambiguation Based on Wikipedia BIBAKFull-Text 653-664
  Chenliang Li; Aixin Sun; Anwitaman Datta
In this paper we propose a general framework for word sense disambiguation using knowledge latent in Wikipedia. Specifically, we exploit the rich and growing Wikipedia corpus in order to achieve a large and robust knowledge repository consisting of keyphrases and their associated candidate topics. Keyphrases are mainly derived from Wikipedia article titles and anchor texts associated with wikilinks. The disambiguation of a given keyphrase is based on both the commonness of a candidate topic and the context-dependent relatedness where unnecessary (and potentially noisy) context information is pruned. With extensive experimental evaluations using different relatedness measures, we show that the proposed technique achieved comparable disambiguation accuracies with respect to state-of-the-art techniques, while incurring orders of magnitude less computation cost.
Keywords: Word Sense Disambiguation; Wikipedia; Context Pruning

Posters

Representing Document Lengths with Identifiers BIBAFull-Text 665-669
  Raffaele Perego; Fabrizio Silvestri; Nicola Tonellotto
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically.
Free-Text Search versus Complex Web Forms BIBAFull-Text 670-674
  Kien Tjin-Kam-Jet; Dolf Trieschnigg; Djoerd Hiemstra
We investigated the use of free-text queries as an alternative means for searching 'behind' web forms. We conducted a user study where we evaluated our prototype free-text interface in a travel planner scenario. Our results show that users prefer this free-text interface over the original web form and that they are about 9% faster on average at completing their search tasks.
Multilingual Log Analysis: LogCLEF BIBAFull-Text 675-678
  Giorgio Maria Di Nunzio; Johannes Leveling; Thomas Mandl
The current lack of recent and long-term query logs makes the verifiability and repeatability of log analysis experiments very limited. A first attempt in this direction has been made within the Cross-Language Evaluation Forum in 2009 in a track named LogCLEF which aims to stimulate research on user behaviour in multilingual environments and promote standard evaluation collections of log data. We report on similarities and differences of the most recent activities for LogCLEF.
A Large-Scale System Evaluation on Component-Level BIBAFull-Text 679-682
  Jens Kürsten; Maximilian Eibl
This article describes a large-scale empirical evaluation across different types of English text collections. We ran about 140,000 experiments and analyzed the results on system component-level to find out if we can select configurations that perform reliable on specific types of corpora. To our own surprise we observed that a specific set of configuration parameters achieved 95% of the optimal average MAP across all collections. We conclude that this configuration could be used as baseline reference for evaluation of new IR approaches on English text corpora.
Should MT Systems Be Used as Black Boxes in CLIR? BIBAFull-Text 683-686
  Walid Magdy; Gareth J. F. Jones
The translation stage in cross language information retrieval (CLIR) acts as the main enabling stage to cross the language barrier between documents and queries. In recent years machine translation (MT) systems have become the dominant approach to translation in CLIR. However, unlike information retrieval (IR), MT focuses on the morphological and syntactical quality of the sentence. This requires large training resources and high computational power for training and translation. We present a novel technique for MT designed specifically for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this pre-processing step is found to significantly speed up the translation process without affecting the retrieval quality.
Video Retrieval Based on Words-of-Interest Selection BIBAKFull-Text 687-690
  Lei Wang; Dawei Song; Eyad Elyan
Query-by-example video retrieval is receiving an increasing attention in recent years. One of the state-of-art approaches is the Bag-of-visual Words (BoW) based technique, where images are described by a set of local features mapped to a discrete set of visual words. Such techniques, however, ignores spatial relations between visual words. In this paper, we present a content based video retrieval technique based on selected Words-of-Interest (WoI) that utilizes visual words spatial proximity constraint identified from the query. Experiments carried out on a public video database demonstrate promising results of our approach that outperform the classical BoW approach.
Keywords: Bag-of-Words; Content based video retrieval; Words-of-Interest
Classic Children's Literature -- Difficult to Read? BIBAFull-Text 691-694
  Dolf Trieschnigg; Claudia Hauff
Classic children's literature such as is nowadays freely available thanks to initiatives such as Project Gutenberg. Due to diverging vocabularies and style, these texts are often not readily understandable to children in the present day. Our goal is to make such texts more accessible by aiding children in the reading process, in particular by automatically identifying the terms that result in low readability. As a first step, in this poster we report on a preliminary user study that investigates the extent of the vocabulary problem. We also propose and evaluate a basic approach to detect such difficult terminology.
Applying Machine Learning Diversity Metrics to Data Fusion in Information Retrieval BIBAFull-Text 695-698
  David Leonard; David Lillis; Lusheng Zhang; Fergus Toolan; Rem W. Collier; John Dunnion
The Supervised Machine Learning task of classification has parallels with Information Retrieval (IR): in each case, items (documents in the case of IR) are required to be categorised into discrete classes (relevant or non-relevant). Thus a parallel can also be drawn between classifier ensembles, where evidence from multiple classifiers are combined to achieve a superior result, and the IR data fusion task.
   This paper presents preliminary experimental results on the applicability of classifier ensemble diversity metrics in data fusion. Initial results indicate a relationship between the quality of the fused result set (as measured by MAP) and the diversity of its inputs.
Reranking Collaborative Filtering with Multiple Self-contained Modalities BIBAKFull-Text 699-703
  Yue Shi; Martha Larson; Alan Hanjalic
A reranking algorithm, Multi-Rerank, is proposed to refine the recommendation list generated by collaborative filtering approaches. Multi-Rerank is capable of capturing multiple self-contained modalities, i.e., item modalities extractable from user-item matrix, to improve recommendation lists. Experimental results indicate that Multi-Rerank is effective for improving various CF approaches and additional benefits can be achieved when reranking with multiple modalities rather than a single modality.
Keywords: Recommender systems; collaborative filtering; reranking; multiple modalities; self-contained modalities
How Far Are We in Trust-Aware Recommendation? BIBAKFull-Text 704-707
  Yue Shi; Martha Larson; Alan Hanjalic
Social trust holds great potential for improving recommendation and much recent work focuses on the use of social trust for rating prediction, in particular, in the context of the Epinions dataset. An experimental comparison with trust-free, naïve approaches suggests that state-of-the-art social-trust-aware recommendation approaches, in particular Social Trust Ensemble (STE), can fail to isolate the true added value of trust. We demonstrate experimentally that not only trust-set users, but also random users can be exploited to yield recommendation improvement via STE. Specific users, however, do benefit from use of social trust, and we conclude with an investigation of their characteristics.
Keywords: Recommender systems; social trust; trust-aware recommendation
Re-ranking for Multimedia Indexing and Retrieval BIBAKFull-Text 708-711
  Bahjat Safadi; Georges Quénot
We proposed a re-ranking method for improving the performance of semantic video indexing and retrieval. Experimental results show that the proposed re-ranking method is effective and it improves the system performance on average by about 16-22% on TRECVID 2010 semantic indexing task.
Keywords: Multimedia Indexing and Retrieval; Re-ranking
Combining Query Translation Techniques to Improve Cross-Language Information Retrieval BIBAFull-Text 712-715
  Benjamin Herbert; György Szarvas; Iryna Gurevych
In this paper we address the combination of query translation approaches for cross-language information retrieval (CLIR). We translate queries with Google Translate and extend them with new translations obtained by mapping noun phrases in the query to concepts in the target language using Wikipedia. For two CLIR collections, we show that the proposed model provides meaningful translations that improve the strong baseline CLIR model based on a top performing SMT system.
Back to the Roots: Mean-Variance Analysis of Relevance Estimations BIBAFull-Text 716-720
  Guido Zuccon; Leif Azzopardi; Keith van Rijsbergen
Recently, mean-variance analysis has been proposed as a novel paradigm to model document ranking in Information Retrieval. The main merit of this approach is that it diversifies the ranking of retrieved documents. In its original formulation, the strategy considers both the mean of relevance estimates of retrieved documents and their variance. However, when this strategy has been empirically instantiated, the concepts of mean and variance are discarded in favour of a point-wise estimation of relevance (to replace the mean) and of a parameter to be tuned or, alternatively, a quantity dependent upon the document length (to replace the variance). In this paper we revisit this ranking strategy by going back to its roots: mean and variance. For each retrieved document, we infer a relevance distribution from a series of point-wise relevance estimations provided by a number of different systems. This is used to compute the mean and the variance of document relevance estimates. On the TREC Clueweb collection, we show that this approach improves the retrieval performances. This development could lead to new strategies to address the fusion of relevance estimates provided by different systems.
A Novel Re-ranking Approach Inspired by Quantum Measurement BIBAFull-Text 721-724
  Xiaozhao Zhao; Peng Zhang; Dawei Song; Yuexian Hou
Quantum theory (QT) has recently been employed to advance the theory of information retrieval (IR). A typical method, namely the Quantum Probability Ranking Principle (QPRP), was proposed to re-rank top retrieved documents by considering the inter-dependencies between documents through the "quantum interference". In this paper, we attempt to explore another important QT concept, namely the "quantum measurement". Inspired by the photon polarization experiment underpinning the "quantum measurement", we propose a novel re-ranking approach. Evaluation on several TREC data sets shows that in ad-hoc retrieval, our method can significantly improve the first-round ranking from a baseline retrieval model, and also outperform the QPRP.
Simple vs. Sophisticated Approaches for Patent Prior-Art Search BIBAFull-Text 725-728
  Walid Magdy; Patrice Lopez; Gareth J. F. Jones
Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.
Towards Quantum-Based DB+IR Processing Based on the Principle of Polyrepresentation BIBAFull-Text 729-732
  David Zellhöfer; Ingo Frommholz; Ingo Schmitt; Mounia Lalmas; Keith van Rijsbergen
The cognitively motivated principle of polyrepresentation still lacks a theoretical foundation in IR. In this work, we discuss two competing polyrepresentation frameworks that are based on quantum theory. Both approaches support different aspects of polyrepresentation, where one is focused on the geometric properties of quantum theory while the other has a strong logical basis. We compare both approaches and outline how they can be combined to express further aspects of polyrepresentation.
ATTention: Understanding Authors and Topics in Context of Temporal Evolution BIBAFull-Text 733-737
  Nasir Naveed; Sergej Sizov; Steffen Staab
Understanding thematic trends and user roles is an important challenge in the field of information retrieval. In this contribution, we present a novel model for analyzing evolution of user's interests with respect to produced content over time. Our approach ATTention (a name derived from analysis of Authors and Topics in the Temporal context) addresses this problem by means of Bayesian modeling of relations between authors, latent topics and temporal information. We also present results of preliminary evaluations with scientific publication datasets and discuss opportunities of model use in novel mining and recommendation scenarios.
Role of Emotional Features in Collaborative Recommendation BIBAFull-Text 738-742
  Yashar Moshfeghi; Joemon M. Jose
The aim of this poster is to investigate the role of emotion in the collaborative filtering task. For this purpose, a kernel-based collaborative recommendation technique is used. The experiment is conducted on two MovieLens data sets. The emotional features are extracted from the movie reviews and plot summaries. The results show that emotional features are capable of enhancing recommendation effectiveness.
The Importance of the Depth for Text-Image Selection Strategy in Learning-To-Rank BIBAFull-Text 743-746
  David Buffoni; Sabrina Tollari; Patrick Gallinari
We examine the effect of the number documents being pooled, for constructing training sets, has on the performance of the learning-to-rank (LTR) approaches that use it to build our ranking functions. Our investigation takes place in a multimedia setting and uses the ImageCLEF photo 2006 dataset based on text and visual features. Experiments show that our LTR algorithm, OWPC, outperforms other baselines.
Personal Blog Retrieval Using Opinion Features BIBAFull-Text 747-750
  Shima Gerani; Mostafa Keikha; Mark Carman; Fabio Crestani
Faceted blog distillation aims at finding blogs with recurring interest to a topic while satisfying a specific facet of interest. In this paper we focus on the personal facet and propose a method that uses opinion features as indicators of personal content. Experimental results on TREC BLOG08 data-set confirm our intuition that personal blogs are more opinionated.
Processing Queries in Session in a Quantum-Inspired IR Framework BIBAFull-Text 751-754
  Ingo Frommholz; Benjamin Piwowarski; Mounia Lalmas; Keith van Rijsbergen
In a search session, users tend to reformulate their queries, for instance because they want to generalise or specify them, or because they are undergoing a drift in their information need. This motivates to regard queries not in isolation, but within the session they are embedded in. In this poster, we propose an approach inspired by quantum mechanics to represent queries and their reformulations as density operators. Differently constructed densities can potentially be applied for different types of query reformulation. To do so, we propose and discuss indicators that can hint us to the type of query reformulation we are dealing with.
Towards Predicting Relevance Using a Quantum-Like Framework BIBAFull-Text 755-758
  Emanuele Di Buccio; Massimo Melucci; Dawei Song
In this paper, the user's relevance state is modeled using quantum-like probability and the interference term is proposed so as to model the evolution of the state and the user's uncertainty about the assessment. The theoretical framework has been formulated and the results of an experimental user study based on a TREC test collection have been reported.
Fusion vs. Two-Stage for Multimodal Retrieval BIBAFull-Text 759-762
  Avi Arampatzis; Konstantinos Zagoris; Savvas A. Chatzichristofis
We compare two methods for retrieval from multimodal collections. The first is a score-based fusion of results, retrieved visually and textually. The second is a two-stage method that visually re-ranks the top-K results textually retrieved. We discuss their underlying hypotheses and practical limitations, and contact a comparative evaluation on a standardized snapshot of Wikipedia. Both methods are found to be significantly more effective than single-modality baselines, with no clear winner but with different robustness features. Nevertheless, two-stage retrieval provides efficiency benefits over fusion.
Combination of Feature Selection Methods for Text Categorisation BIBAFull-Text 763-766
  Robert Neumayer; Rudolf Mayer; Kjetil Nørvåg
Feature selection plays a vital role in text categorisation. A range of different methods have been developed, each having unique properties and selecting different features. We show some results of an extensive study of feature selection approaches using a wide range of combination methods. We performed experiments on 18 test collections and report a subset of the results.

Demonstrations

Time-Surfer: Time-Based Graphical Access to Document Content BIBAFull-Text 767-771
  Hector Llorens; Estela Saquete; Borja Navarro; Robert Gaizauskas
This demonstration presents a novel interactive graphical interface to document content focusing on the time dimension. The objective of Time-Surfer is to let users search and explore information related to a specific period, event, or event participant within a document. The system is based on the automatic detection not only of time expressions, but also of events and temporal relations. Through a zoomable timeline interface, it brings users an dynamic picture of the temporal distribution of events within a document. Time-Surfer has been successfully applied to history and biographical articles from Wikipedia.
ARES: A Retrieval Engine Based on Sentiments BIBAFull-Text 772-775
  Gianluca Demartini
This paper introduces a system enriching the standard web search engine interface with sentiment information. Additionally, it exploits such annotations to diversify the result list based on the different sentiments expressed by retrieved web pages. Thanks to the annotations, the end user is aware of which opinions the search engine is showing her and, thanks to the diversification, she can see an overview of the different opinions expressed about the requested topic. We describe the methods used for computing sentiment scores of web search results and for re-ranking them in order to cover different sentiment classes. The proposed system, built on top of commercial search engine APIs, is available on-line.
Web Search Query Assistance Functionality for Young Audiences BIBAFull-Text 776-779
  Carsten Eickhoff; Tamara Polajnar; Karl Gyllstrom; Sergio Duarte Torres; Richard Glassey
The Internet plays an important role in people's daily lives. This is not only true for adults, but also holds for children; however, current web search engines are designed with adult users and their cognitive abilities in mind. Consequently, children face considerable barriers when using these information systems. In this work, we demonstrate the use of query assistance and search moderation techniques as well as appropriate interface design to overcome or mitigate these challenges.
Conversation Retrieval from Twitter BIBAKFull-Text 780-783
  Matteo Magnani; Danilo Montesi; Gabriele Nunziante; Luca Rossi
The process of retrieving conversations from social network sites differs from traditional Web information retrieval because it involves human communication aspects, like the degree of interest in the conversation explicitly or implicitly expressed by the interacting people and their influence/popularity. Our demo allows users to include these aspects into the search process. The system allows the retrieval of millions of conversations generated on the popular Twitter social network site, and in particular conversations about trending topics.
Keywords: Conversation retrieval; Twitter
Finding Useful Users on Twitter: Twittomender the Followee Recommender BIBAFull-Text 784-787
  John Hannon; Kevin McCarthy; Barry Smyth
This paper examines an application for finding pertinent friends (followees) on Twitter. Whilst Twitter provides a great basis for receiving information, we believe a potential downfall lies in the lack of an effective way in which users of Twitter can find other Twitter users to follow. We apply several recommendation techniques to build a followee recommender for Twitter. We evaluate a variety of different recommendation strategies, using real-user data, to demonstrate the potential for this recommender system to correctly identify and promote interesting users who are worth following.
Visual Exploration of Health Information for Children BIBAFull-Text 788-792
  Frans van der Sluis; Sergio Duarte Torres; Djoerd Hiemstra; Betsy van Dijk; Frea Kruisinga
Children experience several difficulties retrieving information using current Information Retrieval (IR) systems. Particularly, children struggle to find the right keywords to construct queries given their lack of domain knowledge. This problem is even more critical in the case of the specialized health domain. In this work we present a novel method to address this problem using a cross-media search interface in which the textual data is searched through visual images. This solution aims to solve the recall and recognition problem which is salient for health information, by replacing the need for a vocabulary with the easy task of recognising the different body parts.