| IR Research: Systems, Interaction, Evaluation and Theories | | BIBA | Full-Text | 1-3 | |
| Kalervo Järvelin | |||
| The ultimate goal of information retrieval (IR) research is to create ways
to support humans to better access information in order to better carry out
their (work) tasks. Because of this, IR research has a primarily technological
interest in knowledge creation -- how to find information (better)? IR research
therefore has a constructive aspect (to create novel systems) and an evaluative
aspect (are they any good?). Evaluation is sometimes referred to as a hallmark
and distinctive feature of IR research. No claim on IR system performance is
granted any merit unless proven through evaluation. Technological innovation
alone is not sufficient. In fact, much research in IR deals with IR evaluation
and its methodology.
Evaluation, in general, is the systematic determination of merit and significance of something using criteria against some standards. Evaluation therefore requires some object that is evaluated and some goal that should be achieved or served. In IR, both can be set in many ways. The object usually is an IR system -- but what is an IR system? The goal is typically the quality of the retrieved result -- but what is the retrieved result and how does one measure quality? These questions can be answered in alternative ways leading to different kinds of IR evaluation. Practical life with all its variability is difficult and expensive to investigate. Therefore surrogate and more easily measurable goals are employed in IR evaluation, typically the quality of the ranked result list instead of the work task result. The task performance process may also be cut down from a work task to a search task and down to running an individual query in a test collection. This simplification has led to standardization of research designs and tremendous success in IR research. However, as the goals and systems drift farther away from the practical life condition, one needs to ask, whether the findings still best serve the initial goal of evaluation (supporting human performance)? If means (outputs) replace ends (outcomes), one runs the risk of sub-optimization. It is important to evaluate all subsystems of information retrieval processes, in addition to the search engines. Through a wider perspective one may be able to put the subsystems and their contributions in relation with each other. We will discuss nested IR evaluation frameworks ranging from IR system centered evaluation to work-task based evaluation. We will also point to the Pandora's box of problems that the enlargement of the scope of research entails. Is science at risk here? The contributions of a research area, in addition to constructive and evaluative contributions, may be generally empirical, theoretical and methodological. Why should anyone in IR care about anything beyond IR experimentation (i.e. evaluation) using test collections? The Cranfield model seeks to relate texts (documents), queries, their representations and matching to topical relevance in ranked output. Who relates this, and a range of possible other contributing factors, to outcomes in search task performance or work task performance? The talk will outline some possibilities for descriptive, explanatory and theoretical research in IR. As an example of descriptive research, we will discuss information access in task processes. Regarding explanatory and theoretical research, we look at unit theories that connect work task stages and properties to information need properties, information sources, and searching. Such studies do not solve a technical problem, nor evaluate any particular technique, and may therefore be considered unpractical. However, they may identify mechanisms that mediate between IR processes and task outcomes and position factors in the processes of information access into a balanced perspective. Therefore they may help focus research efforts on technical problems or evaluation. | |||
| Ad Retrieval Systems in vitro and in vivo: Knowledge-Based Approaches to Computational Advertising | | BIBA | Full-Text | 4-5 | |
| Evgeniy Gabrilovich | |||
| Over the past decade, online advertising became the principal economic force
behind many an Internet service, from major search engines to globe-spanning
social networks to blogs. There is often a tension between online advertising
and user experience, but on the other hand, advertising revenue enables a
myriad of free Web services to the public and fosters a great deal of
innovation. Matching the advertisers' message to a receptive and interested
audience benefits both sides; indeed, literally hundreds of millions of users
occasionally click on the ads, hence they should be considered relevant to the
users' information needs by current IR evaluation principles. The utility of
ads can be better explained by considering advertising as a medium of
information [2,3]. Similarly to aggregated search [1], which enhances users'
Web search experience with relevant news, local results, user-generated
content, or multimedia, online advertising provides another rich source of
content. This source, however, is in a complexity class of its own, due to the
brevity of bid phrases, ad text being optimized for presentation rather than
indexing, and multiple, possibly contradictory utility functions.
A new scientific sub-discipline -- Computational Advertising -- has recently emerged, which strives to make online advertising integral to the user experience and relevant to the users' information needs, as well as economically worthwhile to the advertiser and the publisher. In this talk we discuss the unique algorithmic challenges posed by searching the ad corpus, and report on empirical evaluation of large-scale advertising systems in vivo. At first approximation, finding user-relevant ads is akin to ad hoc information retrieval, where the user context is distilled into a query executed against an index of ads. However, the elaborate structure of ad campaigns, along with the cornucopia of pertinent non-textual information, makes ad retrieval substantially and interestingly different. We show how to adapt standard IR methods for ad retrieval, by developing structure-aware indexing techniques and by augmenting the ad selection process with exogenous knowledge. Computational advertising also employs a host of NLP techniques, from text summarization for just-in-time ad matching, to machine translation for cross-language ad retrieval, to natural language generation for automatic construction of advertising campaigns. Last but not least, we study the interplay between the algorithmic and sponsored search results, as well as formulate and explore context transfer, which characterizes the user's transition from Web search to the context of the landing page following an ad-click. These studies offer deep insights into how users interact with ads, and facilitate better understanding of the much broader notion of relevance in ad retrieval compared to Web search. | |||
| The Value of User Feedback | | BIBA | Full-Text | 6 | |
| Thorsten Joachims | |||
| Information retrieval systems and their users engage in a dialogue. While the main flow of information is from the system to the user, feedback from the user to the system provides many opportunities for short-term and long-term learning. In this talk, I will explore two interrelated questions that are central to the effective use of feedback. First, how can user feedback be collected so that it does not lay a burden on the user? I will argue that the mechanisms for collecting feedback have to be integrated into the design of the retrieval process, so that the user's short-term goals are well-aligned with the system's goal of collecting feedback. Second, if this integration succeeds, how valuable is the information that user feedback provides? For the tasks of retrieval evaluation and query disambiguation, the talk will quantify by how much user feedback can save human annotator effort and improve retrieval quality respectively. | |||
| Text Classification for a Large-Scale Taxonomy Using Dynamically Mixed Local and Global Models for a Node | | BIBAK | Full-Text | 7-18 | |
| Heung-Seon Oh; Yoonjung Choi; Sung-Hyon Myaeng | |||
| Hierarchical text classification for a large-scale Web taxonomy is
challenging because the number of categories hierarchically organized is large
and the training data for deep categories are usually sparse. It's been shown
that a narrow-down approach involving a search of the taxonomical tree is an
effective method for the problem. A recent study showed that both local and
global information for a node is useful for further improvement. This paper
introduces two methods for mixing local and global models dynamically for
individual nodes and shows they improve classification effectiveness by 5% and
30%, respectively, over and above the state-of-art method. Keywords: Web Taxonomy; Hierarchical Text Classification; ODP | |||
| User-Related Tag Expansion for Web Document Clustering | | BIBA | Full-Text | 19-31 | |
| Peng Li; Bin Wang; Wei Jin; Yachao Cui | |||
| As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags on clustering. In this work, we propose a user-related tag expansion method to overcome the problem, which incorporates additional useful tags into the original tag document by utilizing user tagging as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. This problem is addressed in this research by designing a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that (1) Our user-related tag expansion method can be effectively applied to over 90% tagged web documents; (2) Folk-LDA can alleviate the topic drift in expansion, especially for those topic-specific documents; (3) Compared to word-based clustering, our approach using only tags achieves a statistically significant increase of 39% on F1 score while reducing 76% terms involved in computation at best. | |||
| A Comparative Experimental Assessment of a Threshold Selection Algorithm in Hierarchical Text Categorization | | BIBA | Full-Text | 32-42 | |
| Andrea Addis; Giuliano Armano; Eloisa Vargiu | |||
| Most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold, i.e., on hierarchical text categorization. For solutions of a hierarchical problem that make use of an ensemble of classifiers, the behavior of each classifier typically depends on an acceptance threshold, which turns a degree of membership into a dichotomous decision. In principle, the problem of finding the best acceptance thresholds for a set of classifiers related with taxonomic relationships is a hard problem. Hence, devising effective ways for finding suboptimal solutions to this problem may have great importance. In this paper, we assess a greedy threshold selection algorithm aimed at finding a suboptimal combination of thresholds in a hierarchical text categorization setting. Comparative experiments, performed on Reuters, report the performance of the proposed threshold selection algorithm against a relaxed brute-force algorithm and against two state-of-the-art algorithms. Results highlight the effectiveness of the approach. | |||
| Improving Tag-Based Recommendation by Topic Diversification | | BIBA | Full-Text | 43-54 | |
| Christian Wartena; Martin Wibbels | |||
| Collaborative tagging has emerged as a mechanism to describe items in large
on-line collections. Tags are assigned by users to describe and find back
items, but it is also tempting to describe the users in terms of the tags they
assign or in terms of the tags of the items they are interested in. The
tag-based profile thus obtained can be used to recommend new items.
If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recommending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it is assumed that there exists a trade-off between accuracy and diversity. In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study topic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity. | |||
| A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating | | BIBAK | Full-Text | 55-66 | |
| Jorge Carrillo de Albornoz; Laura Plaza; Pablo Gervás; Alberto Díaz | |||
| The information in customer reviews is of great interest to both companies
and consumers. This information is usually presented as non-structured
free-text so that automatically extracting and rating user opinions about a
product is a challenging task. Moreover, this opinion highly depends on the
product features on which the user judgments and impressions are expressed.
Following this idea, our goal is to predict the overall rating of a product
review based on the user opinion about the different product features that are
evaluated in the review. To this end, the system first identifies the features
that are relevant to consumers when evaluating a certain type of product, as
well as the relative importance or salience of such features. The system then
extracts from the review the user opinions about the different product features
and quantifies such opinions. The salience of the different product features
and the values that quantify the user opinions about them are used to construct
a Vector of Feature Intensities which represents the review and will be the
input to a machine learning model that classifies the review into different
rating categories. Our method is evaluated over 1000 hotel reviews from
booking.com. The results compare favorably with those achieved by other systems
addressing similar evaluations. Keywords: automatic product rating; feature mining; polarity detection; sentiment
analysis | |||
| Modeling Answerer Behavior in Collaborative Question Answering Systems | | BIBA | Full-Text | 67-79 | |
| Qiaoling Liu; Eugene Agichtein | |||
| A key functionality in Collaborative Question Answering (CQA) systems is the assignment of the questions from information seekers to the potential answerers. An attractive solution is to automatically recommend the questions to the potential answerers with expertise or interest in the question topic. However, previous work has largely ignored a key problem in question recommendation -- namely, whether the potential answerer is likely to accept and answer the recommended questions in a timely manner. This paper explores the contextual factors that influence the answerer behavior in a large, popular CQA system, with the goal to inform the construction of question routing and recommendation systems. Specifically, we consider when users tend to answer questions in a large-scale CQA system, and how answerers tend to choose the questions to answer. Our results over a dataset of more than 1 million questions draw from a real CQA system could help develop more realistic evaluation methods for question recommendation, and inform the design of future question recommender systems. | |||
| Clash of the Typings | | BIBAK | Full-Text | 80-91 | |
| Karl Gyllstrom; Marie-Francine Moens | |||
| The TadPolemic system identifies whether web search queries (1) are
controversial in nature and/or (2) pertain to children's topics. We are
incorporating it into a children's web search engine to assist children's
search during difficult topics, as well as to provide filtering or mitigation
of bias in results when children search for contentious topics. We show through
an evaluation that the system is effective at detecting kids' topics and
controversies for a broad range of topics. Though designed to assist children,
we believe these methods are generalizable beyond young audiences and can be
usefully applied in other contexts. Keywords: controversy detection; children's search | |||
| Are Semantically Related Links More Effective for Retrieval? | | BIBAK | Full-Text | 92-103 | |
| Marijn Koolen; Jaap Kamps | |||
| Why do links work? Link-based ranking algorithms are based on the often
implicit assumption that linked documents are semantically related to each
other, and that link information is therefore useful for retrieval. Although
the benefits of link information are well researched, this underlying
assumption on why link evidence works remains untested, and the main aim of
this paper is to do exactly that. Specifically, we use Wikipedia because it has
a dense link structure in combination with a large category structure, which
allows for an independent measurement of the semantic relatedness of linked
documents. Our main findings are that: 1) global, query-independent link
evidence, is not affected by the semantic nature of the links, and 2) for
local, query-dependent link evidence, the effectiveness of links increases as
their semantic distance decreases. That is, we directly observe that links
between semantically related pages are more effective for ad hoc retrieval than
links between unrelated ones. These findings confirm and quantify the
underlying assumption of existing link-based methods, which sheds further light
on our understanding of the nature of link evidence. Such deeper understanding
is instrumental for the development of novel link-based methods. Keywords: Links; Semantic Relatedness; Effectiveness; Wikipedia | |||
| Caching for Realtime Search | | BIBA | Full-Text | 104-116 | |
| Edward Bortnikov; Ronny Lempel; Kolman Vornovitsky | |||
| Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP -- the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment -- on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache's memory footprint. | |||
| Enhancing Deniability against Query-Logs | | BIBA | Full-Text | 117-128 | |
| Avi Arampatzis; Pavlos Efraimidis; George Drosatos | |||
| We propose a method for search privacy on the Internet, focusing on enhancing plausible deniability against search engine query-logs. The method approximates the target search results, without submitting the intended query and avoiding other exposing queries, by employing sets of queries representing more general concepts. We model the problem theoretically, and investigate the practical feasibility and effectiveness of the proposed solution with a set of real queries with privacy issues on a large web collection. The findings may have implications for other IR research areas, such as query expansion and fusion in meta-search. | |||
| On the Contributions of Topics to System Evaluation | | BIBA | Full-Text | 129-140 | |
| Stephen Robertson | |||
| We consider the selection of good subsets of topics for system evaluation. It has previously been suggested that some individual topics and some subsets of topics are better for system evaluation than others: given limited resources, choosing the best subset of topics may give significantly better prediction of overall system effectiveness than (for example) choosing random subsets. Earlier experimental results are extended, with particular reference to generalisation: the ability of a subset of topics selected on the basis on one collection of system runs to perform well in evaluating another collection of system runs. It turns out to be hard to establish generalisability; it is not at all clear that it is possible to identify subsets of topics that are good for general evaluation. | |||
| A Methodology for Evaluating Aggregated Search Results | | BIBA | Full-Text | 141-152 | |
| Jaime Arguello; Fernando Diaz; Jamie Callan; Ben Carterette | |||
| Aggregated search is the task of incorporating results from different specialized search services, or verticals, into Web search results. While most prior work focuses on deciding which verticals to present, the task of deciding where in the Web results to embed the vertical results has received less attention. We propose a methodology for evaluating an aggregated set of results. Our method elicits a relatively small number of human judgements for a given query and then uses these to facilitate a metric-based evaluation of any possible presentation for the query. An extensive user study with 13 verticals confirms that, when users prefer one presentation of results over another, our metric agrees with the stated preference. By using Amazon's Mechanical Turk, we show that reliable assessments can be obtained quickly and inexpensively. | |||
| Design and Implementation of Relevance Assessments Using Crowdsourcing | | BIBA | Full-Text | 153-164 | |
| Omar Alonso; Ricardo Baeza-Yates | |||
| In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own. | |||
| In Search of Quality in Crowdsourcing for Search Engine Evaluation | | BIBAK | Full-Text | 165-176 | |
| Gabriella Kazai | |||
| Crowdsourcing is increasingly looked upon as a feasible alternative to
traditional methods of gathering relevance labels for the evaluation of search
engines, offering a solution to the scalability problem that hinders
traditional approaches. However, crowdsourcing raises a range of questions
regarding the quality of the resulting data. What indeed can be said about the
quality of the data that is contributed by anonymous workers who are only paid
cents for their efforts? Can higher pay guarantee better quality? Do better
qualified workers produce higher quality labels? In this paper, we investigate
these and similar questions via a series of controlled crowdsourcing
experiments where we vary pay, required effort and worker qualifications and
observe their effects on the resulting label quality, measured based on
agreement with a gold set. Keywords: IR evaluation; relevance data gathering; crowdsourcing | |||
| Summarizing a Document Stream | | BIBA | Full-Text | 177-188 | |
| Hiroya Takamura; Hikaru Yokono; Manabu Okumura | |||
| We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches. | |||
| A Link Prediction Approach to Recommendations in Large-Scale User-Generated Content Systems | | BIBAK | Full-Text | 189-200 | |
| Nitin Chiluka; Nazareno Andrade; Johan Pouwelse | |||
| Recommending interesting and relevant content from the vast repositories of
User-Generated Content systems (UGCs) such as YouTube, Flickr and Digg is a
significant challenge. Part of this challenge stems from the fact that
classical collaborative filtering techniques -- such as k-Nearest Neighbor --
cannot be assumed to perform as well in UGCs as in other applications. Such
technique has severe limitations regarding data sparsity and scalability that
are unfitting for UGCs. In this paper, we employ adaptations of popular Link
Prediction algorithms that were shown to be effective in massive online social
networks for recommending items in UGCs. We evaluate these algorithms on a
large dataset we collect from Flickr. Our results suggest that Link Prediction
algorithms are a more scalable and accurate alternative to classical
collaborative filtering in the context of UGCs. Moreover, our experiments show
that the algorithms considering the immediate neighborhood of users in an
user-item graph to recommend items outperform the algorithms that use the
entire graph structure for the same. Finally, we find that, contrary to
intuition, exploiting explicit social links among users in the recommendation
algorithms improves only marginally their performance. Keywords: User-Generated Content Systems; Recommendation; Collaborative Filtering;
Link Prediction; Flickr | |||
| Topic Classification in Social Media Using Metadata from Hyperlinked Objects | | BIBA | Full-Text | 201-206 | |
| Sheila Kinsella; Alexandre Passant; John G. Breslin | |||
| Social media presents unique challenges for topic classification, including the brevity of posts, the informal nature of conversations, and the frequent reliance on external hyperlinks to give context to a conversation. In this paper we investigate the usefulness of these external hyperlinks for determining the topic of an individual post. We focus specifically on hyperlinks to objects which have related metadata available on the Web, including Amazon products and YouTube videos. Our experiments show that the inclusion of metadata from hyperlinked objects in addition to the original post content improved classifier performance measured with the F-score from 84% to 90%. Further, even classification based on object metadata alone outperforms classification based on the original post content. | |||
| Peddling or Creating? Investigating the Role of Twitter in News Reporting | | BIBA | Full-Text | 207-213 | |
| Ilija Subašic; Bettina Berendt | |||
| The widespread use of social media is regarded by many as the emergence of a new highway for information and news sharing promising a new information-driven "social revolution". In this paper, we analyze how this idea transfers to the news reporting domain. To analyze the role of social media in news reporting, we ask whether citizen journalists tend to create news or peddle (re-report) existing content. We introduce a framework for exploring divergence between news sources by providing multiple views on corpora in comparison. The results of our case study comparing Twitter and other news sources suggest that a major role of Twitter authors consists of neither creating nor peddling, but extending them by commenting on news. | |||
| Latent Sentiment Model for Weakly-Supervised Cross-Lingual Sentiment Classification | | BIBAK | Full-Text | 214-225 | |
| Yulan He | |||
| In this paper, we present a novel weakly-supervised method for cross-lingual
sentiment analysis. In specific, we propose a latent sentiment model (LSM)
based on latent Dirichlet allocation where sentiment labels are considered as
topics. Prior information extracted from English sentiment lexicons through
machine translation are incorporated into LSM model learning, where preferences
on expectations of sentiment labels of those lexicon words are expressed using
generalized expectation criteria. An efficient parameter estimation procedure
using variational Bayes is presented. Experimental results on the Chinese
product reviews show that the weakly-supervised LSM model performs comparably
to supervised classifiers such as Support vector Machines with an average of
81% accuracy achieved over a total of 5484 review documents. Moreover, starting
with a generic sentiment lexicon, the LSM model is able to extract highly
domain-specific polarity words from text. Keywords: Latent sentiment model (LSM); cross-lingual sentiment analysis; Generalized
expectation; latent Dirichlet allocation | |||
| Fractional Similarity: Cross-Lingual Feature Selection for Search | | BIBA | Full-Text | 226-237 | |
| Jagadeesh Jagarlamudi; Paul N. Bennett | |||
| Training data as well as supplementary data such as usage-based click behavior may abound in one search market (i.e., a particular region, domain, or language) and be much scarcer in another market. Transfer methods attempt to improve performance in these resource-scarce markets by leveraging data across markets. However, differences in feature distributions across markets can change the optimal model. We introduce a method called Fractional Similarity, which uses query-based variance within a market to obtain more reliable estimates of feature deviations across markets. An empirical analysis demonstrates that using this scoring method as a feature selection criterion in cross-lingual transfer improves relevance ranking in the foreign language and compares favorably to a baseline based on KL divergence. | |||
| Is a Query Worth Translating: Ask the Users! | | BIBAK | Full-Text | 238-250 | |
| Ahmed Hefny; Kareem Darwish; Ali Alkahky | |||
| Users in many regions of the world are multilingual and they issue similar
queries in different languages. Given a source language query, we propose query
picking which involves finding equivalent target language queries in a large
query log. Query picking treats translation as a search problem, and can serve
as a translation method in the context of cross-language and multilingual
search. Further, given that users usually issue queries when they think they
can find relevant content, the success of query picking can serve as a strong
indicator to the projected success of cross-language and multilingual search.
In this paper we describe a system that performs query picking and we show that
picked queries yield results that are statistically indistinguishable from a
monolingual baseline. Further, using query picking to predict the effectiveness
of cross-language results can have statistically significant effect on the
success of multilingual search with improvements over a monolingual baseline.
Multilingual merging methods that do not account for the success of query
picking can often hurt retrieval effectiveness. Keywords: cross-language search; multilingual search; query translation mining | |||
| Balancing Exploration and Exploitation in Learning to Rank Online | | BIBA | Full-Text | 251-263 | |
| Katja Hofmann; Shimon Whiteson; Maarten de Rijke | |||
| As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice. | |||
| ReFER: Effective Relevance Feedback for Entity Ranking | | BIBA | Full-Text | 264-276 | |
| Tereza Iofciu; Gianluca Demartini; Nick Craswell; Arjen P. de Vries | |||
| Web search increasingly deals with structured data about people, places and things, their attributes and relationships. In such an environment an important sub-problem is matching a user's unstructured free-text query to a set of relevant entities. For example, a user might request 'Olympic host cities'. The most challenging general problem is to find relevant entities, of the correct type and characteristics, based on a free-text query that need not conform to any single ontology or category structure. This paper presents an entity ranking relevance feedback model, based on example entities specified by the user or on pseudo feedback. It employs the Wikipedia category structure, but augments that structure with 'smooth categories' to deal with the sparseness of the raw category information. Our experiments show the effectiveness of the proposed method, whether applied as a pseudo relevance feedback method or interactively with the user in the loop. | |||
| The Limits of Retrieval Effectiveness | | BIBA | Full-Text | 277-282 | |
| Ronan Cummins; Mounia Lalmas; Colm O'Riordan | |||
| Best match systems in Information Retrieval have long been one of the most predominant models used in both research and practice. It is argued that the effectiveness of these types of systems for the ad hoc task in IR has plateaued. In this short paper, we conduct experiments to find the upper limits of performance of these systems from three different perspectives. Our results on TREC data show that there is much room for improvement in terms of term-weighting and query reformulation in the ad hoc task given an entire information need. | |||
| Learning Conditional Random Fields from Unaligned Data for Natural Language Understanding | | BIBA | Full-Text | 283-288 | |
| Deyu Zhou; Yulan He | |||
| In this paper, we propose a learning approach to train conditional random fields from unaligned data for natural language understanding where input to model learning are sentences paired with predicate formulae (or abstract semantic annotations) without word-level annotations. The learning approach resembles the expectation maximization algorithm. It has two advantages, one is that only abstract annotations are needed instead of fully word-level annotations, and the other is that the proposed learning framework can be easily extended for training other discriminative models, such as support vector machines, from abstract annotations. The proposed approach has been tested on the DARPA Communicator Data. Experimental results show that it outperforms the hidden vector state (HVS) model, a modified hidden Markov model also trained on abstract annotations. Furthermore, the proposed method has been compared with two other approaches, one is the hybrid framework (HF) combining the HVS model and the support vector hidden Markov model, and the other is discriminative training of the HVS model (DT). The proposed approach gives a relative error reduction rate of 18.7% and 8.3% in F-measure when compared with HF and DT respectively. | |||
| Subspace Tracking for Latent Semantic Analysis | | BIBA | Full-Text | 289-300 | |
| Radim RehOrek | |||
| Modern applications of Latent Semantic Analysis (LSA) must deal with enormous (often practically infinite) data collections, calling for a single-pass matrix decomposition algorithm that operates in constant memory w.r.t. the collection size. This paper introduces a streamed distributed algorithm for incremental SVD updates. Apart from the theoretical derivation, we present experiments measuring numerical accuracy and runtime performance of the algorithm over several data collections, one of which is the whole of the English Wikipedia. | |||
| Text Retrieval Methods for Item Ranking in Collaborative Filtering | | BIBAK | Full-Text | 301-306 | |
| Alejandro Bellogín; Jun Wang; Pablo Castells | |||
| Collaborative Filtering (CF) aims at predicting unknown ratings of a user
from other similar users. The uniqueness of the problem has made its
formulation distinctive to other information retrieval problems. While the
formulation has proved to be effective in rating prediction tasks, it has
limited the potential connections between these algorithms and Information
Retrieval (IR) models. In this paper we propose a common notational framework
for IR and rating-based CF, as well as a technique to provide CF data with a
particular structure, in order to be able to use any IR weighting function with
it. We argue that the flexibility of our approach may yield to much better
performing algorithms. In fact, in this work we have found that IR models
perform well in item ranking tasks, along with different normalization
strategies. Keywords: Collaborative Filtering; Text Retrieval; Unified Models | |||
| Classifying with Co-stems | | BIBA | Full-Text | 307-313 | |
| Nedim Lipka; Benno Stein | |||
| Besides the content the writing style is an important discriminator in information filtering tasks. Ideally, the solution of a filtering task employs a text representation that models both kinds of characteristics. In this respect word stems are clearly content capturing, whereas word suffixes qualify as writing style indicators. Though the latter feature type is used for part of speech tagging, it has not yet been employed for information filtering in general. We propose a text representation that combines both the output of a stemming algorithm (stems) and the stem-reduced words (co-stems). A co-stems can be a prefix, an infix, a suffix, or a concatenation of prefixes, infixes, or suffixes. Using accepted standard corpora, we analyze the discriminative power of this representation for a broad range of information filtering tasks to provide new insights into the adequacy and task-specificity of text representation models. Altogether we observe that co-stems-based representations outperform the classical bag of words model for several filtering tasks. | |||
| Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content | | BIBAK | Full-Text | 314-325 | |
| Marçal Rusiñol; David Aldavert; Dimosthenis Karatzas; Ricardo Toledo; Josep Lladós | |||
| In this paper we propose an efficient queried-by-example retrieval system
which is able to retrieve trademark images by similarity from patent and
trademark offices' digital libraries. Logo images are described by both their
semantic content, by means of the Vienna codes, and their visual contents, by
using shape and color as visual cues. The trademark descriptors are then
indexed by a locality-sensitive hashing data structure aiming to perform
approximate k-NN search in high dimensional spaces in sub-linear time. The
resulting ranked lists are combined by using a weighted Condorcet method and a
relevance feedback step helps to iteratively revise the query and refine the
obtained results. The experiments demonstrate the effectiveness and efficiency
of this system on a realistic and large dataset. Keywords: Multimedia Information Retrieval; Trademark Image Retrieval; Graphics
Recognition; Feature Indexing | |||
| Dynamic Two-Stage Image Retrieval from Large Multimodal Databases | | BIBA | Full-Text | 326-337 | |
| Avi Arampatzis; Konstantinos Zagoris; Savvas A. Chatzichristofis | |||
| Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a 'better' subset. Using a relatively 'cheap' first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed. | |||
| Comparing Twitter and Traditional Media Using Topic Models | | BIBAK | Full-Text | 338-349 | |
| Wayne Xin Zhao; Jing Jiang; Jianshu Weng; Jing He; Ee-Peng Lim; Hongfei Yan; Xiaoming Li | |||
| Twitter as a new form of social media can potentially contain much useful
information, but content analysis on Twitter has not been well studied. In
particular, it is not clear whether as an information source Twitter can be
simply regarded as a faster news feed that covers mostly the same information
as traditional news media. In This paper we empirically compare the content of
Twitter with a traditional news medium, New York Times, using unsupervised
topic modeling. We use a Twitter-LDA model to discover topics from a
representative sample of the entire Twitter. We then use text mining techniques
to compare these Twitter topics with topics from New York Times, taking into
consideration topic categories and types. We also study the relation between
the proportions of opinionated tweets and retweets and topic categories and
types. Our comparisons show interesting and useful findings for downstream IR
or DM applications. Keywords: Twitter; microblogging; topic modeling | |||
| Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval | | BIBAK | Full-Text | 350-361 | |
| Huizhong Duan; Chengxiang Zhai | |||
| Due to many unique characteristics of forum data, forum post retrieval is
different from traditional document retrieval and web search, raising
interesting research questions about how to optimize the accuracy of forum post
retrieval. In this paper, we study how to exploit the naturally available raw
thread structures of forums to improve retrieval accuracy in the language
modeling framework. Specifically, we propose and study two different schemes
for smoothing the language model of a forum post based on the thread containing
the post. We explore several different variants of the two schemes to exploit
thread structures in different ways. We also create a human annotated test data
set for forum post retrieval and evaluate the proposed smoothing methods using
this data set. The experiment results show that the proposed methods for
leveraging forum threads to improve estimation of document language models are
effective, and they outperform the existing smoothing methods for the forum
post retrieval task. Keywords: Forum post retrieval; language modeling; smoothing | |||
| Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts | | BIBA | Full-Text | 362-367 | |
| Kamran Massoudi; Manos Tsagkias; Maarten de Rijke; Wouter Weerkamp | |||
| We propose a retrieval model for searching microblog posts for a given topic of interest. We develop a language modeling approach tailored to microblogging characteristics, where redundancy-based IR methods cannot be used in a straightforward manner. We enhance this model with two groups of quality indicators: textual and microblog specific. Additionally, we propose a dynamic query expansion model for microblog post retrieval. Experimental results on Twitter data reveal the usefulness of boolean search, and demonstrate the utility of quality indicators and query expansion in microblog search. | |||
| Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models | | BIBA | Full-Text | 368-374 | |
| Oscar Täckström; Ryan McDonald | |||
| In this paper we investigate the use of latent variable structured prediction models for fine-grained sentiment analysis in the common situation where only coarse-grained supervision is available. Specifically, we show how sentence-level sentiment labels can be effectively learned from document-level supervision using hidden conditional random fields (HCRFs) [10]. Experiments show that this technique reduces sentence classification errors by 22% relative to using a lexicon and 13% relative to machine-learning baselines. | |||
| Combining Global and Local Semantic Contexts for Improving Biomedical Information Retrieval | | BIBAK | Full-Text | 375-386 | |
| Duy Dinh; Lynda Tamine | |||
| In the context of biomedical information retrieval (IR), this paper explores
the relationship between the document's global context and the query's local
context in an attempt to overcome the term mismatch problem between the user
query and documents in the collection. Most solutions to this problem have been
focused on expanding the query by discovering its context, either global or
local. In a global strategy, all documents in the collection are used to
examine word occurrences and relationships in the corpus as a whole, and use
this information to expand the original query. In a local strategy, the
top-ranked documents retrieved for a given query are examined to determine
terms for query expansion. We propose to combine the document's global context
and the query's local context in an attempt to increase the term overlap
between the user query and documents in the collection via document expansion
(DE) and query expansion (QE). The DE technique is based on a statistical
method (IR-based) to extract the most appropriate concepts (global context)
from each document. The QE technique is based on a blind feedback approach
using the top-ranked documents (local context) obtained in the first retrieval
stage. A comparative experiment on the TREC 2004 Genomics collection
demonstrates that the combination of the document's global context and the
query's local context shows a significant improvement over the baseline. The
MAP is significantly raised from 0.4097 to 0.4532 with a significant
improvement rate of +10.62% over the baseline. The IR performance of the
combined method in terms of MAP is also superior to official runs participated
in TREC 2004 Genomics and is comparable to the performance of the best run
(0.4075). Keywords: Term Mismatch; Concept Extraction; Document Expansion; Query Expansion;
Biomedical Information Retrieval | |||
| Smoothing Click Counts for Aggregated Vertical Search | | BIBA | Full-Text | 387-398 | |
| Jangwon Seo; W. Bruce Croft; Kwang Hyun Kim; Joon Ho Lee | |||
| Clickthrough data is a critical feature for improving web search ranking. Recently, many search portals have provided aggregated search, which retrieves relevant information from various heterogeneous collections called verticals. In addition to the well-known problem of rank bias, clickthrough data recorded in the aggregated search environment suffers from severe sparseness problems due to the limited number of results presented for each vertical. This skew in clickthrough data, which we call rank cut, makes optimization of vertical searches more difficult. In this work, we focus on mitigating the negative effect of rank cut for aggregated vertical searches. We introduce a technique for smoothing click counts based on spectral graph analysis. Using real clickthrough data from a vertical recorded in an aggregated search environment, we show empirically that clickthrough data smoothed by this technique is effective for improving the vertical search. | |||
| Automatic People Tagging for Expertise Profiling in the Enterprise | | BIBA | Full-Text | 399-410 | |
| Pavel Serdyukov; Mike Taylor; Vishwa Vinay; Matthew Richardson; Ryen W. White | |||
| In an enterprise search setting, there is a class of queries for which people, rather than documents, are desirable answers. However, presenting users with just a list of names of knowledgeable employees without any description of their expertise may lead to confusion, lack of trust in search results, and abandonment of the search engine. At the same time, building a concise meaningful description for a person is not a trivial summarization task. In this paper, we propose a solution to this problem by automatically tagging people for the purpose of profiling their expertise areas in the scope of the enterprise where they are employed. We address the novel task of automatic people tagging by using a machine learning algorithm that combines evidence that a certain tag is relevant to a certain employee acquired from different sources in the enterprise. We experiment with the data from a large distributed organization, which also allows us to study sources of expertise evidence that have been previously overlooked, such as personal click-through history. The evaluation of the proposed methods shows that our technique clearly outperforms state of the art approaches. | |||
| Text Classification: A Sequential Reading Approach | | BIBA | Full-Text | 411-423 | |
| Gabriel Dulac-Arnold; Ludovic Denoyer; Patrick Gallinari | |||
| We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided. | |||
| Domain Adaptation for Text Categorization by Feature Labeling | | BIBAK | Full-Text | 424-435 | |
| Cristina Kadar; José Iria | |||
| We present a novel approach to domain adaptation for text categorization,
which merely requires that the source domain data are weakly annotated in the
form of labeled features. The main advantage of our approach resides in the
fact that labeling words is less expensive than labeling documents. We propose
two methods, the first of which seeks to minimize the divergence between the
distributions of the source domain, which contains labeled features, and the
target domain, which contains only unlabeled data. The second method augments
the labeled features set in an unsupervised way, via the discovery of a shared
latent concept space between source and target. We empirically show that our
approach outperforms standard supervised and semi-supervised methods, and
obtains results competitive to those reported by state-of-the-art domain
adaptation methods, while requiring considerably less supervision. Keywords: Domain Adaptation; Generalized Expectation Criteria; Weakly-Supervised
Latent Dirichlet Allocation | |||
| TEMPER: A Temporal Relevance Feedback Method | | BIBA | Full-Text | 436-447 | |
| Mostafa Keikha; Shima Gerani; Fabio Crestani | |||
| The goal of a blog distillation (blog feed search) method is to rank blogs according to their recurrent relevance to the query. An interesting property of blog distillation which differentiates it from traditional retrieval tasks is its dependency on time. In this paper we investigate the effect of time dependency in query expansion. We propose a framework, TEMPER, which selects different terms for different times and ranks blogs according to their relevancy to the query over time. By generating multiple expanded queries based on time, we are able to capture the dynamics of the topic both in aspects and vocabulary usage. We show performance gains over the baseline techniques which generate a single expanded query using the top retrieved posts or blogs irrespective of time. | |||
| Terms of a Feather: Content-Based News Recommendation and Discovery Using Twitter | | BIBA | Full-Text | 448-459 | |
| Owen Phelan; Kevin McCarthy; Mike Bennett; Barry Smyth | |||
| User-generated content has dominated the web's recent growth and today the so-called real-time web provides us with unprecedented access to the real-time opinions, views, and ratings of millions of users. For example, Twitter's 200m+ users are generating in the region of 1000+ tweets per second. In this work, we propose that this data can be harnessed as a useful source of recommendation knowledge. We describe a social news service called Buzzer that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions. This is achieved by a content-based approach of mining trending terms from both the public Twitter timeline and from the timeline of tweets published by a user's own Twitter friend subscriptions. We also present results of a live-user evaluation which demonstrates how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques. | |||
| Topical and Structural Linkage in Wikipedia | | BIBA | Full-Text | 460-465 | |
| Kelly Y. Itakura; Charles L. A. Clarke; Shlomo Geva; Andrew Trotman; Wei Chi Huang | |||
| We explore statistical properties of links within Wikipedia. We demonstrate that a simple algorithm can predict many of the links that would normally be added to a new article, without considering the topic of the article itself. We then explore a variant of topic-oriented PageRank, which can effectively identify topical links within existing articles, when compared with manual judgments of their topical relevance. Based on these results, we suggest that linkages within Wikipedia arise from a combination of structural requirements and topical relationships. | |||
| An Analysis of Time-Instability in Web Search Results | | BIBA | Full-Text | 466-478 | |
| Jinyoung Kim; Vitor R. Carvalho | |||
| Due to the dynamic nature of web and the complex architectures of modern
commercial search engines, top results in major search engines can change
dramatically over time. Our experimental data shows that, for all three major
search engines (Google, Bing and Yahoo!), approximately 90% of queries have
their top 10 results altered within a period of ten days. Although this
instability is expected in some situations such as in news-related queries, it
is problematic in general because it can dramatically affect retrieval
performance measurements and negatively affect users' perception of search
quality (for instance, when users cannot re-find a previously found document).
In this work we present the first large scale study on the degree and nature of these changes. We introduce several types of query instability, and several metrics to quantify it. We then present a quantitative analysis using 12,600 queries collected from a commercial web search engine over several weeks. Our analysis shows that the results from all major search engines have similar levels of instability, and that many of these changes are temporary. We also identified classes of queries with clearly different instability profiles -- for instance, navigational queries are considerably more stable than non-navigational, while longer queries are significantly less stable than shorter ones. | |||
| Rules of Thumb for Information Acquisition from Large and Redundant Data | | BIBA | Full-Text | 479-490 | |
| Wolfgang Gatterbauer | |||
| We develop an abstract model of information acquisition from redundant data. We assume a random sampling process from data which contain information with bias and are interested in the fraction of information we expect to learn as function of (i) the sampled fraction (recall) and (ii) varying bias of information (redundancy distributions). We develop two rules of thumb with varying robustness. We first show that, when information bias follows a Zipf distribution, the 80-20 rule or Pareto principle does surprisingly not hold, and we rather expect to learn less than 40% of the information when randomly sampling 20% of the overall data. We then analytically prove that for large data sets, randomized sampling from power-law distributions leads to "truncated distributions" with the same power-law exponent. This second rule is very robust and also holds for distributions that deviate substantially from a strict power law. We further give one particular family of power-law functions that remain completely invariant under sampling. Finally, we validate our model with two large Web data sets: link distributions to web domains and tag distributions on delicious.com. | |||
| Bringing Why-QA to Web Search | | BIBA | Full-Text | 491-496 | |
| Suzan Verberne; Lou Boves; Wessel Kraaij | |||
| We investigated to what extent users could be satisfied by a web search engine for answering causal questions. We used an assessment environment in which a web search interface was simulated. For 1401 why-queries from a search engine log we pre-retrieved the first 10 results using Bing. 311 queries were assessed by human judges. We found that even without clicking a result, 25.2% of the why-questions is answered on the first result page. If we count an intended click on a result as a vote for relevance, then 74.4% of the why-questions gets at least one relevant answer in the top-10. 10% of why-queries asked to web search engines are not answerable according to human assessors. | |||
| The Power of Peers | | BIBA | Full-Text | 497-502 | |
| Nick Craswell; Dennis Fetterly; Marc Najork | |||
| We present a study of the contributions of three classes of ranking signals: BM25F, a retrieval function that is based on words in the content of web pages and the anchors that link to them; SALSA, a link-based feature that takes all or part of the result set to a query as input; and matching-anchor count (MAC), a feature that measures precise matches between queries and anchors pointing to result pages. All three features incorporate both link and textual features, but in varying degrees. BM25F is the state-of-the art exponent of Salton's term-vector model, and is based on a solid theoretical foundation; the two other features are somewhat more ad-hoc. We studied the impact of two factors that go into the formation of SALSA's "base" set: whether to use conjunctive or disjunctive query semantics, and how many results to include into the base set. We found that the choice of query semantics has little impact on the effectiveness of SALSA (with conjunctive semantics having a slight edge); more surprisingly, we found that limiting the size of the base set to a few hundred results of high expected quality maximizes performance. Furthermore, we experimented with various linear combinations of BM25F, MAC and SALSA. In doing so, we made a remarkable observation: adding BM25F to a two-way weighted linear combination of MAC and SALSA does not increase performance in any statistically significant way. | |||
| Introducing the User-over-Ranking Hypothesis | | BIBA | Full-Text | 503-509 | |
| Benno Stein; Matthias Hagen | |||
| The User-over-Ranking hypothesis states that rather the user herself than a
web search engine's ranking algorithm can help to improve retrieval
performance. The means are longer queries that provide additional keywords.
Readers who take this hypothesis for granted should recall the fact that virtually no user and none of the search index providers consider its implications. For readers who feel insecure about the claim, our paper gives empirical evidence. | |||
| Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines | | BIBAK | Full-Text | 510-516 | |
| I. Sengor Altingovde; Rifat Ozcan; B. Barla Cambazoglu; Özgür Ulusoy | |||
| Result caches are vital for efficiency of search engines. In this work, we
propose a novel caching strategy in which a dynamic result cache is split into
two layers: an HTML cache and a docID cache. The HTML cache in the first layer
stores the result pages computed for queries. The docID cache in the second
layer stores ids of documents in search results. Experiments under various
scenarios show that, in terms of average query processing time, this hybrid
caching approach outperforms the traditional approach, which relies only on the
HTML cache. Keywords: Search engines; query processing; result cache | |||
| Learning Models for Ranking Aggregates | | BIBA | Full-Text | 517-529 | |
| Craig Macdonald; Iadh Ounis | |||
| Aggregate ranking tasks are those where documents are not the final ranking outcome, but instead an intermediary component. For instance, in expert search, a ranking of candidate persons with relevant expertise to a query is generated after consideration of a document ranking. Many models exist for aggregate ranking tasks, however obtaining an effective and robust setting for different aggregate ranking tasks is difficult to achieve. In this work, we propose a novel learned approach to aggregate ranking, which combines different document ranking features as well as aggregate ranking approaches. We experiment with our proposed approach using two TREC test collections for expert and blog search. Our experimental results attest the effectiveness and robustness of a learned model for aggregate ranking across different settings. | |||
| Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries | | BIBA | Full-Text | 530-542 | |
| Simon Jonassen; Svein Erik Bratsberg | |||
| In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations. | |||
| Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing | | BIBA | Full-Text | 543-554 | |
| Sree Lekha Thota; Ben Carterette | |||
| Document-centric static index pruning methods provide smaller indexes and faster query times by dropping some within-document term information from inverted lists. We present a method of pruning inverted lists derived from the formulation of unigram language models for retrieval. Our method is based on the statistical significance of term frequency ratios: using the two-sample two-proportion (2P2N) test, we statistically compare the frequency of occurrence of a word within a given document to the frequency of its occurrence in the collection to decide whether to prune it. Experimental results show that this technique can be used to significantly decrease the size of the index and querying speed with less compromise to retrieval effectiveness than similar heuristic methods. Furthermore, we give a formal statistical justification for such methods. | |||
| SkipBlock: Self-indexing for Block-Based Inverted List | | BIBA | Full-Text | 555-561 | |
| Stéphane Campinas; Renaud Delbru; Giovanni Tummarello | |||
| In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide sub-optimal performance. In this paper, we present SkipBlock, a self-indexing model for block-based inverted lists. Based on a cost model, we show that it is possible to achieve significant improvements on both search performance and structure's space storage. | |||
| Weight-Based Boosting Model for Cross-Domain Relevance Ranking Adaptation | | BIBA | Full-Text | 562-567 | |
| Peng Cai; Wei Gao; Kam-Fai Wong; Aoying Zhou | |||
| Adaptation techniques based on importance weighting were shown effective for RankSVM and RankNet, viz., each training instance is assigned a target weight denoting its importance to the target domain and incorporated into loss functions. In this work, we extend RankBoost using importance weighting framework for ranking adaptation. We find it non-trivial to incorporate the target weight into the boosting-based ranking algorithms because it plays a contradictory role against the innate weight of boosting, namely source weight that focuses on adjusting source-domain ranking accuracy. Our experiments show that among three variants, the additive weight-based RankBoost, which dynamically balances the two types of weights, significantly and consistently outperforms the baseline trained directly on the source domain. | |||
| What Makes Re-finding Information Difficult? A Study of Email Re-finding | | BIBA | Full-Text | 568-579 | |
| David Elsweiler; Mark Baillie; Ian Ruthven | |||
| Re-finding information that has been seen or accessed before is a task which can be relatively straight-forward, but often it can be extremely challenging, time-consuming and frustrating. Little is known, however, about what makes one re-finding task harder or easier than another. We performed a user study to learn about the contextual factors that influence users' perception of task difficulty in the context of re-finding email messages. 21 participants were issued re-finding tasks to perform on their own personal collections. The participants' responses to questions about the tasks combined with demographic data and collection statistics for the experimental population provide a rich basis to investigate the variables that can influence the perception of difficulty. A logistic regression model was developed to examine the relationships between variables and determine whether any factors were associated with perceived task difficulty. The model reveals strong relationships between difficulty and the time lapsed since a message was read, remembering when the sought-after email was sent, remembering other recipients of the email, the experience of the user and the user's filing strategy. We discuss what these findings mean for the design of re-finding interfaces and future re-finding research. | |||
| A User-Oriented Model for Expert Finding | | BIBA | Full-Text | 580-592 | |
| Elena Smirnova; Krisztian Balog | |||
| Expert finding addresses the problem of retrieving a ranked list of people who are knowledgeable on a given topic. Several models have been proposed to solve this task, but so far these have focused solely on returning the most knowledgeable people as experts on a particular topic. In this paper we argue that in a real-world organizational setting the notion of the "best expert" also depends on the individual user and her needs. We propose a user-oriented approach that balances two factors that influence the user's choice: time to contact an expert, and the knowledge value gained after. We use the distance between the user and an expert in a social network to estimate contact time, and consider various social graphs, based on organizational hierarchy, geographical location, and collaboration, as well as the combination of these. Using a realistic test set, created from interactions of employees with a university-wide expert search engine, we demonstrate substantial improvements over a state-of-the-art baseline on all retrieval measures. | |||
| Simulating Simple and Fallible Relevance Feedback | | BIBAK | Full-Text | 593-604 | |
| Feza Baskaya; Heikki Keskustalo; Kalervo Järvelin | |||
| Much of the research in relevance feedback (RF) has been performed under
laboratory conditions using test collections and either test persons or simple
simulation. These studies have given mixed results. The design of the present
study is unique. First, the initial queries are realistically short queries
generated by real end-users. Second, we perform a user simulation with several
RF scenarios. Third, we simulate human fallibility in providing RF, i.e.,
incorrectness in feedback. Fourth, we employ graded relevance assessments in
the evaluation of the retrieval results. The research question is: how does RF
affect IR performance when initial queries are short and feedback is fallible?
Our findings indicate that very fallible feedback is no different from
pseudo-relevance feedback (PRF) and not effective on short initial queries.
However, RF with empirically observed fallibility is as effective as correct RF
and able to improve the performance of short initial queries. Keywords: Relevance feedback; fallibility; simulation | |||
| AutoEval: An Evaluation Methodology for Evaluating Query Suggestions Using Query Logs | | BIBA | Full-Text | 605-610 | |
| M-Dyaa Albakour; Udo Kruschwitz; Nikolaos Nanas; Yunhyong Kim; Dawei Song; Maria Fasli; Anne De Roeck | |||
| User evaluations of search engines are expensive and not easy to replicate. The problem is even more pronounced when assessing adaptive search systems, for example system-generated query modification suggestions that can be derived from past user interactions with a search engine. Automatically predicting the performance of different modification suggestion models before getting the users involved is therefore highly desirable. AutoEval is an evaluation methodology that assesses the quality of query modifications generated by a model using the query logs of past user interactions with the system. We present experimental results of applying this methodology to different adaptive algorithms which suggest that the predicted quality of different algorithms is in line with user assessments. This makes AutoEval a suitable evaluation framework for adaptive interactive search engines. | |||
| To Seek, Perchance to Fail: Expressions of User Needs in Internet Video Search | | BIBAK | Full-Text | 611-616 | |
| Christoph Kofler; Martha Larson; Alan Hanjalic | |||
| This work investigates user expressions of content needs in Internet video
search, focusing on cases in which users have failed to meet their search
goals, although relevant content is reasonably certain to exist. We study
expressions of user needs in the form of requests (i.e., questions) formulated
in natural language and published to Yahoo! Answers. Experiments show that
classifiers can distinguish requests associated with search-goal failure. We
identify a group of 'easy-to-predict' requests (cases for which the classifier
predicts search-goal failure well) and compile an inventory of strategies used
by users to express search goals in these cases. In a final set of experiments,
we demonstrate the feasibility of predicting search-goal failure based on
query-like representations of the original natural-language requests. The
results of our study are intended to inform the future development of indexing
and retrieval techniques for Internet video that target difficult queries. Keywords: Multimedia retrieval; Internet video; user information need; search-goal
failure; crowdsourcing | |||
| Passage Reranking for Question Answering Using Syntactic Structures and Answer Types | | BIBAK | Full-Text | 617-628 | |
| Elif Aktolga; James Allan; David A. Smith | |||
| Passage Retrieval is a crucial step in question answering systems, one that
has been well researched in the past. Due to the vocabulary mismatch problem
and independence assumption of bag-of-words retrieval models, correct passages
are often ranked lower than other incorrect passages in the retrieved list.
Whereas in previous work, passages are reranked only on the basis of syntactic
structures of questions and answers, our method achieves a better ranking by
aligning the syntactic structures based on the question's answer type and
detected named entities in the candidate passage. We compare our technique with
strong retrieval and reranking baselines. Experimental results using the TREC
QA 1999-2003 datasets show that our method significantly outperforms the
baselines over all ranks in terms of the MRR measure. Keywords: Passage Retrieval; Question Answering; Reranking; Dependency Parsing; Named
Entities | |||
| An Iterative Approach to Text Segmentation | | BIBAK | Full-Text | 629-640 | |
| Fei Song; William M. Darling; Adnan Duric; Fred W. Kroon | |||
| We present divSeg, a novel method for text segmentation that iteratively
splits a portion of text at its weakest point in terms of the connectivity
strength between two adjacent parts. To search for the weakest point, we apply
two different measures: one is based on language modeling of text segmentation
and the other, on the interconnectivity between two segments. Our solution
produces a deep and narrow binary tree -- a dynamic object that describes the
structure of a text and that is fully adaptable to a user's segmentation needs.
We treat it as a separate task to flatten the tree into a broad and shallow
hierarchy either through supervised learning of a document set or explicit
input of how a text should be segmented. The rich structure of our created tree
further allows us to segment documents at varying levels such as topic,
sub-topic, etc. We evaluated our new solution on a set of 265 articles from
Discover magazine where the topic structures are unknown and need to be
discovered. Our experimental results show that the iterative approach has the
potential to generate better segmentation results than several leading
baselines, and the separate flattening step allows us to adapt the results to
different levels of details and user preferences. Keywords: Text Segmentation; Language Modeling | |||
| Improving Query Focused Summarization Using Look-Ahead Strategy | | BIBAK | Full-Text | 641-652 | |
| Rama Badrinath; Suresh Venkatasubramaniyan; C. E. Veni Madhavan | |||
| Query focused summarization is the task of producing a compressed text of
original set of documents based on a query. Documents can be viewed as graph
with sentences as nodes and edges can be added based on sentence similarity.
Graph based ranking algorithms which use 'Biased random surfer model' like
topic-sensitive LexRank have been successfully applied to query focused
summarization. In these algorithms, random walk will be biased towards the
sentences which contain query relevant words. Specifically, it is assumed that
random surfer knows the query relevance score of the sentence to where he
jumps. However, neighbourhood information of the sentence to where he jumps is
completely ignored. In this paper, we propose look-ahead version of
topic-sensitive LexRank. We assume that random surfer not only knows the query
relevance of the sentence to where he jumps but he can also look N-step ahead
from that sentence to find query relevance scores of future set of sentences.
Using this look ahead information, we figure out the sentences which are
indirectly related to the query by looking at number of hops to reach a
sentence which has query relevant words. Then we make the random walk biased
towards even to the indirect query relevant sentences along with the sentences
which have query relevant words. Experimental results show 20.2% increase in
ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set.
Further, our system outperforms best systems in DUC 2006 and results are
comparable to state of the art systems. Keywords: Topic Sensitive LexRank; Look-ahead; Biased random walk | |||
| A Generalized Method for Word Sense Disambiguation Based on Wikipedia | | BIBAK | Full-Text | 653-664 | |
| Chenliang Li; Aixin Sun; Anwitaman Datta | |||
| In this paper we propose a general framework for word sense disambiguation
using knowledge latent in Wikipedia. Specifically, we exploit the rich and
growing Wikipedia corpus in order to achieve a large and robust knowledge
repository consisting of keyphrases and their associated candidate topics.
Keyphrases are mainly derived from Wikipedia article titles and anchor texts
associated with wikilinks. The disambiguation of a given keyphrase is based on
both the commonness of a candidate topic and the context-dependent relatedness
where unnecessary (and potentially noisy) context information is pruned. With
extensive experimental evaluations using different relatedness measures, we
show that the proposed technique achieved comparable disambiguation accuracies
with respect to state-of-the-art techniques, while incurring orders of
magnitude less computation cost. Keywords: Word Sense Disambiguation; Wikipedia; Context Pruning | |||
| Representing Document Lengths with Identifiers | | BIBA | Full-Text | 665-669 | |
| Raffaele Perego; Fabrizio Silvestri; Nicola Tonellotto | |||
| The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically. | |||
| Free-Text Search versus Complex Web Forms | | BIBA | Full-Text | 670-674 | |
| Kien Tjin-Kam-Jet; Dolf Trieschnigg; Djoerd Hiemstra | |||
| We investigated the use of free-text queries as an alternative means for searching 'behind' web forms. We conducted a user study where we evaluated our prototype free-text interface in a travel planner scenario. Our results show that users prefer this free-text interface over the original web form and that they are about 9% faster on average at completing their search tasks. | |||
| Multilingual Log Analysis: LogCLEF | | BIBA | Full-Text | 675-678 | |
| Giorgio Maria Di Nunzio; Johannes Leveling; Thomas Mandl | |||
| The current lack of recent and long-term query logs makes the verifiability and repeatability of log analysis experiments very limited. A first attempt in this direction has been made within the Cross-Language Evaluation Forum in 2009 in a track named LogCLEF which aims to stimulate research on user behaviour in multilingual environments and promote standard evaluation collections of log data. We report on similarities and differences of the most recent activities for LogCLEF. | |||
| A Large-Scale System Evaluation on Component-Level | | BIBA | Full-Text | 679-682 | |
| Jens Kürsten; Maximilian Eibl | |||
| This article describes a large-scale empirical evaluation across different types of English text collections. We ran about 140,000 experiments and analyzed the results on system component-level to find out if we can select configurations that perform reliable on specific types of corpora. To our own surprise we observed that a specific set of configuration parameters achieved 95% of the optimal average MAP across all collections. We conclude that this configuration could be used as baseline reference for evaluation of new IR approaches on English text corpora. | |||
| Should MT Systems Be Used as Black Boxes in CLIR? | | BIBA | Full-Text | 683-686 | |
| Walid Magdy; Gareth J. F. Jones | |||
| The translation stage in cross language information retrieval (CLIR) acts as the main enabling stage to cross the language barrier between documents and queries. In recent years machine translation (MT) systems have become the dominant approach to translation in CLIR. However, unlike information retrieval (IR), MT focuses on the morphological and syntactical quality of the sentence. This requires large training resources and high computational power for training and translation. We present a novel technique for MT designed specifically for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this pre-processing step is found to significantly speed up the translation process without affecting the retrieval quality. | |||
| Video Retrieval Based on Words-of-Interest Selection | | BIBAK | Full-Text | 687-690 | |
| Lei Wang; Dawei Song; Eyad Elyan | |||
| Query-by-example video retrieval is receiving an increasing attention in
recent years. One of the state-of-art approaches is the Bag-of-visual Words
(BoW) based technique, where images are described by a set of local features
mapped to a discrete set of visual words. Such techniques, however, ignores
spatial relations between visual words. In this paper, we present a content
based video retrieval technique based on selected Words-of-Interest (WoI) that
utilizes visual words spatial proximity constraint identified from the query.
Experiments carried out on a public video database demonstrate promising
results of our approach that outperform the classical BoW approach. Keywords: Bag-of-Words; Content based video retrieval; Words-of-Interest | |||
| Classic Children's Literature -- Difficult to Read? | | BIBA | Full-Text | 691-694 | |
| Dolf Trieschnigg; Claudia Hauff | |||
| Classic children's literature such as is nowadays freely available thanks to initiatives such as Project Gutenberg. Due to diverging vocabularies and style, these texts are often not readily understandable to children in the present day. Our goal is to make such texts more accessible by aiding children in the reading process, in particular by automatically identifying the terms that result in low readability. As a first step, in this poster we report on a preliminary user study that investigates the extent of the vocabulary problem. We also propose and evaluate a basic approach to detect such difficult terminology. | |||
| Applying Machine Learning Diversity Metrics to Data Fusion in Information Retrieval | | BIBA | Full-Text | 695-698 | |
| David Leonard; David Lillis; Lusheng Zhang; Fergus Toolan; Rem W. Collier; John Dunnion | |||
| The Supervised Machine Learning task of classification has parallels with
Information Retrieval (IR): in each case, items (documents in the case of IR)
are required to be categorised into discrete classes (relevant or
non-relevant). Thus a parallel can also be drawn between classifier ensembles,
where evidence from multiple classifiers are combined to achieve a superior
result, and the IR data fusion task.
This paper presents preliminary experimental results on the applicability of classifier ensemble diversity metrics in data fusion. Initial results indicate a relationship between the quality of the fused result set (as measured by MAP) and the diversity of its inputs. | |||
| Reranking Collaborative Filtering with Multiple Self-contained Modalities | | BIBAK | Full-Text | 699-703 | |
| Yue Shi; Martha Larson; Alan Hanjalic | |||
| A reranking algorithm, Multi-Rerank, is proposed to refine the
recommendation list generated by collaborative filtering approaches.
Multi-Rerank is capable of capturing multiple self-contained modalities, i.e.,
item modalities extractable from user-item matrix, to improve recommendation
lists. Experimental results indicate that Multi-Rerank is effective for
improving various CF approaches and additional benefits can be achieved when
reranking with multiple modalities rather than a single modality. Keywords: Recommender systems; collaborative filtering; reranking; multiple
modalities; self-contained modalities | |||
| How Far Are We in Trust-Aware Recommendation? | | BIBAK | Full-Text | 704-707 | |
| Yue Shi; Martha Larson; Alan Hanjalic | |||
| Social trust holds great potential for improving recommendation and much
recent work focuses on the use of social trust for rating prediction, in
particular, in the context of the Epinions dataset. An experimental comparison
with trust-free, naïve approaches suggests that state-of-the-art
social-trust-aware recommendation approaches, in particular Social Trust
Ensemble (STE), can fail to isolate the true added value of trust. We
demonstrate experimentally that not only trust-set users, but also random users
can be exploited to yield recommendation improvement via STE. Specific users,
however, do benefit from use of social trust, and we conclude with an
investigation of their characteristics. Keywords: Recommender systems; social trust; trust-aware recommendation | |||
| Re-ranking for Multimedia Indexing and Retrieval | | BIBAK | Full-Text | 708-711 | |
| Bahjat Safadi; Georges Quénot | |||
| We proposed a re-ranking method for improving the performance of semantic
video indexing and retrieval. Experimental results show that the proposed
re-ranking method is effective and it improves the system performance on
average by about 16-22% on TRECVID 2010 semantic indexing task. Keywords: Multimedia Indexing and Retrieval; Re-ranking | |||
| Combining Query Translation Techniques to Improve Cross-Language Information Retrieval | | BIBA | Full-Text | 712-715 | |
| Benjamin Herbert; György Szarvas; Iryna Gurevych | |||
| In this paper we address the combination of query translation approaches for cross-language information retrieval (CLIR). We translate queries with Google Translate and extend them with new translations obtained by mapping noun phrases in the query to concepts in the target language using Wikipedia. For two CLIR collections, we show that the proposed model provides meaningful translations that improve the strong baseline CLIR model based on a top performing SMT system. | |||
| Back to the Roots: Mean-Variance Analysis of Relevance Estimations | | BIBA | Full-Text | 716-720 | |
| Guido Zuccon; Leif Azzopardi; Keith van Rijsbergen | |||
| Recently, mean-variance analysis has been proposed as a novel paradigm to model document ranking in Information Retrieval. The main merit of this approach is that it diversifies the ranking of retrieved documents. In its original formulation, the strategy considers both the mean of relevance estimates of retrieved documents and their variance. However, when this strategy has been empirically instantiated, the concepts of mean and variance are discarded in favour of a point-wise estimation of relevance (to replace the mean) and of a parameter to be tuned or, alternatively, a quantity dependent upon the document length (to replace the variance). In this paper we revisit this ranking strategy by going back to its roots: mean and variance. For each retrieved document, we infer a relevance distribution from a series of point-wise relevance estimations provided by a number of different systems. This is used to compute the mean and the variance of document relevance estimates. On the TREC Clueweb collection, we show that this approach improves the retrieval performances. This development could lead to new strategies to address the fusion of relevance estimates provided by different systems. | |||
| A Novel Re-ranking Approach Inspired by Quantum Measurement | | BIBA | Full-Text | 721-724 | |
| Xiaozhao Zhao; Peng Zhang; Dawei Song; Yuexian Hou | |||
| Quantum theory (QT) has recently been employed to advance the theory of information retrieval (IR). A typical method, namely the Quantum Probability Ranking Principle (QPRP), was proposed to re-rank top retrieved documents by considering the inter-dependencies between documents through the "quantum interference". In this paper, we attempt to explore another important QT concept, namely the "quantum measurement". Inspired by the photon polarization experiment underpinning the "quantum measurement", we propose a novel re-ranking approach. Evaluation on several TREC data sets shows that in ad-hoc retrieval, our method can significantly improve the first-round ranking from a baseline retrieval model, and also outperform the QPRP. | |||
| Simple vs. Sophisticated Approaches for Patent Prior-Art Search | | BIBA | Full-Text | 725-728 | |
| Walid Magdy; Patrice Lopez; Gareth J. F. Jones | |||
| Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided. | |||
| Towards Quantum-Based DB+IR Processing Based on the Principle of Polyrepresentation | | BIBA | Full-Text | 729-732 | |
| David Zellhöfer; Ingo Frommholz; Ingo Schmitt; Mounia Lalmas; Keith van Rijsbergen | |||
| The cognitively motivated principle of polyrepresentation still lacks a theoretical foundation in IR. In this work, we discuss two competing polyrepresentation frameworks that are based on quantum theory. Both approaches support different aspects of polyrepresentation, where one is focused on the geometric properties of quantum theory while the other has a strong logical basis. We compare both approaches and outline how they can be combined to express further aspects of polyrepresentation. | |||
| ATTention: Understanding Authors and Topics in Context of Temporal Evolution | | BIBA | Full-Text | 733-737 | |
| Nasir Naveed; Sergej Sizov; Steffen Staab | |||
| Understanding thematic trends and user roles is an important challenge in the field of information retrieval. In this contribution, we present a novel model for analyzing evolution of user's interests with respect to produced content over time. Our approach ATTention (a name derived from analysis of Authors and Topics in the Temporal context) addresses this problem by means of Bayesian modeling of relations between authors, latent topics and temporal information. We also present results of preliminary evaluations with scientific publication datasets and discuss opportunities of model use in novel mining and recommendation scenarios. | |||
| Role of Emotional Features in Collaborative Recommendation | | BIBA | Full-Text | 738-742 | |
| Yashar Moshfeghi; Joemon M. Jose | |||
| The aim of this poster is to investigate the role of emotion in the collaborative filtering task. For this purpose, a kernel-based collaborative recommendation technique is used. The experiment is conducted on two MovieLens data sets. The emotional features are extracted from the movie reviews and plot summaries. The results show that emotional features are capable of enhancing recommendation effectiveness. | |||
| The Importance of the Depth for Text-Image Selection Strategy in Learning-To-Rank | | BIBA | Full-Text | 743-746 | |
| David Buffoni; Sabrina Tollari; Patrick Gallinari | |||
| We examine the effect of the number documents being pooled, for constructing training sets, has on the performance of the learning-to-rank (LTR) approaches that use it to build our ranking functions. Our investigation takes place in a multimedia setting and uses the ImageCLEF photo 2006 dataset based on text and visual features. Experiments show that our LTR algorithm, OWPC, outperforms other baselines. | |||
| Personal Blog Retrieval Using Opinion Features | | BIBA | Full-Text | 747-750 | |
| Shima Gerani; Mostafa Keikha; Mark Carman; Fabio Crestani | |||
| Faceted blog distillation aims at finding blogs with recurring interest to a topic while satisfying a specific facet of interest. In this paper we focus on the personal facet and propose a method that uses opinion features as indicators of personal content. Experimental results on TREC BLOG08 data-set confirm our intuition that personal blogs are more opinionated. | |||
| Processing Queries in Session in a Quantum-Inspired IR Framework | | BIBA | Full-Text | 751-754 | |
| Ingo Frommholz; Benjamin Piwowarski; Mounia Lalmas; Keith van Rijsbergen | |||
| In a search session, users tend to reformulate their queries, for instance because they want to generalise or specify them, or because they are undergoing a drift in their information need. This motivates to regard queries not in isolation, but within the session they are embedded in. In this poster, we propose an approach inspired by quantum mechanics to represent queries and their reformulations as density operators. Differently constructed densities can potentially be applied for different types of query reformulation. To do so, we propose and discuss indicators that can hint us to the type of query reformulation we are dealing with. | |||
| Towards Predicting Relevance Using a Quantum-Like Framework | | BIBA | Full-Text | 755-758 | |
| Emanuele Di Buccio; Massimo Melucci; Dawei Song | |||
| In this paper, the user's relevance state is modeled using quantum-like probability and the interference term is proposed so as to model the evolution of the state and the user's uncertainty about the assessment. The theoretical framework has been formulated and the results of an experimental user study based on a TREC test collection have been reported. | |||
| Fusion vs. Two-Stage for Multimodal Retrieval | | BIBA | Full-Text | 759-762 | |
| Avi Arampatzis; Konstantinos Zagoris; Savvas A. Chatzichristofis | |||
| We compare two methods for retrieval from multimodal collections. The first is a score-based fusion of results, retrieved visually and textually. The second is a two-stage method that visually re-ranks the top-K results textually retrieved. We discuss their underlying hypotheses and practical limitations, and contact a comparative evaluation on a standardized snapshot of Wikipedia. Both methods are found to be significantly more effective than single-modality baselines, with no clear winner but with different robustness features. Nevertheless, two-stage retrieval provides efficiency benefits over fusion. | |||
| Combination of Feature Selection Methods for Text Categorisation | | BIBA | Full-Text | 763-766 | |
| Robert Neumayer; Rudolf Mayer; Kjetil Nørvåg | |||
| Feature selection plays a vital role in text categorisation. A range of different methods have been developed, each having unique properties and selecting different features. We show some results of an extensive study of feature selection approaches using a wide range of combination methods. We performed experiments on 18 test collections and report a subset of the results. | |||
| Time-Surfer: Time-Based Graphical Access to Document Content | | BIBA | Full-Text | 767-771 | |
| Hector Llorens; Estela Saquete; Borja Navarro; Robert Gaizauskas | |||
| This demonstration presents a novel interactive graphical interface to document content focusing on the time dimension. The objective of Time-Surfer is to let users search and explore information related to a specific period, event, or event participant within a document. The system is based on the automatic detection not only of time expressions, but also of events and temporal relations. Through a zoomable timeline interface, it brings users an dynamic picture of the temporal distribution of events within a document. Time-Surfer has been successfully applied to history and biographical articles from Wikipedia. | |||
| ARES: A Retrieval Engine Based on Sentiments | | BIBA | Full-Text | 772-775 | |
| Gianluca Demartini | |||
| This paper introduces a system enriching the standard web search engine interface with sentiment information. Additionally, it exploits such annotations to diversify the result list based on the different sentiments expressed by retrieved web pages. Thanks to the annotations, the end user is aware of which opinions the search engine is showing her and, thanks to the diversification, she can see an overview of the different opinions expressed about the requested topic. We describe the methods used for computing sentiment scores of web search results and for re-ranking them in order to cover different sentiment classes. The proposed system, built on top of commercial search engine APIs, is available on-line. | |||
| Web Search Query Assistance Functionality for Young Audiences | | BIBA | Full-Text | 776-779 | |
| Carsten Eickhoff; Tamara Polajnar; Karl Gyllstrom; Sergio Duarte Torres; Richard Glassey | |||
| The Internet plays an important role in people's daily lives. This is not only true for adults, but also holds for children; however, current web search engines are designed with adult users and their cognitive abilities in mind. Consequently, children face considerable barriers when using these information systems. In this work, we demonstrate the use of query assistance and search moderation techniques as well as appropriate interface design to overcome or mitigate these challenges. | |||
| Conversation Retrieval from Twitter | | BIBAK | Full-Text | 780-783 | |
| Matteo Magnani; Danilo Montesi; Gabriele Nunziante; Luca Rossi | |||
| The process of retrieving conversations from social network sites differs
from traditional Web information retrieval because it involves human
communication aspects, like the degree of interest in the conversation
explicitly or implicitly expressed by the interacting people and their
influence/popularity. Our demo allows users to include these aspects into the
search process. The system allows the retrieval of millions of conversations
generated on the popular Twitter social network site, and in particular
conversations about trending topics. Keywords: Conversation retrieval; Twitter | |||
| Finding Useful Users on Twitter: Twittomender the Followee Recommender | | BIBA | Full-Text | 784-787 | |
| John Hannon; Kevin McCarthy; Barry Smyth | |||
| This paper examines an application for finding pertinent friends (followees) on Twitter. Whilst Twitter provides a great basis for receiving information, we believe a potential downfall lies in the lack of an effective way in which users of Twitter can find other Twitter users to follow. We apply several recommendation techniques to build a followee recommender for Twitter. We evaluate a variety of different recommendation strategies, using real-user data, to demonstrate the potential for this recommender system to correctly identify and promote interesting users who are worth following. | |||
| Visual Exploration of Health Information for Children | | BIBA | Full-Text | 788-792 | |
| Frans van der Sluis; Sergio Duarte Torres; Djoerd Hiemstra; Betsy van Dijk; Frea Kruisinga | |||
| Children experience several difficulties retrieving information using current Information Retrieval (IR) systems. Particularly, children struggle to find the right keywords to construct queries given their lack of domain knowledge. This problem is even more critical in the case of the specialized health domain. In this work we present a novel method to address this problem using a cross-media search interface in which the textual data is searched through visual images. This solution aims to solve the recall and recognition problem which is salient for health information, by replacing the need for a vocabulary with the easy task of recognising the different body parts. | |||