HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 9900010203040506070809101112131415

Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Mark Sanderson; ChengXiang Zhai; Justin Zobel; James Allan; Javed Aslam
Location:Boston, Massachusetts, USA
Dates:2009-Jul-19 to 2009-Jul-23
Standard No:ISBN: 1-60558-483-5, 978-1-60558-483-6; ACM DL: Table of Contents hcibib: IR09
Links:Conference Home Page
  1. Keynote
  2. Classification and clustering
  3. Novel search features
  4. Expansion and feedback
  5. Speech and linguistic processing
  6. Retrieval models I
  7. Web 2.0
  8. Efficiency
  9. Question answering
  10. Recommenders I
  11. Web Retrieval I
  12. Learning to rank I
  13. Information extraction
  14. Retrieval models II
  15. Vertical search
  16. Clickthrough models
  17. Interactive search
  18. Multimedia I (music and video)
  19. Federated, distributed search
  20. Keynote
  21. Evaluation and measurement I
  22. Learning to rank II
  23. Multimedia II (images and tags)
  24. Evaluation and measurement II
  25. Recommenders II
  26. Query formulation
  27. Web Retrieval II
  28. Spamming
  29. Posters
  30. Demonstrations
  31. Doctoral consortium


An interdisciplinary perspective on information retrieval BIBKFull-Text 1-2
  Susan T. Dumais
Keywords: Salton award lecture, evaluation, human-computer interaction, information retrieval, interdisciplinary research

Classification and clustering

Context-aware query classification BIBAKFull-Text 3-10
  Huanhuan Cao; Derek Hao Hu; Dou Shen; Daxin Jiang; Jian-Tao Sun; Enhong Chen; Qiang Yang
Understanding users'search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries without considering their context information. However, as exemplified by the well-known example on query "jaguar", many Web queries are short and ambiguous, whose real meanings are uncertain without the context information. In this paper, we incorporate context information into the problem of query classification by using conditional random field (CRF) models. In our approach, we use neighboring queries and their corresponding clicked URLs (Web pages) in search sessions as the context information. We perform extensive experiments on real world search logs and validate the effectiveness and efficiency of our approach. We show that we can improve the F1 score by 52% as compared to other state-of-the-art baselines.
Keywords: query classification, search context
Refined experts: improving classification in large taxonomies BIBAKFull-Text 11-18
  Paul N. Bennett; Nam Nguyen
While large-scale taxonomies -- especially for web pages -- have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems -- first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10-30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.
Keywords: large-scale hierarchy, text classification
Dynamicity vs. effectiveness: studying online clustering for scatter/gather BIBAKFull-Text 19-26
  Weimao Ke; Cassidy R. Sugimoto; Javed Mostafa
We proposed and implemented a novel clustering algorithm called LAIR2, which has constant running time average for on-the-fly Scatter/Gather browsing [4]. Our experiments showed that when running on a single processor, the LAIR2 on-line clustering algorithm was several hundred times faster than a parallel Buckshot algorithm running on multiple processors [11]. This paper reports on a study that examined the effectiveness of the LAIR2 algorithm in terms of clustering quality and its impact on retrieval performance. We conducted a user study on 24 subjects to evaluate on-the-fly LAIR2 clustering in Scatter/Gather search tasks by comparing its performance to the Buckshot algorithm, a classic method for Scatter/Gather browsing [4]. Results showed significant differences in terms of subjective perceptions of clustering quality. Subjects perceived that the LAIR2 algorithm produced significantly better quality clusters than the Buckshot method did. Subjects felt that it took less effort to complete the tasks with the LAIR2 system, which was more effective in helping them in the tasks. Interesting patterns also emerged from subjects' comments in the final open-ended questionnaire. We discuss implications and future research.
Keywords: clustering, effectiveness, efficiency, exploratory search, interactive visualization, scalability, scatter/gather, user study

Novel search features

Web searching for daily living BIBAKFull-Text 27-34
  Takuya Maekawa; Yutaka Yanagisawa; Yasushi Sakurai; Yasue Kishino; Koji Kamei; Takeshi Okadome
The new concept proposed in this paper is a query free web search that automatically retrieves a web page including information related to the daily activity that we are currently engaged in for automatically displaying the page on Internet-connected domestic appliances around us such as televisions. When we are washing a coffee maker, for example, a web page is retrieved that includes tips such as 'cleaning a coffee maker with vinegar removes stains well.' A method designed on the basis of this concept automatically searches for a web page by using a query constructed from the use of ordinary household objects that is detected by sensors attached to the objects. An in-situ experiment tests a variety of IR techniques and the experiment confirmed that our daily activities can produce related web pages with high accuracy.
Keywords: daily living, experiment, sensors, web search
Global ranking by exploiting user clicks BIBAKFull-Text 35-42
  Shihao Ji; Ke Zhou; Ciya Liao; Zhaohui Zheng; Gui-Rong Xue; Olivier Chapelle; Gordon Sun; Hongyuan Zha
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.
Keywords: conditional random field, experimental evaluation, implicit relevance feedback, learning to rank, sequential supervised learning, user clicks
Good abandonment in mobile and PC internet search BIBAKFull-Text 43-50
  Jane Li; Scott Huffman; Akihito Tokuda
Query abandonment by search engine users is generally considered to be a negative signal. In this paper, we explore the concept of good abandonment. We define a good abandonment as an abandoned query for which the user's information need was successfully addressed by the search results page, with no need to click on a result or refine the query. We present an analysis of abandoned internet search queries across two modalities (PC and mobile) in three locales. The goal is to approximate the prevalence of good abandonment, and to identify types of information needs that may lead to good abandonment, across different locales and modalities. Our study has three key findings: First, queries potentially indicating good abandonment make up a significant portion of all abandoned queries. Second, the good abandonment rate from mobile search is significantly higher than that from PC search, across all locales tested. Third, classified by type of information need, the major classes of good abandonment vary dramatically by both locale and modality. Our findings imply that it is a mistake to uniformly consider query abandonment as a negative signal. Further, there is a potential opportunity for search engines to drive additional good abandonment, especially for mobile search users, by improving search features and result snippets.
Keywords: PC internet search, good abandonment, mobile internet search, query analysis

Expansion and feedback

Efficient query expansion for advertisement search BIBAKFull-Text 51-58
  Haofen Wang; Yan Liang; Linyun Fu; Gui-Rong Xue; Yong Yu
Online advertising represents a growing part of the revenues of major Internet service providers such as Google and Yahoo. A commonly used strategy is to place advertisements (ads) on the search result pages according to the users' submitted queries. Relevant ads are likely to be clicked by a user and to increase the revenues of both advertisers and publishers. However, bid phrases defined by ad-owners are usually contained in limited number of ads. Directly matching user queries with bid phrases often results in finding few appropriate ads. To address this shortcoming, query expansion is often used to increase the chances to match the ads. Nevertheless, query expansion on top of the traditional inverted index faces efficiency issues such as high time complexity and heavy I/O costs. Moreover, precision cannot always be improved, sometimes even hurt due to the involvement of additional noise.
   In this paper, we propose an efficient ad search solution relying on a block-based index able to tackle the issues associated with query expansion. Our index structure places clusters of similar bid phrases in corresponding blocks with their associated ads. It reduces the number of merge operations significantly during query expansion and allows sequential scans rather than random accesses, saving I/O costs. We adopt flexible block sizes according to the clustering results of bid phrases to further optimize the index structure for efficient ad search. The pre-computation of such clusters is achieved through an agglomerative iterative clustering algorithm. Finally, we adapt the spreading activation mechanism to return the top-k relevant ads, improving search precision. The experimental results of our prototype, AdSearch, show that we can indeed return a larger number of relevant ads without sacrificing execution speed.
Keywords: ad rank, ad search, agglomerative clustering, block-based index, efficient query expansion
Query dependent pseudo-relevance feedback based on wikipedia BIBAKFull-Text 59-66
  Yang Xu; Gareth J. F. Jones; Bin Wang
Pseudo-relevance feedback (PRF) via query-expansion has been proven to be eîective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.
Keywords: entity, information retrieval, pseudo-relevance feedback, query expansion, wikipedia
Segment-level display time as implicit feedback: a comparison to eye tracking BIBAKFull-Text 67-74
  Georg Buscher; Ludger van Elst; Andreas Dengel
We examine two basic sources for implicit relevance feedback on the segment level for search personalization: eye tracking and display time. A controlled study has been conducted where 32 participants had to view documents in front of an eye tracker, query a search engine, and give explicit relevance ratings for the results. We examined the performance of the basic implicit feedback methods with respect to improved ranking and compared their performance to a pseudo relevance feedback baseline on the segment level and the original ranking of a Web search engine.
   Our results show that feedback based on display time on the segment level is much coarser than feedback from eye tracking. But surprisingly, for re-ranking and query expansion it did work as well as eye-tracking-based feedback. All behavior-based methods performed significantly better than our non-behavior-based baseline and especially improved poor initial rankings of the Web search engine.
   The study shows that segment-level display time yields comparable results as eye-tracking-based feedback. Thus, it should be considered in future personalization systems as an inexpensive but precise method for implicit feedback.
Keywords: display time, eye tracking, implicit feedback, personalization

Speech and linguistic processing

Addressing morphological variation in alphabetic languages BIBAKFull-Text 75-82
  Paul McNamee; Charles Nicholas; James Mayfield
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normalize surface forms, and thereby address the problem of not finding documents that contain words related to query terms through infectional or derivational morphology. However, rule-based stemmers are not available for every language and it is unclear which methods for coping with morphology are most effective. In this paper we investigate an assortment of techniques for representing text and compare these approaches using data sets in eighteen languages and five different writing systems.
   We find character n-gram tokenization to be highly effective. In half of the languages examined n-grams outperform unnormalized words by more than 25%; in highly infective languages relative improvements over 50% are obtained. In languages with less morphological richness the choice of tokenization is not as critical and rule-based stemming can be an attractive option, if available. We also conducted an experiment to uncover the source of n-gram power and a causal relationship between the morphological complexity of a language and n-gram effectiveness was demonstrated.
Keywords: CLIR, character n-grams, morphology, stemming, tokenization
Web derived pronunciations for spoken term detection BIBAKFull-Text 83-90
  Dogan Can; Erica Cooper; Arnab Ghoshal; Martin Jansche; Sanjeev Khudanpur; Bhuvana Ramabhadran; Michael Riley; Murat Saraclar; Abhinav Sethy; Morgan Ulinski; Christopher White
Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically first converted to a lexical or phonetic representation using automatic speech recognition (ASR). The first step in searching through indexes built on these representations is the generation of pronunciations for named entities and foreign language query terms. This paper summarizes the results of the work conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term Detection team, on mining the web for pronunciations and analyzing their impact on spoken term detection. We will first present methods to use the vast amount of pronunciation information available on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic representations from ad-hoc transcriptions. We then present an analysis of the effectiveness of using these pronunciations to represent Out-Of-Vocabulary (OOV) query terms on the performance of a spoken term detection (STD) system. We will provide comparisons of Web pronunciations against automated techniques for pronunciation generation as well as pronunciations generated by human experts. Our results cover a range of speech indexes based on lattices, confusion networks and one-best transcriptions at both word and word fragments levels.
Keywords: open vocabulary, pronunciation normalization, spoken term detection, web pronunciation extraction
Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search BIBAKFull-Text 91-98
  J. Scott Olsson; Douglas W. Oard
Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that Out-Of-Vocabulary (OOV) query terms can significantly reduce retrieval effectiveness when that tuning is not performed. Further experiments demonstrate, however, that retrieval effectiveness for queries with OOV terms can be substantially improved by combining evidence from LVCSR with additional evidence from vocabulary-independent Ranked Utterance Retrieval (RUR). The combination is performed by using relevance judgments from held-out topics to learn generic (i.e., topic-independent), smooth, non-decreasing transformations from LVCSR and RUR system scores to probabilities of topical relevance. Evaluated using a CLEF collection that includes topics, spontaneous conversational speech audio, and relevance judgments, the system recovers 57% of the mean uninterpolated average precision that could have been obtained through LVCSR domain tuning for very short queries (or 41% for longer queries).
Keywords: ranked utterance retrieval, speech retrieval, vocabulary-independent spoken term detection

Retrieval models I

Risky business: modeling and exploiting uncertainty in information retrieval BIBAKFull-Text 99-106
  Jianhan Zhu; Jun Wang; Ingemar J. Cox; Michael J. Taylor
Most retrieval models estimate the relevance of each document to a query and rank the documents accordingly. However, such an approach ignores the uncertainty associated with the estimates of relevancy. If a high estimate of relevancy also has a high uncertainty, then the document may be very relevant or not relevant at all. Another document may have a slightly lower estimate of relevancy but the corresponding uncertainty may be much less. In such a circumstance, should the retrieval engine risk ranking the first document highest, or should it choose a more conservative (safer) strategy that gives preference to the second document? There is no definitive answer to this question, as it depends on the risk preferences of the user and the information retrieval system. In this paper we present a general framework for modeling uncertainty and introduce an asymmetric loss function with a single parameter that can model the level of risk the system is willing to accept. By adjusting the risk preference parameter, our approach can effectively adapt to users' different retrieval strategies.
   We apply this asymmetric loss function to a language modeling framework and a practical risk-aware document scoring function is obtained. Our experiments on several TREC collections show that our "risk-averse" approach significantly improves the Jelinek-Mercer smoothing language model, and a combination of our "risk-averse" approach and the Jelinek-Mercer smoothing method generally outperforms the Dirichlet smoothing method. Experimental results also show that the "risk-averse" approach, even without smoothing from the collection statistics, performs as well as three commonly-adopted retrieval models, namely, the Jelinek-Mercer and Dirichlet smoothing methods, and BM25 model.
Keywords: language models, loss functions, probabilistic retrieval models, probability ranking principle, risk adjustment
Approximating true relevance distribution from a mixture model based on irrelevance data BIBAKFull-Text 107-114
  Peng Zhang; Yuexian Hou; Dawei Song
Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often a mixture of relevant and irrelevant documents. As a result, the derived distribution is actually a mixture model, which has long been limiting the performance of PRF. This is particularly the case when we deal with difficult queries where the truly relevant documents in D are very sparse. In this situation, it is often easier to identify a small number of seed irrelevant documents, which can form a seed irrelevant distribution. Then, a fundamental and challenging problem arises: solely based on the mixed distribution and a seed irrelevance distribution, how to automatically generate an optimal approximation of the true relevance distribution? In this paper, we propose a novel distribution separation model (DSM) to tackle this problem. Theoretical justifications of the proposed algorithm are given. Evaluation results from our extensive simulated experiments on several large scale TREC data sets demonstrate the effectiveness of our method, which outperforms a well respected PRF Model, the Relevance Model (RM), as well as the use of RM on D with the seed negative documents directly removed.
Keywords: distribution separation model, irrelevant data, pseudo-relevance feedback, true relevance distribution
Portfolio theory of information retrieval BIBAKFull-Text 115-122
  Jun Wang; Jianhan Zhu
This paper studies document ranking under uncertainty. It is tackled in a general situation where the relevance predictions of individual documents have uncertainty, and are dependent between each other. Inspired by the Modern Portfolio Theory, an economic theory dealing with investment in financial markets, we argue that ranking under uncertainty is not just about picking individual relevant documents, but about choosing the right combination of relevant documents. This motivates us to quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance; the latter serves as a measure of risk, which was rarely studied for document ranking in the past. Through the analysis of the mean and variance, we show that an optimal rank order is the one that balancing the overall relevance (mean) of the ranked list against its risk level (variance). Based on this principle, we then derive an efficient document ranking algorithm. It generalizes the well-known probability ranking principle (PRP) by considering both the uncertainty of relevance predictions and correlations between retrieved documents. Moreover, the benefit of diversification is mathematically quantified; we show that diversifying documents is an effective way to reduce the risk of document ranking. Experimental results in text retrieval confirm performance.
Keywords: document ranking under uncertainty, mean-variance analysis, modern portfolio theory, risk management, the probability ranking principle

Web 2.0

A statistical comparison of tag and query logs BIBAKFull-Text 123-130
  Mark J. Carman; Mark Baillie; Robert Gwadera; Fabio Crestani
We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.
Keywords: folksonomy, query log, tag, web search
Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications BIBAKFull-Text 131-138
  Chen Lin; Jiang-Ming Yang; Rui Cai; Xin-Jing Wang; Wei Wang
The huge amount of knowledge in web communities has motivated the research interests in threaded discussions. The dynamic nature of threaded discussions poses lots of challenging problems for computer scientists. Although techniques such as semantic models and structural models have been shown to be useful in a number of areas, they are inefficient in understanding threaded discussions due to three reasons: (I) as most of users read existing messages before posting, posts in a discussion thread are temporally dependent on the previous ones; It causes the semantics and structure to be coupled with each other in threaded discussions; (II) in online discussion threads, there are a lot of junk posts which are useless and may disturb content analysis; and (III) it is very hard to judge the quality of a post. In this paper, we propose a sparse coding-based model named SMSS to Simultaneously Model Semantics and Structure of threaded discussions. The model projects each post into a topic space, and approximates each post by a linear combination of previous posts in the same discussion thread. Meanwhile, the model also imposes two sparse constraints to force a sparse post reconstruction in the topic space and a sparse post approximation from previous posts. The sparse properties effectively take into account the characteristics of threaded discussions. Towards the above three problems, we demonstrate the competency of our model in three applications: reconstructing reply structure of threaded discussions, identifying junk posts, and finding experts in a given board/sub-board in web communities. Experimental results show encouraging performance of the proposed SMSS model in all these applications.
Keywords: expert finding, junk identification, reply reconstruction, sparse coding, threaded discussion
Enhancing cluster labeling using wikipedia BIBAKFull-Text 139-146
  David Carmel; Haggai Roitman; Naama Zwerdling
This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling.
   Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.
Keywords: cluster labeling, wikipedia


Compressing term positions in web indexes BIBAKFull-Text 147-154
  Hao Yan; Shuai Ding; Torsten Suel
Large search engines process thousands of queries per second on billions of pages, making query processing a major factor in their operating costs. This has led to a lot of research on how to improve query throughput, using techniques such as massive parallelism, caching, early termination, and inverted index compression. We focus on techniques for compressing term positions in web search engine indexes. Most previous work has focused on compressing docID and frequency data, or position information in other types of text collections. Compression of term positions in web pages is complicated by the fact that term occurrences tend to cluster within documents but not across document boundaries, making it harder to exploit clustering effects. Also, typical access patterns for position data are different from those for docID and frequency data. We perform a detailed study of a number of existing and new techniques for compressing position data in web indexes. We also study how to efficiently access position data for ranking functions that take proximity features into account.
Keywords: index compression, inverted index, search engines
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce BIBAKFull-Text 155-162
  Jimmy Lin
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of "more like this" queries in the life sciences domain. Three MapReduce algorithms are introduced: one based on brute force, a second where the problem is treated as large-scale ad hoc retrieval, and a third based on the Cartesian product of postings lists. Each algorithm supports one or more approximations that trade effectiveness for efficiency, the characteristics of which are studied experimentally. Results show that the brute force algorithm is the most efficient of the three when exact similarity is desired. However, the other two algorithms support approximations that yield large efficiency gains without significant loss of effectiveness.
Keywords: distributed algorithms, hadoop
Efficiency trade-offs in two-tier web search systems BIBAKFull-Text 163-170
  Ricardo Baeza-Yates; Vanessa Murdock; Claudia Hauff
Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost.
Keywords: distributed systems, pre-retrieval prediction, tiered indexing

Question answering

A classification-based approach to question answering in discussion boards BIBAKFull-Text 171-178
  Liangjie Hong; Brian D. Davison
Discussion boards and online forums are important platforms for people to share information. Users post questions or problems onto discussion boards and rely on others to provide possible solutions and such question-related content sometimes even dominates the whole discussion board. However, to retrieve this kind of information automatically and effectively is still a non-trivial task. In addition, the existence of other types of information (e.g., announcements, plans, elaborations, etc.) makes it difficult to assume that every thread in a discussion board is about a question. We consider the problems of identifying question-related threads and their potential answers as classification tasks. Experimental results across multiple datasets demonstrate that our method can significantly improve the performance in both question detection and answer finding subtasks. We also do a careful comparison of how different types of features contribute to the final result and show that non-content features play a key role in improving overall performance. Finally, we show that a ranking scheme based on our classification approach can yield much better performance than prior published methods.
Keywords: classification, discussion boards, online forums, question answering
Ranking community answers by modeling question-answer relationships via analogical reasoning BIBAKFull-Text 179-186
  Xin-Jing Wang; Xudong Tu; Dan Feng; Lei Zhang
The method of finding high-quality answers has significant impact on user satisfaction in community question answering systems. However, due to the lexical gap between questions and answers as well as spam typically existing in user-generated content, filtering and ranking answers is very challenging. Previous solutions mainly focus on generating redundant features, or finding textual clues using machine learning techniques; none of them ever consider questions and their answers as relational data but instead model them as independent information. Moreover, they only consider the answers of the current question, and ignore any previous knowledge that would be helpful to bridge the lexical and semantic gap. We assume that answers are connected to their questions with various types of latent links, i.e. positive indicating high-quality answers, negative links indicating incorrect answers or user-generated spam, and propose an analogical reasoning-based approach which measures the analogy between the new question-answer linkages and those of relevant knowledge which contains only positive links; the candidate answer which has the most analogous link is assumed to be the best answer. We conducted experiments based on 29.8 million Yahoo!Answer question-answer threads and showed the effectiveness of our approach.
Keywords: analogical reasoning, community question answering, probabilistic relational modeling, ranking
A syntactic tree matching approach to finding similar questions in community-based qa services BIBAKFull-Text 187-194
  Kai Wang; Zhaoyan Ming; Tat-Seng Chua
While traditional question answering (QA) systems tailored to the TREC QA task work relatively well for simple questions, they do not suffice to answer real world questions. The community-based QA systems offer this service well, as they contain large archives of such questions where manually crafted answers are directly available. However, finding similar questions in the QA archive is not trivial. In this paper, we propose a new retrieval framework based on syntactic tree structure to tackle the similar question matching problem. We build a ground-truth set from Yahoo! Answers, and experimental results show that our method outperforms traditional bag-of-word or tree kernel based methods by 8.3% in mean average precision. It further achieves up to 50% improvement by incorporating semantic features as well as matching of potential answers. Our model does not rely on training, and it is demonstrated to be robust against grammatical errors as well.
Keywords: Yahoo! answer, question answering, question matching, syntactic structure

Recommenders I

On social networks and collaborative recommendation BIBAKFull-Text 195-202
  Ioannis Konstas; Vassilios Stathopoulos; Joemon M. Jose
Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency for these systems to encourage the creation of virtual networks among their users by allowing them to establish bonds of friendship and thus provide a novel and direct medium for the exchange of data.
   We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks.
   In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.
Keywords: collaborative filtering, social network, social tagging, web 2.0 IR
Learning to recommend with social trust ensemble BIBAKFull-Text 203-210
  Hao Ma; Irwin King; Michael R. Lyu
As an indispensable technique in the field of Information Filtering, Recommender System has been well studied and developed both in academia and in industry recently. However, most of current recommender systems suffer the following problems: (1) The large-scale and sparse data of the user-item matrix seriously affect the recommendation quality. As a result, most of the recommender systems cannot easily deal with users who have made very few ratings. (2) The traditional recommender systems assume that all the users are independent and identically distributed; this assumption ignores the connections among users, which is not consistent with the real world recommendations. Aiming at modeling recommender systems more accurately and realistically, we propose a novel probabilistic factor analysis framework, which naturally fuses the users' tastes and their trusted friends' favors together. In this framework, we coin the term Social Trust Ensemble to represent the formulation of the social trust restrictions on the recommender systems. The complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations, while the experimental results show that our method performs better than the state-of-the-art approaches.
Keywords: matrix factorization, recommender systems, social network, social trust ensemble
Fast nonparametric matrix factorization for large-scale collaborative filtering BIBAKFull-Text 211-218
  Kai Yu; Shenghuo Zhu; John Lafferty; Yihong Gong
With the sheer growth of online user data, it becomes challenging to develop preference learning algorithms that are sufficiently flexible in modeling but also affordable in computation. In this paper we develop nonparametric matrix factorization methods by allowing the latent factors of two low-rank matrix factorization methods, the singular value decomposition (SVD) and probabilistic principal component analysis (pPCA), to be data-driven, with the dimensionality increasing with data size. We show that the formulations of the two nonparametric models are very similar, and their optimizations share similar procedures. Compared to traditional parametric low-rank methods, nonparametric models are appealing for their flexibility in modeling complex data dependencies. However, this modeling advantage comes at a computational price -- it is highly challenging to scale them to large-scale problems, hampering their application to applications such as collaborative filtering. In this paper we introduce novel optimization algorithms, which are simple to implement, which allow learning both nonparametric matrix factorization models to be highly efficient on large-scale problems. Our experiments on EachMovie and Netflix, the two largest public benchmarks to date, demonstrate that the nonparametric models make more accurate predictions of user ratings, and are computationally comparable or sometimes even faster in training, in comparison with previous state-of-the-art parametric matrix factorization models.
Keywords: collaborative filtering, matrix factorization, nonparametric models

Web Retrieval I

Building enriched document representations using aggregated anchor text BIBAKFull-Text 219-226
  Donald Metzler; Jasmine Novak; Hang Cui; Srihari Reddy
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search. It is common practice to enrich a document's standard textual representation with all of the anchor text associated with its incoming hyperlinks. However, this approach does not help match relevant pages with very few inlinks. In this paper, we propose a method for overcoming anchor text sparsity by enriching document representations with anchor text that has been aggregated across the hyperlink graph. This aggregation mechanism acts to smooth, or diffuse, anchor text within a domain. We rigorously evaluate our proposed approach on a large web search test collection. Our results show the approach significantly improves retrieval effectiveness, especially for longer, more difficult queries.
Keywords: anchor text, link structure, term weighting, web search
Using anchor texts with their hyperlink structure for web search BIBAKFull-Text 227-234
  Zhicheng Dou; Ruihua Song; Jian-Yun Nie; Ji-Rong Wen
As a good complement to page content, anchor texts have been extensively used, and proven to be useful, in commercial search engines. However, anchor texts have been assumed to be independent, whether they come from the same Web site or not. Intuitively, an anchor text from unrelated Web sites should be considered as stronger evidence than that from the same site. This paper proposes two new methods to take into account the possible relationships between anchor texts. We consider two relationships in this paper: links from the same site and links from related sites. The importance assigned to the anchor texts in these two situations is discounted. Experimental results show that these two new models outperform the baseline model which assumes independence between hyperlinks.
Keywords: anchor text, hyperlink structure, web site relationship
Link analysis for private weighted graphs BIBAKFull-Text 235-242
  Jun Sakuma; Shigenobu Kobayashi
Link analysis methods have been used successfully for knowledge discovery from the link structure of mutually linking entities. Existing link analysis methods have been inherently designed based on the fact that the entire link structure of the target graph is observable such as public web documents; however, link information in graphs in the real world, such as human relationship or economic activities, is rarely open to public. If link analysis can be performed using graphs with private links in a privacy-preserving way, it enables us to rank entities connected with private ties, such as people, organizations, or business transactions. In this paper, we present a secure link analysis for graphs with private links by means of cryptographic protocols. Our solutions are designed as privacy-preserving expansions of well-known link analysis methods, PageRank and HITS. The outcomes of our protocols are completely equivalent to those of PageRank and HITS. Furthermore, our protocols theoretically guarantee that the private link information possessed by each node is not revealed to other nodes. We demonstrate the efficiency of our solution by experimental studies, comparing with existing solutions, such as secure function evaluation, decentralized spectral analysis, and privacy-preserving link-analysis.
Keywords: hits, link analysis, pagerank, privacy, ranking

Learning to rank I

Learning to rank for quantity consensus queries BIBAKFull-Text 243-250
  Somnath Banerjee; Soumen Chakrabarti; Ganesh Ramakrishnan
Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX and TREC. Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs with ground truth taken from TREC QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20% better MAP and NDCG compared to the best-known collective rankers, and are 35% better than scoring snippets independent of each other.
Keywords: aggregating evidence from snippets, learning to rank, quantity search
Learning in a pairwise term-term proximity framework for information retrieval BIBAKFull-Text 251-258
  Ronan Cummins; Colm O'Riordan
Traditional ad hoc retrieval models do not take into account the closeness or proximity of terms. Document scores in these models are primarily based on the occurrences or non-occurrences of query-terms considered independently of each other. Intuitively, documents in which query-terms occur closer together should be ranked higher than documents in which the query-terms appear far apart.
   This paper outlines several term-term proximity measures and develops an intuitive framework in which they can be used to fully model the proximity of all query-terms for a particular topic. As useful proximity functions may be constructed from many proximity measures, we use a learning approach to combine proximity measures to develop a useful proximity function in the framework. An evaluation of the best proximity functions show that there is a significant improvement over the baseline ad hoc retrieval model and over other more recent methods that employ the use of single proximity measures.
Keywords: information retrieval, learning to rank, proximity
Robust sparse rank learning for non-smooth ranking measures BIBAKFull-Text 259-266
  Zhengya Sun; Tao Qin; Qing Tao; Jue Wang
Recently increasing attention has been focused on directly optimizing ranking measures and inducing sparsity in learning models. However, few attempts have been made to relate them together in approaching the problem of learning to rank. In this paper, we consider the sparse algorithms to directly optimize the Normalized Discounted Cumulative Gain (NDCG) which is a widely-used ranking measure. We begin by establishing a reduction framework under which we reduce ranking, as measured by NDCG, to the importance weighted pairwise classification. Furthermore, we provide a sound theoretical guarantee for this reduction, bounding the realized NDCG regret in terms of a properly weighted pairwise classification regret, which implies that good performance can be robustly transferred from pairwise classification to ranking. Based on the converted pairwise loss function, it is conceivable to take into account sparsity in ranking models and to come up with a gradient possessing certain performance guarantee. For the sake of achieving sparsity, a novel algorithm named RSRank has also been devised, which performs L1 regularization using truncated gradient descent. Finally, experimental results on benchmark collection confirm the significant advantage of RSRank in comparison with several baseline methods.
Keywords: information retrieval, learn to rank, rsrank, truncated gradient

Information extraction

Named entity recognition in query BIBAKFull-Text 267-274
  Jiafeng Guo; Gu Xu; Xueqi Cheng; Hang Li
This paper addresses the problem of Named Entity Recognition in Query (NERQ), which involves detection of the named entity in a given query and classification of the named entity into predefined classes. NERQ is potentially useful in many applications in web search. The paper proposes taking a probabilistic approach to the task using query log data and Latent Dirichlet Allocation. We consider contexts of a named entity (i.e., the remainders of the named entity in queries) as words of a document, and classes of the named entity as topics. The topic model is constructed by a novel and general learning method referred to as WS-LDA (Weakly Supervised Latent Dirichlet Allocation), which employs weakly supervised learning (rather than unsupervised learning) using partially labeled seed entities. Experimental results show that the proposed method based on WS-LDA can accurately perform NERQ, and outperform the baseline methods.
Keywords: named entity recognition, topic model
A 2-poisson model for probabilistic coreference of named entities for improved text retrieval BIBAKFull-Text 275-282
  Seung-Hoon Na; Hwee Tou Ng
Text retrieval queries frequently contain named entities. The standard approach of term frequency weighting does not work well when estimating the term frequency of a named entity, since anaphoric expressions (like he, she, the movie, etc) are frequently used to refer to named entities in a document, and the use of anaphoric expressions causes the term frequency of named entities to be underestimated. In this paper, we propose a novel 2-Poisson model to estimate the frequency of anaphoric expressions of a named entity, without explicitly resolving the anaphoric expressions. Our key assumption is that the frequency of anaphoric expressions is distributed over named entities in a document according to the probabilities of whether the document is elite for the named entities. This assumption leads us to formulate our proposed Co-referentially Enhanced Entity Frequency (CEEF). Experimental results on the text collection of TREC Blog Track show that CEEF achieves significant and consistent improvements over state-of-the-art retrieval methods using standard term frequency estimation. In particular, we achieve a 3% increase of MAP over the best performing run of TREC 2008 Blog Track.
Keywords: co-reference resolution, entity retrieval, term frequency
Mining employment market via text block detection and adaptive cross-domain information extraction BIBAKFull-Text 283-290
  Tak-Lam Wong; Wai Lam; Bo Chen
We have developed an approach for analyzing online job advertisements in different domains (industries) from different regions worldwide. Our approach is able to extract precise information from the text content supporting useful employment market analysis locally and globally. A major component in our approach is an information extraction framework which is composed of two challenging tasks. The first task is to detect unformatted text blocks automatically based on an unsupervised learning model. Identifying these useful text blocks through this learning model allows the generation of highly effective features for the next task which is text fragment extraction learning. The task of text fragment extraction learning is formulated as a domain adaptation model for text fragment classification. One advantage of our approach is that it can easily adapt to a large number of online job advertisements in different and new domains. Extensive experiments have been conducted to demonstrate the effectiveness and flexibility of our approach.
Keywords: information extraction, text block detection

Retrieval models II

A proximity language model for information retrieval BIBAKFull-Text 291-298
  Jinglei Zhao; Yeogirl Yun
The proximity of query terms in a document is a very important information to enable ranking models go beyond the "bag of word" assumption in information retrieval. This paper studies the integration of term proximity information into the unigram language modeling. A new proximity language model (PLM) is proposed which views query terms' proximity centrality as the Dirichlet hyper-parameter that weights the parameters of the unigram document language model. Several forms of proximity measure are developed to be used in PLM which could compute a query term's proximate centrality in a specific document. In experiments, the proximity language model is compared with the basic language model and previous works that combine the proximity information with language model using linear score combination. The experiment results show that the proposed model performs better in both top precision and average precision.
Keywords: information retrieval, proximity language model, proximity model
Positional language models for information retrieval BIBAKFull-Text 299-306
  Yuanhua Lv; ChengXiang Zhai
Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters.
   In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.
Keywords: passage retrieval, positional language models, proximity
A Bayesian learning approach to promoting diversity in ranking for biomedical information retrieval BIBAKFull-Text 307-314
  Xiangji Huang; Qinmin Hu
In this paper, we propose a Bayesian learning approach to promoting diversity for information retrieval in biomedicine and a re-ranking model to improve retrieval performance in the biomedical domain. First, the re-ranking model computes the maximum posterior probability of the hidden property corresponding to each retrieved passage. Then it iteratively groups the passages into subsets according to their properties. Finally, these passages are re-ranked from the subsets as our output. There is no need for our proposed method to use any external biomedical resource. We evaluate our Bayesian learning approach by conducting extensive experiments on the TREC 2004-2007 Genomics data sets. The experimental results show the effectiveness of the proposed Bayesian learning approach for promoting diversity in ranking for biomedical information retrieval on four years TREC data sets.
Keywords: bayesian learning, biomedical IR, promoting diversity

Vertical search

Sources of evidence for vertical selection BIBAKFull-Text 315-322
  Jaime Arguello; Fernando Diaz; Jamie Callan; Jean-Francois Crespo
Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.
Keywords: aggregated search, distributed information retrieval, query classification, resource selection, vertical selection
Adaptation of offline vertical selection predictions in the presence of user feedback BIBAKFull-Text 323-330
  Fernando Diaz; Jaime Arguello
Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous approaches to vertical selection is that methods have not explicitly modeled user feedback. However, production search systems often record a variety of feedback information. In this paper, we present algorithms for vertical selection which adapt to user feedback. We evaluate algorithms using a novel simulator which models performance of a vertical selector situated in realistic query traffic.
Keywords: aggregated search, distributed information retrieval, resource selection, simulation, user feedback, vertical selection
A probabilistic topic-based ranking framework for location-sensitive domain information retrieval BIBAKFull-Text 331-338
  Huajing Li; Zhisheng Li; Wang-Chien Lee; Dik Lun Lee
It has been observed that many queries submitted to search engines are location-sensitive. Traditional search techniques fail to interpret the significance of such geographical clues and as such are unable to return highly relevant search results. Although there have been efforts in the literature to support location-aware information retrieval, critical challenges still remain in terms of search result quality and data scalability. In this paper, we propose an innovative probabilistic ranking framework for domain information retrieval where users are interested in a set of location-sensitive topics. Our proposed method recognizes the geographical distribution of topic influence in the process of ranking documents and models it accurately using probabilistic Gaussian Process classifiers. Additionally, we demonstrate the effectiveness of the proposed ranking framework by implementing it in a Web search service for NBA news. Extensive performance evaluation is performed on real Web document collections, which confirms that our proposed mechanism works significantly better (around 29.7% averagely using DCG20 measure) than other popular location-aware information retrieval techniques in ranking quality.
Keywords: gaussian process, ranking, spatial search, topic model

Clickthrough models

Entropy-biased models for query representation on the click graph BIBAKFull-Text 339-346
  Hongbo Deng; Irwin King; Michael R. Lyu
Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-eliminated. Nor do they handle heterogeneous query-URL pairs well. In this paper, we investigate and develop a novel entropy-biased framework for modeling click graphs. The intuition behind this model is that various query-URL pairs should be treated differently, i.e., common clicks on less frequent but more specific URLs are of greater value than common clicks on frequent and general URLs. Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click on a certain URL. The IQF weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature. We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation. To illustrate our methodology, we conduct experiments with the AOL query log data for query similarity analysis and query suggestion tasks. Experimental results demonstrate that considerable improvements in performance are obtained with our entropy-biased models. Moreover, our method can also be applied to other bipartite graphs.
Keywords: click frequency, click graph, entropy-biased model, inverse query frequency, user frequency
Click-through prediction for news queries BIBAKFull-Text 347-354
  Arnd Christian König; Michael Gamon; Qiang Wu
A growing trend in commercial search engines is the display of specialized content such as news, products, etc. interleaved with web search results. Ideally, this content should be displayed only when it is highly relevant to the search query, as it competes for space with "regular" results and advertisements. One measure of the relevance to the search query is the click-through rate the specialized content achieves when displayed; hence, if we can predict this click-through rate accurately, we can use this as the basis for selecting when to show specialized content. In this paper, we consider the problem of estimating the click-through rate for dedicated news search results. For queries for which news results have been displayed repeatedly before, the click-through rate can be tracked online; however, the key challenge for which previously unseen queries to display news results remains. In this paper we propose a supervised model that offers accurate prediction of news click-through rates and satisfies the requirement of adapting quickly to emerging news events.
Keywords: CTR prediction, click-through rates, news search
Smoothing clickthrough data for web search ranking BIBAKFull-Text 355-362
  Jianfeng Gao; Wei Yuan; Xiao Li; Kefeng Deng; Jian-Yun Nie
Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Both methods are evaluated on real-world data in three Web search domains. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.
Keywords: clickthrough data, discounting, learning to rank, random walk, smoothing, web search

Interactive search

Predicting user interests from contextual information BIBAKFull-Text 363-370
  Ryen W. White; Peter Bailey; Liwei Chen
Search and recommendation systems must include contextual information to effectively model users' interests. In this paper, we present a systematic study of the effectiveness of five variant sources of contextual information for user interest modeling. Post-query navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity. Therefore we conducted this study with a focus on Website recommendations rather than search results. The five contextual information sources used are: social, historic, task, collection, and user interaction. We evaluate the utility of these sources, and overlaps between them, based on how effectively they predict users' future interests. Our findings demonstrate that the sources perform differently depending on the duration of the time window used for future prediction, and that context overlap outperforms any isolated source. Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.
Keywords: context, user interest modeling, website recommendation
A comparison of query and term suggestion features for interactive searching BIBAKFull-Text 371-378
  Diane Kelly; Karl Gyllstrom; Earl W. Bailey
Query formulation is one of the most difficult and important aspects of information seeking and retrieval. Two techniques, term relevance feedback and query suggestion, provide methods to help users formulate queries, but each is limited in different ways. In this research we combine these two techniques by automatically creating query suggestions using term relevance feedback techniques. To evaluate our approach, we conducted an interactive information retrieval study with 55 subjects and 20 topics. Each subject completed four topics, half with a term suggestion system and half with a query suggestion system. We also investigated the source of the suggestions: approximately half of all subjects were provided with system-generated suggestions, while half were provided with user-generated suggestions. Results show that subjects used more query suggestions than term suggestions and saved more documents with these suggestions, even though there were no significant differences in performance. Subjects preferred the query suggestion system and rated it higher along a number of dimensions including its ability to help them think of new approaches to searching. Qualitative data provided insight into subjects' usage and ratings, and indicated that subjects often used the suggestions even when they did not click on them.
Keywords: interactive searching, query suggestion, relevance feedback, term suggestion
An aspectual interface for supporting complex search tasks BIBAKFull-Text 379-386
  Robert Villa; Iván Cantador; Hideo Joho; Joemon M. Jose
With the increasing importance of search systems on the web, there is a continuing push to design interfaces which are a better match with the kinds of real-world tasks in which users are engaged. In this paper, we consider how broad, complex search tasks may be supported via the search interface. In particular, we consider search tasks which may be composed of multiple aspects, or multiple related subtasks. For example, in decision making tasks the user may investigate multiple possible solutions before settling on a single, final solution, while other tasks, such as report writing, may involve searching on multiple interrelated topics.
   A search interface is presented which is designed to support such broad search tasks, allowing a user to create search aspects, each of which models an independent subtask of some larger task. The interface is built on the intuition that users should be able to structure their searching environment when engaged on complex search tasks, where the act of structuring and organization may aid the user in understanding his or her task. A user study was carried out which compared our aspectual interface to a standard web-search interface. The results suggest that an aspectual interface can aid users when engaged in broad search tasks where the search aspects must be identified during searching; for a task where search aspects were pre-defined, no advantage over the baseline was found. Results for a decision making task were less clear cut, but show some evidence for improved task performance.
Keywords: aspectual search, complex search tasks, exploratory search

Multimedia I (music and video)

Combining audio content and social context for semantic music discovery BIBAKFull-Text 387-394
  Douglas R. Turnbull; Luke Barrington; Gert Lanckriet; Mehrdad Yazdani
When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a query-by-text music retrieval system. We consider two representations of the acoustic content (related to timbre and harmony) and two social sources (social tags and web documents). We then compare three algorithms that combine these information sources: calibrated score averaging (CSA), RankBoost, and kernel combination support vector machines (KC-SVM). We demonstrate empirically that each of these algorithms is superior to algorithms that use individual information sources.
Keywords: calibrated score averaging, combining data sources, kernel combination svm, music ir, rankboost
Automatic video tagging using content redundancy BIBAKFull-Text 395-402
  Stefan Siersdorfer; Jose San Pedro; Mark Sanderson
The analysis of the leading social video sharing platform YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. In this paper, we show that this redundancy can provide useful information about connections between videos. We reveal these links using robust content-based video analysis techniques and exploit them for generating new tag assignments. To this end, we propose different tag propagation methods for automatically obtaining richer video annotations. Our techniques provide the user with additional information about videos, and lead to enhanced feature representations for applications such as automatic data organization and search. Experiments on video clustering and classification as well as a user evaluation demonstrate the viability of our approach.
Keywords: automatic tagging, content-based links, data organization, neighbor-based tagging, tag propagation, video duplicates
CompositeMap: a novel framework for music similarity measure BIBAKFull-Text 403-410
  Bingjun Zhang; Jialie Shen; Qiaoliang Xiang; Ye Wang
With the continuing advances in data storage and communication technology, there has been an explosive growth of music information from different application domains. As an effective technique for organizing, browsing, and searching large data collections, music information retrieval is attracting more and more attention. How to measure and model the similarity between different music items is one of the most fundamental yet challenging research problems. In this paper, we introduce a novel framework based on a multimodal and adaptive similarity measure for various applications. Distinguished from previous approaches, our system can effectively combine music properties from different aspects into a compact signature via supervised learning. In addition, an incremental Locality Sensitive Hashing algorithm has been developed to support efficient retrieval processes with different kinds of queries. Experimental results based on two large music collections reveal various advantages of the proposed framework including effectiveness, efficiency, adaptiveness, and scalability.
Keywords: browsing, music, personalization, recommendation, search, similarity measure

Federated, distributed search

Quantifying performance and quality gains in distributed web search engines BIBAKFull-Text 411-418
  B. Barla Cambazoglu; Vassilis Plachouras; Ricardo Baeza-Yates
Distributed search engines based on geographical partitioning of a central Web index emerge as a feasible solution to the immense growth of the Web, user bases, and query traffic. However, there is still lack of research in quantifying the performance and quality gains that can be achieved by such architectures. In this paper, we develop various cost models to evaluate the performance benefits of a geographically distributed search engine architecture based on partial index replication and query forwarding. Specifically, we focus on possible performance gains due to the distributed nature of query processing and Web crawling processes. We show that any response time gain achieved by distributed query processing can be utilized to improve search relevance as the use of complex but more accurate algorithms can now be enabled for document ranking. We also show that distributed Web crawling leads to better Web coverage and try to see if this improves the search quality. We verify the validity of our claims over large, real-life datasets via simulations.
Keywords: data centers, distributed query processing, index partitioning, web crawling
SUSHI: scoring scaled samples for server selection BIBAKFull-Text 419-426
  Paul Thomas; Milad Shokouhi
Modern techniques for distributed information retrieval use a set of documents sampled from each server, but these samples have been underutilised in server selection. We describe a new server selection algorithm, SUSHI, which unlike earlier algorithms can make full use of the text of each sampled document and which does not need training data. SUSHI can directly optimise for many common cases, including high precision retrieval, and by including a simple stopping condition can do so while reducing network traffic.
   Our experiments compare SUSHI with alternatives and show it achieves the same effectiveness as the best current methods while being substantially more efficient, selecting as few as 20% as many servers.
Keywords: document samples
Effective query expansion for federated search BIBAKFull-Text 427-434
  Milad Shokouhi; Leif Azzopardi; Paul Thomas
While query expansion techniques have been shown to improve retrieval performance in a centralized setting, they have not been well studied in a federated setting. In this paper, we consider how query expansion may be adapted to federated environments and propose several new methods: where focused expansions are used in a selective fashion to produce specific queries for each source (or a set of sources). On a number of different testbeds, we show that focused query expansion can significantly outperform the previously proposed global expansion method, and -- contrary to earlier work -- show that query expansion can improve performance over standard federated retrieval.
   These findings motivate further research examining the different methods for query expansion, and other forms of system and user interaction, in order to continue improving the performance of interactive federated search systems.
Keywords: distributed information retrieval, query expansion


From networks to human behavior BIBAKFull-Text 435
  Albert-László Barabási
Highly interconnected networks with amazingly complex topology describe systems as diverse as the World Wide Web, our cells, social systems or the economy. Recent studies indicate that these networks are the result of self-organizing processes governed by simple but generic laws, resulting in architectural features that makes them much more similar to each other than one would have expected by chance. I will discuss the amazing order characterizing our interconnected world and its implications to network robustness and spreading processes. Finally, most of these networks are driven by the temporal patterns characterizing human activity. I will use communication and web browsing data to show that there is deep order in the temporal domain of human dynamics, and discuss the different ways to understand and model the emerging patterns.
Keywords: network science, world wide web

Evaluation and measurement I

On rank correlation and the distance between rankings BIBAKFull-Text 436-443
  Ben Carterette
Rank correlation statistics are useful for determining whether a there is a correspondence between two measurements, particularly when the measures themselves are of less interest than their relative ordering. Kendall's -- in particular has found use in Information Retrieval as a "meta-evaluation" measure: it has been used to compare evaluation measures, evaluate system rankings, and evaluate predicted performance. In the meta-evaluation domain, however, correlations between systems confound relationships between measurements, practically guaranteeing a positive and significant estimate of -- regardless of any actual correlation between the measurements. We introduce an alternative measure of distance between rankings that corrects this by explicitly accounting for correlations between systems over a sample of topics, and moreover has a probabilistic interpretation for use in a test of statistical significance. We validate our measure with theory, simulated data, and experiment.
Keywords: distance measures, evaluation, information retrieval, rank correlation
Score adjustment for correction of pooling bias BIBAKFull-Text 444-451
  William Webber; Laurence A. F. Park
Information retrieval systems are evaluated against test collections of topics, documents, and assessments of which documents are relevant to which topics. Documents are chosen for relevance assessment by pooling runs from a set of existing systems. New systems can return unassessed documents, leading to an evaluation bias against them. In this paper, we propose to estimate the degree of bias against an unpooled system, and to adjust the system's score accordingly. Bias estimation can be done via leave-one-out experiments on the existing, pooled systems, but this requires the problematic assumption that the new system is similar to the existing ones. Instead, we propose that all systems, new and pooled, be fully assessed against a common set of topics, and the bias observed against the new system on the common topics be used to adjust scores on the existing topics. We demonstrate using resampling experiments on TREC test sets that our method leads to a marked reduction in error, even with only a relatively small number of common topics, and that the error decreases as the number of topics increases.
Keywords: evaluation, retrieval experiment, system measurement
Towards methods for the collective gathering and quality control of relevance assessments BIBAKFull-Text 452-459
  Gabriella Kazai; Natasa Milic-Frayling; Jamie Costello
Growing interest in online collections of digital books and video content motivates the development and optimization of adequate retrieval systems. However, traditional methods for collecting relevance assessments to tune system performance are challenged by the nature of digital items in such collections, where assessors are faced with a considerable effort to review and assess content by extensive reading, browsing, and within-document searching. The extra strain is caused by the length and cohesion of the digital item and the dispersion of topics within it. We propose a method for the collective gathering of relevance assessments using a social game model to instigate participants' engagement. The game provides incentives for assessors to follow a predefined review procedure and makes provisions for the quality control of the collected relevance judgments. We discuss the approach in detail, and present the results of a pilot study conducted on a book corpus to validate the approach. Our analysis reveals intricate relationships between the affordances of the system, the incentives of the social game, and the behavior of the assessors. We show that the proposed game design achieves two designated goals: the incentive structure motivates endurance in assessors and the review process encourages truthful assessment.
Keywords: relevance assessments, social game, test collection construction

Learning to rank II

On the local optimality of LambdaRank BIBAKFull-Text 460-467
  Pinar Donmez; Krysta M. Svore; Christopher J. C. Burges
A machine learning approach to learning to rank trains a model to optimize a target evaluation measure with respect to training data. Currently, existing information retrieval measures are impossible to optimize directly except for models with a very small number of parameters. The IR community thus faces a major challenge: how to optimize IR measures of interest directly. In this paper, we present a solution. Specifically, we show that LambdaRank, which smoothly approximates the gradient of the target measure, can be adapted to work with four popular IR target evaluation measures using the same underlying gradient construction. It is likely, therefore, that this construction is extendable to other evaluation measures. We empirically show that LambdaRank finds a locally optimal solution for mean NDCG@10, mean NDCG, MAP and MRR with a 99% confidence rate. We also show that the amount of effective training data varies with IR measure and that with a sufficiently large training set size, matching the training optimization measure to the target evaluation measure yields the best accuracy.
Keywords: learning to rank, web search
Document selection methodologies for efficient and effective learning-to-rank BIBAKFull-Text 468-475
  Javed A. Aslam; Evangelos Kanoulas; Virgil Pavlu; Stefan Savev; Emine Yilmaz
Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms.
   In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation-depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets.
   Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.
Keywords: document selection methodologies, evaluation, learning-to-rank, sampling
An improved markov random field model for supporting verbose queries BIBAKFull-Text 476-483
  Matthew Lease
Recent work in supervised learning of term-based retrieval models has shown significantly improved accuracy can often be achieved via better model estimation. In this paper, we show retrieval accuracy with Metzler and Croft's Markov random field (MRF) approach can be similarly improved via supervised learning. While the original MRF method estimates a parameter for each of its three feature classes from data, parameters within each class are set via a uniform weighting scheme adopted from the standard unigram. We conjecture greater MRF retrieval accuracy should be possible by better estimating within-class parameters, particularly for verbose queries employing natural language terms. Retrieval experiments with these queries on three TREC document collections show our improved MRF consistently out-performs both the original MRF and supervised unigram baselines. Additional experiments using blind-feedback and evaluation with optimal weighting demonstrate both the immediate value and further potential of our method.
Keywords: information retrieval, markov random field, supervised learning

Multimedia II (images and tags)

Placing flickr photos on a map BIBAKFull-Text 484-491
  Pavel Serdyukov; Vanessa Murdock; Roelof van Zwol
In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames (http://www.geonames.org visited May 2009), a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.
Keywords: flickr, image localization, language models
An automatic translation of tags for multimedia contents using folksonomy networks BIBAKFull-Text 492-499
  Tae-Gil Noh; Seong-Bae Park; Hee-Geun Yoon; Sang-Jo Lee; Se-Young Park
This paper proposes a novel method to translate tags attached to multimedia contents for cross-language retrieval. The main issue in this problem is the sense disambiguation of tags given with few textual contexts. In order to solve this problem, the proposed method represents both tags and its translation candidates as networks of co-occurring tags since a network allows richer expression of contexts than other expressions such as co-occurrence vectors. The method translates a tag by selecting the optimal one from possible candidates based on a network similarity even when neither the textual contexts nor sophisticated language resources are available. The experiments on the MIR Flickr-2008 test set show that the proposed method achieves 90.44% accuracy in translating tags from English into German, which is significantly higher than the baseline methods of a frequency based translation and a co-occurrence-based translation.
Keywords: cross-language ir, image ir, meta-data, social tagging
CrowdReranking: exploring multiple search engines for visual search reranking BIBAKFull-Text 500-507
  Yuan Liu; Tao Mei; Xian-Sheng Hua
Most existing approaches to visual search reranking predominantly focus on mining information within the initial search results. However, the initial ranked list cannot provide enough cues for reranking by itself due to the typically unsatisfying visual search performance. This paper presents a new method for visual search reranking called CrowdReranking, which is characterized by mining relevant visual patterns from image search results of multiple search engines which are available on the Internet. Observing that different search engines might have different data sources for indexing and methods for ranking, it is reasonable to assume that there exist different search results yet certain common visual patterns relevant to a given query among those results. We first construct a set of visual words based on the local image patches collected from multiple image search engines. We then explicitly detect two kinds of visual patterns, i.e., salient and concurrent patterns, among the visual words. Theoretically, we formalize reranking as an optimization problem on the basis of the mined visual patterns and propose a close-form solution. Empirically, we conduct extensive experiments on several real-world search engines and one benchmark dataset, and show that the proposed CrowdReranking is superior to the state-of-the-art works.
Keywords: data mining, search reranking, visual search

Evaluation and measurement II

Including summaries in system evaluation BIBAKFull-Text 508-515
  Andrew Turpin; Falk Scholer; Kalvero Jarvelin; Mingfang Wu; J. Shane Culpepper
In batch evaluation of retrieval systems, performance is calculated based on predetermined relevance judgements applied to a list of documents returned by the system for a query. This evaluation paradigm, however, ignores the current standard operation of search systems which require the user to view summaries of documents prior to reading the documents themselves.
   In this paper we modify the popular IR metrics MAP and P@10 to incorporate the summary reading step of the search process, and study the effects on system rankings using TREC data. Based on a user study, we establish likely disagreements between relevance judgements of summaries and of documents, and use these values to seed simulations of summary relevance in the TREC data. Re-evaluating the runs submitted to the TREC Web Track, we find the average correlation between system rankings and the original TREC rankings is 0.8 (Kendall τ), which is lower than commonly accepted for system orderings to be considered equivalent. The system that has the highest MAP in TREC generally remains amongst the highest MAP systems when summaries are taken into account, but other systems become equivalent to the top ranked system depending on the simulated summary relevance.
   Given that system orderings alter when summaries are taken into account, the small amount of effort required to judge summaries in addition to documents (19 seconds vs 88 seconds on average in our data) should be undertaken when constructing test collections.
Keywords: ir evaluation, summaries, trec
When more is less: the paradox of choice in search engine use BIBAKFull-Text 516-523
  Antti Oulasvirta; Janne P. Hukkinen; Barry Schwartz
In numerous everyday domains, it has been demonstrated that increasing the number of options beyond a handful can lead to paralysis and poor choice and decrease satisfaction with the choice. Were this so-called paradox of choice to hold in search engine use, it would mean that increasing recall can actually work counter to user satisfaction if it implies choice from a more extensive set of result items. The existence of this effect was demonstrated in an experiment where users (N=24) were shown a search scenario and a query and were required to choose the best result item within 30 seconds. Having to choose from six results yielded both higher subjective satisfaction with the choice and greater confidence in its correctness than when there were 24 items on the results page. We discuss this finding in the wider context of "choice architecture" -- that is, how result presentation affects choice and satisfaction.
Keywords: relevance judgments, satisfaction, search engines, user interfaces
Where to stop reading a ranked list?: threshold optimization using truncated score distributions BIBAKFull-Text 524-531
  Avi Arampatzis; Jaap Kamps; Stephen Robertson
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query -- document scores and their distribution -- the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
Keywords: distributed retrieval, effectiveness measure optimization, expectation maximization, filtering, fusion, meta-search, probability of relevance, score distribution, score normalization, threshold optimization, truncated distribution

Recommenders II

The wisdom of the few: a collaborative filtering approach based on expert opinions from the web BIBAKFull-Text 532-539
  Xavier Amatriain; Neal Lathia; Josep M. Pujol; Haewoon Kwak; Nuria Oliver
Nearest-neighbor collaborative filtering provides a successful means of generating recommendations for web users. However, this approach suffers from several shortcomings, including data sparsity and noise, the cold-start problem, and scalability. In this work, we present a novel method for recommending items to users based on expert opinions. Our method is a variation of traditional collaborative filtering: rather than applying a nearest neighbor algorithm to the user-rating data, predictions are computed using a set of expert neighbors from an independent dataset, whose opinions are weighted according to their similarity to the user. This method promises to address some of the weaknesses in traditional collaborative filtering, while maintaining comparable accuracy. We validate our approach by predicting a subset of the Netflix data set. We use ratings crawled from a web portal of expert reviews, measuring results both in terms of prediction accuracy and recommendation list precision. Finally, we explore the ability of our method to generate useful recommendations, by reporting the results of a user-study where users prefer the recommendations generated by our approach.
Keywords: collaborative filtering, cosine similarity, experts, nearest neighbors, recommender system, top-n recommendations
Personalized tag recommendation using graph-based ranking on multi-type interrelated objects BIBAKFull-Text 540-547
  Ziyu Guan; Jiajun Bu; Qiaozhu Mei; Chun Chen; Can Wang
Social tagging is becoming increasingly popular in many Web 2.0 applications where users can annotate resources (e.g. Web pages) with arbitrary keywords (i.e. tags). A tag recommendation module can assist users in tagging process by suggesting relevant tags to them. It can also be directly used to expand the set of tags annotating a resource. The benefits are twofold: improving user experience and enriching the index of resources. However, the former one is not emphasized in previous studies, though a lot of work has reported that different users may describe the same concept in different ways. We address the problem of personalized tag recommendation for text documents. In particular, we model personalized tag recommendation as a "query and ranking" problem and propose a novel graph-based ranking algorithm for interrelated multi-type objects. When a user issues a tagging request, both the document and the user are treated as a part of the query. Tags are then ranked by our graph-based ranking algorithm which takes into consideration both relevance to the document and preference of the user. Finally, the top ranked tags are presented to the user as suggestions. Experiments on a large-scale tagging data set collected from Del.icio.us have demonstrated that our proposed algorithm significantly outperforms algorithms which fail to consider the diversity of different users' interests.
Keywords: personalization, ranking, recommender systems, social tagging
Leveraging sources of collective wisdom on the web for discovering technology synergies BIBAKFull-Text 548-555
  Cai-Nicolas Ziegler; Stefan Jung
One of the central tasks of R&D strategy and portfolio management at large technology companies and research institutions refers to the identification of technological synergies throughout the organization. These efforts are geared towards saving resources by consolidating scattered expertise, sharing best practices, and reusing available technologies across multiple product lines. In the past, this task has been done in a manual evaluation process by technical domain experts. While feasible, the major drawback of this approach is the enormous effort in terms of availability and time: For a structured and complete analysis every combination of any two technologies has to be rated explicitly. We present a novel approach that recommends technological synergies in an automated fashion, making use of abundant collective wisdom from the Web, both in pure textual form as well as classification ontologies. Our method has been deployed for practical support of the synergy evaluation process within our company. We have also conducted empirical evaluations based on randomly selected technology pairs so as to benchmark the accuracy of our approach, as compared to a group of general computer science technologists as well as a control group of domain experts.
Keywords: collective wisdom, semantic similarity, text mining, web 2.0

Query formulation

Query side evaluation: an empirical analysis of effectiveness and effort BIBAKFull-Text 556-563
  Leif Azzopardi
Typically, Information Retrieval evaluation focuses on measuring the performance of the system's ability at retrieving relevant information, and not the query's ability. However, the effectiveness of a retrieval system is strongly influenced by the quality of the query submitted. In this paper, the effectiveness and effort of querying is empirically examined in the context of the Principle of Least Effort, Zipf's Law and the Law of Diminishing Returns. This query focused investigation leads to a number of novel findings which should prove useful in the development of future retrieval methods and evaluation techniques. While, also motivating further research into query side evaluation.
Keywords: evaluation, simulation
Reducing long queries using query quality predictors BIBAKFull-Text 564-571
  Giridhar Kumaran; Vitor R. Carvalho
Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30% in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and selecting the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked sub-query chosen by the ranker results in a statistically significant average improvement of 8% on our test sets. Analysis of the results shows that query reduction is well-suited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.
Keywords: long queries, query quality, query reduction, verbose queries
Extracting structured information from user queries with semi-supervised conditional random fields BIBAKFull-Text 572-579
  Xiao Li; Ye-Yi Wang; Alex Acero
When search is against structured documents, it is beneficial to extract information from user queries in a format that is consistent with the backend data structure. As one step toward this goal, we study the problem of query tagging which is to assign each query term to a pre-defined category. Our problem could be approached by learning a conditional random field (CRF) model (or other statistical models) in a supervised fashion, but this would require substantial human-annotation effort. In this work, we focus on a semi-supervised learning method for CRFs that utilizes two data sources: (1) a small amount of manually-labeled queries, and (2) a large amount of queries in which some word tokens have derived labels, i.e., label information automatically obtained from additional resources. We present two principled ways of encoding derived label information in a CRF model. Such information is viewed as hard evidence in one setting and as soft evidence in the other. In addition to the general methodology of how to use derived labels in semi-supervised CRFs, we also present a practical method on how to obtain them by leveraging user click data and an in-domain database that contains structured documents. Evaluation on product search queries shows the effectiveness of our approach in improving tagging accuracies.
Keywords: conditional random fields, information extraction, metadata, semi-supervised learning

Web Retrieval II

The impact of crawl policy on web search effectiveness BIBAKFull-Text 580-587
  Dennis Fetterly; Nick Craswell; Vishwa Vinay
Crawl selection policy has a direct influence on Web search effectiveness, because a useful page that is not selected for crawling will also be absent from search results. Yet there has been little or no work on measuring this effect. We introduce an evaluation framework, based on relevance judgments pooled from multiple search engines, measuring the maximum potential NDCG that is achievable using a particular crawl. This allows us to evaluate different crawl policies and investigate important scenarios like selection stability over multiple iterations. We conduct two sets of crawling experiments at the scale of 1~billion and 100~million pages respectively. These show that crawl selection based on PageRank, indegree and trans-domain indegree all allow better retrieval effectiveness than a simple breadth-first crawl of the same size. PageRank is the most reliable and effective method. Trans-domain indegree can outperform PageRank, but over multiple crawl iterations it is less effective and more unstable. Finally we experiment with combinations of crawl selection methods and per-domain page limits, which yield crawls with greater potential NDCG than PageRank.
Keywords: corpus selection, crawl ordering, web crawling
Optimizing search engine revenue in sponsored search BIBAKFull-Text 588-595
  Yunzhang Zhu; Gang Wang; Junli Yang; Dakan Wang; Jun Yan; Jian Hu; Zheng Chen
Displaying sponsored ads alongside the search results is a key monetization strategy for search engine companies. Since users are more likely to click ads that are relevant to their query, it is crucial for search engine to deliver the right ads for the query and the order in which they are displayed. There are several works investigating on how to learn a ranking function to maximize the number of ad clicks. In this paper, we address a new revenue optimization
   problem and aim to answer the question: how to construct a ranking model that can deliver high quality ads to the user as well as maximize search engine revenue? We introduce two novel methods from different machine learning perspectives, and both of them take the revenue component into careful considerations. The algorithms are built upon the click-through log data with real ad clicks and impressions. The extensively experimental results verify the proposed
   algorithm that can produce more revenue than other methods
   as well as avoid losing relevance accuracy. To provide deep insight into the importance of each feature to search engine revenue, we extract twelve basic features from four categories. The experimental study provides a feature ranking list according to the revenue benefit of each feature.
Keywords: machine learning, ranking, revenue optimization, sponsored search
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies BIBAKFull-Text 596-603
  A Marius Pa#351;ca; Enrique Alfonseca
A weakly-supervised extraction method identifies concepts within conceptual hierarchies, at the appropriate level of specificity (e.g., Bank vs. Institution), to which attributes (e.g., routing number) extracted from unstructured text best apply. The extraction exploits labeled classes of instances acquired from a combination of Web documents and query logs, and inserted into existing conceptual hierarchies. The correct concept is identified within the top three positions on average over gold-standard attributes, which corresponds to higher accuracy than in alternative experiments.
Keywords: class attributes, conceptual hierarchies, knowledge acquisition, named entities, unstructured text, web search


Spam filter evaluation with imprecise ground truth BIBAKFull-Text 604-611
  Gordon V. Cormack; Aleksander Kolcz
When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications. But labels acquired from user feedback or third-party adjudication exhibit higher error rates than the best filters -- even filters trained using the same source of labels. It is appropriate to use naturally occuring labels -- including errors -- as training data in evaluating spam filters. Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness. Any measurement of the filter's error rate will be augmented and perhaps masked by the label error rate. Using two natural sources of labels, we demonstrate automatic and semi-automatic methods that reduce the influence of labeling errors on evaluation, yielding substantially more precise measurements of true filter error rates.
Keywords: classification, email, filtering, noise, spam
Telling experts from spammers: expertise ranking in folksonomies BIBAKFull-Text 612-619
  Michael G. Noll; Ching-man Au Yeung; Nicholas Gibbins; Christoph Meinel; Nigel Shadbolt
With a suitable algorithm for ranking the expertise of a user in a collaborative tagging system, we will be able to identify experts and discover useful and relevant resources through them. We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors. Firstly, an expert should possess a high quality collection of resources, while the quality of a Web resource depends on the expertise of the users who have assigned tags to it. Secondly, an expert should be one who tends to identify interesting or useful resources before other users do. We propose a graph-based algorithm, SPEAR (SPamming-resistant Expertise Analysis and Ranking), which implements these ideas for ranking users in a folksonomy. We evaluate our method with experiments on data sets collected from Delicious.com comprising over 71,000 Web documents, 0.5 million users and 2 million shared bookmarks. We also show that the algorithm is more resistant to spammers than other methods such as the original HITS algorithm and simple statistical measures.
Keywords: collaborative tagging, expertise, folksonomy, ranking, spam
Detecting spammers and content promoters in online video social networks BIBAKFull-Text 620-627
  Fabrício Benevenuto; Tiago Rodrigues; Virgílio Almeida; Jussara Almeida; Marcos Gonçalves
A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users -- promoters -- may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.
   In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.
Keywords: promoter, social media, social networks, spammer, video promotion, video response, video spam


AdOn: an intelligent overlay video advertising system BIBAKFull-Text 628-629
  Jinlian Guo; Tao Mei; Falin Liu; Xian-Sheng Hua
This paper presents a new video advertising system, called AdOn, which supports intelligent overlay video ads. Unlike most current ad-networks such as Youtube that overlay the ads at fixed positions in the videos (e.g., on the bottom fifth of videos 15 seconds in), AdOn is able to automatically detect a set of spatio-temporal nonintrusive positions and associate the contextually relevant ads with these positions. The overlay positions are obtained on the basis of video structuring, face and text detection, as well as visual saliency analysis, so that the intrusiveness to the users can be minimized. The ads are selected according to content-based multimodal relevance so that advertising relevance can be maximized. AdOn represents one of the first attempts towards intelligent overlay video advertising by leveraging video content analysis techniques.
Keywords: contextual relevance, saliency detection, video advertising
Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes BIBAKFull-Text 630-631
  Mark D. Smucker; James Allan; Ben Carterette
Research has shown that little practical difference exists between the randomization, Student's paired t, and bootstrap tests of statistical significance for TREC ad-hoc retrieval experiments with 50 topics. We compared these three tests on runs with topic sizes down to 10 topics. We found that these tests show increasing disagreement as the number of topics decreases. At smaller numbers of topics, the randomization test tended to produce smaller p-values than the t-test for p-values less than 0.1. The bootstrap exhibited a systematic bias towards p-values strictly less than the t-test with this bias increasing as the number of topics decreased. We recommend the use of the randomization test although the t-test appears to be suitable even when the number of topics is small.
Keywords: statistical significance
Annotation of URLs: more than the sum of parts BIBAKFull-Text 632-633
  Max Hinne; Wessel Kraaij; Stephan Raaijmakers; Suzan Verberne; Theo van der Weide; Maarten van der Heijden
Recently a number of studies have demonstrated that search engine logfiles are an important resource to determine the relevance relation between URLs and query terms. We hypothesized that the queries associated with a URL could also be presented as useful URL metadata in a search engine result list, e.g. for helping to determine the semantic category of a URL. We evaluated this hypothesis by a classification experiment based on the DMOZ dataset. Our method can also annotate URLs that have no associated queries.
Keywords: URL annotation, click data
Automatic URL completion and prediction using fuzzy type-ahead search BIBAKFull-Text 634-635
  Jiannan Wang; Guoliang Li; Jianhua Feng
Type-ahead search is a new information-access paradigm, in which systems can find answers to keyword queries "on-the-fly" as a user types in a query. It improves traditional autocomplete search by allowing query keywords to appear at different places in an answer. In this paper we study the problem of automatic URL completion and prediction using fuzzy type-ahead search. That is, we interactively find relevant URLs that contain words matching query keywords, even approximately, as the user types in a query. Supporting fuzzy search is very important when the user has limited knowledge about URLs. We describe the design and implementation of our method, and report the experimental results on Firefox.
Keywords: URL completion, fuzzy search, type-ahead search
Beyond session segmentation: predicting changes in search intent with client-side user interactions BIBAKFull-Text 636-637
  Qi Guo; Eugene Agichtein
Effective search session segmentation "grouping queries according to common task or intent" can be useful for improving relevance, search evaluation, and query suggestion. Previous work has largely attempted to segment search sessions off-line, after the fact. In contrast, we present preliminary investigation of predicting, in real time, whether a user is about to switch interest -- that is, whether the user is about to finish the current search and switch to another search task (or stop searching altogether). We explore an approach for this task using client-side user behavior such as clicks, scrolls, and mouse movements, contextualized by the content of the search result pages and previous searches. Our experiments over thousands of real searches show that we can identify context and user behavior patterns that indicate that a user is about to switch to a new search task. These preliminary results can be helpful for more effective query suggestion and personalization.
Keywords: search behavior modeling, search sessions, user intent inference
Blog distillation using random walks BIBAKFull-Text 638-639
  Mostafa Keikha; Mark James Carman; Fabio Crestani
This paper addresses the blog distillation problem. That is, given a user query find the blogs most related to the query topic. We model the blogosphere as a single graph that includes extra information besides the content of the posts. By performing a random walk on this graph we extract most relevant blogs for each query. Our experiments on the TREC'07 data set show 15% improvement in MAP and 8% improvement in Precision@10 over the Language Modeling baseline.
Keywords: blog retrieval, random walks
A case for improved evaluation of query difficulty prediction BIBAKFull-Text 640-641
  Falk Scholer; Steven Garcia
Query difficulty prediction aims to identify, in advance, how well an information retrieval system will perform when faced with a particular search request. The current standard evaluation methodology involves calculating a correlation coefficient, to indicate how strongly the predicted query difficulty is related with an actual system performance measure, usually Average Precision. We run a series of experiments based on predictors that have been shown to perform well in the literature, comparing these across different TREC runs. Our results demonstrate that the current evaluation methodology is severely limited. Although it can be used to demonstrate the performance of a predictor for a single system, such performance is not consistent over a variety of retrieval systems. We conclude that published results in the query difficulty area are generally not comparable, and recommend that prediction be evaluated against a spectrum of underlying search systems.
Keywords: evaluation, query difficulty prediction
Characterizing the subjectivity of topics BIBAKFull-Text 642-643
  Marc-Allen Cartright; Elif Aktolga; Jeffrey Dalton
A document or web page in isolation may appear completely reasonable, but may represent a biased perspective on the topic being discussed. Given the topic of a document, we propose new metrics provocativeness and balance that suggest when the topic could be controversial. We explore the use of these metrics to characterize the subjectivity of the topics in the TREC Blog Track.
Keywords: opinion classification, sentiment analysis, subjectivity/polarity detection
Classifying library catalogue by author profiling BIBAKFull-Text 644-645
  Tadashi Nomoto
This paper presents a novel approach to classifying library records by making use of what we call "author profile," a representation of an author's expertise along a library classification. Coupled with a string kernel classifier, the idea is shown to bring a significant improvement over a baseline.
Keywords: SVM, author profiling, library classification, string kernel
Cluster-based query expansion BIBAKFull-Text 646-647
  Inna Gelfer Kalmanovich; Oren Kurland
We demonstrate the merits of using document clusters that are created offline to improve the overall effectiveness and performance robustness of a state-of-the-art pseudo-feedback-based query expansion method -- the relevance model.
Keywords: clusters, query expansion, relevance model
Comparing both relevance and robustness in selection of web ranking functions BIBAKFull-Text 648-649
  Fan Li; Xin Li; Shihao Ji; Zhaohui Zheng
In commercial search engines, a ranking function is selected for deployment mainly by comparing the relevance measurements over candidates. In this paper we suggest to select Web ranking functions according to both their relevance and robustness to the changes that may lead to relevance degradation over time. We argue that the ranking robustness can be effectively measured by taking into account the ranking score distribution across Web pages. We then improve NDCG with two new metrics and show their superiority in terms of stability to ranking score turbulence and stability in function selection.
Keywords: NDCG, ranking robustness, web ranking
A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation BIBAKFull-Text 650-651
  Christof Monz; Wouter Weerkamp
This paper describes a simple clustering approach to person name disambiguation of retrieved documents. The methods are based on standard IR concepts and do not require any task-specific features. We compare different term-weighting and indexing methods and evaluate their performance against the Web People Search task (WePS). Despite their simplicity these approaches achieve very competitive performance.
Keywords: clustering, person name disambiguation
Compression-based document length prior for language models BIBAKFull-Text 652-653
  Javier Parapar; David E. Losada; Álvaro Barreiro
The inclusion of document length factors has been a major topic in the development of retrieval models. We believe that current models can be further improved by more refined estimations of the document's scope. In this poster we present a new document length prior that uses the size of the compressed document. This new prior is introduced in the context of Language Modeling with Dirichlet smoothing. The evaluation performed on several collections shows significant improvements in effectiveness.
Keywords: compression, document length, document priors, language models
Concept representation based video indexing BIBAKFull-Text 654-655
  Meng Wang; Yan Song; Xian-Sheng Hua
This poster introduces a novel concept-based video indexing approach. It is developed based on a rich set of base concepts, of which the models are available. Then, for a given concept with several labeled samples, we combine the base concepts to fit it and its model can thus be obtained accordingly. Empirical results demonstrate that this method can achieve great performance even with very limited labeled data. We have compared different representation approaches including both sparse and non-sparse methods. Our conclusion is that the sparse method will lead to much better performance.
Keywords: concept detection, sparse representation, video indexing
Context transfer in search advertising BIBAKFull-Text 656-657
  Hila Becker; Andrei Broder; Evgeniy Gabrilovich; Vanja Josifovski; Bo Pang
We define and study the process of context transfer in search advertising, which is the transition of a user from the context of Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.
Keywords: landing page taxonomy, online advertising
Counting ancestors to estimate authority BIBAKFull-Text 658-659
  Jian Wang; Brian D. Davison
The AncestorRank algorithm calculates an authority score by using just one characteristic of the web graph-the number of ancestors per node. For scalability, we estimate the number of ancestors by using a probabilistic counting algorithm. We also consider the case in which ancestors which are closer to the node have more influence than those farther from the node. Thus we further apply a decay factor delta on the contributions from successively earlier ancestors. The resulting authority score is used in combination with a content-based ranking algorithm. Our experiments show that as long as delta is in the range of [0.1, 0.9], AncestorRank can greatly improve BM25 performance, and in our experiments is often better than PageRank.
Keywords: PageRank, link analysis, probabilistic counting
Cross language name matching BIBAKFull-Text 660-661
  J. Scott McCarley
Cross language information retrieval methods are used to determine which segments of Arabic language documents match name-based English queries. We investigate and contrast a word-based translation model with a character-based transliteration model in order to handle spelling variation and previously unseen names. We measure performance by making a novel use of the training data from the 2007 ACE Entity Translation.
Keywords: algorithms, named entities, sentence retrieval
Deep versus shallow judgments in learning to rank BIBAKFull-Text 662-663
  Emine Yilmaz; Stephen Robertson
Much research in learning to rank has been placed on developing sophisticated learning methods, treating the training set as a given. However, the number of judgments in the training set directly affects the quality of the learned system. Given the expense of obtaining relevance judgments for constructing training data, one often has a limited budget in terms of how many judgments he can get. The major problem then is how to distribute this judgment effort across different queries. In this paper, we investigate the tradeoff between the number of queries and the number of judgments per query when training sets are constructed. In particular, we show that up to a limit, training sets with more queries but shallow (less) judgments per query are more cost effective than training sets with less queries but deep (more) judgments per query.
Keywords: learning to rank, relevance judgments, training
Developing energy efficient filtering systems BIBAKFull-Text 664-665
  Leif Azzopardi; Wim Vanderbauwhede; Mahmoud Moadeli
Processing large volumes of information generally requires massive amounts of computational power, which consumes a significant amount of energy. An emerging challenge is the development of "environmentally friendly" systems that are not only efficient in terms of time, but also energy efficient. In this poster, we outline our initial efforts at developing greener filtering systems by employing Field Programmable Gate Arrays (FPGA) to perform the core information processing task. FPGAs enable code to be executed in parallel at a chip level, while consuming only a fraction of the power of a standard (von Neuman style) processor. On a number of test collections, we demonstrate that the FPGA filtering system performs 10-20 times faster than the Itanium based implementation, resulting in considerable energy savings.
Keywords: FPGA, efficiency, filtering
Enhancing topical ranking with preferences from click-through data BIBAKFull-Text 666-667
  Yi Chang; Anlei Dong; Ciya Liao; Zhaohui Zheng
To overcome the training data insufficiency problem for dedicated model in topical ranking, this paper proposes to utilize click-through data to improve learning. The efficacy of click-through data is explored under the framework of preference learning. The empirical experiment on a commercial search engine shows that, the model trained with the dedicated labeled data combined with skip-next preferences could beat the baseline model and the generic model in NDCG5 for 4.9% and 2.4% respectively.
Keywords: click-through data, preference learning, topical ranking
Equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis BIBAKFull-Text 668-669
  Wei Peng
This paper establishes a connection between NMF and PLSA on multi-way data, called NTF and T-PLSA respectively. Two types of T-PLSA models are proven to be equivalent to non-negative PARAFAC and non-negative Tucker3. This paper also shows that by running NTF and T-PLSA alternatively, they can jump out of each other's local minima and achieve a better clustering solution.
Keywords: NTF, PLSA, multi-way data, tensor
The ESA retrieval model revisited BIBAKFull-Text 670-671
  Maik Anderka; Benno Stein
Among the retrieval models that have been proposed in the last years, the ESA model of Gabrilovich and Markovitch received much attention. The authors report on a significant improvement in the retrieval performance, which is explained with the semantic concepts introduced by the document collection underlying ESA. Their explanation appears plausible but our analysis shows that the connections are more involved and that the "concept hypothesis" does not hold. In our contribution we analyze several properties that in fact affect the retrieval performance. Moreover, we introduce a formalization of ESA, which reveals its close connection to existing retrieval models.
Keywords: explicit semantic analysis, retrieval models
Estimating query performance using class predictions BIBAKFull-Text 672-673
  Kevyn Collins-Thompson; Paul N. Bennett
We investigate using topic prediction data, as a summary of document content, to compute measures of search result quality. Unlike existing quality measures such as query clarity that require the entire content of the top-ranked results, class-based statistics can be computed efficiently online, because class information is compact enough to precompute and store in the index. In an empirical study we compare the performance of class-based statistics to their language-model counterparts for predicting two measures: query difficulty and expansion risk. Our findings suggest that using class predictions can offer comparable performance to full language models while reducing computation overhead.
Keywords: query difficulty, text classification
Evaluating effects of machine translation accuracy on cross-lingual patent retrieval BIBAKFull-Text 674-675
  Atsushi Fujii; Masao Utiyama; Mikio Yamamoto; Takehito Utsuro
We organized a machine translation (MT) task at the Seventh NTCIR Workshop. Participating groups were requested to machine translate sentences in patent documents and also search topics for retrieving patent documents across languages. We analyzed the relationship between the accuracy of MT and its effects on the retrieval accuracy.
Keywords: NTCIR, cross-lingual information retrieval, evaluation measures, machine translation
Evaluating web search using task completion time BIBAKFull-Text 676-677
  Ya Xu; David Mease
We consider experiments to measure the quality of a web search algorithm based on how much total time users take to complete assigned search tasks using that algorithm. We first analyze our data to verify that there is in fact a negative relationship between a user's total search time and a user's satisfaction for the types of tasks under consideration. Secondly, we fit a model with the user's total search time as the response to compare two different search algorithms. Finally, we propose an alternative experimental design which we demonstrate to be a substantial improvement over our current design in terms of variance reduction and efficiency.
Keywords: evaluation metrics, experiment design, interactive ir and visualization, question answering
An evaluation of entity and frequency based query completion methods BIBAKFull-Text 678-679
  Edgar Meij; Peter Mika; Hugo Zaragoza
We present a semantic approach to suggesting query completions which leverages entity and type information. When compared to a frequency-based approach, we show that such information mostly helps rare queries.
Keywords: query completion, query log analysis, web search
Evolutionary document summarization for disaster management BIBAKFull-Text 680-681
  Dingding Wang; Li Zheng; Tao Li; Yi Deng
In this poster, we develop an evolutionary document summarization system for discovering the changes and differences in each phase of a disaster evolution. Given a collection of document streams describing an event, our system generates a short summary delivering the main development theme of the event by extracting the most representative and discriminative sentences at each phase. Experimental results on the collection of press releases for Hurricane Wilma in 2005 demonstrate the efficacy of our proposal.
Keywords: evolutionary summarization
Experiments in CLIR using fuzzy string search based on surface similarity BIBAKFull-Text 682-683
  Sethuramalingam Subramaniam; Anil Kumar Singh; Pradeep Dasigi; Vasudeva Varma
Cross Language Information Retrieval (CLIR) between languages of the same origin is an interesting topic of research. The similarity of the writing systems used for these languages can be used effectively to not only improve CLIR, but to overcome the problems of textual variations, textual errors, and even the lack of linguistic resources like stemmers to an extent. We have conducted CLIR experiments between three languages which use writing systems (scripts) of Brahmi-origin, namely Hindi, Bengali and Marathi. We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity. In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching.
Keywords: fuzzy text search, spelling variations, surface similarity
Feature selection for automatic taxonomy induction BIBAKFull-Text 684-685
  Hui Yang; Jamie Callan
Most existing automatic taxonomy induction systems exploit one or more features to induce a taxonomy; nevertheless there is no systematic study examining which are the best features for the task under various conditions. This paper studies the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels. The evaluation shows that different conditions need different technologies or different combination of the technologies. In particular, co-occurrence and lexico-syntactic patterns are good features for is-a, sibling and part-of relations; contextual, co-occurrence, patterns, and syntactic features work well for concrete terms; co-occurrence works well for abstract terms.
Keywords: ontology learning, semantic features, taxonomy
Finding advertising keywords on video scripts BIBAKFull-Text 686-687
  Jung-Tae Lee; Hyungdong Lee; Hee-Seon Park; Young-In Song; Hae-Chang Rim
A key to success to contextual in-video advertising is finding advertising keywords on video contents effectively, but there has been little literature in the area so far. This paper presents some preliminary results of our learning-based system that finds relevant advertising keywords on particular scene of video contents using their scripts. The system is trained with not only features proven useful in earlier studies but novel features that reflect the situation of a targeted scene. Experimental results show that the new features are potentially helpful for enhancing the accuracy of keyword extraction for contextual in-video advertising.
Keywords: contextual in-video advertising, keyword extraction
Fitting score distribution for blog opinion retrieval BIBAKFull-Text 688-689
  Ben He; Jie Peng; Iadh Ounis
Current blog opinion retrieval approaches cannot be applied if the topic relevance and opinion score distributions by rank are dissimilar. This problem severely limits the feasibility of these approaches. We propose to tackle this problem by fitting the distribution of opinion scores, which replaces the original topic relevance score distribution with the simulated one. Our proposed score distribution fitting method markedly enhances the feasibility of a state-of-the-art dictionary-based opinion retrieval approach. Evaluation on a standard TREC blog test collection shows significant improvements over high quality topic relevance baselines.
Keywords: blog search, distribution fitting, opinion finding
A graph-based approach to mining multilingual word associations from wikipedia BIBAKFull-Text 690-691
  Zheng Ye; Xiangji Huang; Hongfei Lin
In this paper, we propose a graph-based approach to constructing a multilingual association dictionary from Wikipedia, in which we exploit two kinds of links in Wikipedia articles to associate multilingual words and concepts together in a graph. The mined association dictionary is applied in cross language information retrieval (CLIR) to verify its quality. We evaluate our approach on four CLIR data sets and the experimental results show that it is possible to mine a good multilingual association dictionary from Wikipedia articles.
Keywords: CLIR, association dictionary, wikipedia
Has adhoc retrieval improved since 1994? BIBAFull-Text 692-693
  Timothy G. Armstrong; Alistair Moffat; William Webber; Justin Zobel
Evaluation forums such as TREC allow systematic measurement and comparison of information retrieval techniques. The goal is consistent improvement, based on reliable comparison of the effectiveness of different approaches and systems. In this paper we report experiments to determine whether this goal has been achieved. We ran five publicly available search systems, in a total of seventeen different configurations, against nine TREC adhoc-style collections, spanning 1994 to 2005. These runsets were then used as a benchmark for reassessing the relative effectiveness of the original TREC runs for those collections. Surprisingly, there appears to have been no overall improvement in effectiveness for either median or top-end TREC submissions, even after allowing for several possible confounds. We therefore question whether the effectiveness of adhoc information retrieval has improved over the past decade and a half.
High precision retrieval using relevance-flow graph BIBAKFull-Text 694-695
  Jangwon Seo; Jiwoon Jeon
Traditional bag-of-words information retrieval models use aggregated term statistics to measure the relevance of documents, making it difficult to detect non-relevant documents that contain many query terms by chance or in the wrong context. In-depth document analysis is needed to filter out these deceptive documents. In this paper, we hypothesize that truly relevant documents have relevant sentences in predictable patterns. Our experimental results show that we can successfully identify and exploit these patterns to significantly improve retrieval precision at top ranks.
Keywords: re-ranking, relevance flow, relevant sentence
Identifying the original contribution of a document via language modeling BIBAKFull-Text 696-697
  Benyah Shaparenko; Thorsten Joachims
One goal of text mining is to provide readers with automatic methods for quickly finding the key ideas in individual documents and whole corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributes to a corpus, focusing on self-referential diachronic corpora such as research publications, blogs, email, and news articles. Our statistical model of passage impact defines (interesting) original content through a combination of impact and novelty, and it can be used to identify the most original passages in a document. Unlike heuristic approaches, this statistical model is extensible and open to analysis. We evaluate the approach on both synthetic and real data, showing that the passage impact model outperforms a heuristic baseline method.
Keywords: language modeling, original contributions, topic modeling
The importance of manual assessment in link discovery BIBAKFull-Text 698-699
  Wei Che Huang; Andrew Trotman; Shlomo Geva
Using a ground truth extracted from the Wikipedia, and a ground truth created through manual assessment, we show that the apparent performance advantage seen in machine learning approaches to link discovery are an artifact of trivial links that are actively rejected by manual assessors.
Keywords: INEX, assessment, evaluation, link discovery, wikipedia
Improving search relevance for implicitly temporal queries BIBKFull-Text 700-701
  Donald Metzler; Rosie Jones; Fuchun Peng; Ruiqiang Zhang
Keywords: query log analysis, temporal queries, web search
Improving user confidence in cultural heritage aggregated results BIBAKFull-Text 702-703
  Junte Zhang; Alia Amin; Henriette S. M. Cramer; Vanessa Evers; Lynda Hardman
State of the art web search systems enable aggregation of information from many sources. Users are challenged to assess the reliability of information from different sources. We report on an empirical user study on the effect of displaying credibility ratings of multiple cultural heritage sources (e.g. museum websites, art blogs) on users' search performance and selection. The results of our online interactive study (N=122) show that when explicitly presenting these ratings, people become significantly more confident in their selection of information from aggregated results.
Keywords: aggregated search, credibility, cultural heritage, quantitative user study
Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization BIBAKFull-Text 704-705
  Massih R. Amini; Nicolas Usunier
This paper presents a transductive approach to learn ranking functions for extractive multi-document summarization. At the first stage, the proposed approach identifies topic themes within a document collection, which help to identify two sets of relevant and irrelevant sentences to a question. It then iteratively trains a ranking function over these two sets of sentences by optimizing a ranking loss and fitting a prior model built on keywords. The output of the function is used to find further relevant and irrelevant sentences. This process is repeated until a desired stopping criterion is met.
Keywords: learning to rank, multi-document summarization
Integrating clusters created offline with query-specific clusters for document retrieval BIBAKFull-Text 706-707
  Lior Meister; Oren Kurland; Inna Gelfer Kalmanovich
Previous work on cluster-based document retrieval has used either static document clusters that are created offline, or query-specific (dynamic) document clusters that are created from top-retrieved documents. We present the potential merit of integrating these two types of clusters.
Keywords: cluster-based retrieval, re-ranking
Integrating phrase inseparability in phrase-based model BIBAKFull-Text 708-709
  Lixin Shi; Jian-Yun Nie
In this paper, we propose a new phrase-based IR model, which integrates a measure of "inseparability" of phrases. Our experiments show its high potential to produce large improvements in retrieval effectiveness.
Keywords: inseparability, language model, phrase model
Is spam an issue for opinionated blog post search? BIBAKFull-Text 710-711
  Craig Macdonald; Iadh Ounis; Ian Soboroff
In opinion-finding, the retrieval system is tasked with retrieving not just relevant documents, but those that also express an opinion towards the query target entity. This task has been studied in the context of the blogosphere by groups participating in the 2006-2008 TREC Blog tracks. Spam blogs (splogs) are thought to be a problem on the blogosphere. In this paper, we investigate the extent to which spam has affected the participating groups' retrieval systems over the three years of the TREC Blog track opinion-finding task. Our results show that spam can be an issue, with most systems retrieving some spam for every topic. However, removing spam from the rankings does not markedly change the relative performance of opinion-finding approaches.
Keywords: blogs, opinion-finding, spam, splogs
Is this urgent?: exploring time-sensitive information needs in collaborative question answering BIBAKFull-Text 712-713
  Yandong Liu; Nitya Narasimhan; Venu Vasudevan; Eugene Agichtein
As online Collaborative Question Answering (CQA) services such as Yahoo!Answers and Baidu Knows are attracting users, questions, and answers at an explosive rate, the truly urgent and important questions are increasingly getting lost in the crowd. That is, questions that require immediate responses are pushed out of the way by the trivial but more recently arriving questions. Unlike other questions in collaborative question answering (CQA) for which users might be willing to wait until good answers appear, urgent questions are likely to be of interest to the asker only if answered in the next few minutes or hours. For such questions, late responses are either not useful or are simply not applicable. Unfortunately, current collaborative question-answering systems do not distinguish urgent questions from the rest, and could thus be ineffective for urgent information needs. We explore text- and data-mining methods for automatically identifying urgent questions in the CQA setting. Our results indicate that modeling the question context (i.e., the particular forum/category where the question was posted) can increase classification accuracy compared to the text of the question alone.
Keywords: community question answering, social media
It pays to be picky: an evaluation of thread retrieval in online forums BIBAKFull-Text 714-715
  Jonathan L. Elsas; Jaime G. Carbonell
Online forums host a rich information exchange, often with contributions from many subject matter experts. In this work we evaluate algorithms for thread retrieval in a large and active online forum community. We compare methods that utilize thread structure to a naïve method that treats a thread as a single document. We find that thread structure helps, and additionally selective methods of thread scoring, which only use evidence from a small number of messages in the thread, significantly and consistently outperform inclusive methods which use all the messages in the thread.
Keywords: message boards, online forums, thread search
Knowledge transformation for cross-domain sentiment classification BIBAKFull-Text 716-717
  Tao Li; Vikas Sindhwani; Chris Ding; Yi Zhang
With the explosion of user-generated web2.0 content in the form of blogs, wikis and discussion forums, the Internet has rapidly become a massive dynamic repository of public opinion on an unbounded range of topics. A key enabler of opinion extraction and summarization is sentiment classification: the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a topic of interest. Building high-quality sentiment classifiers using standard text categorization methods is challenging due to the lack of labeled data in a target domain. In this paper, we consider the problem of cross-domain sentiment analysis: can one, for instance, download rated movie reviews from rottentomatoes.com or IMBD discussion forums, learn linguistic expressions and sentiment-laden terms that generally characterize opinionated reviews and then successfully transfer this knowledge to the target domain, thereby building high-quality sentiment models without manual effort? We outline a novel sentiment transfer mechanism based on constrained non-negative matrix tri-factorizations of term-document matrices in the source and target domains. We report some preliminary results with this approach.
Keywords: non-negative matrix factorization, sentiment analysis, transfer learning
K-tree: large scale document clustering BIBAKFull-Text 718-719
  Christopher M. De Vries; Shlomo Geva
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
Keywords: B-tree, K-means, K-tree, document clustering, search tree
A latent topic model for linked documents BIBAKFull-Text 720-721
  Zhen Guo; Shenghuo Zhu; Yun Chi; Zhongfei Zhang; Yihong Gong
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, in this paper we propose a citation-topic (CT) model which assumes a probabilistic generative process for corpora. In the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. Moreover, the CT model contains a random process for selecting the related documents according to the structure of the generative model determined by links and therefore, the transitivity of the relations among documents is captured. We apply the CT model on the document clustering task and the experimental comparisons against several state-of-the-art approaches demonstrate very promising performances.
Keywords: document clustering, topic model
Measuring constraint violations in information retrieval BIBAKFull-Text 722-723
  Ronan Cummins; Colm O'Riordan
Recently, an inductive approach to modelling term-weighting function correctness has provided a number of axioms (constraints), to which all good term-weighting functions should adhere. These constraints have been shown to be theoretically and empirically sound in a number of works. It has been shown that when a term-weighting function breaks one or more of the constraints, it typically indicates sub-optimality of that function. This elegant inductive approach may more accurately model the human process of determining the relevance a document. It is intuitive that a person's notion of relevance changes as terms that are either on or off-topic are encountered in a given document. Ultimately, it would be desirable to be able to mathematically determine the performance of term-weighting functions without the need for test collections.
   Many modern term-weighting functions do not satisfy the constraints in an unconditional manner. However, the degree to which these functions violate the constraints has not been investigated. A comparison between weighting functions from this perspective may shed light on the poor performance of certain functions in certain settings. Moreover, if a correlation exists between performance and the number of violations, measuring the degree of violation could help more accurately predict how a certain scheme will perform on a given collection.
Keywords: axioms, constraints, information retrieval
Measuring the descriptiveness of web comments BIBAKFull-Text 724-725
  Martin Potthast
This paper investigates whether Web comments are of descriptive nature, that is, whether the combined text of a set of comments is similar in topic to the commented object. If so, comments may be used in place of the respective object in all kinds of cross-media retrieval tasks. Our experiments reveal that comments on textual objects are indeed descriptive: 10 comments suffice to expect a high similarity between the comments and the commented text; 100-500 comments suffice to replace the commented text in a ranking task, and to measure the contribution of the commenters beyond the commented text.
Keywords: comment descriptiveness, cross-media retrieval
Mining product reviews based on shallow dependency parsing BIBAKFull-Text 726-727
  Qi Zhang; Yuanbin Wu; Tao Li; Mitsunori Ogihara; Joseph Johnson; Xuanjing Huang
This paper presents a novel method for mining product reviews, where it mines reviews by identifying product features, expressions of opinions and relations between them. By taking advantage of the fact that most of product features are phrases, a concept of shallow dependency parsing is introduced, which extends traditional dependency parsing to phrase level. This concept is then implemented for extracting relation between product features and expressions of opinions. Experimental evaluations show that the mining task can benefit from shallow dependency parsing.
Keywords: dependency parsing, product mining
Modeling facial expressions and peripheral physiological signals to predict topical relevance BIBAKFull-Text 728-729
  Ioannis Arapakis; Ioannis Konstas; Joemon M. Jose; Ioannis Kompatsiaris
By analyzing explicit & implicit feedback information retrieval systems can determine topical relevance and tailor search criteria to the user's needs. In this paper we investigate whether it is possible to infer what is relevant by observing user affective behaviour. The sensory data employed range between facial expressions and peripheral physiological signals. We extract a set of features from the signals and analyze the data using classification methods, such as SVM and KNN. The results of our initial evaluation indicate that prediction of relevance is possible, to a certain extent, and implicit feedback models can benefit from taking into account user affective behavior.
Keywords: affective feedback, classification, facial expression analysis, multimedia retrieval, pattern recognition, physiological signal processing, support vector machines
Modeling search response time BIBAKFull-Text 730-731
  Dan Zhang; Luo Si
Modeling the response time of search engines is an important task for many applications such as resource selection in federated text search. Limited research has been conducted to address this task. Prior research calculated the search response time of all queries in the same way either with the average response time of several sample queries or with a single probability distribution, which is irrelevant to the characteristics of queries. However, the search response time may vary a lot for different types of queries. This paper proposes a novel query-specific and source-specific approach to model search response time. Some training data is acquired by measuring the search response time of some sample queries from a search engine. Then, a query-specific model is estimated with the training data and their corresponding response times by utilizing Ridge Regression. The obtained model can be used to predict search response times for new queries. A set of empirical studies are conducted to show the effectiveness of the proposed method.
Keywords: response time, ridge regression, source selection
Multiclass VisualRank: image ranking method in clustered subsets based on visual features BIBAKFull-Text 732-733
  Mitsuru Ambai; Yuichi Yoshida
This paper proposes Multiclass VisualRank, a method that expands the idea of VisualRank into more than one category of images. Multiclass VisualRank divides images retrieved from search engines into several categories based on distinctive patterns of visual features, and gives ranking within the category. Experimental results show that our method can extract several different image categories relevant to given keyword and gives good ranking scores to retrieved images.
Keywords: clustering, ranking, visual feature
Multiple approaches to analysing query diversity BIBAKFull-Text 734-735
  Paul Clough; Mark Sanderson; Murad Abouammoh; Sergio Navarro; Monica Paramita
In this paper we examine user queries with respect to diversity: providing a mix of results across different interpretations. Using two query log analysis techniques (click entropy and reformulated queries), 14.9 million queries from the Microsoft Live Search log were analysed. We found that a broad range of query types may benefit from diversification. Additionally, although there is a correlation between word ambiguity and the need for diversity, the range of results users may wish to see for an ambiguous query stretches well beyond traditional notions of word sense.
Keywords: ambiguity, diversity
Multiview clustering: a late fusion approach using latent models BIBAKFull-Text 736-737
  Eric Bruno; Stephane Marchand-Maillet
Multi-view clustering is an important problem in information retrieval due to the abundance of data offering many perspectives and generating multi-view representations. We investigate in this short note a late fusion approach for multi-view clustering based on the latent modeling of cluster-cluster relationships. We derive a probabilistic multi-view clustering model outperforming an early-fusion approach based on multi-view feature correlation analysis.
Keywords: data fusion, latent models, multi-view clustering
On efficient posting list intersection with multicore processors BIBKFull-Text 738-739
  Shirish Tatikonda; Flavio Junqueira; B. Barla Cambazoglu; Vassilis Plachouras
Keywords: list intersection, multicores, parallel query processing
On perfect document rankings for expert search BIBAKFull-Text 740-741
  Craig Macdonald; Iadh Ounis
Expert search systems often employ a document search component to identify on-topic documents, which are then used to identify people likely to have relevant expertise. This work investigates the impact of the retrieval effectiveness of the underlying document search component. It has been previously shown that applying techniques to the underlying document search component that normally improve the effectiveness of a document search engine also have a positive impact on the retrieval effectiveness of the expert search engine. In this work, we experiment with fictitious perfect document rankings, to attempt to identify an upper-bound in expert search system performance. Our surprising results infer that non-relevant documents can bring useful expertise evidence, and that removing these does not lead to an upper-bound in retrieval performance.
Keywords: document search, expert search
On single-pass indexing with MapReduce BIBAKFull-Text 742-743
  Richard M. C. McCreadie; Craig Macdonald; Iadh Ounis
Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.
Keywords: MapReduce, indexing
On the relative age of spam and ham training samples for email filtering BIBAKFull-Text 744-745
  Gordon V. Cormack; Jose-Marcio Martins da Cruz
Email spam filters are commonly trained on a sample of spam and ham (non-spam) messages. We investigate the effect on filter performance of using samples of spam and ham messages sent months before those to be filtered. Our results show that filter performance deteriorates with the overall age of spam and ham samples, but at different rates. Spam and ham samples of different ages may be mixed to advantage, provided temporal cues are elided.
Keywords: email, filtering, spam, training
Page Hunt: improving search engines using human computation games BIBAKFull-Text 746-747
  Hao Ma; Raman Chandrasekar; Chris Quirk; Abhishek Gupta
There has been a lot of work on evaluating and improving the relevance of web search engines. In this paper, we suggest using human computation games to elicit data from players that can be used to improve search. We describe Page Hunt, a single-player game. The data elicited using Page Hunt has several applications including providing metadata for pages, providing query alterations for use in query refinement, and identifying ranking issues. We describe an experiment with over 340 game players, and highlight some interesting aspects of the data obtained.
Keywords: human computation games, query alterations, relevance, web search
Personalized music emotion recognition BIBAKFull-Text 748-749
  Yi-Hsuan Yang; Yu-Ching Lin; Homer Chen
In recent years, there has been a dramatic proliferation of research on information retrieval based on highly subjective concepts such as emotion, preference and aesthetic. Such retrieval methods are fascinating but challenging since it is difficult to built a general retrieval model that performs equally well to everyone. In this paper, we propose two novel methods, bag-of-users model and residual modeling, to accommodate the individual differences for emotion-based music retrieval. The proposed methods are intuitive and generally applicable to other information retrieval tasks that involve subjective perception. Evaluation result shows the effectiveness of the proposed methods.
Keywords: emotion, perceptual residual, personalization
Predicting stopping behaviour: a preliminary analysis BIBAKFull-Text 750-751
  Elaine G. Toms; Luanne Freund
The analysis of search transaction logs often characterizes a search session but rarely looks at the end point. When do users stop, and what cues are present suggesting that stopping is eminent? In this preliminary analysis of the logs of 288 search sessions conducted in a laboratory setting, we identified the activity performed by participants as well as search transitions that were invoked over the course of a search session. The 4331 search transitions (15 per task on average) contained a total of 9295 actions. We isolated the final transition in each search session for detailed analysis. As hypothesized some behaviours are predictable, and suggestive of stopping behavior, with the potential for modeling.
Keywords: human experiment, interactive ir, stopping behaviour
Protein identification as an information retrieval problem BIBAKFull-Text 752-753
  Yiming Yang; Subramaniam Ganapathy; Abhay Harpale
We present the first interdisciplinary work on transforming a popular problem in proteomics, i.e. protein identification from tandem mass spectra, to an Information Retrieval (IR) problem. We present an empirical comparison of popular IR approaches, such as those available from Indri and Lemur toolkits on benchmark datasets, to representative popular baselines in the proteomics literature. Our experiments demonstrate statistically significant evidence that popular IR approaches outperform representative baseline approaches in proteomics.
Keywords: interdisciplinary investigation, protein identification, tandem mass spectra, vector space models
Query sampling for ranking learning in web search BIBAKFull-Text 754-755
  Linjun Yang; Li Wang; Bo Geng; Xian-Sheng Hua
Learning to rank has become a popular approach to build a ranking model for Web search recently. Based on our observation, the constitution of the training set will greatly influence the performance of the learned ranking model. Meanwhile, the number of queries in Web search is nearly infinite and the human labeling cost is expensive, hence a subset of queries need to be carefully selected for training. In this paper, we develop a greedy algorithm to sample the queries, by simultaneously taking the query density, difficulty and diversity into consideration. The experimental results on a collected Web search dataset comprising 2024 queries show that the proposed method can lead to a more informative training set for building an effective model.
Keywords: learning to rank, training set construction
A ranking approach to keyphrase extraction BIBAKFull-Text 756-757
  Xin Jiang; Yunhua Hu; Hang Li
This paper addresses the issue of automatically extracting keyphrases from a document. Previously, this problem was formalized as classification and learning methods for classification were utilized. This paper points out that it is more essential to cast the problem as ranking and employ a learning to rank method to perform the task. Specifically, it employs Ranking SVM, a state-of-art method of learning to rank, in keyphrase extraction. Experimental results on three datasets show that Ranking SVM significantly outperforms the baseline methods of SVM and Naive Bayes, indicating that it is better to exploit learning to rank techniques in keyphrase extraction.
Keywords: keyphrase extraction, learning to rank, ranking SVM
Reciprocal rank fusion outperforms condorcet and individual rank learning methods BIBAKFull-Text 758-759
  Gordon V. Cormack; Charles L. A. Clarke; Stefan Buettcher
Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method.
Keywords: aggregation, fusion, ranking
Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis BIBAKFull-Text 760-761
  Omar Alonso; Stefano Mizzaro
We discuss the concept of relevance criteria in the context of e-Commerce search. A vast body of research literature describes the beyond-topical criteria used to determine the relevance of the document to the need. We argue that in an e-Commerce scenario there are some differences, and novel and different criteria can be used to determine relevance. We experimentally validate this hypothesis by means of Amazon Mechanical Turk using a crowdsourcing approach.
Keywords: IR evaluation, relevance, relevance criteria, user study
A relevance model based filter for improving ad quality BIBAKFull-Text 762-763
  Hema Raghavan; Dustin Hillard
Recently there has been a surge in research that predicts retrieval relevance using historical click-through data. While a larger number of clicks between a query and a document provides a stronger "confidence" of relevance, most models in the literature that learn from clicks are error-prone as they do not take into account any confidence estimates. Sponsored Search models are especially prone to this error as they are typically trained on search engine logs in order to predict click-through-rate (CTR). The estimated CTR ultimately determines the rank at which an ad is shown and also impacts the price (cost-per-click) for the advertiser. In this paper, we improve a model that applies collaborative filtering on click data by training a filter that has been trained to predict pure relevance. Applying the filter to ads that have seen few clicks on live traffic results in improved CTR and click-yield (CY). Additionally, in offline experiments we find that using features based on the organic results improves the relevance based filter's performance.
Keywords: advertising, relevance modeling
A relevance-based topic model for news event tracking BIBAKFull-Text 764-765
  Viet Ha-Thuc; Yelena Mejova; Christopher Harris; Padmini Srinivasan
Event tracking is the task of discovering temporal patterns of popular events from text streams. Existing approaches for event tracking have two limitations: scalability and inability to rule out non-relevant portions in text streams. In this study, we propose a novel approach to tackle these limitations. To demonstrate the approach, we track news events across a collection of weblogs spanning a two-month time period.
Keywords: LDA, event tracking, relevance models, topic models
Revisiting logical imaging for information retrieval BIBAKFull-Text 766-767
  Guido Zuccon; Leif Azzopardi; Cornelis J. van Rijsbergen
Retrieval with Logical Imaging is derived from belief revision and provides a novel mechanism for estimating the relevance of a document through logical implication (i.e. P(q->d). In this poster, we perform the first comprehensive evaluation of Logical Imaging (LI) in Information Retrieval (IR) across several TREC test Collections. When compared against standard baseline models, we show that LI fails to improve performance. This failure can be attributed to a nuance within the model that means non-relevant documents are promoted in the ranking, while relevant documents are demoted. This is an important contribution because it not only contextualizes the effectiveness of LI, but crucially explains why it fails. By addressing this nuance, future LI models could be significantly improved.
Keywords: logical imaging, probability kinematics
A robust retrieval system of polyphonic music based on chord progression similarity BIBAKFull-Text 768-769
  Pierre Hanna; Thomas Rocher; Matthias Robine
Retrieval systems for polyphonic music rely on the automatic estimation of similarity between two musical pieces. In the case of symbolic music, existing systems either consider a monophonic reduction based on melody or propose algorithms with high complexity. In this paper, we propose a new approach. Musical pieces are represented as a sequence of chords which are estimated from groups of notes sounding at the same time. A root and a mode are associated to each chord. Local alignment is then applied for estimating a similarity score between these sequences. Experiments performed on MIDI files collected on the Internet show that the system proposed allows the retrieval of different versions of the same song.
Keywords: music information retrieval
Score and rank convergence of HITS BIBAKFull-Text 770-771
  Enoch Peserico; Luca Pretto
How many iterations does the (ever more) popular HITS algorithm require to converge in score and, perhaps more importantly, in rank (i.e. to get the nodes of a graph "in the right order")? After pinning down the elusive notion of convergence in rank we provide the first non-trivial bounds on the convergence of HITS. A "worst case" example, requiring a number of iterations superexponential in the size of the target graph to achieve even "mild" convergence, suggests the need for greater caution in the experimental evaluation of the algorithm -- as recent results of poor performance (e.g. vs. SALSA) might be due to insufficient iterations, rather than to an intrinsic deficiency of HITS. An almost matching upper bound shows that, as long as one employs exponential acceleration e.g. through a "squaring trick", a polynomial running time (practical in many application domains) always provides strong convergence guarantees.
Keywords: HITS, convergence, link analysis, rank
A search engine in a few lines.: yes, we can! BIBAKFull-Text 772-773
  Stefan Savev
Many research implementations of search engines are written in C, C++, or Java. They are difficult to understand and modify because they are at least a few thousand lines of code and contain many low-level details. In this paper, we show how to achieve a much shorter and higher level implementation: one in about a few hundred lines. We accomplish this result through the use of a high-level functional programming language, F#, and some of its features such as sequences, pipes and structured input and output. By using a search engine implementation as a case study, we argue that functional programming fits the domain of Information Retrieval problems much better than imperative/OO languages like C++ and Java.
   Functional programming languages are ideal for rapid algorithm prototyping and data exploration in the field of Information Retrieval (IR).
   Additionally, our implementation can be used as case study in an IR course since it is a very high level, but nevertheless executable specification of a search engine.
Keywords: search engine implementation techniques
Search engine predilection towards news media providers BIBAKFull-Text 774-775
  Leif Azzopardi; Ciaran Owens
In this poster paper, we present a preliminary study on the predilection of web search engines towards various online news media provider sites using an access based measure.
Keywords: accessibility, retrievability, search engine bias
Searching documentation using text, OCR, and image BIBAKFull-Text 776-777
  Tom Yeh; Boris Katz
We describe a mixed-modality method to index and search software documentation in three ways: plain text, OCR text of embedded figures, and visual features of these figures. Using a corpus of 102 computer books with a total of 62,943 pages and 75,800 figures, we empirically demonstrate that our method achieves better precision/recall than do alternatives based on single modalities.
Keywords: computer vision, content-based image retrieval, multimodal search
Selecting hierarchical clustering cut points for web person-name disambiguation BIBAKFull-Text 778-779
  Jun Gong; Douglas W. Oard
Hierarchical clustering is often used to cluster person-names referring to the same entities. Since the correct number of clusters for a given person-name is not known a priori, some way of deciding where to cut the resulting dendrogram to balance risks of over- or under-clustering is needed. This paper reports on experiments in which outcome-specific and result-set measures are used to learn a global similarity threshold. Results on the Web People Search (WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be achieved on held-out data.
Keywords: clustering, person-name disambiguation
Serendipitous search via wikipedia: a query log analysis BIBAKFull-Text 780-781
  Tetsuya Sakai; Kenichi Nogami
We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.
Keywords: clickthrough, search engines
Spoken information retrieval for Turkish broadcast news BIBAKFull-Text 782-783
  Siddika Parlak; Murat Saraclar
Speech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we address spoken information retrieval in Turkish, a morphologically rich language where OOV rates are high. We apply several techniques, such as using subword units and indexing alternative hypotheses, to cope with the OOV problem and ASR inaccuracy.
   Experiments are performed on our Turkish Broadcast News (BN) Corpus which also incorporates a spoken IR collection. Results indicate that word segmentation is quite useful but the efficiency of indexing alternative hypotheses depends on retrieval type.
Keywords: speech retrieval, spoken document retrieval, spoken term detection, subword indexing
A study of inter-annotator agreement for opinion retrieval BIBAKFull-Text 784-785
  Adam Bermingham; Alan F. Smeaton
Evaluation of sentiment analysis, like large-scale IR evaluation, relies on the accuracy of human assessors to create judgments. Subjectivity in judgments is a problem for relevance assessment and even more so in the case of sentiment annotations. In this study we examine the degree to which assessors agree upon sentence-level sentiment annotation. We show that inter-assessor agreement is not contingent on document length or frequency of sentiment but correlates positively with automated opinion retrieval performance. We also examine the individual annotation categories to determine which categories pose most difficulty for annotators.
Keywords: annotation, inter-annotator agreement, opinion retrieval, sentiment analysis
SugarCube: quantification of topic propagation in the blogosphere using percolation theory BIBAKFull-Text 786-787
  Ali Azimi Bolourian; Yashar Moshfeghi; C. J. van Rijsbergen
Blogs facilitate online debates and discussions for millions of people around the world. Identifying the most popular and prevailing topics discussed in the Blogosphere is a crucial task. This poster describes our novel approach to the quantification of the level of topic propagation in the Blogosphere. Our model uses graph-theoretic representations of the Blogosphere's link structures that allows it to deduce the 'Percolation Threshold', which is then used in the quantification and definition of a global topic. The result of our experiments on a blog collection shows that our model is able to quantify the propagation of topics. Moreover, our model is successful in identifying specific topics that propagate throughout the Blogosphere and classifies them as 'Global'.
Keywords: blogosphere, citation analysis, percolation theory, social networks, topic propagation
System scoring using partial prior information BIBAKFull-Text 788-789
  Sri Devi Ravana; Laurence A. Park; Alistair Moffat
We introduce smoothing of retrieval effectiveness scores, which balances results from prior incomplete query sets against limited additional complete information, in order to obtain more refined system orderings than would be possible on the new queries alone.
Keywords: partial information, score aggregation, smoothing, system ordering
Tag-based object similarity computation using term space dimension reduction BIBAKFull-Text 790-791
  Yong Ki Lee; Sung Jun Lee; Jonghun Park
In this paper, we propose a novel approach for measuring similarity between web objects. Our similarity measure is defined based on the representation of a web object as a collection of tags. Precisely, we first construct a vector space in which multiple terms are mapped into a single dimension by using information available from Open Directory Project and Delicious.com. Then we position web objects in the vector space and apply the traditional cosine measure for similarity computation. We demonstrate that the proposed similarity computation method is able to overcome the limitation of traditional vector space approach while at the same time require less computational cost compares to LSI (Latent Semantic Indexing).
Keywords: open directory project, similarity, vector space model, web search
Tagging products using image classification BIBAKFull-Text 792-793
  Brian Tomasik; Phyo Thiha; Douglas Turnbull
Associating labels with online products can be a labor-intensive task. We study the extent to which a standard "bag of visual words" image classifier can be used to tag products with useful information, such as whether a sneaker has laces or velcro straps. Using Scale Invariant Feature Transform (SIFT) image descriptors at random keypoints, a hierarchical visual vocabulary, and a variant of nearest-neighbor classification, we achieve accuracies between 66% and 98% on 2- and 3-class classification tasks using several dozen training examples. We also increase accuracy by combining information from multiple views of the same product.
Keywords: bag of visual words, content-based tagging, image classification
Template-independent wrapper for web forums BIBAKFull-Text 794-795
  Qi Zhang; Yang Shi; Xuanjing Huang; Lide Wu
This paper presents a novel work on the task of extracting data from Web forums. Millions of users contribute rich information to Web forum everyday, which has become an important resource for manyWeb applications, such as product opinion retrieval, social network analysis, and so on. The novelty of the proposed algorithm is that it can not only extract the pure text but also distinguish between the original post and replies. Experimental results on a large number of real Web forums indicate that the proposed algorithm can correctly extract data from websites with various styles in most cases.
Keywords: conditional random fields, crawler
Temporal collaborative filtering with adaptive neighbourhoods BIBAKFull-Text 796-797
  Neal Lathia; Stephen Hailes; Licia Capra
Collaborative Filtering aims to predict user tastes, by minimising the mean error produced when predicting hidden user ratings. The aim of a deployed recommender system is to iteratively predict users' preferences over a dynamic, growing dataset, and system administrators are confronted with the problem of having to continuously tune the parameters calibrating their CF algorithm. In this work, we formalise CF as a time-dependent, iterative prediction problem. We then perform a temporal analysis of the Netflix dataset, and evaluate the temporal performance of two CF algorithms. We show that, due to the dynamic nature of the data, certain prediction methods that improve prediction accuracy on the Netflix probe set do not show similar improvements over a set of iterative train-test experiments with growing data. We then address the problem of parameter selection and update, and propose a method to automatically assign and update per-user neighbourhood sizes that (on the temporal scale) outperforms setting global parameters.
Keywords: temporal collaborative filtering
Temporal query substitution for ad search BIBAKFull-Text 798-799
  Wen Zhang; Jun Yan; Shuicheng Yan; Ning Liu; Zheng Chen
Recently, information retrieval researchers have witnessed the increasing interest in query substitution for ad search. Most previous works substitute search queries via content based query similarities, and few of them take the temporal characteristics of queries into consideration. In this extended abstract, we propose a novel temporal similarity measurement for query substitution in ad search task. We firstly extract temporal features, such as burst and periodicity, from query frequency curves and then define the temporal query similarity by integrating these new features with the temporal query frequency distribution. Compared to the traditional temporal similarity measurements such as correlation coefficient, our proposed approach is more effective owing to the explicit extraction of high-level semantic query temporal features for similarity measure. The experimental results demonstrate that the proposed similarity measure can make the ads more relevant to user search queries compared to ad search without temporal features.
Keywords: ad search, query substitution, temporal similarity.
Term-based commercial intent analysis BIBAKFull-Text 800-801
  Azin Ashkan; Charles L. A. Clarke
In this work, we investigate the contribution of query terms and their corresponding ad click rates on commercial intent of queries. A probabilistic model is proposed following the hypothesis that a query is likely to receive ad clicks based on contributions from its individual terms.
Keywords: ad targeting, clickthrough, query intent, query log
Topic (query) selection for IR evaluation BIBAKFull-Text 802-803
  Jianhan Zhu; Jun Wang; Vishwa Vinay; Ingemar J. Cox
The need for evaluating large amounts of topics (queries) makes IR evaluation an uneasy task. In this paper, we study a topic selection problem for IR evaluation. The selection criterion is based on the overall difficulty of the chosen set, as well as the uncertainty of the final IR metric applied to the systems. Our preliminary experiments demonstrate that our approach helps to identify a set of topics that provides confident estimates of systems' performance while keeping the requirement of the query difficulty.
Keywords: mean-variance analysis, portfolio theory, topic selection
Topic prerogative feature selection using multiple query examples for automatic video retrieval BIBAKFull-Text 804-805
  P. Punitha; Joemon M. Jose; Anuj Goyal
Well acceptance of relevance feedback and collaborative systems has given the users to express their preferences in terms of multiple query examples. The technology devised to utilize these user preferences, is expected to mine the semantic knowledge embedded within these query examples. In this paper, we propose a video mining framework based on dynamic learning from queries, using a statistical model for topic prerogative feature selection. The proposed method is specifically designed for multiple query example scenarios. The effectiveness of the proposed framework has been established with an extensive experimentation on TRECVid2007 data collection. The results reveal that our approach achieves a performance that is in par with the best results for this corpus without the requirement of any textual data.
Keywords: feature selection, multimedia retrieval, multiple query examples, performance evaluation, result fusion
Topic set size redux BIBAKFull-Text 806-807
  Ellen M. Voorhees
The cost as well as the power and reliability of a retrieval test collection are all proportional to the number of topics included in it. Test collections created through community evaluations such as TREC generally use 50 topics. Prior work estimated the reliability of 50-topic sets by extrapolating confidence levels from those of smaller sets, and concluded that 50 topics are sufficient to have high confidence in a comparison, especially when the comparison is statistically significant. Using topic sets that actually contain 50 topics, this paper shows that statistically significant differences can be wrong, even when statistical significance is accompanied by moderately large (>10%) relative differences in scores. Further, using standardized evaluation scores rather than raw evaluation scores does not increase the reliability of these paired comparisons. Researchers should continue to be skeptical of conclusions demonstrated on only a single test collection.
Keywords: evaluation, test collections, variability
Transforming patents into prior-art queries BIBAKFull-Text 808-809
  Xiaoibng Xue; W. Bruce Croft
Searching for prior-art patents is an essential step for the patent examiner to validate or invalidate a patent application. In this paper, we consider the whole patent as the query, which reduces the burden on the user, and also makes many more potential search features available. We explore how to automatically transform the query patent into an effective search query, especially focusing on the effect of different patent fields. Experiments show that the background summary of a patent is the most useful source of terms for generating a query, even though most previous work used the patent claims.
Keywords: information retrieval, patent retrieval, prior-art search
Two-stage query segmentation for information retrieval BIBAKFull-Text 810-811
  Michael Bendersky; W. Bruce Croft; David A. Smith
Modeling term dependence has been shown to have a significant positive impact on retrieval. Current models, however, use sequential term dependencies, leading to an increased query latency, especially for long queries. In this paper, we examine two query segmentation models that reduce the number of dependencies. We find that two-stage segmentation based on both query syntactic structure and external information sources such as query logs, attains retrieval performance comparable to the sequential dependence model, while achieving a 50% reduction in query latency.
Keywords: long queries, query segmentation, term dependence
Undergraduates' evaluations of assigned search topics BIBAKFull-Text 812-813
  Earl W. Bailey; Diane Kelly; Karl Gyllstrom
This paper evaluates undergraduate students' knowledge, interests and experiences with 20 topics from the TREC Robust Track collection. The goal is to characterize these topics along several dimensions to help researchers make more informed decisions about which topics are most appropriate to use in experimental IIR evaluations with undergraduate student subjects.
Keywords: assigned tasks, motivation, task difficulty, topic selection
A unified inverted index for an efficient image and text retrieval BIBAKFull-Text 814-815
  Jonathan Mamou; Yosi Mass; Michal Shmueli-Scheuer; Benjamin Sznajder
We present an efficient method for approximate search in a combination of several metric spaces -- which are a generalization of low level image features -- using an inverted index. Our approximation gives very high recall with subsecond response time on a real data set of one million images extracted from Flickr.
   We further exploit the inverted index to improve efficiency of the query processing by combining our search in metric features with search in associated textual metadata.
Keywords: IR, approximation, inverted index, metric spaces, multimedia
Usefulness of click-through data in expert search BIBAKFull-Text 816-817
  Craig Macdonald; Ryen W. White
The task in expert finding is to identify members of an organisation with relevant expertise on a given topic. Typically, an expert search engine uses evidence from the authors of on-topic documents found in the organisation's intranet by search engines. The search result click-through behaviour of many intranet search engine users provides an additional source of evidence to identify topically-relevant documents, and via document authorship, experts. In this poster, we assess the usefulness of click-through log data for expert finding. We find that ranking authors based solely on the clicks their documents receive is reasonably effective at correctly identifying relevant experts. Moreover, we show that this evidence can successfully be integrated with an existing expert search engine to increase its retrieval effectiveness.
Keywords: click-through log data, expert finding, intranet
User-centric multi-criteria information retrieval BIBAKFull-Text 818-819
  Shawn R. Wolfe; Yi Zhang
Information retrieval models usually represent content only, and not other considerations, such as authority, cost, and recency. How could multiple criteria be utilized in information retrieval, and how would it effect the results? In our experiments, using multiple user-centric criteria always produced better results than a single criteria.
Keywords: multi-criteria decision making
Users' stopping behaviors and estimates of recall BIBAKFull-Text 820-821
  Maureen Dostert; Diane Kelly
This paper investigates subjects' stopping behaviors and estimates of recall in an interactive information retrieval (IIR) experiment. Subjects completed four recall-oriented search tasks and were asked to estimate how many of the relevant documents they believed they had found after each task. Subjects also responded to an interview question probing their reasons for stopping a search. Results show that most subjects believed they found about 51-60% of the relevant documents and that this estimate was correlated positively with number of documents saved and actual recall, even though subjects' recall estimates were inaccurate. Reasons given for stopping search are also explored.
Keywords: decision making, recall estimates, stopping behavior, user study
Using dynamic markov compression to detect vandalism in the wikipedia BIBAKFull-Text 822-823
  Kelly Y. Itakura; Charles L. A. Clarke
We apply the Dynamic Markov Compression model to detect spam edits in the Wikipedia. The method appears to outperform previous efforts based on compression models, providing performance comparable to methods based on manually constructed rules.
Keywords: information retrieval, spam filtering
Using wikipedia categories for ad hoc search BIBAKFull-Text 824-825
  Rianne Kaptein; Marijn Koolen; Jaap Kamps
In this paper we explore the use of category information for ad hoc retrieval in Wikipedia. We show that techniques for entity ranking exploiting this category information can also be applied to ad hoc topics and lead to significant improvements. Automatically assigned target categories are good surrogates for manually assigned categories, which perform only slightly better.
Keywords: ad hoc retrieval, category information, wikipedia
Visualizing the problems with the INEX topics BIBAKFull-Text 826
  Andrew Trotman; Maria del Rocio Gomez Crisostomo; Mounia Lalmas
Topics form a crucial component of a test collection. We show, through visualization, that the INEX 2008 topics have shortcomings, which questions their validity for evaluating XML retrieval effectiveness.
Keywords: INEX, IR methodology, XML-IR, element retrieval
What queries are likely to recur in web search? BIBAKFull-Text 827-828
  Dell Zhang; Jinsong Lu
We study the recurrence dynamics of queries in Web search by analysing a large real-world query log dataset. We find that query frequency is more useful in predicting collective query recurrence whereas query recency is more useful in predicting individual query recurrence. Our findings provide valuable insights for understanding and improving Web search.
Keywords: personalisation, query log analysis, web mining, web search
When is query performance prediction effective? BIBAKFull-Text 829-830
  Claudia Hauff; Leif Azzopardi
The utility of Query Performance Prediction (QPP) methods is commonly evaluated by reporting correlation coefficients to denote how well the methods perform at predicting the retrieval performance of a set of queries. However, a quintessential question remains unexplored: how strong does the correlation need to be in order to realize an increase in retrieval performance? In this work, we address this question in the context of Selective Query Expansion (SQE) and perform a large-scale experiment. The results show that to consistently and predictably improve retrieval effectiveness in the ideal SQE setting, a Kendall's Tau correlation of tau>=0.5 is required, a threshold which most existing query performance prediction methods fail to reach.
Keywords: evaluation, query performance prediction
Who said what to whom?: capturing the structure of debates BIBAKFull-Text 831-832
  Rianne Kaptein; Maarten Marx; Jaap Kamps
Transcripts of meetings are a document genre characterized by a complex narrative structure. The essence is not only what is said, but also by who and to whom. This paper investigates whether we can use semantic annotations like the speaker in order to capture this debate structure, as well as the related content of the debate. The structure is visualized in a graph, while the content is condensed into word clouds, that are created using a parsimonious language model. Evaluation shows that both tools adequately capture the structure and content of the debate at an aggregated level.
Keywords: political data, visualization, word clouds


EvaluatIR: an online tool for evaluating and comparing IR systems BIBFull-Text 833
  Timothy G. Armstrong; Alistair Moffat; William Webber; Justin Zobel
Expertise search in academia using facets BIBKFull-Text 834
  Duncan McDougall; Craig Macdonald
Keywords: expert search, faceted search, terrier
Exploiting social context for expertise propagation BIBKFull-Text 835
  Greg P. Milette; Michael K. Schneider; Kathy Ryall; Robert Hyland
Keywords: expertise finding, recommendation, social network analysis, topic models
Social networks and discovery in the enterprise (SaND) BIBKFull-Text 836
  Inbal Ronen; Elad Shahar; Sigalit Ur; Erel Uziel; Sivan Yogev; Naama Zwerdling; David Carmel; Ido Guy; Nadav Har'el; Shila Ofek-Koifman
Keywords: enterprise search, social network, social search
Sifting micro-blogging stream for events of user interest BIBAKFull-Text 837
  Maxim Grinev; Maria Grineva; Alexander Boldakov; Leonid Novak; Andrey Syssoev; Dmitry Lizorkin
Micro-blogging is a new form of social communication that encourages users to share information about anything they are seeing or doing, the motivation facilitated by the ability to post brief text messages through a variety of devices. Twitter, the most popular micro-blogging tool, is exhibiting rapid growth [3]: up to 11% of online Americans are using Twitter by December 2008, compared to 6% in May 2008. Due to its nature, micro-blogosphere has unique features: (i) It is a source of extremely up-to-date information about what is happening in the world; (ii) It captures the wisdom of millions of people and covers a broad range of domains. These features make micro-blogosphere more than a popular medium of social communication: we believe that it has additionally become a valuable source of extremely up-to-date news on virtually any subject of user interest. Making use of micro-blogosphere in this new role we meet the following challenges: (A) Since any given subject is generally mentioned in the micro-blogging stream on the continuous basis, a method is needed for locating periods of news on this subject. (B) Additionally, even for such periods, stream filtering is required for removing noise and for extracting messages that best describe the news. To address these challenges we make and exploit the following observations: (A) For an arbitrary subject, events that catch user interest gain distinguishably more attention than the average mentioning of the subject resulting in message activity bursts for it. (B) Most of the messages in an activity burst describe common event in close variations -- either rephrased or "retweeted" between the users. We demonstrate TweetSieve -- a system that allows obtaining news on any given subject by sifting the Twitter stream. Our work is related to frequency-based analysis applied to blogs [1], but higher latency and lower coverage in blogs makes the analysis less effective than in case of micro-blogs. In TweetSieve demo, the user is able to express the subject of her interest by an arbitrary search string. The system shows the period of events occuring for the subject and outputs tweets that best describe each of the events. Figure 1 shows a screenshot of the system for "Semantic search" as a sample subject. The underlying process consists of two steps: Identifying activity bursts. Counting the messages matching the search string in the stream over time, the frequency curve is constructed. Activity bursts in the curve are identified by taking the periods of frequency exceeding the standard deviation from the average. Selecting messages that best describe news events. For the set of all messages matching the search string in an activity burst, we apply the message-granular variation of our keyphrase extraction algorithm [2] that is specifically suited to efficiently filtering noisy data. The algorithm clusters messages with respect to their similarity to each other and chooses central messages from the most dense clusters. As the similarity measure we use Jaccard coefficient for the "bag of words" representation of messages. The demonstration illustrates the potential of our approach in bringing news acquisition to a new level of promptness and coverage range.
Keywords: micro-blogging, news, twitter
Incentives for social annotation BIBAKFull-Text 838
  Heather Roinestad; John Burgoon; Benjamin Markines; Filippo Menczer
The effectiveness of community-driven annotation, such as social bookmarking, depends on user participation. Since the participation of many users is motivated by selfish reasons, an effective way to encourage participation is to create useful or entertaining applications. We demo two such tools -- a browser extension and a game.
Keywords: interactive IR and visualization, social tagging, web 2.0 IR
Accommodating colorblind users in image search BIBAKFull-Text 839
  Meng Wang; Bo Liu; Linjun Yang; Xian-Sheng Hua
There are about 8% of men and 0.8% of women suffering from colorblindness. Due to certain loss of color information, the existing image search techniques may not provide satisfactory results for these users. In this demonstration, we show an image search system that can accommodate colorblind users. It can help these special users find and enjoy what they want by providing multiple services for them, including search results reranking, image recoloring and color indication.
Keywords: colorblindness, image search
Generic similarity search engine demonstrated by an image retrieval application BIBAKFull-Text 840
  David Novak; Michal Batko; Pavel Zezula
We introduce a generic engine for large-scale similarity search and demonstrate it on a set of 100 million Flickr images.
Keywords: content-based search, metric space, scalability, similarity search
Pharos: an audiovisual search platform BIBKFull-Text 841
  Alessandro Bozzon; Marco Brambilla; Piero Fraternali; Francesco Nucci; Stefan Debald; Eric Moore; Wolfgang Neidl; Michel Plu; Patrick Aichroth; Olli Pihlajamaa; Cyril Laurier; Serge Zagorac; Gerhard Backfried; Daniel Weinland; Vincenzo Croce
Keywords: audiovideo, multimedia, search, search based application
Agate: information gathering for risk monitoring BIBAKFull-Text 842
  Arnaud Saval; Yann Mombrun
Internet sources provide new ways to acquire information about risk and to follow up the evolution of natural disasters in real time. We will present first, an architecture dedicated to unstructured information processing. Then, we will show how the spatial and temporal representation extended with semantic properties answers information ambiguity problem. Agate platform was evaluated by a non-governmental organization which filtered alerts according to types of disaster and their locations.
Keywords: crawling, geographic, risks, search
wikiSearch: enabling interactivity in search BIBAKFull-Text 843
  Elaine G. Toms; Tayze Mackenzie; Chris Jordan; Sam Hall
wikiSearch, is a search engine customized for the Wikipedia corpus but with design features that may be generalized to other search systems. Its features enhance basic functionality and enable more fluid interactivity while supporting both workflow in the search process and the experimental process used in lab testing.
Keywords: interactive IR, search engine, search workflow, wikipedia
CDPRanking: discovering and ranking cross-document paths between entities BIBKFull-Text 844
  Wei Jin; Xin Wu
Keywords: graph mining, information extraction, text mining

Doctoral consortium

Context-based health information retrieval BIBKFull-Text 845
  Carla Lopes
Keywords: contextual information retrieval, health information retrieval, user context
Modeling uncertainty in video retrieval: a retrieval model for uncertain semantic representations of videos BIBKFull-Text 846
  Robin Aly
Keywords: concept based search, multimedia retrieval, probabilistic information retrieval, video retrieval
Toponym ambiguity in geographical information retrieval BIBAKFull-Text 847
  Davide Buscaldi
The objectives of this research work is to study the effects of toponym (place name) ambiguity in the Geographical Information Retrieval (GIR) task. Our experience with GIR systems shows that toponym ambiguity may be an important factor in the inability of these systems to take advantage from geographical knowledge. Previous studies over ambiguity and Information Retrieval (IR) suggested that disambiguation may be useful in some specific IR scenario. We suppose that GIR may constitute such a scenario. This preliminary study was carried out over the WordNet based, manually disambiguated collection developed for the CLIR-WSD task, using the GeoCLEF collection of 100 geographically related topics. The employed GIR system was based on the GeoWorSE system that participated in GeoCLEF 2008. The experiments were carried out considering the manual disambiguation and comparing this result with those obtained by randomly disambiguating the document collection and those obtained by using always the most common referent. The obtained results show no significant difference in the overall results, although the work gave an insight into some errors that are produced by toponym ambiguity and how they may affect the results. These preliminary results also suggest that WordNet is not a suitable resource for the planned research.
Keywords: geographical information retrieval, information retrieval, toponym disambiguation, word sense disambiguation, wordnet
Exploiting temporal information in retrieval of archived documents BIBAKFull-Text 848
  Nattiya Kanhabua
In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information?
Keywords: language models, query expansion, ranking, temporal search
Using document structure for automatic summarization BIBKFull-Text 849
  Aurélien Bossard
Keywords: automatic summarization
Topic structure for information retrieval BIBAKFull-Text 850
  Jiyin He
In my research, I propose a coherence measure, with the goal of discovering and using topic structures within and between documents, of which I explore its extensions and applications in information retrieval.
Keywords: semantic relatedness, topic structure
Using computational community interest as an indicator for ranking BIBAKFull-Text 851
  Xiaozhong Liu
Ranking documents in response to users' information needs is a challenging task, due, in part, to the dynamic nature of users' interests with respect to a query. I hypothesize that the interests of a given user are similar to the interests of the broader community of which he is a part and propose an innovative method that uses social media to characterize the interests of the community and use this characterization to improve future rankings. By generating a community interest vector (CIV) for a given query, we use community interest to alter the ranking score of individual documents retrieved by the query. The CIV is based on a continuously updated set of recent (daily or past few hours) user-oriented text data. The user-oriented data can be user blogs or user comment tagged news. Preliminary evaluation shows that the new ranking method significantly improves ranking performance.
Keywords: blog, comments, community interest, ranking, user
Affective adaptive retrieval: study of emotion in adaptive retrieval BIBKFull-Text 852
  Yashar Moshfeghi
Keywords: adaptive retrieval, affect, emotion, filtering, personalisation, recommendation
A study on performance volatility in information retrieval BIBAKFull-Text 853
  Mehdi Hosseini
A common practice in comparative evaluation of information retrieval (IR) systems is to create a test collection comprising a set of topics (queries), a document corpus, and relevance judgments, and to monitor the performance of retrieval systems over such a collection. A typical evaluation of a system involves computing a performance metric, e.g., Average Precision (AP), for each topic and then using the average performance metric, e.g., Mean Average Precision (MAP) to express the overall system performance. However, averages do not capture all the important aspects of system performance, and used alone may not thoroughly express system effectiveness, i.e., average of performance can mask large variance in individual topic effectiveness. The author hypothesis is that, in addition to the average of overall performance, attention needs to be paid to how a system performance varies across topics. This variability can be measured by calculating the standard deviation (SD) of individual performance scores. We refer to this performance variation as Volatility.
Keywords: evaluation, prediction, volatility
Novelty detection across different source types and languages BIBKFull-Text 854
  Johannes Schanda
Keywords: new event detection, novelty detection
Personalizing information retrieval using task features, topic knowledge, and task product BIBAKFull-Text 855
  Jingjing Liu
Personalization of information retrieval tailors search towards individual users to meet their particular information needs. Personalization systems obtain additional information about users and their contexts beyond the queries they submit to the systems, and use this information to bring the desired documents to top ranks. The additional information can come from various sources: user preferences, user behaviors, contexts, etc. [1] To avoid users taking extra effort in providing explicit preferences, most personalization approaches have adopted an implicit strategy to obtain users' interests from their behaviors and/or contexts, such as query history, browsing history, and so on.
   Task, topic knowledge, and desktop information have been used as evidence for personalization. Tailoring display time threshold based on task information was found to improve implicit relevance feedback performance [5]. User's familiarity with search topics was found to be positively correlated with reading time but negatively correlated with search efficacy [3]. This indicated the possibility of inferring topic familiarity from searching behavior. Desktop information was also found to be a good source for personalization [2, 4], and personalization using only those files relevant to user queries are more effective than using the entire desktop data [2]. Since search often happens in a work task environment, we examine how user-generated products and retained documents can help improve search performance.
   To these ends, this study looks at how the following factors can help personalize search: features of user's work tasks (including task stage and task type), user's familiarity with work task topic, user's saving and using behaviors, and task product(s) that the user generated for the work task. Work tasks are designed to include multiple sub-tasks, each being a stage. Two types of sub-task interdependence are considered: parallel, where the sub-tasks do not depend upon each other, or dependent, where one sub-task depends upon the accomplishment of other sub-task(s). The study examines the interaction effects of these factors, dwell time, and document usefulness. It also looks at a personalization technique that extracts terms for query expansion from work task product(s) and user behaviors. There are three research questions: RQ1: Does the stage of the user's task help predict document usefulness from dwell time in the parallel and the dependent tasks, respectively? RQ2. Does the user's familiarity with work task topic help predict document usefulness from dwell time in the parallel and the dependent tasks, respectively? RQ3. Do user's task product(s) and saving and using behaviors help with query disambiguation?
   Twenty-four participants are recruited, each coming three times (as three experiment sessions) to a usability laboratory working on three sub-tasks in a general task, either a parallel or a dependent. Take the parallel task as an example. It asks the participants to write a three-section article on hybrid cars, and each section is finished in one session. The three sections focus on Honda Civic sedan hybrid, Nissan Altima sedan hybrid, and Toyota Camry sedan hybrid, respectively. When searching for information, half of the participants use a query expansion condition, where the system recommends search terms based on their work in previous sessions, and the other half use a non-query expansion system condition. Data are collected by three major means: logging software that records user-system interactions, an eye tracker that records eye movement, and questionnaires that elicit users' background information and their perceptions on a number of aspects. The results will provide new evidence on personalizing search by taking account of the examined contextual factors.
Keywords: contextual factors in IR, personalization of IR, task
Exploiting memory cues in personal lifelog retrieval BIBAKFull-Text 856
  Yi Chen
In recent years personal lifelogs (PLs) have become an emerging field of research. PLs are collections of digital data taken from an individual's life experiences, gathered from both digital and physical worlds. PLs collections can potentially be collected over periods of years and thus can be very large. In our research group, four researchers have been engaged in collection of individual one-year-long PLs data sets. These collections include logs of computer and mobile phone activities, digital photos, in particular passively captured Microsoft SenseCam images, geo-location (via GPS), surrounding people or objects (via Bluetooth), and various biometric data. The complex data types and heterogeneous structure of this corpus brings great challenges to traditional content based information retrieval (IR). Yet, the rich connections integral to personal experience offer exciting potential opportunities to leverage features from human memory and associated models to support retrieval.
   My PhD project aims to develop an interface to assist IR from PLs. In doing this I plan to exploit features in human memory, in particular the mechanisms in associative memory models. Previous studies in personal information re-finding have explored the use of generally well-remembered attributes or metadata of the search targets, such as date, item type/format, authors of documents [1]. There have also been systems which utilize associated computer items or real life events (e.g. [2, 3]) to assist re-finding tasks. However, few of them looked into exactly what types of associated items/events people tend to recall. I plan to explore associations among PL items, as well as their attributes regarding their role in an individual's memory, since I believe that some associations and types of metadata which are available and feasible for use, may have been omitted in existing systems; due to the methods used in previous research where the users' behaviour may have been guided by the searching or management tools available to them. As indicated by some information seeking studies (e.g. [4]), different search context, search motivation, or personal differences such as habits, may lead to varied recall of contents and information seeking behaviours. For this reason, I will also investigate: the influences on personal information re-finding behaviour of context, lifestyle, and differences in prior personal experiences of IR tools. Results from these studies will be used to explore personalisation in search, e.g. to dynamically increase the importance of geo-location in scoring of search results for subjects who travel frequently.
   As indicated by [4], people tend to make small steps to approach the targets they are looking for, rather than trying to do this in a single search with a "perfect query" comprising all of the relevant details, partially because of their trouble in recalling them. To relieve users from the heavy cognitive burden of recalling the exact target, and on the other hand to reduce the rate of inaccurate queries caused by false recall, my proposed interface will be based on browsing and recognizing, instead of traditional recalling and searching. For example, a user will be able to browse and narrow results by recognizing landmarks and estimating the target activities' time range from the user's digital or physical life [5]. An important issue in my work will be to consider the challenges of evaluating my work with only a very limited number of PL datasets. To partially address this issue, I am currently in engaged in a number of smaller scale diary studies of searching experiences for larger numbers of subjects.
Keywords: associative memory, interactive IR, personal lifelogs