HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 03040506070809101112131415

Proceedings of the 2013 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Gareth J.F. Jones; Páraic Sheridan; Diane Kelly; Maarten de Rijke; Tetsuya Sakai
Location:Dublin, Ireland
Dates:2013-Jul-28 to 2013-Aug-01
Standard No:ISBN: 978-1-4503-2034-4; ACM DL: Table of Contents; hcibib: IR13
Links:Conference Website
  1. Keynote address
  2. User behaviour
  3. Social media and network analysis I
  4. Queries I
  5. Users and interactive IR I
  6. Efficiency I
  7. Topic modeling
  8. Users and interactive IR II
  9. Recommender systems
  10. Retrieval models and ranking I
  11. Time
  12. Evaluation I
  13. Multimedia
  14. Search sessions
  15. Click models
  16. Social media and network analysis II
  17. Queries II
  18. Diversity
  19. Evaluation II
  20. Retrieval models and ranking II
  21. Efficiency II
  22. Short Papers 1 -- evaluation
  23. Short papers 1 -- filtering and recommending
  24. Short papers 1 -- multimedia IR
  25. Short papers 1 -- queries and query analysis
  26. Short papers 1 -- retrieval models and ranking
  27. Short papers 1 -- social media IR
  28. Short papers 1 -- topic models
  29. Short papers 1 -- users and interactive IR
  30. Short papers 2 -- evaluation
  31. Short papers 2 -- filtering and recommending
  32. Short papers 2 -- multimedia IR
  33. Short papers 2 -- queries and query analysis
  34. Short papers 2 -- retrieval models and ranking
  35. Short papers 2 -- social media IR
  36. Short papers 2 -- topic models
  37. Short papers 2 -- users and interactive IR
  38. Demonstrations 1 -- Users and interactive IR
  39. Demonstrations 1 -- IR and structured data
  40. Demonstrations 1 -- information extraction
  41. Demonstrations 1 -- filtering and recommending
  42. Demonstrations 1 -- classification and clustering
  43. Demonstrations 2 -- users and interactive IR
  44. Demonstrations 2 -- IR and structured data
  45. Demonstrations 2 -- information extraction
  46. Demonstrations 2 -- filtering and recommending
  47. Demonstrations 2 -- classification and clustering
  48. Tutorials
  49. Workshops
  50. Doctoral consortium

Keynote address

Riding the multimedia big data wave BIBAFull-Text 1-2
  John R. Smith
In this talk we present a perspective across multiple industry problems, including safety and security, medical, Web, social and mobile media, and motivate the need for large-scale analysis and retrieval of multimedia data. We describe a multi-layer architecture that incorporates capabilities for audio-visual feature extraction, machine learning and semantic modeling and provides a powerful framework for learning and classifying contents of multimedia data. We discuss the role semantic ontologies for representing audio-visual concepts and relationships, which are essential for training semantic classifiers. We discuss the importance of using faceted classification schemes in particular for organizing multimedia semantic concepts in order to achieve effective learning and retrieval. We also show how training and scoring of multimedia semantics can be implemented on big data distributed computing platforms to address both massive-scale analysis and low-latency processing. We describe multiple efforts at IBM on image and video analysis and retrieval, including IBM Multimedia Analysis and Retrieval System (IMARS), and show recent results for semantic-based classification and retrieval. We conclude with future directions for improving analysis of multimedia through interactive and curriculum-based techniques for multimedia semantics-based learning and retrieval.

User behaviour

Beliefs and biases in web search BIBAFull-Text 3-12
  Ryen White
People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers seek or are presented with information that significantly deviates from the truth. There is little understanding of the impact of such biases in search. In this paper we study search-related biases via multiple probes: an exploratory retrospective survey, human labeling of the captions and results returned by a Web search engine, and a large-scale log analysis of search behavior on that engine. Targeting yes-no questions in the critical domain of health search, we show that Web searchers exhibit their own biases and are also subject to bias from the search engine. We clearly observe searchers favoring positive information over negative and more than expected given base rates based on consensus answers from physicians. We also show that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth. Importantly, we show that these biases can be counterproductive and affect search outcomes; in our study, around half of the answers that searchers settled on were actually incorrect. Our findings have implications for search engine design, including the development of ranking algorithms that con-sider the desire to satisfy searchers (by validating their beliefs) and providing accurate answers and properly considering base rates. Incorporating likelihood information into search is particularly important for consequential tasks, such as those with a medical focus.
Improving search result summaries by using searcher behavior data BIBAFull-Text 13-22
  Mikhail Ageev; Dmitry Lagun; Eugene Agichtein
Query-biased search result summaries, or "snippets", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published snippet generation algorithms have been primarily based on selecting document fragments most similar to the query, which does not take into account which parts of the document the searchers actually found useful. We present a new approach to improving result summaries by incorporating post-click searcher behavior data, such as mouse cursor movements and scrolling over the result documents. To achieve this aim, we develop a method for collecting behavioral data with precise association between searcher intent, document examination behavior, and the corresponding document fragments. In turn, this allows us to incorporate page examination behavior signals into a novel Behavior-Biased Snippet generation system (BeBS). By mining searcher examination data, BeBS infers document fragments of most interest to users, and combines this evidence with text-based features to select the most promising fragments for inclusion in the result summary. Our extensive experiments and analysis demonstrate that our method improves the quality of result summaries compared to existing state-of-the-art methods. We believe that this work opens a new direction for improving search result presentation, and we make available the code and the search behavior data used in this study to encourage further research in this area.
How query cost affects search behavior BIBAFull-Text 23-32
  Leif Azzopardi; Diane Kelly; Kathy Brennan
affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects laboratory study with 36 undergraduate subjects was conducted, where subjects were randomly assigned to use one of three search interfaces that varied according to the amount of physical cost required to query: Structured (high cost), Standard (medium cost) and Query Suggestion (low cost). Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list. Results also showed that these subjects spent longer generating their initial queries, saved more relevant documents and rated their queries as more successful. These findings have implications for the usefulness of microeconomic theory as a way to model and explain search interaction, as well as for the design of query facilities.
Search engine switching detection based on user personal preferences and behavior patterns BIBAFull-Text 33-42
  Denis Savenkov; Dmitry Lagun; Qiaoling Liu
Sometimes, during a search task users may switch from one search engine to another for several reasons, e.g., dissatisfaction with the current search results or desire for broader topic coverage. Detecting the fact of switching is difficult but important for understanding users' satisfaction with the search engine and the complexity of their search tasks, leading to economic significance for search providers. Previous research on switching detection mainly focused on studying different signals useful for the task and particular reasons for switching. Although it is known that switching is a personal choice of a user and different users have different search behavior, little has been done to understand how these differences could be used for switching detection. In this paper we study the effectiveness of learning personal behavior patterns for switching detection and present a personalized approach which uses user's session history containing sessions with and without switches. Experiments show that users' personal habits and behavior patterns are indeed among the most informative signals. Our findings can be used by a search log analyzer for engine switching detection and potentially other log mining problems, thus providing valuable signals for search providers to improve user experience.

Social media and network analysis I

Emerging topic detection for organizations from microblogs BIBAFull-Text 43-52
  Yan Chen; Hadi Amiri; Zhoujun Li; Tat-Seng Chua
Microblog services have emerged as an essential way to strengthen the communications among individuals and organizations. These services promote timely and active discussions and comments towards products, markets as well as public events, and have attracted a lot of attentions from organizations. In particular, emerging topics are of immediate concerns to organizations since they signal current concerns of, and feedback by their users. Two challenges must be tackled for effective emerging topic detection. One is the problem of real-time relevant data collection and the other is the ability to model the emerging characteristics of detected topics and identify them before they become hot topics. To tackle these challenges, we first design a novel scheme to crawl the relevant messages related to the designated organization by monitoring multi-aspects of microblog content, including users, the evolving keywords and their temporal sequence. We then develop an incremental clustering framework to detect new topics, and employ a range of content and temporal features to help in promptly detecting hot emerging topics. Extensive evaluations on a representative real-world dataset based on Twitter data demonstrate that our scheme is able to characterize emerging topics well and detect them before they become hot topics.
Pseudo test collections for training and tuning microblog rankers BIBAFull-Text 53-62
  Richard Berendsen; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke
Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search.
   We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material.
Learning latent friendship propagation networks with interest awareness for link prediction BIBAFull-Text 63-72
  Jun Zhang; Chaokun Wang; Philip S. Yu; Jianmin Wang
It's well known that the transitivity of friendship is a popular sociological principle in social networks. However, it's still unknown that to what extent people's friend-making behaviors follow this principle and to what extent it can benefit the link prediction task.
   In this paper, we try to adopt this sociological principle to explain the evolution of networks and study the latent friendship propagation. Unlike traditional link prediction approaches, we model link formation as results of individuals' friend-making behaviors combined with personal interests. We propose the Latent Friendship Propagation Network (LFPN), which depicts the evolution progress of one's egocentric network and reveals future growth potentials driven by the transitivity of friendship based on personal interests. We model individuals' social behaviors using the Latent Friendship Propagation Model (LFPM), a probabilistic generative model from which the LFPN can be learned effectively. To evaluate the power of the friendship propagation in link prediction, we design LFPN-RW which models the friend-making behavior as a random walk upon the LFPN naturally and captures the co-influence effect of the friend circles as well as personal interests to provide more accurate prediction.
   Experimental results on real-world datasets show that LFPN-RW outperforms the state-of-the-art approaches. This convinces that the transitivity of friendship actually plays important roles in the evolution of social networks.
An experimental study on implicit social recommendation BIBAFull-Text 73-82
  Hao Ma
Social recommendation problems have drawn a lot of attention recently due to the prevalence of social networking sites. The experiments in previous literature suggest that social information is very effective in improving traditional recommendation algorithms. However, explicit social information is not always available in most of the recommender systems, which limits the impact of social recommendation techniques. In this paper, we study the following two research problems: (1) In some systems without explicit social information, can we still improve recommender systems using implicit social information? (2) In the systems with explicit social information, can the performance of using implicit social information outperform that of using explicit social information? In order to answer these two questions, we conduct comprehensive experimental analysis on three recommendation datasets. The result indicates that: (1) Implicit user and item social information, including similar and dissimilar relationships, can be employed to improve traditional recommendation methods. (2) When comparing implicit social information with explicit social information, the performance of using implicit information is slightly worse. This study provides additional insights to social recommendation techniques, and also greatly widens the utility and spreads the impact of previous and upcoming social recommendation approaches.

Queries I

Task-aware query recommendation BIBAFull-Text 83-92
  Henry Feild; James Allan
When generating query recommendations for a user, a natural approach is to try and leverage not only the user's most recently submitted query, or reference query, but also information about the current search context, such as the user's recent search interactions. We focus on two important classes of queries that make up search contexts: those that address the same information need as the reference query (on-task queries), and those that do not (off-task queries). We analyze the effects on query recommendation performance of using contexts consisting of only on-task queries, only off-task queries, and a mix of the two. Using TREC Session Track data for simulations, we demonstrate that on-task context is helpful on average but can be easily overwhelmed when off-task queries are interleaved -- a common situation according to several analyses of commercial search logs. To minimize the impact of off-task queries on recommendation performance, we consider automatic methods of identifying such queries using a state of the art search task identification technique. Our experimental results show that automatic search task identification can eliminate the effect of off-task queries in a mixed context.
   We also introduce a novel generalized model for generating recommendations over a search context. While we only consider query text in this study, the model can handle integration over arbitrary user search behavior, such as page visits, dwell times, and query abandonment. In addition, it can be used for other types of recommendation, including personalized web search.
Extracting query facets from search results BIBAFull-Text 93-102
  Weize Kong; James Allan
Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using groups of semantically related terms extracted from search results. As an example, for the query "baggage allowance", these groups might be different airlines, different flight types (domestic, international), or different travel classes (first, business, economy). We name these groups query facets and the terms in these groups facet terms. We develop a supervised approach based on a graphical model to recognize query facets from the noisy candidates found. The graphical model learns how likely a candidate term is to be a facet term as well as how likely two terms are to be grouped together in a query facet, and captures the dependencies between the two factors. We propose two algorithms for approximate inference on the graphical model since exact inference is intractable. Our evaluation combines recall and precision of the facet terms with the grouping quality. Experimental results on a sample of web queries show that the supervised method significantly outperforms existing approaches, which are mostly unsupervised, suggesting that query facet extraction can be effectively learned.
Learning to personalize query auto-completion BIBAFull-Text 103-112
  Milad Shokouhi
Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes tend to be short and ambiguous, and existing models mostly rely on the past popularity of matching candidates for ranking. However, the popularity of certain queries may vary drastically across different demographics and users. For instance, while instagram and imdb have comparable popularities overall and are both legitimate candidates to show for prefix i, the former is noticeably more popular among young female users, and the latter is more likely to be issued by men.
   In this paper, we present a supervised framework for personalizing auto-completion ranking. We introduce a novel labelling strategy for generating offline training labels that can be used for learning personalized rankers. We compare the effectiveness of several user-specific and demographic-based features and show that among them, the user's long-term search history and location are the most effective for personalizing auto-completion rankers. We perform our experiments on the publicly available AOL query logs, and also on the larger-scale logs of Bing. The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.
Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval BIBAFull-Text 113-122
  Parvaz Mahdabi; Shima Gerani; Jimmy Xiangji Huang; Fabio Crestani
Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application and does not represent a focused information need. This query by document nature of patent retrieval introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering different information resources for query reduction and query disambiguation. However, previous work has not fully studied the effect of using proximity information and exploiting domain specific resources for performing query disambiguation.
   In this paper, we first reduce the query document by taking the first claim of the document itself. We then build a query-specific patent lexicon based on definitions of the International Patent Classification (IPC). We study how to expand queries by selecting expansion terms from the lexicon that are focused on the query topic. The key problem is how to capture whether an expansion term is focused on the query topic or not. We address this problem by exploiting proximity information. We assign high weights to expansion terms appearing closer to query terms based on the intuition that terms closer to query terms are more likely to be related to the query topic.
   Experimental results on two patent retrieval datasets show that the proposed method is effective and robust for query expansion, significantly outperforming the standard pseudo relevance feedback (PRF) and existing baselines in patent retrieval.

Users and interactive IR I

Aggregated search interface preferences in multi-session search tasks BIBAFull-Text 123-132
  Marc Bron; Jasmijn van Gorp; Frank Nack; Lotte Belice Baltussen; Maarten de Rijke
Aggregated search interfaces provide users with an overview of results from various sources. Two general types of display exist: tabbed, with access to each source in a separate tab, and blended, which combines multiple sources into a single result page. Multi-session search tasks, e.g., a research project, consist of multiple stages, each with its own sub-tasks. Several factors involved in multi-session search tasks have been found to influence user search behavior. We investigate whether user preference for source presentation changes during a multi-session search task.
   The dynamic nature of multi-session search tasks makes the design of a controlled experiment a non-trivial challenge. We adopt a methodology based on triangulation and conduct two types of observational study: a longitudinal study and a laboratory study. In the longitudinal study we follow the use of tabbed and blended displays by 25 students during a project.
   We find that while a tabbed display is used more than a blended display, subjects repeatedly switch between displays during the project. Use of the tabbed display is motivated by a need to zoom in on a specific source, while the blended display is used to explore available material across sources whenever the information need changes.
   In a laboratory study 44 students completed a multi-session search task composed of three sub-tasks, the first with a tabbed display, the second and third with blended displays. The tasks were manipulated by either providing three task about the same topic or about three different topics. We find that a stable information need over multiple sub-tasks negatively influences perceived usability of the blended displays, while we do not find an influence when the information need changes.
An effective implicit relevance feedback technique using affective, physiological and behavioural features BIBAFull-Text 133-142
  Yashar Moshfeghi; Joemon M. Jose
The effectiveness of various behavioural signals for implicit relevance feedback models has been exhaustively studied. Despite the advantages of such techniques for a real time information retrieval system, most of the behavioural signals are noisy and therefore not reliable enough to be employed. Among many, a combination of dwell time and task information has been shown to be effective for relevance judgement prediction. However, the task information might not be available to the system at all times. Thus, there is a need for other sources of information which can be used as a substitute for task information. Recently, affective and physiological signals have shown promise as a potential source of information for relevance judgement prediction. However, their accuracy is not high enough to be applicable on their own.
   This paper investigates whether affective and physiological signals can be used as a complementary source of information for behavioural signals (i.e. dwell time) to create a reliable signal for relevance judgement prediction. Using a video retrieval system as a use case, we study and compare the effectiveness of the affective and physiological signals on their own, as well as in combination with behavioural signals for the relevance judgment prediction task across four different search intentions: seeking information, re-finding a particular information object, and two different entertainment intentions (i.e. entertainment by adjusting arousal level, and entertainment by adjusting mood). Our experimental results show that the effectiveness of studied signals varies across different search intentions, and when affective and physiological signals are combined with dwell time, a significant improvement can be achieved. Overall, these findings will help to implement better search engines in the future.
How do users respond to voice input errors?: lexical and phonetic query reformulation in voice search BIBAFull-Text 143-152
  Jiepu Jiang; Wei Jeng; Daqing He
Voice search offers users with a new search experience: instead of typing, users can vocalize their search queries. However, due to voice input errors (such as speech recognition errors and improper system interruptions), users need to frequently reformulate queries to handle the incorrectly recognized queries. We conducted user experiments with native English speakers on their query reformulation behaviors in voice search and found that users often reformulate queries with both lexical and phonetic changes to previous queries. In this paper, we first characterize and analyze typical voice input errors in voice search and users' corresponding reformulation strategies. Then, we evaluate the impacts of typical voice input errors on users' search progress and the effectiveness of different reformulation strategies on handling these errors. This study provides a clearer picture on how to further improve current voice search systems.
Mining touch interaction data on mobile devices to predict web search result relevance BIBAFull-Text 153-162
  Qi Guo; Haojian Jin; Dmitry Lagun; Shuai Yuan; Eugene Agichtein
Fine-grained search interactions in the desktop setting, such as mouse cursor movements and scrolling, have been shown valuable for understanding user intent, attention, and their preferences for Web search results. As web search on smart phones and tablets becomes increasingly popular, previously validated desktop interaction models have to be adapted for the available touch interactions such as pinching and swiping, and for the different device form factors. In this paper, we present, to our knowledge, the first in-depth study of modeling interactions on touch-enabled device for improving Web search ranking. In particular, we evaluate a variety of touch interactions on a smart phone as implicit relevance feedback, and compare them with the corresponding fine-grained interactions on a desktop computer with mouse and keyboard as the primary input devices. Our experiments are based on a dataset collected from two user studies with 56 users in total, using a specially instrumented version of a popular mobile browser to capture the interaction data. We report a detailed analysis of the similarities and differences of fine-grained search interactions between the desktop and the smart phone modalities, and identify novel patterns of touch interactions indicative of result relevance. Finally, we demonstrate significant improvements to search ranking quality by mining touch interaction data.

Efficiency I

An information-theoretic account of static index pruning BIBAFull-Text 163-172
  Ruey-Cheng Chen; Chia-Jung Lee
In this paper, we recast static index pruning as a model induction problem under the framework of Kullback's principle of minimum cross-entropy. We show that static index pruning has an approximate analytical solution in the form of convex integer program. Further analysis on computation feasibility suggests that one of its surrogate model can be solved efficiently. This result has led to the rediscovery of uniform pruning, a simple yet powerful pruning method proposed in 2001 and later easily ignored by many of us. To empirically verify this result, we conducted experiments under a new design in which prune ratio is strictly controlled. Our result on standard ad-hoc retrieval benchmarks has confirmed that uniform pruning is robust to high prune ratio and its performance is currently state of the art.
Document identifier reassignment and run-length-compressed inverted indexes for improved search performance BIBAFull-Text 173-182
  Diego Arroyuelo; Senén González; Mauricio Oyarzún; Victor Sepulveda
Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: the inverted indexes. Currently, inverted indexes can be represented very efficiently using index compression schemes. Recent investigations also study how an optimized document ordering can be used to assign document identifiers (docIDs) to the document database. This yields important improvements in index compression and query processing time. In this paper we follow this line of research, yet from a different perspective. We propose a docID reassignment method that allows one to focus on a given subset of inverted lists to improve their performance. We then use run-length encoding to compress these lists (as many consecutive 1s are generated). We show that by using this approach, not only the performance of the particular subset of inverted lists is improved, but also that of the whole inverted index. Our experimental results indicate a reduction of about 10% in the space usage of the whole index docID reassignment was focused. Also, decompression speed is up to 1.22 times faster if the runs must be explicitly decompressed and up to 4.58 times faster if implicit decompression of runs is allowed. Finally, we also improve the Document-at-a-Time query processing time of AND queries (by up to 12%), WAND queries (by up to 23%) and full (non-ranked) OR queries (by up to 86%).
Fast document-at-a-time query processing using two-tier indexes BIBAFull-Text 183-192
  Cristian Rossi; Edleno S. de Moura; Andre L. Carvalho; Altigran S. da Silva
In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), to take advantage of a two-tiered index, in which the first tier is a small index containing only the higher impact entries of each inverted list. This small index is used to pre-process the query before accessing a larger index in the second tier, resulting in considerable speeding up the whole process. The first algorithm we propose, named BMW-CS, achieves higher performance, but may result in small changes in the top results provided in the final ranking. The second algorithm, named BMW-t, preserves the top results and, while slower than BMW-CS, it is faster than BMW. In our experiments, BMW-CS was more than 40 times faster than BMW when computing top 10 results, and, while it does not guarantee preserving the top results, it preserved all ranking results evaluated at this level.
Faster and smaller inverted indices with treaps BIBAFull-Text 193-202
  Roberto Konow; Gonzalo Navarro; Charles L. A. Clarke; Alejandro López-Ortíz
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.

Topic modeling

An unsupervised topic segmentation model incorporating word order BIBAFull-Text 203-212
  Shoaib Jameel; Wai Lam
We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme using Gibbs sampling method. We conduct extensive experiments using publicly available data from different collections and show that our model improves the quality of several text mining tasks such as the ability to support fine grained topics with n-gram words in the correlation graph, the ability to segment a document into topically coherent sections, document classification, and document likelihood estimation.
Semantic hashing using tags and topic modeling BIBAFull-Text 213-222
  Qifan Wang; Dan Zhang; Luo Si
It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching.
   This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.
Incorporating popularity in topic models for social network analysis BIBAFull-Text 223-232
  Youngchul Cha; Bin Bi; Chu-Cheng Hsieh; Junghoo Cho
Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear in "irrelevant" topic groups. This noise has not been a serious problem in a text dataset because the frequent words (e.g., the and is) do not have much meaning and have been simply removed before a topic model analysis. However, in a social network dataset we are interested in, they correspond to popular persons (e.g., Barack Obama and Justin Bieber) and cannot be simply removed because most people are interested in them.
   To solve this "popularity problem", we explicitly model the popularity of nodes (words) in topic models. For this purpose, we first introduce a notion of a "popularity component" and propose topic model extensions that effectively accommodate the popularity component. We evaluate the effectiveness of our models with a real-world Twitter dataset. Our proposed models achieve significantly lower perplexity (i.e., better prediction power) compared to the state-of-the-art baselines.
   In addition to the popularity problem caused by the nodes with high incoming edge degree, we also investigate the effect of the outgoing edge degree with another topic model extensions. We show that considering outgoing edge degree does not help much in achieving lower perplexity.
Topic hierarchy construction for the organization of multi-source user generated contents BIBAFull-Text 233-242
  Xingwei Zhu; Zhao-Yan Ming; Xiaoyan Zhu; Tat-Seng Chua
User generated contents (UGCs) carry a huge amount of high quality information. However, the information overload and diversity of UGC sources limit their potential uses. In this research, we propose a framework to organize information from multiple UGC sources by a topic hierarchy which is automatically generated and updated using the UGCs. We explore the unique characteristics of UGCs like blogs, cQAs, microblogs, etc., and introduce a novel scheme to combine them. We also propose a graph-based method to enable incremental update of the generated topic hierarchy. Using the hierarchy, users can easily obtain a comprehensive, in-depth and up-to-date picture of their topics of interests. The experiment results demonstrate how information from multiple heterogeneous sources improves the resultant topic hierarchies. It also shows that the proposed method achieves better F1 scores in hierarchy generation as compared to the state-of-the-art methods.

Users and interactive IR II

Looking ahead: query preview in exploratory search BIBAFull-Text 243-252
  Pernilla Qvarfordt; Gene Golovchinsky; Tony Dunnigan; Elena Agapie
Exploratory search is a complex, iterative information seeking activity that involves running multiple queries and finding and examining many documents. We designed a query preview control that visualizes the distribution of newly-retrieved and re-retrieved documents prior to running the query. When evaluating the preview control with a control condition, we found effects on both people's information seeking behavior and improved retrieval performance. People spent more time formulating a query and were more likely to explore search results more deeply, retrieved a more diverse set of documents, and found more different relevant documents when using the preview.
News vertical search: when and what to display to users BIBAFull-Text 253-262
  Richard McCreadie; Craig Macdonald; Iadh Ounis
News reporting has seen a shift toward fast-paced online reporting in new sources such as social media. Web Search engines that support a news vertical have historically relied upon articles published by major newswire providers when serving news-related queries. In this paper, we investigate to what extent real-time content from newswire, blogs, Twitter and Wikipedia sources are useful to return to the user in the current fast-paced news search setting. In particular, we perform a detailed user study using the emerging medium of crowdsourcing to determine when and where integrating news-related content from these various sources can better serve the user's news need. We sampled approximately 300 news-related search queries using Google Trends and Bitly data in real-time for two time periods. For these queries, we have crowdsourced workers compare Web search rankings for each, with similar rankings integrating real-time news content from sources such as Twitter or the blogosphere. Our results show that users exhibited a preference for rankings integrating newswire articles for only half of our queries, indicating that relying solely on newswire providers for news-related content is now insufficient. Moreover, our results show that users preferred rankings that integrate tweets more often than those that integrate newswire articles, showing the potential of using social media to better serve news queries.
Toward self-correcting search engines: using underperforming queries to improve search BIBAFull-Text 263-272
  Ahmed Hassan; Ryen W. White; Yi-Min Wang
Search engines receive queries with a broad range of different search intents. However, they do not perform equally well for all queries. Understanding where search engines perform poorly is critical for improving their performance. In this paper, we present a method for automatically identifying poorly-performing query groups where a search engine may not meet searcher needs. This allows us to create coherent query clusters that help system designers generate actionable insights about necessary changes and helps learning-to-rank algorithms better learn relevance signals via specialized rankers. The result is a framework capable of estimating dissatisfaction from Web search logs and learning to improve performance for dissatisfied queries. Through experimentation, we show that our method yields good quality groups that align with established retrieval performance metrics. We also show that we can significantly improve retrieval effectiveness via specialized rankers, and that coherent grouping of underperforming queries generated by our method is important in improving each group.
Fighting search engine amnesia: reranking repeated results BIBAFull-Text 273-282
  Milad Shokouhi; Ryen W. White; Paul Bennett; Filip Radlinski
Web search engines frequently show the same documents repeatedly for different queries within the same search session, in essence forgetting when the same documents were already shown to users. Depending on previous user interaction with the repeated results, and the details of the session, we show that sometimes the repeated results should be promoted, while some other times they should be demoted.
   Analysing search logs from two different commercial search engines, we find that results are repeated in about 40% of multi-query search sessions, and that users engage differently with repeats than with results shown for the first time. We demonstrate how statistics about result repetition within search sessions can be incorporated into ranking for personalizing search results. Our results on query logs of two large-scale commercial search engines suggest that we successfully promote documents that are more likely to be clicked by the user in the future while maintaining performance over standard measures of non-personalized relevance.

Recommender systems

Addressing cold-start in app recommendation: latent user models constructed from Twitter followers BIBAFull-Text 283-292
  Jovian Lin; Kazunari Sugiyama; Min-Yen Kan; Tat-Seng Chua
As a tremendous number of mobile applications (apps) are readily available, users have difficulty in identifying apps that are relevant to their interests. Recommender systems that depend on previous user ratings (i.e., collaborative filtering, or CF) can address this problem for apps that have sufficient ratings from past users. But for apps that are newly released, CF does not have any user ratings to base recommendations on, which leads to the cold-start problem.
   In this paper, we describe a method that accounts for nascent information culled from Twitter to provide relevant recommendation in such cold-start situations. We use Twitter handles to access an app's Twitter account and extract the IDs of their Twitter-followers. We create pseudo-documents that contain the IDs of Twitter users interested in an app and then apply latent Dirichlet allocation to generate latent groups. At test time, a target user seeking recommendations is mapped to these latent groups. By using the transitive relationship of latent groups to apps, we estimate the probability of the user liking the app. We show that by incorporating information from Twitter, our approach overcomes the difficulty of cold-start app recommendation and significantly outperforms other state-of-the-art recommendation techniques by up to 33%.
A location-based news article recommendation with explicit localized semantic analysis BIBAFull-Text 293-302
  Jeong-Woo Son; A-Yeong Kim; Seong-Bae Park
The interest of users in handheld devices is strongly related to their location. Therefore, the user location is important, as a user context, for news article recommendation in a mobile environment. This paper proposes a novel news article recommendation that reflects the geographical context of the user. For this purpose, we propose the Explicit Localized Semantic Analysis (ELSA), an ESA-based topical representation of documents. Every location has its own geographical topics, which can be captured from the geo-tagged documents related to the location. Thus, not only news articles but locations are also represented as topic vectors. The main advantage of ELSA is that it stresses only the topics that are relevant to a given location, whereas all topics are equally important in ESA. As a result, geographical topics have different importance according to the user location in ELSA, even if they come from the same article. Another advantage of ELSA is that it allows a simple comparison of the user location and news articles, because it projects both locations and articles onto an identical space composed of Wikipedia topics. In the evaluation of ELSA with the New York Times corpus, it outperformed two simple baselines of Bag-Of-Words and LDA as well as two ESA-based methods. Rt10 of ELSA was improved up to 46.25% over other methods, and its NDCG@k was always higher than those of the others regardless of k.
Opportunity model for e-commerce recommendation: right product; right time BIBAFull-Text 303-312
  Jian Wang; Yi Zhang
Most of existing e-commerce recommender systems aim to recommend the right product to a user, based on whether the user is likely to purchase or like a product. On the other hand, the effectiveness of recommendations also depends on the time of the recommendation. Let us take a user who just purchased a laptop as an example. She may purchase a replacement battery in 2 years (assuming that the laptop's original battery often fails to work around that time) and purchase a new laptop in another 2 years. In this case, it is not a good idea to recommend a new laptop or a replacement battery right after the user purchased the new laptop. It could hurt the user's satisfaction of the recommender system if she receives a potentially right product recommendation at the wrong time. We argue that a system should not only recommend the most relevant item, but also recommend at the right time.
   This paper studies the new problem: how to recommend the right product at the right time? We adapt the proportional hazards modeling approach in survival analysis to the recommendation research field and propose a new opportunity model to explicitly incorporate time in an e-commerce recommender system. The new model estimates the joint probability of a user making a follow-up purchase of a particular product at a particular time. This joint purchase probability can be leveraged by recommender systems in various scenarios, including the zero-query pull-based recommendation scenario (e.g. recommendation on an e-commerce web site) and a proactive push-based promotion scenario (e.g. email or text message based marketing). We evaluate the opportunity modeling approach with multiple metrics. Experimental results on a data collected by a real-world e-commerce website (shop.com) show that it can predict a user's follow-up purchase behavior at a particular time with descent accuracy. In addition, the opportunity model significantly improves the conversion rate in pull-based systems and the user satisfaction/utility in push-based systems.
Improve collaborative filtering through bordered block diagonal form matrices BIBAFull-Text 313-322
  Yongfeng Zhang; Min Zhang; Yiqun Liu; Shaoping Ma
Collaborative Filtering-based recommendation algorithms have achieved widespread success on the Web, but little work has been performed to investigate appropriate user-item relationship structures of rating matrices. This paper presents a novel and general collaborative filtering framework based on (Approximate) Bordered Block Diagonal Form structure of user-item rating matrices. We show formally that matrices in (A)BBDF structures correspond to community detection on the corresponding bipartite graphs, and they reveal relationships among users and items intuitionally in recommendation tasks. By this framework, general and special interests of a user are distinguished, which helps to improve prediction accuracy in collaborative filtering tasks. Experimental results on four real-world datasets, including the Yahoo! Music dataset, which is currently the largest, show that the proposed framework helps many traditional collaborative filtering algorithms, such as User-based, Item-based, SVD and NMF approaches, to make more accurate rating predictions. Moreover, by leveraging smaller and denser submatrices to make predictions, this framework contributes to the scalability of recommender systems.

Retrieval models and ranking I

Personalized ranking model adaptation for web search BIBAFull-Text 323-332
  Hongning Wang; Xiaodong He; Ming-Wei Chang; Yang Song; Ryen W. White; Wei Chu
Search engines train and apply a single ranking model across all users, but searchers' information needs are diverse and cover a broad range of topics. Hence, a single user-independent ranking model is insufficient to satisfy different users' result preferences. Conventional personalization methods learn separate models of user interests and use those to re-rank the results from the generic model. Those methods require significant user history information to learn user preferences, have low coverage in the case of memory-based methods that learn direct associations between query-URL pairs, and have limited opportunity to markedly affect the ranking given that they only re-order top-ranked items.
   In this paper, we propose a general ranking model adaptation framework for personalized search. Using a given user-independent ranking model trained offline and limited number of adaptation queries from individual users, the framework quickly learns to apply a series of linear transformations, e.g., scaling and shifting, over the parameters of the given global ranking model such that the adapted model can better fit each individual user's search preferences. Extensive experimentation based on a large set of search logs from a major commercial Web search engine confirms the effectiveness of the proposed method compared to several state-of-the-art ranking model adaptation methods.
Ranking document clusters using Markov random fields BIBAFull-Text 333-342
  Fiana Raiber; Oren Kurland
An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and query-independent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms state-of-the-art cluster ranking methods. We also show that our method can be used to improve the performance of (state-of-the-art) results-diversification methods.
A novel TF-IDF weighting scheme for effective ranking BIBAFull-Text 343-352
  Jiaul H. Paik
Term weighting schemes are central to the study of information retrieval systems. This article proposes a novel TF-IDF term weighting scheme that employs two different within document term frequency normalizations to capture two different aspects of term saliency. One component of the term frequency is effective for short queries, while the other performs better on long queries. The final weight is then measured by taking a weighted combination of these components, which is determined on the basis of the length of the corresponding query.
   Experiments conducted on a large number of TREC news and web collections demonstrate that the proposed scheme almost always outperforms five state of the art retrieval models with remarkable significance and consistency. The experimental results also show that the proposed model achieves significantly better precision than the existing models.
Retrieving documents with mathematical content BIBAFull-Text 353-362
  Shahab Kamali; Frank Wm. Tompa
Many documents with mathematical content are published on the Web, but conventional search engines that rely on keyword search only cannot fully exploit their mathematical information. In particular, keyword search is insufficient when expressions in a document are not annotated with natural keywords or the user cannot describe her query with keywords. Retrieving documents by querying their mathematical content directly is very appealing in various domains such as education, digital libraries, engineering, patent documents, medical sciences, etc. Capturing the relevance of mathematical expressions also greatly enhances document classification in such domains.
   Unlike text retrieval, where keywords carry enough semantics to distinguish text documents and rank them, math symbols do not contain much semantic information on their own. In fact, mathematical expressions typically consist of few alphabetical symbols organized in rather complex structures. Hence, the structure of an expression, which describes the way such symbols are combined, should also be considered. Unfortunately, there is no standard testbed with which to evaluate the effectiveness of a mathematics retrieval algorithm.
   In this paper we study the fundamental and challenging problems in mathematics retrieval, that is how to capture the relevance of mathematical expressions, how to query them, and how to evaluate the results. We describe various search paradigms and propose retrieval systems accordingly. We discuss the benefits and drawbacks of each approach, and further compare them through an extensive empirical study.


Time-aware point-of-interest recommendation BIBAFull-Text 363-372
  Quan Yuan; Gao Cong; Zongyang Ma; Aixin Sun; Nadia Magnenat-Thalmann
The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to recommend places where users have not visited before. Several techniques have been recently proposed for the recommendation service. However, no existing work has considered the temporal information for POI recommendations in LBSNs. We believe that time plays an important role in POI recommendations because most users tend to visit different places at different time in a day, eg visiting a restaurant at noon and visiting a bar at night. In this paper, we define a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day. To solve the problem, we develop a collaborative recommendation model that is able to incorporate temporal information. Moreover, based on the observation that users tend to visit nearby POIs, we further enhance the recommendation model by considering geographical information. Our experimental results on two real-world datasets show that the proposed approach outperforms the state-of-the-art POI recommendation methods substantially.
Modeling user's receptiveness over time for recommendation BIBAFull-Text 373-382
  Wei Chen; Wynne Hsu; Mong Li Lee
Existing recommender systems model user interests and the social influences independently. In reality, user interests may change over time, and as the interests change, new friends may be added while old friends grow apart and the new friendships formed may cause further interests change. This complex interaction requires the joint modeling of user interest and social relationships over time. In this paper, we propose a probabilistic generative model, called Receptiveness over Time Model (RTM), to capture this interaction. We design a Gibbs sampling algorithm to learn the receptiveness and interest distributions among users over time. The results of experiments on a real world dataset demonstrate that RTM-based recommendation outperforms the state-of-the-art recommendation methods. Case studies also show that RTM is able to discover the user interest shift and receptiveness change over time.
Query representation for cross-temporal information retrieval BIBAFull-Text 383-392
  Miles Efron
This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to language change is high. Collections such as Google Books and the Hathi Trust contain text written in the vernaculars of many centuries. With respect to IR, changes in vocabulary and orthography make 14th-Century English qualitatively different from 21st-Century English. This challenges retrieval models that rely on keyword matching. With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English? We argue that search in historically diverse corpora is similar to cross-language retrieval (CLIR). By considering "modern" English and "archaic" English as distinct languages, CLIR techniques can improve what we call cross-temporal IR (CTIR). We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search. We find that a principled combination of three sources of evidence during relevance feedback yields strong CTIR performance.

Evaluation I

On the measurement of test collection reliability BIBAFull-Text 393-402
  Julián Urbano; Mónica Marrero; Diego Martín
The reliability of a test collection is proportional to the number of queries it contains. But building a collection with many queries is expensive, so researchers have to find a balance between reliability and cost. Previous work on the measurement of test collection reliability relied on data-based approaches that contemplated random what if scenarios, and provided indicators such as swap rates and Kendall tau correlations. Generalizability Theory was proposed as an alternative founded on analysis of variance that provides reliability indicators based on statistical theory. However, these reliability indicators are hard to interpret in practice, because they do not correspond to well known indicators like Kendall tau correlation. We empirically established these relationships based on data from over 40 TREC collections, thus filling the gap in the practical interpretation of Generalizability Theory. We also review the computation of these indicators, and show that they are extremely dependent on the sample of systems and queries used, so much that the required number of queries to achieve a certain level of reliability can vary in orders of magnitude. We discuss the computation of confidence intervals for these statistics, providing a much more reliable tool to measure test collection reliability. Reflecting upon all these results, we review a wealth of TREC test collections, arguing that they are possibly not as reliable as generally accepted and that the common choice of 50 queries is insufficient even for stable rankings.
Deciding on an adjustment for multiplicity in IR experiments BIBAFull-Text 403-412
  Leonid Boytsov; Anna Belova; Peter Westfall
We evaluate statistical inference procedures for small-scale IR experiments that involve multiple comparisons against the baseline. These procedures adjust for multiple comparisons by ensuring that the probability of observing at least one false positive in the experiment is below a given threshold. We use only publicly available test collections and make our software available for download. In particular, we employ the TREC runs and runs constructed from the Microsoft learning-to-rank (MSLR) data set. Our focus is on non-parametric statistical procedures that include the Holm-Bonferroni adjustment of the permutation test p-values, the MaxT permutation test, and the permutation-based closed testing. In TREC-based simulations, these procedures retain from 66% to 92% of individually significant results (i.e., those obtained without taking other comparisons into account). Similar retention rates are observed in the MSLR simulations. For the largest evaluated query set size (i.e., 6400), procedures that adjust for multiplicity find at most 5% fewer true differences compared to unadjusted tests. At the same time, unadjusted tests produce many more false positives.
Preference based evaluation measures for novelty and diversity BIBAFull-Text 413-422
  Praveen Chandar; Ben Carterette
Novel and diverse document ranking is an effective strategy that involves reducing redundancy in a ranked list to maximize the amount of novel and relevant information available to users. Evaluation for novelty and diversity typically involves an assessor judging each document for relevance against a set of pre-identified subtopics, which may be disambiguations of the query, facets of an information need, or nuggets of information. Alternately, when expressing a preference for document A or document B, users may implicitly take subtopics into account, but may also take into account other factors such as recency, readability, length, and so on, each of which may have more or less importance depending on user. A user profile contains information about the extent to which each factor, including subtopic relevance, plays a role in the user's preference for one document over another. A preference-based evaluation can then take this user profile information into account to better model utility to the space of users.
   In this work, we propose an evaluation framework that not only can consider implicit factors but also handles differences in user preference due to varying underlying information need. Our proposed framework is based on the idea that a user scanning a ranked list from top to bottom and stopping at rank $k$ gains some utility from every document that is relevant their information need. Thus, we model the expected utility of a ranked list by estimating the utility of a document at a given rank using preference judgments and define evaluation measures based on the same. We validate our framework by comparing it to existing measures such as α-nDCG, ERR-IA, and subtopic recall that require explicit subtopic judgments We show that our proposed measures correlate well with existing measures while having the potential to capture various other factors when real data is used. We also show that the proposed measures can easily handle relevance assessments against multiple user profiles, and that they are robust to noisy and incomplete judgments.


Competence-based song recommendation BIBAFull-Text 423-432
  Lidan Shou; Kuang Mao; Xinyuan Luo; Ke Chen; Gang Chen; Tianlei Hu
Singing is a popular social activity and a good way of expressing one's feelings. One important reason for unsuccessful singing performance is because the singer fails to choose a suitable song. In this paper, we propose a novel singing competence-based song recommendation framework. It is distinguished from most existing music recommendation systems which rely on the computation of listeners' interests or similarity. We model a singer's vocal competence as singer profile, which takes voice pitch, intensity, and quality into consideration. Then we propose techniques to acquire singer profiles. We also present a song profile model which is used to construct a human annotated song database. Finally, we propose a learning-to-rank scheme for recommending songs by singer profile. The experimental study on real singers demonstrates the effectiveness of our approach and its advantages over two baseline methods. To the best of our knowledge, our work is the first to study competence-based song recommendation.
A low rank structural large margin method for cross-modal ranking BIBAFull-Text 433-442
  Xinyan Lu; Fei Wu; Siliang Tang; Zhongfei Zhang; Xiaofei He; Yueting Zhuang
Cross-modal retrieval is a classic research topic in multimedia information retrieval. The traditional approaches study the problem as a pairwise similarity function problem. In this paper, we consider this problem from a new perspective as a listwise ranking problem and propose a general cross-modal ranking algorithm to optimize the listwise ranking loss with a low rank embedding, which we call Latent Semantic Cross-Modal Ranking (LSCMR). The latent low-rank embedding space is discriminatively learned by structural large margin learning to optimize for certain ranking criteria directly. We evaluate LSCMR on the Wikipedia and NUS-WIDE dataset. Experimental results show that this method obtains significant improvements over the state-of-the-art methods.
Learning to name faces: a multimodal learning scheme for search-based face annotation BIBAFull-Text 443-452
  Dayong Wang; Steven C. H. Hoi; Pengcheng Wu; Jianke Zhu; Ying He; Chunyan Miao
Automated face annotation aims to automatically detect human faces from a photo and further name the faces with the corresponding human names. In this paper, we tackle this open problem by investigating a search-based face annotation (SBFA) paradigm for mining large amounts of web facial images freely available on the WWW. Given a query facial image for annotation, the idea of SBFA is to first search for top-n similar facial images from a web facial image database and then exploit these top-ranked similar facial images and their weak labels for naming the query facial image. To fully mine those information, this paper proposes a novel framework of Learning to Name Faces (L2NF) -- a unified multimodal learning approach for search-based face annotation, which consists of the following major components: (i) we enhance the weak labels of top-ranked similar images by exploiting the "label smoothness" assumption; (ii) we construct the multimodal representations of a facial image by extracting different types of features; (iii) we optimize the distance measure for each type of features using distance metric learning techniques; and finally (iv) we learn the optimal combination of multiple modalities for annotation through a learning to rank scheme. We conduct a set of extensive empirical studies on two real-world facial image databases, in which encouraging results show that the proposed algorithms significantly boost the naming accuracy of search-based face annotation task.

Search sessions

Utilizing query change for session search BIBAFull-Text 453-462
  Dongyi Guan; Sicong Zhang; Hui Yang
Session search is the Information Retrieval (IR) task that performs document retrieval for a search session. During a session, a user constantly modifies queries in order to find relevant documents that fulfill the information need. This paper proposes a novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search. We propose to model session search as a Markov Decision Process (MDP). We consider two agents in this MDP: the user agent and the search engine agent. The user agent's actions are query changes that we observe and the search agent's actions are proposed in this paper. Experiments show that our approach is highly effective and outperforms top session search systems in TREC 2011 and 2012.
Toward whole-session relevance: exploring intrinsic diversity in web search BIBAFull-Text 463-472
  Karthik Raman; Paul N. Bennett; Kevyn Collins-Thompson
Current research on web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [20] which span multiple queries across one or more search sessions [26,24]. An ideal search engine would not only retrieve relevant results for a user's particular query but also be able to identify when the user is engaged in a more complex task and aid the user in completing that task [29,1]. Toward optimizing whole-session or task relevance, we characterize and address the problem of intrinsic diversity (ID) in retrieval [30], a type of complex task that requires multiple interactions with current search engines. Unlike existing work on extrinsic diversity [30] that deals with ambiguity in intent across multiple users, ID queries often have little ambiguity in intent but seek content covering a variety of aspects on a shared theme. In such scenarios, the underlying needs are typically exploratory, comparative, or breadth-oriented in nature. We identify and address three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation BIBAFull-Text 473-482
  Tetsuya Sakai; Zhicheng Dou
We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval.
   This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions.
   We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).

Click models

Modeling click-through based word-pairs for web search BIBAFull-Text 483-492
  Jagadeesh Jagarlamudi; Jianfeng Gao
Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.
Click model-based information retrieval metrics BIBAFull-Text 493-502
  Aleksandr Chuklin; Pavel Serdyukov; Maarten de Rijke
In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions.
   One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.
Incorporating vertical results into search click models BIBAFull-Text 503-512
  Chao Wang; Yiqun Liu; Min Zhang; Shaoping Ma; Meihong Zheng; Jing Qian; Kuo Zhang
In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.

Social media and network analysis II

Personalized time-aware tweets summarization BIBAFull-Text 513-522
  Zhaochun Ren; Shangsong Liang; Edgar Meij; Maarten de Rijke
We focus on the problem of selecting meaningful tweets given a user's interests; the dynamic nature of user interests, the sheer volume, and the sparseness of individual messages make this an challenging problem. Specifically, we consider the task of time-aware tweets summarization, based on a user's history and collaborative social influences from "social circles." We propose a time-aware user behavior model, the Tweet Propagation Model (TPM), in which we infer dynamic probabilistic distributions over interests and topics. We then explicitly consider novelty, coverage, and diversity to arrive at an iterative optimization algorithm for selecting tweets. Experimental results validate the effectiveness of our personalized time-aware tweets summarization method based on TPM.
Exploiting hybrid contexts for Tweet segmentation BIBAFull-Text 523-532
  Chenliang Li; Aixin Sun; Jianshu Weng; Qi He
Twitter has attracted hundred millions of users to share and disseminate most up-to-date information. However, the noisy and short nature of tweets makes many applications in information retrieval (IR) and natural language processing (NLP) challenging. Recently, segment-based tweet representation has demonstrated effectiveness in named entity recognition (NER) and event detection from tweet streams. To split tweets into meaningful phrases or segments, the previous work is purely based on external knowledge bases, which ignores the rich local context information embedded in the tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. HybridSeg incorporates local context knowledge with global knowledge bases for better tweet segmentation. HybridSeg consists of two steps: learning from off-the-shelf weak NERs and learning from pseudo feedback. In the first step, the existing NER tools are applied to a batch of tweets. The named entities recognized by these NERs are then employed to guide the tweet segmentation process. In the second step, HybridSeg adjusts the tweet segmentation results iteratively by exploiting all segments in the batch of tweets in a collective manner. Experiments on two tweet datasets show that HybridSeg significantly improves tweet segmentation quality compared with the state-of-the-art algorithm. We also conduct a case study by using tweet segments for the task of named entity recognition from tweets. The experimental results demonstrate that HybridSeg significantly benefits the downstream applications.
Sumblr: continuous summarization of evolving tweet streams BIBAFull-Text 533-542
  Lidan Shou; Zhenhua Wang; Ke Chen; Gang Chen
With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow BIBAFull-Text 543-552
  Daniel Hasan Dalip; Marcos André Gonçalves; Marco Cristo; Pavel Calado
Collaborative web sites, such as collaborative encyclopedias, blogs, and forums, are characterized by a loose edit control, which allows anyone to freely edit their content. As a consequence, the quality of this content raises much concern. To deal with this, many sites adopt manual quality control mechanisms. However, given their size and change rate, manual assessment strategies do not scale and content that is new or unpopular is seldom reviewed. This has a negative impact on the many services provided, such as ranking and recommendation. To tackle with this problem, we propose a learning to rank (L2R) approach for ranking answers in Q&A forums. In particular, we adopt an approach based on Random Forests and represent query and answer pairs using eight different groups of features. Some of these features are used in the Q&A domain for the first time. Our L2R method was trained to learn the answer rating, based on the feedback users give to answers in Q&A forums. Using the proposed method, we were able (i) to outperform a state of the art baseline with gains of up to 21% in NDCG, a metric used to evaluate rankings; we also conducted a comprehensive study of the features, showing that (ii) review and user features are the most important in the Q&A domain although text features are useful for assessing quality of new answers; and (iii) the best set of new features we proposed was able to yield the best quality rankings.

Queries II

An incremental approach to efficient pseudo-relevance feedback BIBAFull-Text 553-562
  Hao Wu; Hui Fang
Pseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is to retrieve final retrieval results using the original query expanded with terms selected from the previously retrieved documents. This two-round retrieval process is clearly time consuming, which could arguably be one of main reasons that hinder the wide adaptation of the pseudo-relevance feedback methods in real-world IR systems.
   In this paper, we study how to improve the efficiency of pseudo-relevance feedback methods. The basic idea is to reduce the time needed for the second round of retrieval by leveraging the query processing results of the first round. Specifically, instead of processing the expand query as a newly submitted query, we propose an incremental approach, which resumes the query processing results (i.e. document accumulators) for the first round of retrieval and process the second round of retrieval mainly as a step of adjusting the scores in the accumulators. Experimental results on TREC Terabyte collections show that the proposed incremental approach can improve the efficiency of pseudo-relevance feedback methods by a factor of two without sacrificing their effectiveness.
Query expansion using path-constrained random walks BIBAFull-Text 563-572
  Jianfeng Gao; Gu Xu; Jinxi Xu
This paper exploits Web search logs for query expansion (QE) by presenting a new QE method based on path-constrained random walks (PCRW), where the search logs are represented as a labeled, directed graph, and the probability of picking an expansion term for an input query is computed by a learned combination of constrained random walks on the graph. The method is shown to be generic in that it covers most of the popular QE models as special cases and flexible in that it provides a principled mathematical framework in which a wide variety of information useful for QE can be incorporated in a unified way. Evaluation is performed on the Web document ranking task using a real-world data set. Results show that the PCRW-based method is very effective for the expansion of rare queries, i.e., low-frequency queries that are unseen in search logs, and that it outperforms significantly other state-of-the-art QE methods.
Efficient query construction for large scale data BIBAFull-Text 573-582
  Elena Demidova; Xuan Zhou; Wolfgang Nejdl
In recent years, a number of open databases have emerged on the Web, providing Web users with platforms to collaboratively create structured information. As these databases are intended to accommodate heterogeneous information and knowledge, they usually comprise a very large schema and billions of instances. Browsing and searching data on such a scale is not an easy task for a Web user. In this context, interactive query construction offers an intuitive interface for novice users to retrieve information from databases neither requiring any knowledge of structured query languages, nor any prior knowledge of the database schema. However, the existing mechanisms do not scale well on large scale datasets. This paper presents a set of techniques to boost the scalability of interactive query construction, from the perspective of both, user interaction cost and performance. We connect an abstract ontology layer to the database schema to shorten the process of user-computer interaction. We also introduce a search mechanism to enable efficient exploration of query interpretation spaces over large scale data. Extensive experiments show that our approach scales well on Freebase -- an open database containing more than 7,000 relational tables in more than 100 domains.
Compact query term selection using topically related text BIBAFull-Text 583-592
  K. Tamsin Maxwell; W. Bruce Croft
Many recent and highly effective retrieval models for long queries use query reformulation methods that jointly optimize term weights and term selection. These methods learn using word context and global context but typically fail to capture query context. In this paper, we present a novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself. This focuses queries so that one to five terms in an unweighted model achieve better retrieval effectiveness than weighted term selection models that use up to 30 terms. PhRank terms are also typically compact and contain 1-2 words compared to competing models that use query subsets up to 7 words long. PhRank captures query context with an affinity graph constructed using word co-occurrence in pseudo-relevant documents. A random walk of the graph is used for term ranking in combination with discrimination weights. Empirical evaluation using newswire and web collections demonstrates that performance of reformulated queries is significantly improved for long queries and at least as good for short, keyword queries compared to highly competitive information retrieval (IR) models.


Sentiment diversification with different biases BIBAFull-Text 593-602
  Elif Aktolga; James Allan
Prior search result diversification work focuses on achieving topical variety in a ranked list, typically equally across all aspects. In this paper, we diversify with sentiments according to an explicit bias. We want to allow users to switch the result perspective to better grasp the polarity of opinionated content, such as during a literature review. For this, we first infer the prior sentiment bias inherent in a controversial topic -- the 'Topic Sentiment'. Then, we utilize this information in 3 different ways to diversify results according to various sentiment biases: (1) Equal diversification to achieve a balanced and unbiased representation of all sentiments on the topic; (2) Diversification towards the Topic Sentiment, in which the actual sentiment bias in the topic is mirrored to emphasize the general perception of the topic; (3) Diversification against the Topic Sentiment, in which documents about the 'minority' or outlying sentiment(s) are boosted and those with the popular sentiment are demoted.
   Since sentiment classification is an essential tool for this task, we experiment by gradually degrading the accuracy of a perfect classifier down to 40%, and show which diversification approaches prove most stable in this setting. The results reveal that the proportionality-based methods and our SCSF model, considering sentiment strength and frequency in the diversified list, yield the highest gains. Further, in case the Topic Sentiment cannot be reliably estimated, we show how performance is affected by equal diversification when actually an emphasis either towards or against the Topic Sentiment is desired: in the former case, an average of 6.48% is lost across all evaluation measures, whereas in the latter case this is 16.23%, confirming that bias-specific sentiment diversification is crucial.
Term level search result diversification BIBAFull-Text 603-612
  Van Dang; Bruce W. Croft
Current approaches for search result diversification have been categorized as either implicit or explicit. The implicit approach assumes each document represents its own topic, and promotes diversity by selecting documents for different topics based on the difference of their vocabulary. On the other hand, the explicit approach models the set of query topics, or aspects. While the former approach is generally less effective, the latter usually depends on a manually created description of the query aspects, the automatic construction of which has proven difficult. This paper introduces a new approach: term-level diversification. Instead of modeling the set of query aspects, which are typically represented as coherent groups of terms, our approach uses terms without the grouping. Our results on the ClueWeb collection show that the grouping of topic terms provides very little benefit to diversification compared to simply using the terms themselves. Consequently, we demonstrate that term-level diversification, with topic terms identified automatically from the search results using a simple greedy algorithm, significantly outperforms methods that attempt to create a full topic structure for diversification.
Search result diversification in resource selection for federated search BIBAFull-Text 613-622
  Dzung Hong; Luo Si
Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications.
   This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.

Evaluation II

The effect of threshold priming and need for cognition on relevance calibration and assessment BIBAFull-Text 623-632
  Falk Scholer; Diane Kelly; Wan-Ching Wu; Hanseul S. Lee; William Webber
Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. We examine the effect that threshold priming, seeing varying degrees of relevant documents, has on people's calibration of relevance. Participants judged the relevance of a prologue of documents containing highly relevant, moderately relevant, or non-relevant documents, followed by a common epilogue of documents of mixed relevance. We observe that participants exposed to only non-relevant documents in the prologue assigned significantly higher average relevance scores to prologue and epilogue documents than participants exposed to moderately or highly relevant documents in the prologue. We also examine how need for cognition, an individual difference measure of the extent to which a person enjoys engaging in effortful cognitive activity, impacts relevance assessments. High need for cognition participants had a significantly higher level of agreement with expert assessors than low need for cognition participants did. Our findings indicate that assessors should be exposed to documents from multiple relevance levels early in the judging process, in order to calibrate their relevance thresholds in a balanced way, and that individual difference measures might be a useful way to screen assessors.
User model-based metrics for offline query suggestion evaluation BIBAFull-Text 633-642
  Eugene Kharitonov; Craig Macdonald; Pavel Serdyukov; Iadh Ounis
Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible to compare results and approaches from the literature. Moreover, often the metrics used to evaluate query suggestions tend to be an adaptation from other domains without a proper justification. Hence, it is not necessarily clear if the improvements reported in the literature would result in an actual improvement in the users' experience. Inspired by the cascade user models and state-of-the-art evaluation metrics in the web search domain, we address the query suggestion evaluation, by first studying the users behaviour from a search engine's query log and thereby deriving a new family of user models describing the users interaction with a query suggestion mechanism. Next, assuming a query log-based evaluation approach, we propose two new metrics to evaluate query suggestions, pSaved and eSaved. Both metrics are parameterised by a user model. pSaved is defined as the probability of using the query suggestions while submitting a query. eSaved equates to the expected relative amount of effort (keypresses) a user can avoid due to the deployed query suggestion mechanism. Finally, we experiment with both metrics using four user model instantiations as well as metrics previously used in the literature on a dataset of 6.1M sessions. Our results demonstrate that pSaved and eSaved show the best alignment with the users satisfaction amongst the considered metrics.
A general evaluation measure for document organization tasks BIBAFull-Text 643-652
  Enrique Amigó; Julio Gonzalo; Felisa Verdejo
A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic document organization problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).
   In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

Retrieval models and ranking II

Modeling term dependencies with quantum language models for IR BIBAFull-Text 653-662
  Alessandro Sordoni; Jian-Yun Nie; Yoshua Bengio
Traditional information retrieval (IR) models use bag-of-words as the basic representation and assume that some form of independence holds between terms. Representing term dependencies and defining a scoring function capable of integrating such additional evidence is theoretically and practically challenging. Recently, Quantum Theory (QT) has been proposed as a possible, more general framework for IR. However, only a limited number of investigations have been made and the potential of QT has not been fully explored and tested. We develop a new, generalized Language Modeling approach for IR by adopting the probabilistic framework of QT. In particular, quantum probability could account for both single and compound terms at once without having to extend the term space artificially as in previous studies. This naturally allows us to avoid the weight-normalization problem, which arises in the current practice by mixing scores from matching compound terms and from matching single terms. Our model is the first practical application of quantum probability to show significant improvements over a robust bag-of-words baseline and achieves better performance on a stronger non bag-of-words baseline.
Copulas for information retrieval BIBAFull-Text 663-672
  Carsten Eickhoff; Arjen P. de Vries; Kevyn Collins-Thompson
In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final ranking. To address these issues, we introduce the use of copulas, a powerful statistical framework for modeling complex multi-dimensional dependencies, to information retrieval tasks. We provide a formal background to copulas and demonstrate their effectiveness on standard IR tasks such as combining multidimensional relevance estimates and fusion of results from multiple search engines. We introduce copula-based versions of standard relevance estimators and fusion methods and show that these lead to significant performance improvements on several tasks, as evaluated on large-scale standard corpora, compared to their non-copula counterparts. We also investigate criteria for understanding the likely effect of using copula models in a given retrieval scenario.
Taily: shard selection using the tail of score distributions BIBAFull-Text 673-682
  Robin Aly; Djoerd Hiemstra; Thomas Demeester
Search engines can improve their efficiency by selecting only few promising shards for each query. State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards. However, the search in the central index also hurts efficiency. Additionally, we show that the effectiveness of these approaches varies substantially with the sampled documents. This paper proposes Taily, a novel shard selection algorithm that models a query's score distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail of the distribution. Taily estimates the parameters of score distributions based on the mean and variance of the score function's features in the collections and shards. Because Taily operates on term statistics instead of document samples, it is efficient and has deterministic effectiveness. Experiments on large web collections (Gov2, CluewebA and CluewebB) show that Taily achieves similar effectiveness to sample-based approaches, and improves upon their efficiency by roughly 20% in terms of used resources and response time.
A mutual information-based framework for the analysis of information retrieval systems BIBAFull-Text 683-692
  Peter B. Golbus; Javed A. Aslam
We consider the problem of information retrieval evaluation and the methods and metrics used for such evaluations. We propose a probabilistic framework for evaluation which we use to develop new information-theoretic evaluation metrics. We demonstrate that these new metrics are powerful and generalizable, enabling evaluations heretofore not possible.
   We introduce four preliminary uses of our framework: (1) a measure of conditional rank correlation, information tau, a powerful meta-evaluation tool whose use we demonstrate on understanding novelty and diversity evaluation; (2) a new evaluation measure, relevance information correlation, which is correlated with traditional evaluation measures and can be used to (3) evaluate a collection of systems simultaneously, which provides a natural upper bound on metasearch performance; and (4) a measure of the similarity between rankers on judged documents, information difference, which allows us to determine whether systems with similar performance are in fact different.

Efficiency II

The impact of solid state drive on search engine cache management BIBAFull-Text 693-702
  Jianguo Wang; Eric Lo; Man Lung Yiu; Jiancong Tong; Gang Wang; Xiaoguang Liu
Caching is an important optimization in search engine architectures. Existing caching techniques for search engine optimization are mostly biased towards the reduction of random accesses to disks, because random accesses are known to be much more expensive than sequential accesses in traditional magnetic hard disk drive (HDD). Recently, solid state drive (SSD) has emerged as a new kind of secondary storage medium, and some search engines like Baidu have already used SSD to completely replace HDD in their infrastructure. One notable property of SSD is that its random access latency is comparable to its sequential access latency. Therefore, the use of SSDs to replace HDDs in a search engine infrastructure may void the cache management of existing search engines. In this paper, we carry out a series of empirical experiments to study the impact of SSD on search engine cache management. The results give insights to practitioners and researchers on how to adapt the infrastructure and how to redesign the caching policies for SSD-based search engines.
Faster upper bounding of intersection sizes BIBAFull-Text 703-712
  Daisuke Takuma; Hiroki Yanagisawa
There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute an upper bound on the size of a set intersection. Knowing an upper bound of the size can be used to accelerate many applications such as top-k query processing in text mining. Given finite sets A and B, the expected computation time for the upper bound of the size of the intersection |A cap B| is O( (|A| + |B|) w), where w is the machine word length. This is much faster than the current best algorithm for the exact intersection, which runs in O((|A| + |B|) / √w + |A cap B|) expected time. Our performance studies show that our implementations of Cardinality Filters are from 2 to 10 times faster than existing set intersection algorithms, and the time for a top-k query in a text mining application can be reduced by half.
Cache-conscious performance optimization for similarity search BIBAFull-Text 713-722
  Maha Alabduljalil; Xun Tang; Tao Yang
All-pairs similarity search can be implemented in two stages. The first stage is to partition the data and group potentially similar vectors. The second stage is to run a set of tasks where each task compares a partition of vectors with other candidate partitions. Because of data sparsity, accessing feature vectors in memory for runtime comparison in the second stage, incurs significant overhead due to the presence of memory hierarchy. This paper proposes a cache-conscious data layout and traversal optimization to reduce the execution time through size-controlled data splitting and vector coalescing. It also provides an analysis to guide the optimal choice for the parameter setting. Our evaluation with several application datasets verifies the performance gains obtained by the optimization and shows that the proposed scheme is upto 2.74x as fast as the cache-oblivious baseline.
A candidate filtering mechanism for fast top-k query processing on modern cpus BIBAFull-Text 723-732
  Constantinos Dimopoulos; Sergey Nepomnyachiy; Torsten Suel
A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.

Short Papers 1 -- evaluation

A test collection for entity search in DBpedia BIBAFull-Text 737-740
  Krisztian Balog; Robert Neumayer
We develop and make publicly available an entity search test collection based on the DBpedia knowledge base. This includes a large number of queries and corresponding relevance judgments from previous benchmarking campaigns, covering a broad range of information needs, ranging from short keyword queries to natural language questions. Further, we present baseline results for this collection with a set of retrieval models based on language modeling and BM25. Finally, we perform an initial analysis to shed light on certain characteristics that make this data set particularly challenging.
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion BIBAFull-Text 741-744
  Lei Cen; Eduard C. Dragut; Luo Si; Mourad Ouzzani
Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individual clusters of publications with different author identities. The HACASC approach utilizes a mixture of kernel ridge regressions to intelligently determine the threshold in clustering. This obtains more appropriate clustering granularity than non-adaptive stopping criterion. We conduct a large scale empirical study with a dataset of more than 2 million publication record pairs to demonstrate the advantage of the proposed HACASC approach.
Document features predicting assessor disagreement BIBAFull-Text 745-748
  Praveen Chandar; William Webber; Ben Carterette
The notion of relevance differs between assessors, thus giving rise to assessor disagreement. Although assessor disagreement has been frequently observed, the factors leading to disagreement are still an open problem. In this paper we study the relationship between assessor disagreement and various topic independent factors such as readability and cohesiveness. We build a logistic model using reading level and other simple document features to predict assessor disagreement and rank documents by decreasing probability of disagreement. We compare the predictive power of these document-level features with that of a meta-search feature that aggregates a document's ranking across multiple retrieval runs. Our features are shown to be on a par with the meta-search feature, without requiring a large and diverse set of retrieval runs to calculate. Surprisingly, however, we find that the reading level features are negatively correlated with disagreement, suggesting that they are detecting some other aspect of document content.
Exploring semi-automatic nugget extraction for Japanese one click access evaluation BIBAFull-Text 749-752
  Matthew Ekstrand-Abueg; Virgil Pavlu; Makoto Kato; Tetsuya Sakai; Takehiro Yamamoto; Mayu Iwata
Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.
Report from the NTCIR-10 1CLICK-2 Japanese subtask: baselines, upperbounds and evaluation robustness BIBAFull-Text 753-756
  Makoto P. Kato; Tetsuya Sakai; Takehiro Yamamoto; Mayu Iwata
The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units (or iUnits), and are required to present important pieces of information first and to minimise the amount of text the user has to read. Using the official Japanese results of the second round of the 1CLICK task from NTCIR-10, we discuss our task setting and evaluation framework. Our analyses show that: (1) Simple baseline methods that leverage search engine snippets or Wikipedia are effective for 'lookup' type queries but not necessarily for other query types; (2) There is still a substantial gap between manual and automatic runs; and (3) Our evaluation metrics are relatively robust to the incompleteness of iUnits.
Building a web test collection using social media BIBAFull-Text 757-760
  Chia-Jung Lee; W. Bruce Croft
Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test collection for web search by exploiting the link information from this type of social media data. We propose to build the test collection by regarding CQA questions as queries and the associated linked web pages as relevant documents. To evaluate this approach, we collect approximately ten thousand CQA queries, whose answers contained links to ClueWeb09 documents after spam filtering. Experimental results using this collection show that the relative effectiveness between different retrieval models on the ClueWeb-CQA query set is consistent with that on the TREC Web Track query sets, confirming the reliability of our test collection. Further analysis shows that the large number of queries generated through this approach compensates for the sparse relevance judgments in determining significant differences.
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification BIBAFull-Text 761-764
  Tetsuya Sakai; Zhicheng Dou; Takehiro Yamamoto; Yiqun Liu; Min Zhang; Makoto P. Kato; Ruihua Song; Mayu Iwata
The NTCIR INTENT task comprises two subtasks: Subtopic Mining, where systems are required to return a ranked list of subtopic strings for each given query; and Document Ranking, where systems are required to return a diversified web search result for each given query. This paper summarises the novel features of the Second INTENT task at NTCIR-10 and its main findings, and poses some questions for future diversified search evaluation.
Is relevance hard work?: evaluating the effort of making relevant assessments BIBAFull-Text 765-768
  Robert Villa; Martin Halvey
The judging of relevance has been a subject of study in information retrieval for a long time, especially in the creation of relevance judgments for test collections. While the criteria by which assessors? judge relevance has been intensively studied, little work has investigated the process individual assessors go through to judge the relevance of a document. In this paper, we focus on the process by which relevance is judged, and in particular, the degree of effort a user must expend to judge relevance. By better understanding this effort in isolation, we may provide data which can be used to create better models of search. We present the results of an empirical evaluation of the effort users must exert to judge the relevance of document, investigating the effect of relevance level and document size. Results suggest that 'relevant' documents require more effort to judge when compared to highly relevant and not relevant documents, and that effort increases as document size increases.

Short papers 1 -- filtering and recommending

A weakly-supervised detection of entity central documents in a stream BIBAFull-Text 769-772
  Ludovic Bonnefoy; Vincent Bouvier; Patrice Bellot
Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the "Knowledge Base Acceleration" track at TREC 2012, it outperforms current state-of-the-art approaches.
Sentiment analysis of user comments for one-class collaborative filtering over ted talks BIBAFull-Text 773-776
  Nikolaos Pappas; Andrei Popescu-Belis
User-generated texts such as reviews, comments or discussions are valuable indicators of users' preferences. Unlike previous works which focus on labeled data from user-contributed reviews, we focus here on user comments which are not accompanied by explicit rating labels. We investigate their utility for a one-class collaborative filtering task such as bookmarking, where only the user actions are given as ground truth. We propose a sentiment-aware nearest neighbor model (SANN) for multimedia recommendations over TED talks, which makes use of user comments. The model outperforms significantly, by more than 25% on unseen data, several competitive baselines.
Modeling the uniqueness of the user preferences for recommendation systems BIBAFull-Text 777-780
  Haggai Roitman; David Carmel; Yosi Mass; Iris Eiron
In this paper we propose a novel framework for modeling the uniqueness of the user preferences for recommendation systems. User uniqueness is determined by learning to what extent the user's item preferences deviate from those of an "average user" in the system. Based on this framework, we suggest three different recommendation strategies that trade between uniqueness and conformity. Using two real item datasets, we demonstrate the effectiveness of our uniqueness based recommendation framework.
Recommending personalized touristic sights using google places BIBAFull-Text 781-784
  Maya Sappelli; Suzan Verberne; Wessel Kraaij
The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations using the location context as search query in Google Places. We first ranked the recommendations based on their textual similarity to the user profiles. In order to improve the ranking of popular sights, we combined the initial ranking with rankings based on Google Search, popularity and categories. Finally, we performed filtering based on the temporal context. Overall, our system performed well above average and median, and outperformed the baseline -- Google Places only -- run.
Optimizing top-n collaborative filtering via dynamic negative item sampling BIBAFull-Text 785-788
  Weinan Zhang; Tianqi Chen; Jun Wang; Yong Yu
Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user behaviors, such as clicking a link in a webpage or playing a music track. The clicks and the plays are good for indicating the items a user liked (i.e., positive training examples), but the items a user did not like (negative training examples) are not directly observed. Previous approaches either randomly pick negative training samples from unseen items or incorporate some heuristics into the learning model, leading to a biased solution and a prolonged training period. In this paper, we propose to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update our model. The experiments conducted on three large-scale datasets show that our approach not only reduces the training time, but also leads to significant performance gains.

Short papers 1 -- multimedia IR

Towards retrieving relevant information graphics BIBAFull-Text 789-792
  Zhuo Li; Matthew Stagitis; Sandra Carberry; Kathleen F. McCoy
Information retrieval research has made significant progress in the retrieval of text documents and images. However, relatively little attention has been given to the retrieval of information graphics (non-pictorial images such as bar charts and line graphs) despite their proliferation in popular media such as newspapers and magazines. Our goal is to build a system for retrieving bar charts and line graphs that reasons about the content of the graphic itself in deciding its relevance to the user query. This paper presents the first steps toward such a system, with a focus on identifying the category of intended message of potentially relevant bar charts and line graphs. Our learned model achieves accuracy higher than 80% on a corpus of collected user queries.
Hybrid retrieval approaches to geospatial music recommendation BIBAFull-Text 793-796
  Markus Schedl; Dominik Schnitzer
Recent advances in music retrieval and recommendation algorithms highlight the necessity to follow multimodal approaches in order to transcend limits imposed by methods that solely use audio, web, or collaborative filtering data. In this paper, we propose hybrid music recommendation algorithms that combine information on the music content, the music context, and the user context, in particular, integrating location-aware weighting of similarities. Using state-of-the-art techniques to extract audio features and contextual web features, and a novel standardized data set of music listening activities inferred from microblogs (MusicMicro), we propose several multimodal retrieval functions.
   The main contributions of this paper are (i) a systematic evaluation of mixture coefficients between state-of-the-art audio features and web features, using the first standardized microblog data set of music listening events for retrieval purposes and (ii) novel geospatial music recommendation approaches using location information of microblog users, and a comprehensive evaluation thereof.
Leveraging viewer comments for mood classification of music video clips BIBAFull-Text 797-800
  Takehiro Yamamoto; Satoshi Nakamura
This short paper proposes a method to classify music video clips uploaded to a video sharing service into music mood categories such as 'cheerful,' 'wistful,' and 'aggressive.' The method leverages viewer comments posted to the music video clips for the music mood classification. It extracts specific features from the comments: (1) adjectives in comments, (2) lengthened words in comments, and (3) comments in chorus sections. Our experimental results classifying 695 video clips into six mood categories showed that our method outperformed the baseline in terms of macro and micro averaged F-measures. In addition, our method outperformed the existing approaches that utilize lyrics and audio signals of songs.

Short papers 1 -- queries and query analysis

Exploiting semantics for improving clinical information retrieval BIBAFull-Text 801-804
  Atanaz Babashzadeh; Jimmy Huang; Mariam Daoud
Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and go beyond keywords matching. To address these issues, in this paper we attempt to use semantic information to improve the performance of clinical IR systems by representing queries in an expressive and meaningful context. To model a query context initially we model and develop query domain ontology. The query domain ontology represents concepts closely related with query concepts. Query context represents concepts extracted from query domain ontology and weighted according to their semantic relatedness to query concept(s). The query context is then exploited in query expansion and patients records re-ranking for improving clinical retrieval performance. We evaluate our approach on the TREC Medical Records dataset. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based IR model.
Interpretation of coordinations, compound generation, and result fusion for query variants BIBAFull-Text 805-808
  Johannes Leveling
We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunction-free query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice im-and export" is transformed into "rice import and export". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.
Time-aware structured query suggestion BIBAFull-Text 809-812
  Taiki Miyanishi; Tetsuya Sakai
Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective.
Flat vs. hierarchical phrase-based translation models for cross-language information retrieval BIBAFull-Text 813-816
  Ferhan Ture; Jimmy Lin
Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.
Here and there: goals, activities, and predictions about location from geotagged queries BIBAFull-Text 817-820
  Robert West; Ryen W. White; Eric Horvitz
A significant portion of Web search is performed in mobile settings. We explore the links between users' queries on mobile devices and their locations and movement, with a focus on interpreting queries about addresses. We find that users tend to have a primary location, likely corresponding to home or workplace, and that a user's location relative to this primary location systematically influences the patterns of address searches. We apply our findings to construct a statistical model that can predict with high accuracy whether a user will be soon observed at an address that had been recently retrieved via search. Such an ability to predict that a user will transition to a location can be harnessed for multiple uses including provision of directions and traffic information, the rendering of competitive advertising, and guiding the opportunistic completion of pending tasks that can be accomplished en route to a target location.
Query change as relevance feedback in session search BIBAFull-Text 821-824
  Sicong Zhang; Dongyi Guan; Hui Yang
Session search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change as a new form of relevance feedback for better session search. Evaluation conducted over TREC 2012 Session Track shows that query change is a highly effective form of feedback as compared with existing relevance feedback methods. The proposed method outperforms the state-of-the-art relevance feedback methods for the TREC 2012 Session Track by a significant improvement of >25%.

Short papers 1 -- retrieval models and ranking

Is uncertain logical-matching equivalent to conditional probability? BIBAFull-Text 825-828
  Karam Abdulahhad; Jean-Pierre Chevallet; Catherine Berrut
Logic-based Information Retrieval (IR) models represent the retrieval decision as a logical implication d->q between a document d and a query q, where d and q are logical sentences. However, d->q is a binary decision, we thus need a measure to estimate the degree to which d implies q, denoted P(d->q). In this study, we revisit the Van Rijsbergen's assumptions about: 1- the logical implication ->' is not the material one, and 2- P(d->q) could be estimated by the conditional probability P(q|d). More precisely, we claim that the material implication is an appropriate implication for IR, and also we mathematically prove that replacing P(d->q) by P(q|d) is a correct choice. In order to prove the Van Rijsbergen's assumption, we use the Propositional Logic and the Lattice theory. We also exploit the notion of degree of implication that is proposed by Knuth.
Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis BIBAFull-Text 829-832
  Xiangdong An; Jimmy Xiangji Huang
In information retrieval, we are interested in the information that is not only relevant but also novel. In this paper, we study how to boost novelty for biomedical information retrieval through probabilistic latent semantic analysis. We conduct the study based on TREC Genomics Track data. In TREC Genomics Track, each topic is considered to have an arbitrary number of aspects, and the novelty of a piece of information retrieved, called a passage, is assessed based on the amount of new aspects it contains. In particular, the aspect performance of a ranked list is rewarded by the number of new aspects reached at each rank and penalized by the amount of irrelevant passages that are rated higher than the novel ones. Therefore, to improve aspect performance, we should reach as many aspects as possible and as early as possible. In this paper, we make a preliminary study on how probabilistic latent semantic analysis can help capture different aspects of a ranked list, and improve its performance by re-ranking. Experiments indicate that the proposed approach can greatly improve the aspect-level performance over baseline algorithm Okapi BM25.
Learning to combine representations for medical records search BIBAFull-Text 833-836
  Nut Limsopatham; Craig Macdonald; Iadh Ounis
The complexity of medical terminology raises challenges when searching medical records. For example, 'cancer', 'tumour', and 'neoplasms', which are synonyms, may prevent a traditional search system from retrieving relevant records that contain only synonyms of the query terms. Prior works use bag-of-concepts approaches, to deal with this by representing medical terms sharing the same meanings using concepts from medical resources (e.g. MeSH). The relevance scores are then combined with a traditional bag-of-words representation, when inferring the relevance of medical records. Even though the existing approaches are effective, the predicted retrieval effectiveness of either the bag-of-words or bag-of-concepts representation, which may be used to effectively model the score combination and hence improve retrieval performance, is not taken into account. In this paper, we propose a novel learning framework that models the importance of the bag-of-words and the bag-of-concepts representations, combining their scores on a per-query basis. Our proposed framework leverages retrieval performance predictors, such as the clarity score and AvIDF, calculated on both representations as learning features. We evaluate our proposed framework using the TREC Medical Records track's test collections. As our proposed framework can significantly outperform an existing approach that linearly merges the relevance scores, we conclude that retrieval performance predictors can be effectively leveraged when combining the relevance scores.
Kinship contextualization: utilizing the preceding and following structural elements BIBAFull-Text 837-840
  Muhammad A. Norozi; Paavo Arvola
The textual context of an element, structurally, contains traces of evidences. Utilizing this context in scoring is called contextualization. In this study we hypothesize that the context of an XML-element originated from its preceding and following elements in the sequential ordering of a document improves the quality of retrieval. In the tree form of the document's structure, kinship contextualization means, contextualization based on the horizontal and vertical elements in the kinship tree, or elements in closer to a wider structural kinship. We have tested several variants of kinship contextualization and verified notable improvements in comparison with the baseline system and gold standards in the retrieval of focused elements.
The cluster hypothesis for entity oriented search BIBAFull-Text 841-844
  Hadas Raviv; Oren Kurland; David Carmel
In this work we study the cluster hypothesis for entity oriented search (EOS). Specifically, we show that the hypothesis can hold to a substantial extent for several entity similarity measures. We also demonstrate the retrieval effectiveness merits of using clusters of similar entities for EOS.
Self reinforcement for important passage retrieval BIBAFull-Text 845-848
  Ricardo Ribeiro; Luís Marujo; David Martins de Matos; João P. Neto; Anatole Gershman; Jaime Carbonell
In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models do not take into account that information sources often contain lateral issues, which are hardly as important as the description of the main topic, or are composed by mixtures of topics. We present a new two-stage method that starts by extracting a collection of key phrases that will be used to help centrality-as-relevance retrieval model. We explore several approaches to the integration of the key phrases in the centrality model. The proposed method is evaluated using different datasets that vary in noise (noisy vs clean) and language (Portuguese vs English). Results show that the best variant achieves relative performance improvements of about 31% in clean data and 18% in noisy data.
What can pictures tell us about web pages?: improving document search using images BIBAFull-Text 849-852
  Sergio Rodriguez-Vaamonde; Lorenzo Torresani; Andrew Fitzgibbon
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using semantic information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on the TREC 2009 Million Query Track, where we show that our use of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark.
Estimating query representativeness for query-performance prediction BIBAFull-Text 853-856
  Mor Sondak; Anna Shtok; Oren Kurland
The query-performance prediction (QPP) task is estimating retrieval effectiveness with no relevance judgments. We present a novel probabilistic framework for QPP that gives rise to an important aspect that was not addressed in previous work; namely, the extent to which the query effectively represents the information need for retrieval. Accordingly, we devise a few query-representativeness measures that utilize relevance language models. Experiments show that integrating the most effective measures with state-of-the-art predictors in our framework often yields prediction quality that significantly transcends that of using the predictors alone.
Interoperability ranking for mobile applications BIBAFull-Text 857-860
  Dragomir Yankov; Pavel Berkhin; Rajen Subba
At present, most major app marketplaces perform ranking and recommendation based on search relevance features or marketplace "popularity" statistics. For instance, they check similarity between app descriptions and user search queries, or rank-order the apps according to statistics such as number of downloads, user ratings etc. Rankings derived from such signals, important as they are, are insufficient to capture the dynamics of the apps ecosystem. Consider for example the questions: In a particular user context, is app A more likely to be launched than app B? Or does app C provide complementary functionality to app D? Answering these questions requires identifying and analyzing the dependencies between apps in the apps ecosystem. Ranking mechanisms that reflect such interdependences are thus necessary.
   In this paper we introduce the notion of interoperability ranking for mobile applications. Intuitively, apps with high rank are such apps which are inferred to be somehow important to other apps in the ecosystem. We demonstrate how interoperability ranking can help answer the above questions and also provide the basis for solving several problems which are rapidly attracting the attention of both researchers and the industry, such as building personalized real-time app recommender systems or intelligent mobile agents. We describe a set of methods for computing interoperability ranks and analyze their performance on real data from the Windows Phone app marketplace.

Short papers 1 -- social media IR

Sopra: a new social personalized ranking function for improving web search BIBAFull-Text 861-864
  Mohamed Reda Bouadjenek; Hakim Hacid; Mokrane Bouzeghoub
We present in this paper a contribution to IR modeling by proposing a new ranking function called SoPRa that considers the social dimension of the Web. This social dimension is any social information that surrounds documents along with the social context of users. Currently, our approach relies on folksonomies for extracting these social contexts, but it can be extended to use any social meta-data, e.g. comments, ratings, tweets, etc. The evaluation performed on our approach shows its benefits for personalized search.
Browse with a social web directory BIBAFull-Text 865-868
  Hao Huang; Yunjun Gao; Lu Chen; Rui Li; Kevin Chiew; Qinming He
Browse with either web directories or social bookmarks is an important complementation to search by keywords in web information retrieval. To improve users' browse experiences and facilitate the web directory construction, in this paper, we propose a novel browse system called Social Web Directory (SWD for short) by integrating web directories and social bookmarks. In SWD, (1) web pages are automatically categorized to a hierarchical structure to be retrieved efficiently, and (2) the popular web pages, hottest tags, and expert users in each category are ranked to help users find information more conveniently. Extensive experimental results demonstrate the effectiveness of our SWD system.
Who will retweet me?: finding retweeters in Twitter BIBAFull-Text 869-872
  Zhunchen Luo; Miles Osborne; Jintao Tang; Ting Wang
An important aspect of communication in Twitter (and other Social Network is message propagation -- people creating posts for others to share. Although there has been work on modelling how tweets in Twitter are propagated (retweeted), an untackled problem has been who will retweet a message. Here we consider the task of finding who will retweet a message posted on Twitter. Within a learning to-rank framework, we explore a wide range of features, such as retweet history, followers status, followers active time and followers interests. We find that followers who retweeted or mentioned the author's tweets frequently before and have common interests are more likely to be retweeters.
A financial cost metric for result caching BIBAFull-Text 873-876
  Fethi Burak Sazoglu; B. Barla Cambazoglu; Rifat Ozcan; Ismail Sengor Altingovde; Özgür Ulusoy
Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating the performance of result caching strategies and propose cost-aware caching strategies. In this paper, we propose a financial cost metric that goes one step beyond and takes also the hourly electricity prices into account when computing the cost. We evaluate the most well-known static, dynamic, and hybrid result caching strategies under this new metric. Moreover, we propose a financial-cost-aware version of the well-known LRU strategy and show that it outperforms the original LRU strategy in terms of the financial cost metric.

Short papers 1 -- topic models

Document classification by topic labeling BIBAFull-Text 877-880
  Swapnil Hingmire; Sandeep Chougule; Girish K. Palshikar; Sutanu Chakraborti
In this paper, we propose Latent Dirichlet Allocation (LDA) [1] based document classification algorithm which does not require any labeled dataset. In our algorithm, we construct a topic model using LDA, assign one topic to one of the class labels, aggregate all the same class label topics into a single topic using the aggregation property of the Dirichlet distribution and then automatically assign a class label to each unlabeled document depending on its "closeness" to one of the aggregated topics.
   We present an extension to our algorithm based on the combination of Expectation-Maximization (EM) algorithm and a naive Bayes classifier. We show effectiveness of our algorithm on three real world datasets.
Mining web search topics with diverse spatiotemporal patterns BIBAFull-Text 881-884
  Di Jiang; Wilfred Ng
Mining the latent topics from web search data and capturing their spatiotemporal patterns have many applications in information retrieval. As web search is heavily influenced by the spatial and temporal factors, the latent topics usually demonstrate a variety of spatiotemporal patterns. In the face of the diversity of these patterns, existing models are increasingly ineffective, since they capture only one dimension of the spatiotemporal patterns (either the spatial or temporal dimension) or simply assume that there exists only one kind of spatiotemporal patterns. Such oversimplification risks distorting the latent data structure and hindering the downstream usage of the discovered topics. In this paper, we introduce the Spatiotemporal Search Topic Model (SSTM) to discover the latent topics from web search data with capturing their diverse spatiotemporal patterns simultaneously. The SSTM can flexibly support diverse spatiotemporal patterns and seamlessly integrate the unique features in web search such as query words, URLs, timestamps and search sessions. The SSTM is demonstrated as an effective exploratory tool for large-scale web search data and it performs superiorly in quantitative comparisons to several state-of-the-art topic models.
A novel topic model for automatic term extraction BIBAFull-Text 885-888
  Sujian Li; Jiwei Li; Tao Song; Wenjie Li; Baobao Chang
Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.
Improving LDA topic models for microblogs via tweet pooling and automatic labeling BIBAFull-Text 889-892
  Rishabh Mehrotra; Scott Sanner; Wray Buntine; Lexing Xie
Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.

Short papers 1 -- users and interactive IR

Extractive summarisation via sentence removal: condensing relevant sentences into a short summary BIBAFull-Text 893-896
  Marco Bonzanini; Miguel Martinez-Alvarez; Thomas Roelleke
Many on-line services allow users to describe their opinions about a product or a service through a review. In order to help other users to find out the major opinion about a given topic, without the effort to read several reviews, multi-document summarisation is required. This research proposes an approach for extractive summarisation, supporting different scoring techniques, such as cosine similarity or divergence, as a method for finding representative sentences. The main contribution of this paper is the definition of an algorithm for sentence removal, developed to maximise the score between the summary and the original document. Instead of ranking the sentences and selecting the most important ones, the algorithm iteratively removes unimportant sentences until a desired compression rate is reached. Experimental results show that variations of the sentence removal algorithm provide good performance.
Characterizing stages of a multi-session complex search task through direct and indirect query modifications BIBAFull-Text 897-900
  Jiyin He; Marc Bron; Arjen P. de Vries
Search systems use context to effectively satisfy a user's information need as expressed by a query. Tasks are important factors in determining user context during search and many studies have been conducted that identify tasks and task stages through users' interaction behavior with search systems. The type of interaction available to users, however, depends on the type of search interface features available. Queries are the most pervasive input from users to express their information need regardless of the input method, e.g., typing keywords or clicking facets. Instead of characterizing interaction behavior in terms of interface specific components, we propose to characterize users' search behavior in terms of two types of query modification: (i) direct modification, which refers to reformulations of queries; and (ii) indirect modification, which refers to user operations on additional input components provided by various search interfaces. We investigate the utility of characterizing task stages through direct and indirect query reformulations in a case study and find that it is possible to effectively differentiate subsequent stages of the search task. We found that describing user interaction behavior in such a generic form allowed us to relate user actions to search task stages independent from the specific search interface deployed. The next step will then be to validate this idea in a setting with a wider palette of search tasks and tools.
Displaying relevance scores for search results BIBAFull-Text 901-904
  Guy Shani; Noam Tractinsky
Internet search engines typically compute a relevance score for webpages given the query terms, and then rank the pages by decreasing relevance scores. The popular search engines do not, however, present the relevance scores that were computed during this process. We suggest that these relevance scores may contain information that can help users make conscious decisions. In this paper we evaluate in a user study how users react to the display of such scores. The results indicate that users understand graphical displays of relevance, and make decisions based on these scores. Our results suggest that in the context of exploratory search, relevance scores may cause users to explore more search results.
Studying page life patterns in dynamical web BIBAFull-Text 905-908
  Alexey Tikhonov; Ivan Bogatyy; Pavel Burangulov; Liudmila Ostroumova; Vitaliy Koshelev; Gleb Gusev
With the ever-increasing speed of content turnover on the web, it is particularly important to understand the patterns that pages' popularity follows. This paper focuses on the dynamical part of the web, i.e. pages that have a limited lifespan and experience a short popularity outburst within it. We classify these pages into five patterns based on how quickly they gain popularity and how quickly they lose it. We study the properties of pages that belong to each pattern and determine content topics that contain disproportionately high fractions of particular patterns. These developments are utilized to create an algorithm that approximates with reasonable accuracy the expected popularity pattern of a web page based on its URL and, if available, prior knowledge about its domain's topics.

Short papers 2 -- evaluation

A document rating system for preference judgements BIBAFull-Text 909-912
  Maryam Bashir; Jesse Anderton; Jie Wu; Peter B. Golbus; Virgil Pavlu; Javed A. Aslam
High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by factors such as inter-assessor disagreement and the arbitrariness of grades. Previous research has shown that it is easier for assessors to make pairwise preference judgments. However, unless the preferences collected are largely transitive, it is not clear how to combine them in order to obtain document relevance scores. Another difficulty is that the number of pairs that need to be assessed is quadratic in the number of documents. In this work, we consider the problem of inferring document relevance scores from pairwise preference judgments by analogy to tournaments using the Elo rating system. We show how to combine a linear number of pairwise preference judgments from multiple assessors to compute relevance scores for every document.
Relevance dimensions in preference-based IR evaluation BIBAFull-Text 913-916
  Jinyoung Kim; Gabriella Kazai; Imed Zitouni
Evaluation of information retrieval (IR) systems has recently been exploring the use of preference judgments over two search result lists. Unlike the traditional method of collecting relevance labels per single result, this method allows to consider the interaction between search results as part of the judging criteria. For example, one result list may be preferred over another if it has a more diverse set of relevant results, covering a wider range of user intents. In this paper, we investigate how assessors determine their preference for one list of results over another with the aim to understand the role of various relevance dimensions in preference-based evaluation. We run a series of experiments and collect preference judgments over different relevance dimensions in side-by-side comparisons of two search result lists, as well as relevance judgments for the individual documents. Our analysis of the collected judgments reveals that preference judgments combine multiple dimensions of relevance that go beyond the traditional notion of relevance centered on topicality. Measuring performance based on single document judgments and NDCG aligns well with topicality based preferences, but shows misalignment with judges' overall preferences, largely due to the diversity dimension. As a judging method, dimensional preference judging is found to lead to improved judgment quality.
Composition of TF normalizations: new insights on scoring functions for ad hoc IR BIBAFull-Text 917-920
  François Rousseau; Michalis Vazirgiannis
Previous papers in ad hoc IR reported that scoring functions should satisfy a set of heuristic retrieval constraints, providing a mathematical justification for the normalizations historically applied to the term frequency (TF). In this paper, we propose a further level of abstraction, claiming that the successive normalizations are carried out through composition. Thus we introduce a principled framework that fully explains BM25 as a variant of TF-IDF with an inverse order of function composition. Our experiments over standard datasets indicate that the respective orders of composition chosen in the original papers for both TF-IDF and BM25 are the most effective ones. Moreover, since the order is different between the two models, they also demonstrated that the order is instrumental in the design of weighting models. In fact, while considering more complex scoring functions such as BM25+, we discovered a novel weighting model in terms of order of composition that consistently outperforms all the rest. Our contribution here is twofold: we provide a unifying mathematical framework for IR and a novel scoring function discovered using this framework.
The impact of intent selection on diversified search evaluation BIBAFull-Text 921-924
  Tetsuya Sakai; Zhicheng Dou; Charles L. A. Clarke
To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and pertinent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, subtopics are manually developed at NIST, based on results of automatic click log analysis; in the NTCIR INTENT Task, intents are determined by manually clustering 'subtopics strings' returned by participating systems. In this study, we address the following research question: Does the choice of intents for a test collection affect relative performances of diversified search systems? To this end, we use the TREC 2012 Web Track Diversity Task data and the NTCIR-10 INTENT-2 Task data, which share a set of 50 topics but have different intent sets. Our initial results suggest that the choice of intents may affect relative performances, and that this choice may be far more important than how many intents are selected for each topic.
A comparison of the optimality of statistical significance tests for information retrieval evaluation BIBAFull-Text 925-928
  Julián Urbano; Mónica Marrero; Diego Martín
Previous research has suggested the permutation test as the theoretically optimal statistical significance test for IR evaluation, and advocated for the discontinuation of the Wilcoxon and sign tests. We present a large-scale study comprising nearly 60 million system comparisons showing that in practice the bootstrap, t-test and Wilcoxon test outperform the permutation test under different optimality criteria. We also show that actual error rates seem to be lower than the theoretically expected 5%, further confirming that we may actually be underestimating significance.
Assessor disagreement and text classifier accuracy BIBAFull-Text 929-932
  William Webber; Jeremy Pickens
Text classifiers are frequently used for high-yield retrieval from large corpora, such as in e-discovery. The classifier is trained by annotating example documents for relevance. These examples may, however, be assessed by people other than those whose conception of relevance is authoritative. In this paper, we examine the impact that disagreement between actual and authoritative assessor has upon classifier effectiveness, when evaluated against the authoritative conception. We find that using alternative assessors leads to a significant decrease in binary classification quality, though less so ranking quality. A ranking consumer would have to go on average 25% deeper in the ranking produced by alternative-assessor training to achieve the same yield as for authoritative-assessor training.
Sequential testing in classifier evaluation yields biased estimates of effectiveness BIBAFull-Text 933-936
  William Webber; Mossaab Bagdouri; David D. Lewis; Douglas W. Oard
It is common to develop and validate classifiers through a process of repeated testing, with nested training and/or test sets of increasing size. We demonstrate in this paper that such repeated testing leads to biased estimates of classifier effectiveness. Experiments on a range of text classification tasks under three sequential testing frameworks show all three lead to optimistic estimates of effectiveness. We calculate empirical adjustments to unbias estimates on our data set, and identify directions for research that could lead to general techniques for avoiding bias while reducing labeling costs.
Relating retrievability, performance and length BIBAFull-Text 937-940
  Colin Wilkie; Leif Azzopardi
Retrievability provides a different way to evaluate an Information Retrieval (IR) system as it focuses on how easily documents can be found. It is intrinsically related to retrieval performance because a document needs to be retrieved before it can be judged relevant. In this paper, we undertake an empirical investigation into the relationship between the retrievability of documents, the retrieval bias imposed by a retrieval system, and the retrieval performance, across different amounts of document length normalization. To this end, two standard IR models are used on three TREC test collections to show that there is a useful and practical link between retrievability and performance. Our findings show that minimizing the bias across the document collection leads to good performance (though not the best performance possible). We also show that past a certain amount of document length normalization the retrieval bias increases, and the retrieval performance significantly and rapidly decreases. These findings suggest that the relationship between retrievability and effectiveness may offer a way to automatically tune systems.

Short papers 2 -- filtering and recommending

Cumulative citation recommendation: classification vs. ranking BIBAFull-Text 941-944
  Krisztian Balog; Heri Ramampiaro
Cumulative citation recommendation refers to the task of filtering a time-ordered corpus for documents that are highly relevant to a predefined set of entities. This task has been introduced at the TREC Knowledge Base Acceleration track in 2012, where two main families of approaches emerged: classification and ranking. In this paper we perform an experimental comparison of these two strategies using supervised learning with a rich feature set. Our main finding is that ranking outperforms classification on all evaluation settings and metrics. Our analysis also reveals that a ranking-based approach has more potential for future improvements.
Tagcloud-based explanation with feedback for recommender systems BIBAFull-Text 945-948
  Wei Chen; Wynne Hsu; Mong Li Lee
Personalized recommender systems aim to push only the relevant items and information directly to the users without requiring them to browse through millions of web resources. The challenge of these systems is to achieve a high user acceptance rate on their recommendations. In this paper, we aim to increase the user acceptance of recommendations by providing more intuitive tag-based explanations of why the items are recommended. Tags are used as intermediary entities that not only relate target users to the recommended items but also understand users' intents. Our system also allows tag-based online relevance feedback. Experiment results on the Movielens dataset show that the proposed approach is able to increase the acceptance rate of recommendations and improve user satisfaction.
Collaborative factorization for recommender systems BIBAFull-Text 949-953
  Chaosheng Fan; Yanyan Lan; Jiafeng Guo; Zuoquan Lin; Xueqi Cheng
Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) have been widely used in real recommender systems. However, neither approaches can well accomplish recommendation task since that: (1) most NN methods leverage the neighbor's behaviors for prediction, which may suffer the severe data sparsity problem; (2) MF methods are less sensitive to sparsity, but neighbors' influences on latent factors are not fully explored, since the latent factors are often used independently. To overcome the above problems, we propose a new framework for recommender systems, called collaborative factorization. It expresses the user as the combination of his own factors and those of the neighbors', called collaborative latent factors, and a ranking loss is then utilized for optimization. The advantage of our approach is that it can both enjoy the merits of NN and MF methods. In this paper, we take the logistic loss in RankNet and the likelihood loss in ListMLE as examples, and the corresponding collaborative factorization methods are called CoF-Net and CoF-MLE. Our experimental results on three benchmark datasets show that they are more effective than several state-of-the-art recommendation methods.
RecSys for distributed events: investigating the influence of recommendations on visitor plans BIBAFull-Text 953-956
  Richard Schaller; Morgan Harvey; David Elsweiler
Distributed events are collections of events taking place within a small area over the same time period and relating to a single topic. There are often a large number of events on offer and the times in which they can be visited are heavily constrained, therefore the task of choosing events to visit and in which order can be very difficult. In this work we investigate how visitors can be assisted by means of a recommender system via 2 large-scale naturalistic studies (n=860 and n=1047). We show that a recommender system can influence users to select events that result in tighter and more compact routes, thus allowing users to spend less time travelling and more time visiting events.

Short papers 2 -- multimedia IR

Ranking-oriented nearest-neighbor based method for automatic image annotation BIBAFull-Text 957-960
  Chaoran Cui; Jun Ma; Tao Lian; Xiaofang Wang; Zhaochun Ren
Automatic image annotation plays a critical role in keyword-based image retrieval systems. Recently, the nearest-neighbor based scheme has been proposed and achieved good performance for image annotation. Given a new image, the scheme is to first find its most similar neighbors from labeled images, and then propagate the keywords associated with the neighbors to it. Many studies focused on designing a suitable distance metric between images so that all labeled images can be ranked by their distance to the given image. However, higher accuracy in distance prediction does not necessarily lead to better ordering of labeled images. In this paper, we propose a ranking-oriented neighbor search mechanism to rank labeled images directly without going through the intermediate step of distance prediction. In particular, a new learning to rank algorithm is developed, which exploits the implicit preference information of labeled images and underlines the accuracy of the top-ranked results. Experiments on two benchmark datasets demonstrate the effectiveness of our approach for image annotation.
Linking transcribed conversational speech BIBAFull-Text 961-964
  Joseph Malionek; Douglas W. Oard; Abhijeet Sangwan; John H. L. Hansen
As large collections of historically significant recorded speech become increasingly available, scholars are faced with the challenge of making sense of what they hear. This paper proposes automatically linking conversational speech to related resources as one way of supporting that sense-making task. Experiment results with transcribed conversations suggest that this kind of linking has promise for helping to contextualize recordings of detail-oriented conversations, and that simple sliding-window bag-of-words techniques can identify some useful links.
On contextual photo tag recommendation BIBAFull-Text 965-968
  Philip J. McParlane; Yashar Moshfeghi; Joemon M. Jose
Image tagging is a growing application on social media websites, however, the performance of many auto-tagging methods are often poor. Recent work has exploited an image's context (e.g. time and location) in the tag recommendation process, where tags which co-occur highly within a given time interval or geographical area are promoted. These models, however, fail to address how and when different image contexts can be combined. In this paper, we propose a weighted tag recommendation model, building on an existing state-of-the-art, which varies the importance of time and location in the recommendation process, based on a given set of input tags. By retrieving more temporally and geographically relevant tags, we achieve statistically significant improvements to recommendation accuracy when testing on 519k images collected from Flickr. The result of this paper is an important step towards more effective image annotation and retrieval systems.
The knowing camera: recognizing places-of-interest in smartphone photos BIBAFull-Text 969-972
  Pai Peng; Lidan Shou; Ke Chen; Gang Chen; Sai Wu
This paper presents a framework called Knowing Camera for real-time recognizing places-of-interest in smartphone photos, with the availability of online geotagged images of such places. We propose a probabilistic field-of-view model which captures the uncertainty in camera sensor data. This model can be used to retrieve a set of candidate images. The visual similarity computation of the candidate images relies on the sparse coding technique. We also propose an ANN filtering technique to speedup the sparse coding. The final ranking combines an uncertain geometric relevance with the visual similarity. Our preliminary experiments conducted in an urban area of a large city show promising results. The most distinguishing feature of our framework is its ability to perform well in contaminated, real-world online image database. Besides, our framework is highly scalable as it does not incur any complex data structure.

Short papers 2 -- queries and query analysis

Question retrieval with user intent BIBAFull-Text 973-976
  Long Chen; Dell Zhang; Mark Levene
Community Question Answering (CQA) services, such as Yahoo! Answers and WikiAnswers, have become popular with users as one of the central paradigms for satisfying users' information needs. The task of question retrieval in CQA aims to resolve one's query directly by finding the most relevant questions (together with their answers) from an archive of past questions. However, as users can ask any question that they like, a large number of questions in CQA are not about objective (factual) knowledge, but about subjective (sentiment-based) opinions or social interactions. The inhomogeneous nature of CQA leads to reduced performance of standard retrieval models. To address this problem, we present a hybrid approach that blends several language modelling techniques for question retrieval, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. The user intent of each candidate question (objective/subjective/social) is given by a probabilistic classifier which makes use of both textual features and metadata features. Our experiments on two real-world datasets show that our approach can significantly outperform existing ones.
Mapping queries to questions: towards understanding users' information needs BIBAFull-Text 977-980
  Yunjun Gao; Lu Chen; Rui Li; Gang Chen
In this paper, for the first time, we study the problem of mapping keyword queries to questions on community-based question answering (CQA) sites. Mapping general web queries to questions enables search engines not only to discover explicit and specific information needs (questions) behind keywords queries, but also to find high quality information (answers) for answering keyword queries. In order to map queries to questions, we propose a ranking algorithm containing three steps: Candidate Question Selection, Candidate Question Ranking, and Candidate Question Grouping. Preliminary experimental results using 60 queries from search logs of a commercial engine show that the presented approach can efficiently find the questions which capture user's information needs explicitly.
From keywords to keyqueries: content descriptors for the web BIBAFull-Text 981-984
  Tim Gollub; Matthias Hagen; Maximilian Michel; Benno Stein
We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are the minimal queries that return the document in the top result ranks. Besides applications in the fields of information retrieval and data mining, keyqueries have the potential to form the basis of a dynamic classification system for future digital libraries -- the modern version of keywords for content description.
   To determine the keyqueries for a document, we present an exhaustive search algorithm along with effective pruning strategies. For applications where a small number of diverse keyqueries is sufficient, two tailored search strategies are proposed. Our experiments emphasize the role of the reference search engine and show the potential of keyqueries as innovative document descriptors for large, fast evolving bodies of digital content such as the web.
Commodity query by snapping BIBAFull-Text 985-988
  Hao Huang; Yunjun Gao; Kevin Chiew; Qinming He; Lu Chen
Commodity information such as prices and public reviews is always the concern of consumers. Helping them conveniently acquire these information as an instant reference is often of practical significance for their purchase activities. Nowadays, Web 2.0, linked data clouds, and the pervasiveness of smart hand held devices have created opportunities for this demand, i.e., users could just snap a photo of any commodity that is of interest at anytime and anywhere, and retrieve the relevant information via their Internet-linked mobile devices. Nonetheless, compared with the traditional keyword-based information retrieval, extracting the hidden information related to the commodities in photos is a much more complicated and challenging task, involving techniques such as pattern recognition, knowledge base construction, semantic comprehension, and statistic deduction. In this paper, we propose a framework to address this issue by leveraging on various techniques, and evaluate the effectiveness and efficiency of this framework with experiments on a prototype.
Temporal variance of intents in multi-faceted event-driven information needs BIBAFull-Text 989-992
  Stewart Whiting; Ke Zhou; Joemon Jose; Mounia Lalmas
Time is often important for understanding user intent during search activity, especially for information needs related to event-driven topics. Diversity for multi-faceted information needs ensures that ranked documents optimally cover multiple facets when a user's intent is uncertain. Effective diversity is reliant on methods to (i) discover and represent facets, and (ii) determine how likely each facet is the user's intent (i.e., its popularity). Past work has developed several techniques addressing these issues, however, they have concentrated on static approaches which do not consider the temporal nature of new and evolving intents and their popularity. In many cases, what a user expects may change dramatically over time as events develop. In this work we study the temporal variance of search intents for event-driven information needs using Wikipedia. First, we model intents based upon the structure represented by the section hierarchy of Wikipedia articles closely related to the information need. Using this technique, we investigate whether temporal changes in the content structure, i.e. in a section's text, reflect the temporal popularity of the intent. We map intents taken from a query-log (as ground-truth) to Wikipedia article sections and found that a large proportion are indeed reflected in topic-related article structure. By correlating the change activity of each section with the use of the intent query over time, we found that section change activity does reflect temporal popularity of many intents. Furthermore, we show that popularity between intents changes over time for event-driven topics.
Pursuing insights about healthcare utilization via geocoded search queries BIBAFull-Text 993-996
  Shuang-Hong Yang; Ryen W. White; Eric Horvitz
Mobile devices provide people with a conduit to the rich information resources of the Web. With consent, the devices can also provide streams of information about search activity and location that can be used in population studies and real-time assistance. We analyzed geotagged mobile queries in a privacy-sensitive study of potential transitions from health information search to in-world healthcare utilization. We note differences in people's health information seeking before, during, and after the appearance of evidence that a medical facility has been visited. We find that we can accurately estimate statistics about such potential user engagement with healthcare providers. The findings highlight the promise of using geocoded search for sensing and predicting activities in the world.

Short papers 2 -- retrieval models and ranking

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures BIBAFull-Text 997-1000
  Nima Asadi; Jimmy Lin
This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore effectiveness/efficiency tradeoffs with three candidate generation approaches: postings intersection with SvS, conjunctive query evaluation with WAND, and disjunctive query evaluation with WAND. We find no significant differences in end-to-end effectiveness as measured by NDCG between conjunctive and disjunctive WAND, but conjunctive query evaluation is substantially faster. Postings intersection with SvS, while fast, yields substantially lower end-to-end effectiveness, suggesting that document and term frequencies remain important in the initial ranking stage. These findings show that conjunctive WAND is the best overall candidate generation strategy of those we examined.
Estimating topical context by diverging from external resources BIBAFull-Text 1001-1004
  Romain Deveaud; Eric SanJuan; Patrice Bellot
Improving query understanding is crucial for providing the user with information that suits her needs. To this end, the retrieval system must be able to deal with several sources of knowledge from which it could infer a topical context. The use of external sources of information for improving document retrieval has been extensively studied. Improvements with either structured or large sets of data have been reported. However, in these studies resources are often used separately and rarely combined together. We experiment in this paper a method that discounts documents based on their weighted divergence from a set of external resources. We present an evaluation of the combination of four resources on two standard TREC test collections. Our proposed method significantly outperforms a state-of-the-art Mixture of Relevance Models on one test collection, while no significant differences are detected on the other one.
Finding knowledgeable groups in enterprise corpora BIBAFull-Text 1005-1008
  Shangsong Liang; Maarten de Rijke
The task of finding groups is a natural extension of search tasks aimed at retrieving individual entities. We introduce a group finding task: given a query topic, find knowledgeable groups that have expertise on that topic. We present four general strategies to this task. The models are formalized using generative language models. Two of the models aggregate expertise scores of the experts in the same group for the task, one locates documents associated with experts in the group and then determines how closely the documents are associated with the topic, whilst the remaining model directly estimates the degree to which a group is a knowledgeable group for a given topic. We construct a test collections based on the TREC 2005 and 2006 Enterprise collections. We find significant differences between different ways of estimating the association between a topic and a group. Experiments show that our knowledgeable group finding models achieve high absolute scores.
Neighbourhood preserving quantisation for LSH BIBAFull-Text 1009-1012
  Sean Moran; Victor Lavrenko; Miles Osborne
We introduce a scheme for optimally allocating multiple bits per hyperplane for Locality Sensitive Hashing (LSH). Existing approaches binarise LSH projections by thresholding at zero yielding a single bit per dimension. We demonstrate that this is a sub-optimal bit allocation approach that can easily destroy the neighbourhood structure in the original feature space. Our proposed method, dubbed Neighbourhood Preserving Quantization (NPQ), assigns multiple bits per hyperplane based upon adaptively learned thresholds. NPQ exploits a pairwise affinity matrix to discretise each dimension such that nearest neighbours in the original feature space fall within the same quantisation thresholds and are therefore assigned identical bits. NPQ is not only applicable to LSH, but can also be applied to any low-dimensional projection scheme. Despite using half the number of hyperplanes, NPQ is shown to improve LSH-based retrieval accuracy by up to 65% compared to the state-of-the-art.
Shame to be sham: addressing content-based grey hat search engine optimization BIBAFull-Text 1013-1016
  Fiana Raiber; Kevyn Collins-Thompson; Oren Kurland
We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey area between 'ham' (clearly normal) and 'spam' (clearly fake). Sham documents are often ranked artificially high in response to certain queries, but also may contain some useful information and cannot be considered as absolute spam. We report a novel annotation effort performed with the ClueWeb09 benchmark where pages were labeled as being spam, sham, or legitimate content. Significant inter-annotator agreement rates support the claim that there are sham documents that are highly ranked by a very effective retrieval approach, yet are not spam. We also present an initial study of predictors that may indicate whether a query is the target of shamming.
IRWR: incremental random walk with restart BIBAFull-Text 1017-1020
  Weiren Yu; Xuemin Lin
Random Walk with Restart (RWR) has become an appealing measure of node proximities in emerging applications eg recommender systems and automatic image captioning. In practice, a real graph is typically large, and is frequently updated with small changes. It is often cost-inhibitive to recompute proximities from scratch via batch algorithms when the graph is updated. This paper focuses on the incremental computations of RWR in a dynamic graph, whose edges often change over time. The prior attempt of RWR [1] deploys \kdash to find top-$k$ highest proximity nodes for a given query, which involves a strategy to incrementally estimate upper proximity bounds. However, due to its aim to prune needless calculation, such an incremental strategy is approximate: in $O(1)$ time for each node. The main contribution of this paper is to devise an exact and fast incremental algorithm of RWR for edge updates. Our solution, \IRWR\!, can incrementally compute any node proximity in $O(1)$ time for each edge update without loss of exactness. The empirical evaluations show the high efficiency and exactness of \IRWR for computing proximities on dynamic networks against its batch counterparts.
Bias-variance decomposition of ir evaluation BIBAFull-Text 1021-1024
  Peng Zhang; Dawei Song; Jun Wang; Yuexian Hou
It has been recognized that, when an information retrieval (IR) system achieves improvement in mean retrieval effectiveness (e.g. mean average precision (MAP)) over all the queries, the performance (e.g., average precision (AP)) of some individual queries could be hurt, resulting in retrieval instability. Some stability/robustness metrics have been proposed. However, they are often defined separately from the mean effectiveness metric. Consequently, there is a lack of a unified formulation of effectiveness, stability and overall retrieval quality (considering both). In this paper, we present a unified formulation based on the bias-variance decomposition. Correspondingly, a novel evaluation methodology is developed to evaluate the effectiveness and stability in an integrated manner. A case study applying the proposed methodology to evaluation of query language modeling illustrates the usefulness and analytical power of our approach.
An adaptive evidence weighting method for medical record search BIBAFull-Text 1025-1028
  Dongqing Zhu; Ben Carterette
In this paper, we present a medical record search system which is useful for identifying cohorts required in clinical studies. In particular, we propose a query-adaptive weighting method that can dynamically aggregate and score evidence in multiple medical reports (from different hospital departments or from different tests within the same department) of a patient. Furthermore, we explore several informative features for learning our retrieval model.
Fresh BrowseRank BIBAFull-Text 1029-1032
  Maxim Zhukovskiy; Andrei Khropov; Gleb Gusev; Pavel Serdyukov
In the last years, a lot of attention was attracted by the problem of page authority computation based on user browsing behavior. However, the proposed methods have a number of limitations. In particular, they run on a single snapshot of a user browsing graph ignoring substantially dynamic nature of user browsing activity, which makes such methods recency unaware. This paper proposes a new method for computing page importance, referred to as Fresh BrowseRank. The score of a page by our algorithm equals to the weight in a stationary distribution of a flexible random walk, which is controlled by recency-sensitive weights of vertices and edges. Our method generalizes some previous approaches, provides better capability for capturing the dynamics of the Web and users behavior, and overcomes essential limitations of BrowseRank. The experimental results demonstrate that our method enables to achieve more relevant and fresh ranking results than the classic BrowseRank.

Short papers 2 -- social media IR

Competition-based networks for expert finding BIBAFull-Text 1033-1036
  Çigdem Aslay; Neil O'Hare; Luca Maria Aiello; Alejandro Jaimes
Finding experts in question answering platforms has important applications, such as question routing or identification of best answers. Addressing the problem of ranking users with respect to their expertise, we propose Competition-Based Expertise Networks (CBEN), a novel community expertise network structure based on the principle of competition among the answerers of a question. We evaluate our approach on a very large dataset from Yahoo! Answers using a variety of centrality measures. We show that it outperforms state-of-the-art network structures and, unlike previous methods, is able to consistently outperform simple metrics like best answer count. We also analyse question answering forums in Yahoo! Answers, and show that they can be characterised by factual or subjective information seeking behavior, social discussions and the conducting of polls or surveys. We find that the ability to identify experts greatly depends on the type of forum, which is directly reflected in the structural properties of the expertise networks.
A study on the accuracy of Flickr's geotag data BIBAFull-Text 1037-1040
  Claudia Hauff
Obtaining geographically tagged multimedia items from social Web platforms such as Flickr is beneficial for a variety of applications including the automatic creation of travelogues and personalized travel recommendations. In order to take advantage of the large number of photos and videos that do not contain (GPS-based) latitude/longitude coordinates, a number of approaches have been proposed to estimate the geographic location where they were taken. Such location estimation methods rely on existing geotagged multimedia items as training data. Across application and usage scenarios, it is commonly assumed that the available geotagged items contain (reasonably) accurate latitude/longitude coordinates. Here, we consider this assumption and investigate how accurate the provided location data is. We conduct a study of Flickr images and videos and find that the accuracy of the geotag information is highly dependent on the popularity of the location: images/videos taken at popular (unpopular) locations, are likely to be geotagged with a high (low) degree of accuracy with respect to the ground truth.
Finding impressive social content creators: searching for SNS illustrators using feedback on motifs and impressions BIBAFull-Text 1041-1044
  Yohei Seki; Kiyoto Miyajima
We propose a method for finding impressive creators in online social network sites (SNSs). Many users are actively engaged in publishing their own works, sharing visual content on sites such as YouTube or Flickr. In this paper, we focus on the Japanese illustration-sharing SNS, Pixiv. We implement an illustrator search system based on user impression categories. The impressions of illustrators are estimated from clues in the crowdsourced social-tag annotations on their illustrations. We evaluated our system in terms of normalized discounted cumulative gain and found that using feedback on motifs and impressions for illustrations of relevant illustrators improved illustrator search by 11%.
Informational friend recommendation in social media BIBAFull-Text 1045-1048
  Shengxian Wan; Yanyan Lan; Jiafeng Guo; Chaosheng Fan; Xueqi Cheng
It is well recognized that users rely on social media (e.g. Twitter or Digg) to fulfill two common needs (i.e. social need and informational need) that is to keep in touch with their friends in the real world and to have access to information they are interested in. Traditional friend recommendation methods in social media mainly focus on a user's social need, but seldom address their informational need (i.e. suggesting friends that can provide information one may be interested in but have not been able to obtain so far). In this paper, we propose to recommend friends according to the informational utility, which stands for the degree to which a friend satisfies the target user's unfulfilled informational need, called informational friend recommendation. In order to capture users' informational need, we view a post in social media as an item and utilize collaborative filtering techniques to predict the rating for each post. The candidate friends are then ranked according to their informational utility for recommendation. In addition, we also show how to further consider diversity in such recommendations. Experiments on benchmark datasets demonstrate that our approach can significantly outperform the traditional friend recommendation methods under informational evaluation measures.

Short papers 2 -- topic models

Using social annotations to enhance document representation for personalized search BIBAFull-Text 1049-1052
  Mohamed Reda Bouadjenek; Hakim Hacid; Mokrane Bouzeghoub; Athena Vakali
In this paper, we present a contribution to IR modeling. We propose an approach that computes on the fly, a Personalized Social Document Representation (PSDR) of each document per user based on his social activities. The PSDRs are used to rank documents with respect to a query. This approach has been intensively evaluated on a large public dataset, showing significant benefits for personalized search.
The bag-of-repeats representation of documents BIBAFull-Text 1053-1056
  Matthias Gallé
n-gram representations of documents may improve over a simple bag-of-word representation by relaxing the independence assumption of word and introducing context. However, this comes at a cost of adding features which are non-descriptive, and increasing the dimension of the vector space model exponentially.
   We present new representations that avoid both pitfalls. They are based on sound theoretical notions of stringology, and can be computed in optimal asymptotic time with algorithms using data structures from the suffix family. While maximal repeats have been used in the past for similar tasks, we show how another equivalence class of repeats -- largest-maximal repeats -- obtain similar or better results, with only a fraction of the features. This class acts as a minimal generative basis of all repeated substrings. We also report their use for topic modeling, showing easier to interpret models.
An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval BIBAFull-Text 1057-1060
  Debasis Ganguly; Johannes Leveling; Gareth J. F. Jones
Document expansion (DE) in information retrieval (IR) involves modifying each document in the collection by introducing additional terms into the document. It is particularly useful to improve retrieval of short and noisy documents where the additional terms can improve the description of the document content. Existing approaches to DE assume that documents to be expanded are from a single topic. In the case of multi-topic documents this can lead to a topic bias in terms selected for DE and hence may result in poor retrieval quality due to the lack of coverage of the original document topics in the expanded document. This paper proposes a new DE technique providing a more uniform selection and weighting of DE terms from all constituent topics. We show that our proposed method significantly outperforms the most recently reported relevance model based DE method on a spoken document retrieval task for both manual and automatic speech recognition transcripts.

Short papers 2 -- users and interactive IR

Timeline generation with social attention BIBAFull-Text 1061-1064
  Xin Wayne Zhao; Yanwei Guo; Rui Yan; Yulan He; Xiaoming Li
Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of any given topic. It thus attracts much attention from research communities in recent years. Nevertheless, existing work on timeline generation often ignores an important factor, the attention attracted to topics of interest (hereafter termed "social attention"). Without taking into consideration social attention, the generated timelines may not reflect users' collective interests. In this paper, we study how to incorporate social attention in the generation of timeline summaries. In particular, for a given topic, we capture social attention by learning users' collective interests in the form of word distributions from Twitter, which are subsequently incorporated into a unified framework for timeline summary generation. We construct four evaluation sets over six diverse topics. We demonstrate that our proposed approach is able to generate both informative and interesting timelines. Our work sheds light on the feasibility of incorporating social attention into traditional text mining tasks.
Explicit feedback in local search tasks BIBAFull-Text 1065-1068
  Dmitry Lagun; Avneesh Sud; Ryen W. White; Peter Bailey; Georg Buscher
Modern search engines make extensive use of people's contextual information to finesse result rankings. Using a searcher's location provides an especially strong signal for adjusting results for certain classes of queries where people may have clear preference for local results, without explicitly specifying the location in the query directly. However, if the location estimate is inaccurate or searchers want to obtain many results from a particular location, they have limited control on the location focus in the search results returned. In this paper we describe a user study that examines the effect of offering searchers more control over how local preferences are gathered and used. We studied providing users with functionality to offer explicit relevance feedback (ERF) adjacent to results automatically identified as location-dependent (i.e., more from this location). They can use this functionality to indicate whether they are interested in a particular search result and desire more results from that result's location. We compared the ERF system against a baseline (NoERF) that used the same underlying mechanisms to retrieve and rank results, but did not offer ERF support. User performance was assessed across 12 experimental participants over 12 location-sensitive topics, in a fully counter-balanced design. We found that participants interacted with ERF frequently, and there were signs that ERF has the potential to improve success rates and lead to more efficient searching for location-sensitive search tasks than NoERF.
Ranking explanatory sentences for opinion summarization BIBAFull-Text 1069-1072
  Hyun Duk Kim; Malu G. Castellanos; Meichun Hsu; ChengXiang Zhai; Umeshwar Dayal; Riddhiman Ghosh
We introduce a novel sentence ranking problem called explanatory sentence extraction (ESE) which aims to rank sentences in opinionated text based on their usefulness for helping users understand the detailed reasons of sentiments (i.e., "explanatoriness"). We propose and study several general methods for scoring the explanatoriness of a sentence. We create new data sets and propose a new measure for evaluation. Experiment results show that the proposed methods are effective, outperforming a state of the art sentence ranking method for standard text summarization.
#trapped!: social media search system requirements for emergency management professionals BIBAFull-Text 1073-1076
  Stefan Raue; Leif Azzopardi; Chris W. Johnson
Social media provides a new and potentially rich source of information for emergency management services. However, extracting the relevant information from such streams poses a number of difficult challenges. In this short paper, we survey emergency management professionals to ascertain how social media is used when responding to incidents, the search strategies that they undertake, and the challenges that they face when using social media streams. This research indicates that emergency management professionals employ two main strategies when searching social media streams: keyword-centric and account-centric search strategies. Furthermore, current search interfaces are inadequate regarding the requirements of command and control environments in the emergency management domain, where the process of information seeking is collaborative in nature and needs to support multiple information seekers.

Demonstrations 1 -- Users and interactive IR

ThemeStreams: visualizing the stream of themes discussed in politics BIBAFull-Text 1077-1078
  Ork de Rooij; Daan Odijk; Maarten de Rijke
The political landscape is fluid. Discussions are always ongoing and new "hot topics" continue to appear in the headlines. But what made people start talking about that topic? And who started it? Because of the speed at which discussions sometimes take place this can be difficult to track down. We describe ThemeStreams: a demonstrator that maps political discussions to themes and influencers and illustrate how this mapping is used in an interactive visualization that shows us which themes are being discussed, and that helps us answer the question "Who put this issue on the map?" in streams of political data.
BATC: a benchmark for aggregation techniques in crowdsourcing BIBAFull-Text 1079-1080
  Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Ngoc Tran Lam; Karl Aberer
As the volumes of AI problems involving human knowledge are likely to soar, crowdsourcing has become essential in a wide range of world-wide-web applications. One of the biggest challenges of crowdsourcing is aggregating the answers collected from crowd workers; and thus, many aggregate techniques have been proposed. However, given a new application, it is difficult for users to choose the best-suited technique as well as appropriate parameter values since each of these techniques has distinct performance characteristics depending on various factors (e.g. worker expertise, question difficulty). In this paper, we develop a benchmarking tool that allows to (i) simulate the crowd and (ii) evaluate aggregate techniques in different aspects (accuracy, sensitivity to spammers, etc.). We believe that this tool will be able to serve as a practical guideline for both researchers and software developers. While researchers can use our tool to assess existing or new techniques, developers can reuse its components to reduce the development complexity.
Spacious: an interactive mental search interface BIBAFull-Text 1081-1082
  Phong D. Vo; Hichem Sahbi
We introduce in this work a novel approach for semantic indexing and mental image search. Given semantic concepts defined by few training examples, our formulation is transductive and learns a mapping from an initial ambient space, related to low level visual features, to an output space spanned by a well defined semantic basis where data can be easily explored. With this method, searching for a mental visual target reduces to scanning data according to their coordinates in the learned semantic space. We illustrate the proposed method through our graphical user interface "Spacious", for the purpose of visualization and interactive navigation in generic image databases and satellite images.

Demonstrations 1 -- IR and structured data

Flex-BaseX: an XML engine with a flexible extension of Xquery full-text BIBAFull-Text 1083-1084
  Emanuele Panzeri; Gabriella Pasi
XML is the most used language for structuring data and documents, besides being the de-facto standard for data exchange. Keyword based search has been implemented by the XQuery Full-Text language extension, allowing document fragments to be retrieved and ranked via keyword-based matching in the Information Retrieval style. In this demo the implementation of an XQuery extension allowing users to express their vague knowledge of the underlying XML structure is presented. The integration has been performed on top of the BaseX query engine; the work, as initially done by Panzeri at al. in IIR 2013 as a proof-of-concept has been further enhanced and extended.
ProductSeeker: entity-based product retrieval for e-commerce BIBAFull-Text 1085-1086
  Hongzhi Wang; Xiaodong Zhang; Jianzhong Li; Hong Gao
The retrieval results of online products information in e-commerce web sites are often difficult for users to use because of different descriptions for the same product. This paper proposes ProductSeeker, a product retrieval system organizing results according to their referring real-world entities for the conveniences of users. In the demonstration, we will present our system providing friendly interface to retrieve fresh product information and refining results according to feedback.

Demonstrations 1 -- information extraction

Live nuggets extractor: a semi-automated system for text extraction and test collection creation BIBAFull-Text 1087-1088
  Matthew Ekstrand-Abueg; Virgil Pavlu; Javed A. Aslam
The Live Nugget Extractor system provides users with a method of efficiently and accurately collecting relevant information for any web query rather than providing a simple ranked lists of documents. The system utilizes an online learning procedure to infer relevance of unjudged documents while extracting and ranking information from judged documents. This creates a set of judged and inferred relevance scores for both documents and text fragments, which can be used for test collections, summarization, and other tasks where high accuracy and large collections with minimal human effort are needed.
X-ENS: semantic enrichment of web search results at real-time BIBAFull-Text 1089-1090
  Pavlos Fafalios; Yannis Tzitzikas
While more and more semantic data are published on the Web, an important question is how typical web users can access and exploit this body of knowledge. Although, existing interaction paradigms in semantic search hide the complexity behind an easy-to-use interface, they have not managed to cover common search needs. In this paper, we present X-ENS (eXplore ENtities in Search), a web search application that enhances the classical, keyword-based, web searching with semantic information, as a means to combine the pros of both Semantic Web standards and common Web Searching. X-ENS identifies entities of interest in the snippets of the top search results which can be further exploited in a faceted interaction scheme, and thereby can help the user to limit the -- often very large -- search space to those hits that contain a particular piece of information. Moreover, X-ENS permits the exploration of the identified entities by exploiting semantic repositories.
Accurate and robust text detection: a step-in for text retrieval in natural scene images BIBAFull-Text 1091-1092
  Xu-Cheng Yin; Xuwang Yin; Kaizhu Huang; Hong-Wei Hao
We propose and implement a robust text detection system, which is a prominent step-in for text retrieval in natural scene images or videos. Our system includes several key components: (1) A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions as character candidates using the strategy of minimizing regularized variations. (2) Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and threshold of clustering are learned automatically by a novel self-training distance metric learning algorithm. (3) The posterior probabilities of text candidates corresponding to non-text are estimated with an character classifier; text candidates with high probabilities are then eliminated and finally texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition dataset and a publicly available multilingual dataset; the f measures are over 76% and 74% which are significantly better than the state-of-the-art performances of 71% and 65%, respectively.

Demonstrations 1 -- filtering and recommending

A framework for specific term recommendation systems BIBAFull-Text 1093-1094
  Thomas Lüke; Philipp Schaer; Philipp Mayr
In this paper we present the IRSA framework that enables the automatic creation of search term suggestion or recommendation systems (TS). Such TS are used to operationalize interactive query expansion and help users in refining their information need in the query formulation phase. Our recent research has shown TS to be more effective when specific to a certain domain. The presented technical framework allows owners of Digital Libraries to create their own specific TS constructed via OAI-harvested metadata with very little effort.
TweetMogaz: a news portal of tweets BIBAFull-Text 1095-1096
  Walid Magdy
Twitter is currently one of the largest social hubs for users to spread and discuss news. For most of the top news stories happening, there are corresponding discussions on social media. In this demonstration TweetMogaz is presented, which is a platform for microblog search and filtering. It creates a real-time comprehensive report about what people discuss and share around news happening in certain regions. TweetMogaz reports the most popular tweets, jokes, videos, images, and news articles that people share about top news stories. Moreover, it allows users to search for specific topics. A scalable automatic technique for microblog filtering is used to obtain relevant tweets to a certain news category in a region. TweetMogaz.com demonstrates the effectiveness of our filtering technique for reporting public response toward news in different Arabic regions including Egypt and Syria in real-time.

Demonstrations 1 -- classification and clustering

InfoLand: information lay-of-land for session search BIBAFull-Text 1097-1098
  Jiyun Luo; Dongyi Guan; Hui Yang
Search result clustering (SRC) is a post-retrieval process that hierarchically organizes search results. The hierarchical structure offers overview for the search results and displays an "information lay-of-land" that intents to guide the users throughout a search session. However, SRC hierarchies are sensitive to query changes, which are common among queries in the same session. This instability may leave users seemly random overviews throughout the session. We present a new tool called InfoLand that integrates external knowledge from Wikipedia when building SRC hierarchies and increase their stability. Evaluation on TREC 2010-2011 Session Tracks shows that InfoLand produces more stable results organization than a commercial search engine.
A portable multilingual medical directory by automatic categorization of Wikipedia articles BIBAFull-Text 1099-1100
  Fernando Ruiz-Rico; María-Consuelo Rubio-Sánchez; David Tomás; Jose-Luis Vicedo
Wikipedia has become one of the most important sources of information available all over the world. However, the categorization of Wikipedia articles is not standardized and the searches are mainly performed on keywords rather than concepts. In this paper we present an application that builds a hierarchical structure to organize all Wikipedia entries, so that medical articles can be reached from general to particular, using the well known Medical Subject Headings (MeSH) thesaurus. Moreover, the language links between articles will allow using the directory created in different languages. The final system can be packed and ported to mobile devices as a standalone offline application.

Demonstrations 2 -- users and interactive IR

A geolinguistic web application based on linked open data BIBAFull-Text 1101-1102
  Emanuele Di Buccio; Giorgio Maria Di Nunzio; Gianmaria Silvello
Digital Geolinguistic systems encourage collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. In this demo, we propose a Linked Open Data approach for increasing the level of interoperability of geolinguistic applications and the reuse of the data. We present a case study of a geolinguistic project named Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt).
TopicVis: a GUI for topic-based feedback and navigation BIBAFull-Text 1103-1104
  Debasis Ganguly; Manisha Ganguly; Johannes Leveling; Gareth J. F. Jones
This paper describes a search system which includes topic model visualization to improve the user search experience. The system graphically renders the topics in a retrieved set of documents, enables a user to selectively refine search results and allows easy navigation through information on selective topics within documents.
Information seeking in digital cultural heritage with PATHS BIBAFull-Text 1105-1106
  Mark M. Hall; Paul D. Clough; Samuel Fernando; Paula Goodale; Mark Stevenson; Eneko Agirre; Arantxa Otegi; Aitor Soroa; Kate Fernie; Jillian Griffiths; Runar Bergheim
Current Information Retrieval systems for digital cultural heritage support only the actual search aspect of the information seeking process. This demonstration presents the second PATHS system which provides the exploration, analysis, and sense-making features to support the full information seeking process.

Demonstrations 2 -- IR and structured data

Answering natural language queries over linked data graphs: a distributional semantics approach BIBAFull-Text 1107-1108
  André Freitas; Fabrício F. de Faria; Seán O'Riain; Edward Curry
This paper demonstrates Treo, a natural language query mechanism for Linked Data graphs. The approach uses a distributional semantic vector space model to semantically match user query terms with data, supporting vocabulary-independent (or schema-agnostic) queries over structured data.
Removing the mismatch headache in XML keyword search BIBAFull-Text 1109-1110
  Yong Zeng; Zhifeng Bao; Tok Wang Ling; Guoliang Li
In this demo, we study one category of query refinement problems in the context of XML keyword search, where what users search for do not exist in the data while useless results are returned by the search engine. It is a hidden but important problem. We refer to it as the MisMatch problem. We propose a practical yet efficient way to detect the MisMatch problem and generate helpful suggestions to users, namely MisMatch detector and suggester. Our approach can be viewed as a post-processing job of query evaluation. An online XML keyword search engine embedding the MisMatch detector and suggester has been built and is available at [1].
YaLi: a crowdsourcing plug-in for NERD BIBAFull-Text 1111-1112
  Yafang Wang; Lili Jiang; Johannes Hoffart; Gerhard Weikum
We demonstrate the YaLi browser plug-in which discovers named entities in Web pages and provides background knowledge about them. The plug-in is implemented with two purposes. From a user perspective, it enriches the browsing experience with entities, helping users with their information needs. From the research perspective, we aim to improve the methods that are used for named entity recognition and disambiguation (NERD) by leveraging the plug-in as an implicit crowdsourcing platform. YaLi tracks the system's errors and the users' corrections, and also gathers implicit training data for improving NERD accuracy.

Demonstrations 2 -- information extraction

SearchResultFinder: federated search made easy BIBAFull-Text 1113-1114
  Dolf Trieschnigg; Kien Tjin-Kam-Jet; Djoerd Hiemstra
Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up determining reusable XPaths for extracting search result items from HTML search result pages. Based on a single search result page, the tool presents a ranked list of candidate extraction XPaths and allows highlighting to view the extraction result. An evaluation with 148 web search engines shows that in 90% of the cases a correct XPath is suggested.

Demonstrations 2 -- filtering and recommending

Online matching of web content to closed captions in IntoNow BIBAFull-Text 1115-1116
  Carlos Castillo; Gianmarco De Francisci Morales; Ajay Shekhawat
IntoNow is a mobile application that provides a second-screen experience to television viewers. IntoNow uses the microphone of the companion device to sample the audio coming from the TV set, and compares it against a database of TV shows in order to identify the program being watched.
   The system we demonstrate is activated by IntoNow for specific types of shows. It retrieves information related to the program the user is watching by using closed captions, which are provided by each broadcasting network along the TV signal. It then matches the stream of closed captions in real-time against multiple sources of content. More specifically, during news programs it displays links to online news articles and the profiles of people and organizations in the news, and during music shows it displays links to songs. The matching models are machine-learned from editorial judgments, and tuned to achieve approximately 90% precision.
Match the news: a firefox extension for real-time news recommendation BIBAFull-Text 1117-1118
  Margarita Karkali; Dimitris Pontikis; Michalis Vazirgiannis
We present Match the News, a browser extension for real time news recommendation. Our extension works on the client side to recommend in real time recently published articles that are relevant to the web page the user is currently visiting. Match the News is fed from Google News RSS and applies syntactic matching to find the relevant articles. We implement an innovative weighting function to perform the keyword extraction task, BM25H. With BM25H we extract keywords not only relevant to currently browsed web page, but also novel with respect to the user's recent browsing history. The novelty feature in keyword extraction task results in meaningful news recommendations with regards to the web page the users currently visits. Moreover the extension offers a salient visualization of the terms corresponding to the users recent browsing history making thus the extension a comprehensive tool for real time news recommendation and self assessment.

Demonstrations 2 -- classification and clustering

Demonstration of citation pattern analysis for plagiarism detection BIBFull-Text 1119-1120
  Bela Gipp; Norman Meuschke; Corinna Breitinger; Mario Lipinski; Andreas Nürnberger
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms BIBAFull-Text 1121-1122
  Fernando Ruiz-Rico; David Tomás; Jose-Luis Vicedo; María-Consuelo Rubio-Sánchez
This paper presents an application for medicinal plants prescription based on text classification techniques. The system receives as an input a free text describing the symptoms of a user, and retrieves a ranked list of medicinal plants related to those symptoms. In addition, a set of links to Wikipedia are also provided, enriching the information about every medicinal plant presented to the user. In order to improve the accessibility to the application, the input can be written in six different languages, adapting the results accordingly. The application interface can be accessed from different devices and platforms.


Searching in the city of knowledge: challenges and recent developments BIBAFull-Text 1123
  Veli Bicer; Vanessa Lopez
Today plenty of data is emerging from various city systems. Beyond the classical Web resources, large amounts of data are retrieved from sensors, devices, social networks, governmental applications, or service networks. In such a diversity of information, answering specific information needs of city inhabitants requires holistic IR techniques, capable of harnessing different types of city data and turned it into actionable insights to answer different queries. This tutorial will present deep insights, challenges, opportunities and techniques to make heterogeneous city data searchable and show how emerging IR techniques models can be employed to retrieve relevant information for the citizens.
Scalability and efficiency challenges in commercial web search engines BIBAFull-Text 1124
  B. Barla Cambazoglu; Ricardo Baeza-Yates
Commercial web search engines rely on very large compute infrastructures to be able to cope with the continuous growth of the Web and user bases. Achieving scalability and efficiency in such large-scale search engines requires making careful architectural design choices while devising algorithmic performance optimizations. Unfortunately, most details about the internal functioning of commercial web search engines remain undisclosed due to their financial value and the high level of competition in the search market. The main objective of this tutorial is to provide an overview of the fundamental scalability and efficiency challenges in commercial web search engines, bridging the existing gap between the industry and academia.
Music similarity and retrieval BIBAFull-Text 1125
  Peter Knees; Markus Schedl
This tutorial serves as an introductory course to the field of and state-of-the-art in music information retrieval (MIR) and in particular to music similarity estimation which is an essential component of music retrieval. Apart from explaining approaches that estimate similarity based on acoustic properties of an audio signal, we review methods that exploit (mostly textual) meta-data from the Web to build representations of music then used for similarity calculation. Additionally, topics such as (large-scale) music indexing, information extraction for music, personalization in music retrieval, and evaluation of MIR systems are addressed.
The cluster hypothesis in information retrieval BIBFull-Text 1126
  Oren Kurland
Entity linking and retrieval BIBAFull-Text 1127
  Edgar Meij; Krisztian Balog; Daan Odijk
This full-day tutorial presents a comprehensive introduction to entity linking and retrieval. Part I provides a detailed overview of entity linking: identifying and disambiguating entity occurrences in unstructured text. Part II focuses on entity retrieval, by first considering scenarios where explicit representations of entities are available, and then moving to a setting where evidence needs to be collected and aggregated from multiple documents or even collections, thereby combining techniques from both entity linking and entity retrieval. Part III concludes the tutorial with an overview and hands-on comparative analysis of applications and publicly available toolkits and web services.
Kernel-based learning to rank with syntactic and semantic structures BIBAFull-Text 1128
  Alessandro Moschitti
Kernel Methods (KMs) are powerful machine learning techniques that can alleviate the data representation problem as they substitute scalar product between feature vectors with similarity functions (kernels) directly defined between data instances, e.g., syntactic trees, (thus features are not needed any longer). This tutorial aims at introducing essential and simplified theory of Support Vector Machines and KMs for the design of practical applications. It will describe effective kernels for easily engineering automatic classifiers and learning to rank algorithms using structured data and semantic processing. Some examples will be drawn from Question Answering, Passage Re-ranking, Short and Long Text Categorization, Relation Extraction, Named Entity Recognition, Co-Reference Resolution. Moreover, some practical demonstrations will be given using the SVM-Light-TK (tree kernel) toolkit.
Designing search usability BIBAFull-Text 1129
  Tony Russell-Rose
Search is not just a box and ten blue links. Search is a journey: an exploration where what we encounter along the way changes what we seek. But in order to guide people along this journey, we must understand both the art and science of search experience design. The aim of this tutorial is to deliver a course grounded in good scholarship, integrating the latest research findings with insights derived from the practical experience of designing and optimizing an extensive range of commercial search applications. It focuses on the development of transferable, practical skills that can be learnt and practiced within a half-day session.
Diversity and novelty in information retrieval BIBAFull-Text 1130
  Rodrygo L. T. Santos; Pablo Castells; Ismail Sengor Altingovde; Fazli Can
This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains, namely, in the context of search engines, recommender systems, and data streams.
Multimedia recommendation: technology and techniques BIBAFull-Text 1131
  Jialie Shen; Meng Wang; Shuicheng Yan; Peng Cui
In recent years, we have witnessed a rapid growth in the availability of digital multimedia on various application platforms and domains. Consequently, the problem of information overload has become more and more serious. In order to tackle the challenge, various multimedia recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval, machine learning and computer version). Meanwhile, many commercial web systems (e.g., Flick, YouTube, and Last.fm) have successfully applied recommendation techniques to provide users personalized content and services in a convenient and flexible way.
   When looking back, the information retrieval (IR) community has a long history of studying and contributing recommender system design and related issues. It has been proven that the recommender systems can effectively assist users in handling information overload and provide high-quality personalization. While several courses were dedicated to multimedia retrieval in the recent decade, to the best of our knowledge, the tutorial is the first one specifically focusing on multimedia recommender systems and their applications on various domains and media contents. We plan to summarize the research along this direction and provide an impetus for further research on this important topic.
Building test collections: an interactive tutorial for students and others without their own evaluation conference series BIBAFull-Text 1132
  Ian M. Soboroff
While existing test collections and evaluation conference efforts may sufficiently support one's research, one can easily find oneself wanting to solve problems no one else is solving yet. But how can research in IR be done (or be published!) without solid data and experiments? Not everyone can talk TREC, CLEF, INEX, or NTCIR into running a track to build a collection.
   This tutorial aims to teach how to build a test collection using resources at hand, how to measure the quality of that collection, how to understand its limitations, and how to communicate them. The intended audience is advanced students who find themselves in need of a test collection, or actually in the process of building a test collection, to support their own research. The goal of this tutorial is to lay out issues, procedures, pitfalls, and practical advice.


Workshop on benchmarking adaptive retrieval and recommender systems: BARS 2013 BIBAFull-Text 1133
  Pablo Castells; Frank Hopfgartner; Alan Said; Mounia Lalmas
Evaluating adaptive and personalized information retrieval techniques is known to be a difficult endeavor. The rapid evolution of novel technologies in this scope raises additional challenges that further stress the need for new evaluation approaches and methodologies. The BARS 2013 workshop seeks to provide a specific venue for work on novel, personalization-centric benchmarking approaches to evaluate adaptive retrieval and recommender systems.
SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation BIBAFull-Text 1134
  Charles L. A. Clarke; Luanne Freund; Mark D. Smucker; Emine Yilmaz
The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval evaluation through the modeling of user behavior.
Internet advertising: theory and practice BIBAFull-Text 1135
  Bin Gao; Jun Yan; Dou Shen; Tie-Yan Liu
Internet advertising, a form of advertising that utilizes the Internet to deliver marketing messages and attract customers, has seen exponential growth since its inception around twenty years ago; it has been pivotal to the success of the World Wide Web. The dramatic growth of internet advertising poses great challenges to information retrieval, machine learning, data mining and game theory, and it calls for novel technologies to be developed. The main purpose of this workshop is to bring together researchers and practitioners in the area of Internet Advertising and enable them to share their latest research results, to express their opinions, and to discuss future directions.
Exploration, navigation and retrieval of information in cultural heritage: ENRICH 2013 BIBAFull-Text 1136
  Séamus Lawless; Maristella Agosti; Paul Clough; Owen Conlan
The Exploration, Navigation and Retrieval of Information in Cultural Heritage Workshop (ENRICH 2013) offers a forum to 1) discuss the challenges and opportunities in Information Retrieval research in the area of Cultural Heritage; 2) encourage collaboration between researchers engaged in work in this specialist area of Information Retrieval, and to foster the formation of a research community; and 3) identify a set of actions which the community should undertake to progress the research agenda. The workshop will foster a new stream of Information Retrieval research and support the design of search tools that can help end-users fully exploit the wonderful Cultural Heritage material that is available across the globe.
SIGIR 2013 workshop on time aware information access (#TAIA2013) BIBAFull-Text 1137
  Fernando Diaz; Susan Dumais; Miles Efron; Kira Radinsky; Maarten de Rijke; Milad Shokouhi
Web content increasingly reflects the current state of the physical and social world, manifested both in traditional news media sources along with user-generated publishing sites such as Twitter, Foursquare, and Facebook. At the same time, web searching increasingly reflects problems grounded in the real world. As a result of this blending of the web with the real world, we observe that the web, both in its composition and use, has incorporated many of the dynamics of the real world. Few of the problems associated with searching dynamic collections are well understood, such as defining time-sensitive relevance, understanding user query behavior over time and understanding why certain web content changes.
   We believe that, just as static collections often benefit from modeling topics, dynamic collections will likely benefit from temporal modeling of events and time-sensitive user interests and intents, which were rarely addressed in the literature. There have been preliminary efforts in the research and industrial communities to address algorithms, architectures, evaluation methodologies and metrics.
   We aim to bring together practitioners and researchers to discuss their recent breakthroughs and the challenges with addressing time-aware information access, both from the algorithmic and the architectural perspectives.
   This workshop is a successor to the successful SIGIR 2012 Workshop on Time Aware Information Access (#TAIA2012). Where the 2012 edition was the first to bring together a broad set of academic and industrial researchers around the topic of time-aware information access, the specific focus of this workshop is on the many time-aware benchmarking activities that are ongoing in 2013.
Workshop on health search and discovery: helping users and advancing medicine BIBAFull-Text 1138
  Ryen W. White; Elad Yom-Tov; Eric Horvitz; Eugene Agichtein; William Hersh
This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medical domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements to ranking algorithms and search interfaces), and how we can discover new medical facts and phenomena from information sought online, as evidenced in query streams and other sources such as social media. This domain also offers many opportunities for applications that monitor and improve quality of life of those affected by medical conditions, by providing tools to support their health-related information behavior.
EuroHCIR2013: the 3rd European workshop on human-computer interaction and information retrieval BIBAFull-Text 1139
  Max L. Wilson; Birger Larsen; Preben Hansen; Kristian Norling; Tony Russell-Rose
A proposal summary for the EuroHCIR workshop at SIGIR2013.

Doctoral consortium

Beyond relevance: on novelty and diversity in tag recommendation BIBAFull-Text 1140
  Fabiano Belém
We propose to explicitly exploit issues related to novelty and diversity in tag recommendation tasks, an unexplored research avenue (only relevance issues have been investigated so far), in order to improve user experience and satisfaction. We propose new tag recommendation strategies to cover these issues and highlight the involved challenges.
Group-support for task-based information searching: a knowledge-based approach BIBFull-Text 1141
  Thilo Boehm
Diversified relevance feedback BIBAFull-Text 1142
  Matt Crane
The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other?
   When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of "java" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings.
   In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research.
   For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then re-run on the remaining documents to re-order them.
   There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4].
   An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents.
   The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These previous approaches suggest that these two ideas are more linked than expected.
   The ATIRE search engine [6] will be used to further explore the relationship between diversification and relevance feedback. ATIRE was selected because it is developed locally, and is designed to be small and fast. ATIRE also produces a competitive baseline, which would have placed 6th in the 2011 TREC diversity task while performing no diversification and index-time spam filtering [2], although we concede this is not equivalent to submitting a run.
Segmentation strategies for passage retrieval in audio-visual documents BIBAFull-Text 1143
  Petra Galušcáková
The importance of Information Retrieval (IR) in audio-visual recordings has been increasing with steeply growing numbers of audio-visual documents available on-line. Compared to traditional IR methods, this task requires specific techniques, such as Passage Retrieval which can accelerate the search process by retrieving the exact relevant passage of a recording instead of the full document. In Passage Retrieval, full recordings are divided into shorter segments which serve as individual documents for the further IR setup. This technique also allows normalizing document length and applying positional information. It was shown that it can even improve retrieval results.
   In this work, we examine two general strategies for Passage Retrieval: blind segmentation into overlapping regular-length passages and segmentation into variable-length passages based on semantics of their content.
   Time-based segmentation was already shown to improve retrieval of textual documents and audio-visual recordings. Our experiments performed on the test collection used in the Search subtask of the Search and Hyperlinking Task in MediaEval Benchmarking 2012 confirm those findings and show that parameters (segment length and shift) tuning for a specific test collection can further improve the results. Our best results on this collection were achieved by using 45-second long segments with 15-second shifts.
   Semantic-based segmentation can be divided into three types: similarity-based (producing segments with high intra-similarity and low inter-similarity), lexical-chain-based (producing segments with frequent lexically connected words), and feature-based (combining various features which signalize a segment break in a machine-learning setting). In this work, we mainly focus on feature-based segmentation which allows exploiting various features from all modalities of the data (including segment length) in a single trainable model and produces segments which can eventually overlap.
   Our preliminary results show that even simple semantic-based segmentation outperforms regular segmentation. Our model is a decision tree incorporating the following features: shot segments, output of TextTiling algorithm, cue words (well, thanks, so, I, now), sentence breaks, and the length of the silence after the previous word. In terms of the MASP, the relative improvement over regular segmentation is more than 19%.
Indexing and querying overlapping structures BIBAFull-Text 1144
  Faegheh Hasibi
Structural information retrieval is mostly based on hierarchy. However, in real life information is not purely hierarchical and structural elements may overlap each other. The most common example is a document with two distinct structural views, where the logical view is section/ subsection/ paragraph and the physical view is page/ line. Each single structural view of this document is a hierarchy and the components are either disjoint or nested inside each other. The overlapping issue arises when one structural element cannot be neatly nested into others. For instance, when a paragraph starts in one page and terminates in the next page. Similar situations can appear in videos and other multimedia contents, where temporal or spatial constituents of a media file may overlap each other.
   Querying over overlapping structures is one of the challenges of large scale search engines. For instance, FSIS (FAST Search for Internet Sites) [1] is a Microsoft search platform, which encounters overlaps while analysing content of textual data. FSIS uses a pipeline process to extract structure and semantic information of documents. The pipeline contains several components, where each component writes annotations to the input data. These annotations consist of structural elements and some of them may overlap each other. Handling overlapping structures in search engines will add a novel capability of searching, where users can ask queries such as "Find all the words that overlap two lines" or "Find the music played during Intro scene of Avatar movie". There are also other use cases, where the user of the search engine is not a person, but is a specific program with complex, non-traditional information retrieval needs.
   This research attempts to index overlapping structures and provide efficient query processing for large-scale search engines. The current research on overlapping structures revolves around encoding and modelling data, while indexing and query processing methods need investigations. Moreover, due to intrinsic complexity of overlaps, XML indexing and query processing techniques cannot be used for overlapping structures. Hence, my research on overlapping structures comprises three main parts: (1) an indexing method that supports both hierarchies and overlaps; (2) a query processing method based on the indexing technique and (3) a query language that is close to natural language and supports both full text and structural queries.
   Our approach for indexing overlaps is to adapt the PrePost [3] XML indexing method to overlapping structures. This method labels each node with its start and end positions and requires modest storage space. However, PrePost indexing cannot be used for overlapping nodes. To overcome this issue, we need to define a data model for overlapping structures. Since hierarchies are not sufficient to describe overlapping components, several data structures have been introduced by scholars. One of the most interesting data models is GODDAG [2]. GODDAG is a tree-like graph, where nodes can have multiple parentage. This model can support overlaps as well as simple inheritance. Our proposed data model for indexing overlaps is such a tree-like structure, where we can define overlapping, parent-child and ancestor-descendant relationships.
A query and patient understanding framework for medical records search BIBAFull-Text 1145
  Nut Limsopatham
Electronic medical records (EMRs) are being increasingly used worldwide to facilitate improved healthcare services [2,3]. They describe the clinical decision process relating to a patient, detailing the observed symptoms, the conducted diagnostic tests, the identified diagnoses and the prescribed treatments. However, medical records search is challenging, due to the implicit knowledge inherent within the medical records -- such knowledge may be known by medical practitioners, but hidden to an information retrieval (IR) system [3]. For instance, the mention of a treatment such as a drug may indicate to a practitioner that a particular diagnosis has been made even if this was not explicitly mentioned in the patient's EMRs. Moreover, the fact that a symptom has not been observed by a clinician may rule out some specific diagnoses.
   Our work focuses on searching EMRs to identify patients with medical histories relevant to the medical condition(s) stated in a query. The resulting system can be beneficial to healthcare providers, administrators, and researchers who may wish to analyse the effectiveness of a particular medical procedure to combat a specific disease [2,4]. During retrieval, a healthcare provider may indicate a number of inclusion criteria to describe the type of patients of interest. For example, the used criteria may include personal profiles (e.g. age and gender) or some specific medical symptoms and tests, allowing to identify patients that have EMRs matching the criteria.
   To attain effective retrieval performance, we hypothesise that, in such a medical IR system, both the information needs and patients should be modelled based on how the medical process is developed. Specifically, our thesis states that since the medical decision process typically encompasses four aspects (symptom, diagnostic test, diagnosis, and treatment), a medical search system should take into account these aspects and apply inferences to recover possible implicit knowledge. We postulate that considering these aspects and their derived implicit knowledge at different levels of the retrieval process (namely, sentence, record, and inter-record level) enhances the retrieval performance. Indeed, we propose to build a query and patient understanding framework that can gain insights from EMRs and queries, by modelling and reasoning during retrieval in terms of the four aforementioned aspects (symptom, diagnostic test, diagnosis, and treatment) at three different levels of the retrieval process.
Semantic models for answer re-ranking in question answering BIBAFull-Text 1146
  Piero Molino
The task of Question Answering (QA) is to find correct answers to users' questions expressed in natural language. In the last few years non-factoid QA received more attention. It focuses on causation, manner and reason questions, where the expected answer has the form of a passage of text. The presence of question and answers corpora allows the adoption of Learning to Rank (MLR) algorithms in order to output a sensible ranking of the candidate answers. The importance and effectiveness of linguistically motivated features, obtained from syntax, lexical semantics and semantic role labeling, was shown in literature [2-4], but there are still several different possible semantic features that have not been taken into account so far and our goal is to find out if their use could lead to performance improvement. In particular features coming from Semantic Models (SM) like Distributional Semantic Models (DSMs), Explicit Semantic Analysis (ESA), Latent Dirichlet Allocation (LDA) induced topics have never been applied to the task so far. Based on the usefulness that those models show in other tasks, we think that SM can have a significant role in improving current state-of-the-art systems' performance in answer re-ranking.
   The questions this research wants to answer are: 1) Do semantic features bring information that is not present in the bag-of-words and syntactic features? 2) Do they bring different information or does it overlap with that of other features? 3) Are additional semantic features useful for answer re-ranking? Does their adoption improve systems' performance? 4) Which of them is more effective and under which circumstances?
   We performed a preliminary evaluation of DSMs on the ResPubliQA 2010 Dataset. We built a DSM based answer scorer that represents the question and the answer as the sums of the vectors of their terms taken term-term co-occurrence matrix and calculates their cosine similarity. We replaced the term-term matrix with the ones obtained by Random Indexing (RI), Latent Semantic Analysis (LSA) and LSA over the RI. Considering each DSM on its own, the results prove that all the DSMs are better than the baseline (the standard term-term co-occurrence matrix), and the improvement is always significant. The best improvement for the MRR in English is obtained by LSA (+180%), while in Italian by LSARI (+161%). We also showed that combining the DSMs with overlap based measures via CombSum the ranking is significantly better than the baseline obtained by the overlap measures alone. For English we have obtained an improvement in MRR of about 16% and for Italian, we achieve a even higher improvement in MRR of 26%. Finally, adopting RankNet for combining the overlap features and the DSMs features, improves the MRR of about 13%. More details can be found in [1]. In order to investigate the effectiveness of the semantic features, we still need to incorporate other semantic features, such as ESA, LDA and other state-of-the-art linguistic features. Other operators for semantic compositionality, like product, tensor product and circular convolution, will also be investigated.
   Moreover we will experiment on different datasets, focusing mainly on non-factoid QA. The Yahoo! Answers Manner Questions datasets are a good starting point. A new dataset will also be collected with questions from the users of Wikiedi (a QA system over Wikipedia articles, www.wikiedi.it) and answers in the form of paragraphs from Wikipedia pages.
Task differentiation for personal search evaluation BIBFull-Text 1147
  Seyedeh Sargol Sadeghi
The role of current working context in professional search BIBAFull-Text 1148
  Maya Sappelli
Today's working world of knowledge workers is changing rapidly. The available information that they need to process is ever growing. In addition, the characteristics of their work are changing as people can and do their work from home. This has resulted in the need to support knowledge workers in order to prevent burnouts. The project SWELL (http://www.swell-project.net) targets this by developing systems that support user's mental and physical well-being at work and at home. In the PhD project presented in this abstract we aim at maintaining well-being at work through information support.
How far will you go?: characterizing and predicting online search stopping behavior using information scent and need for cognition BIBFull-Text 1149
  Wan-Ching Wu
Effective approaches to retrieving and using expertise in social media BIBAFull-Text 1150
  Reyyan Yeniterzi
Expert retrieval has been widely studied especially after the introduction of Expert Finding task in the TREC's Enterprise Track in 2005 [3]. This track provided two different test collections crawled from two organizations' public-facing websites and internal emails which led to the development of many state-of-the-art algorithms on expert retrieval [1]. Until recently, these datasets were considered good representatives of the information resources available within enterprise. However, the recent growth of social media also influenced the work environment, and social media became a common communication and collaboration tool within organizations. According to a recent survey by McKinsey Global Institute [2], 29% of the companies use at least one social media tool for matching their employees to tasks, and 26% of them assess their employees' performance by using social media. This shows that intra-organizational social media became an important resource to identify expertise within organizations.
   In recent years, in addition to the intra-organizational social media, public social media tools like Twitter, Facebook, LinkedIn also became common environments for searching expertise. These tools provide an opportunity for their users to show their specific skills to the world which motivates recruiters to look for talented job candidates on social media, or writers and reporters to find experts for consulting on specific topics they are working on. With these motivations in mind, in this work we propose to develop expert retrieval algorithms for intra-organizational and public social media tools.
   Social media datasets have both challenges and advantages. In terms of challenges, they do not always contain context on one specific domain, instead one social media tool may contain discussions on technical stuff, hobbies or news concurrently. They may also contain spam posts or advertisements. Compared to well-edited enterprise documents, they are much more informal in language. Furthermore, depending on the social media platform, they may have limits on the number of characters used in posts. Even though they include the challenges stated above, they also bring some unique authority signals, such as votes, comments, follower/following information, which can be useful in estimating expertise. Furthermore, compared to previously used enterprise documents, social media provides clear associations between documents and candidates in the context of authorship information. In this work, we propose to develop expert retrieval approaches which will handle these challenges while making use of the advantages.
   Expert retrieval is a very useful application by itself; furthermore, it can be a step towards improving other social media applications. Social media is different than other web based tools mainly because it is dependent on its users. In social media, users are not just content consumers, but they are also the primary and sometimes the only content creators. Therefore, the quality of any user-generated content in social media depends on its creator. In this thesis, we propose to use expertise of users in order to improve the existing applications so that they can estimate the relevancy of a content not just based on the content, but also based on the expertise of the content creator. By using expertise of the content generator, we also hope to boost contents that are more reliable. We propose to apply this user's expertise information in order to improve ad-hoc search and question answering applications in social media. In this work, previous TREC enterprise datasets, available intra-organizational social media and public social media datasets will be used to test the proposed algorithms.