HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 040506070809101112131415

Proceedings of the 2014 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Shlomo Geva; Andrew Trotman; Peter Bruza; Charles L. A. Clarke; Kal Järvelin
Location:Gold Coast, Australia
Dates:2014-Jul-06 to 2014-Jul-11
Publisher:ACM
Standard No:ISBN: 978-1-4503-2257-7; ACM DL: Table of Contents; hcibib: IR14
Papers:229
Pages:1298
Links:Conference Website
  1. Athena award lecture
  2. Session 1a: risks and rewards
  3. Session 1b: #microblog #sigir2014
  4. Session 1c: recommendation
  5. Session 2a: (i can't get no) satisfaction
  6. Session 2b: doctors and lawyers
  7. Session 2c: hashing and efficiency
  8. Session 3a: Social media
  9. Session 3b: indexing and efficiency
  10. Session 3c: e pluribus unum
  11. Plenary address
  12. Session 4a: think globally, act locally
  13. Session 4b: scientia potentia est
  14. Session 4c: more hashing
  15. Session 5a: brains!!!
  16. Session 5b0: auto-completio
  17. Session 5b1: how to win friends and influence people
  18. Session 5c: collaborative complex personalization
  19. Plenary address
  20. Session 6a: #moremicroblog #sigir2014
  21. Session 6b: scents and sensibility
  22. Session 6c: users vs. models
  23. Session 7a: sentiments
  24. Session 7b: more like those
  25. Session 7c: signs and symbols
  26. Session 8a: picture this
  27. Session 8b: time and tide
  28. Session 8c0: summaries and semantics
  29. Session 8C1: [citation] recommendation
  30. Poster session (short papers)
  31. Demo session
  32. Doctoral consortium
  33. Tutorials
  34. Workshops

Athena award lecture

Putting searchers into search BIBAFull-Text 1-2
  Susan T. Dumais
Over the last two decades the information retrieval landscape has changed dramatically. Twenty years ago, there were fewer than 3k web sites and the earliest web search engines indexed approximately 50k pages. Today, search engines index billions of web pages, images, videos, news, music, social media, books, etc., and have become the main entry point for a wide range of information, services, communications and entertainment. Despite these tremendous accomplishments, we still have a long way to go. Many searches are unsuccessful, and even those that succeed are often harder than they should be. To address these challenges we need to extend our evaluation methods to handle the diversity of searchers, tasks, and interactivity that characterize information systems today. I will discuss recent work on user modeling and temporal dynamics of information systems to illustrate the power of utilizing converging lines of evidence from laboratory, panel, and large-scale log techniques to understand and support searchers.

Session 1a: risks and rewards

Modelling interaction with economic models of search BIBAFull-Text 3-12
  Leif Azzopardi
Understanding how people interact when searching is central to the study of Interactive Information Retrieval (IIR). Most of the prior work has either been conceptual, observational or empirical. While this has led to numerous insights and findings regarding the interaction between users and systems, the theory has lagged behind. In this paper, we extend the recently proposed search economic theory to make the model more realistic. We then derive eight interaction based hypotheses regarding search behaviour. To validate the model, we explore whether the search behaviour of thirty-six participants from a lab based study is consistent with the theory. Our analysis shows that observed search behaviours are in line with predicted search behaviours and that it is possible to provide credible explanations for such behaviours. This work describes a concise and compact representation of search behaviour providing a strong theoretical basis for future IIR research.
Query-performance prediction: setting the expectations straight BIBAFull-Text 13-22
  Fiana Raiber; Oren Kurland
The query-performance prediction task has been described as estimating retrieval effectiveness in the absence of relevance judgments. The expectations throughout the years were that improved prediction techniques would translate to improved retrieval approaches. However, this has not yet happened. Herein we provide an in-depth analysis of why this is the case. To this end, we formalize the prediction task in the most general probabilistic terms. Using this formalism we draw novel connections between tasks -- and methods used to address these tasks -- in federated search, fusion-based retrieval, and query-performance prediction. Furthermore, using formal arguments we show that the ability to estimate the probability of effective retrieval with no relevance judgments (i.e., to predict performance) implies knowledge of how to perform effective retrieval. We also explain why the expectation that using previously proposed query-performance predictors would help to improve retrieval effectiveness was not realized. This is due to a misalignment with the actual goal for which these predictors were devised: ranking queries based on the presumed effectiveness of using them for retrieval over a corpus with a specific retrieval method. Focusing on this specific prediction task, namely query ranking by presumed effectiveness, we present a novel learning-to-rank-based approach that uses Markov Random Fields. The resultant prediction quality substantially transcends that of state-of-the-art predictors.
Hypothesis testing for the risk-sensitive evaluation of retrieval systems BIBAFull-Text 23-32
  B. Taner Dinçer; Craig Macdonald; Iadh Ounis
The aim of risk-sensitive evaluation is to measure when a given information retrieval (IR) system does not perform worse than a corresponding baseline system for any topic. This paper argues that risk-sensitive evaluation is akin to the underlying methodology of the Student's t test for matched pairs. Hence, we introduce a risk-reward tradeoff measure TRisk that generalises the existing URisk measure (as used in the TREC 2013 Web track's risk-sensitive task) while being theoretically grounded in statistical hypothesis testing and easily interpretable. In particular, we show that TRisk is a linear transformation of the t statistic, which is the test statistic used in the Student's t test. This inherent relationship between TRisk and the t statistic, turns risk-sensitive evaluation from a descriptive analysis to a fully-fledged inferential analysis. Specifically, we demonstrate using past TREC data, that by using the inferential analysis techniques introduced in this paper, we can (1) decide whether an observed level of risk for an IR system is statistically significant, and thereby infer whether the system exhibits a real risk, and (2) determine the topics that individually lead to a significant level of risk. Indeed, we show that the latter permits a state-of-the-art learning to rank algorithm (LambdaMART) to focus on those topics in order to learn effective yet risk-averse ranking systems.

Session 1b: #microblog #sigir2014

Temporal feedback for tweet search with non-parametric density estimation BIBAFull-Text 33-42
  Miles Efron; Jimmy Lin; Jiyin He; Arjen de Vries
This paper investigates the temporal cluster hypothesis: in search tasks where time plays an important role, do relevant documents tend to cluster together in time? We explore this question in the context of tweet search and temporal feedback: starting with an initial set of results from a baseline retrieval model, we estimate the temporal density of relevant documents, which is then used for result reranking. Our contributions lie in a method to characterize this temporal density function using kernel density estimation, with and without human relevance judgments, and an approach to integrating this information into a standard retrieval model. Experiments on TREC datasets confirm that our temporal feedback formulation improves search effectiveness, thus providing support for our hypothesis. Our approach out-performs both a standard baseline and previous temporal retrieval models. Temporal feedback improves over standard lexical feedback (with and without human judgments), illustrating that temporal relevance signals exist independently of document content.
Fine-grained location extraction from tweets with temporal awareness BIBAFull-Text 43-52
  Chenliang Li; Aixin Sun
Twitter is a popular platform for sharing activities, plans, and opinions. Through tweets, users often reveal their location information and short term visiting plans. In this paper, we are interested in extracting fine-grained locations mentioned in tweets with temporal awareness. More specifically, we like to extract each point-of-interest (POI) mention in a tweet and predict whether the user has visited, is currently at, or will soon visit this POI. Our proposed solution, named PETAR, consists of two main components: a POI inventory and a time-aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of Foursquare community. It contains not only the formal names of POIs but also the informal abbreviations. The POI tagger, based on Conditional Random Field (CRF) model, is designed to simultaneously identify the POIs and resolve their associated temporal awareness. In our experiments, we investigated four types of features (i.e., lexical, grammatical, geographical, and BILOU schema features) for time-aware POI extraction. With the four types of features, PETAR achieves promising extraction accuracy and outperforms all baseline methods.
Collaborative personalized Twitter search with topic-language models BIBAFull-Text 53-62
  Jan Vosecky; Kenneth Wai-Ting Leung; Wilfred Ng
The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user's search query, delivering content that is relevant to her interests is a challenging problem. Traditional methods for personalized Web search are insufficient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. In particular, social interactions between users need to be considered, in order to accurately model user's interests, alleviate data sparseness and tackle the cold-start problem. In this paper, we therefore propose a novel framework for Collaborative Personalized Twitter Search. At its core, we develop a collaborative user model, which exploits the user's social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. Our framework integrates a variety of information about the user's preferences in a principled manner. A thorough evaluation is conducted using two personalized Twitter search query logs, demonstrating a superior ranking performance of our framework compared with state-of-the-art baselines.

Session 1c: recommendation

Gaussian process factorization machines for context-aware recommendations BIBAFull-Text 63-72
  Trung V. Nguyen; Alexandros Karatzoglou; Linas Baltrunas
Context-aware recommendation (CAR) can lead to significant improvements in the relevance of the recommended items by modeling the nuanced ways in which context influences preferences. The dominant approach in context-aware recommendation has been the multidimensional latent factors approach in which users, items, and context variables are represented as latent features in low-dimensional space. An interaction between a user, item, and a context variable is typically modeled as some linear combination of their latent features. However, given the many possible types of interactions between user, items and contextual variables, it may seem unrealistic to restrict the interactions among them to linearity.
   To address this limitation, we develop a novel and powerful non-linear probabilistic algorithm for context-aware recommendation using Gaussian processes. The method which we call Gaussian Process Factorization Machines (GPFM) is applicable to both the explicit feedback setting (e.g. numerical ratings as in the Netflix dataset) and the implicit feedback setting (i.e. purchases, clicks). We derive stochastic gradient descent optimization to allow scalability of the model. We test GPFM on five different benchmark contextual datasets. Experimental results demonstrate that GPFM outperforms state-of-the-art context-aware recommendation methods.
Addressing cold start in recommender systems: a semi-supervised co-training algorithm BIBAFull-Text 73-82
  Mi Zhang; Jie Tang; Xuchen Zhang; Xiangyang Xue
Cold start is one of the most challenging problems in recommender systems. In this paper we tackle the cold-start problem by proposing a context-aware semi-supervised co-training method named CSEL. Specifically, we use a factorization model to capture fine-grained user-item context. Then, in order to build a model that is able to boost the recommendation performance by leveraging the context, we propose a semi-supervised ensemble learning algorithm. The algorithm constructs different (weak) prediction models using examples with different contexts and then employs the co-training strategy to allow each (weak) prediction model to learn from the other prediction models. The method has several distinguished advantages over the standard recommendation methods for addressing the cold-start problem. First, it defines a fine-grained context that is more accurate for modeling the user-item preference. Second, the method can naturally support supervised learning and semi-supervised learning, which provides a flexible way to incorporate the unlabeled data.
   The proposed algorithms are evaluated on two real-world datasets. The experimental results show that with our method the recommendation accuracy is significantly improved compared to the standard algorithms and the cold-start problem is largely alleviated.
Explicit factor models for explainable recommendation based on phrase-level sentiment analysis BIBAFull-Text 83-92
  Yongfeng Zhang; Guokun Lai; Min Zhang; Yi Zhang; Yiqun Liu; Shaoping Ma
Collaborative Filtering (CF)-based recommendation algorithms, such as Latent Factor Models (LFM), work well in terms of prediction accuracy. However, the latent features make it difficulty to explain the recommendation results to the users. Fortunately, with the continuous growth of online user reviews, the information available for training a recommender system is no longer limited to just numerical star ratings or user/item features. By extracting explicit user opinions about various aspects of a product from the reviews, it is possible to learn more details about what aspects a user cares, which further sheds light on the possibility to make explainable recommendations.
   In this work, we propose the Explicit Factor Model (EFM) to generate explainable recommendations, meanwhile keep a high prediction accuracy. We first extract explicit product features (i.e. aspects) and user opinions by phrase-level sentiment analysis on user reviews, then generate both recommendations and disrecommendations according to the specific product features to the user's interests and the hidden features learned. Besides, intuitional feature-level explanations about why an item is or is not recommended are generated from the model. Offline experimental results on several real-world datasets demonstrate the advantages of our framework over competitive baseline algorithms on both rating prediction and top-K recommendation tasks. Online experiments show that the detailed explanations make the recommendations and disrecommendations more influential on user's purchasing behavior.

Session 2a: (i can't get no) satisfaction

Context-aware web search abandonment prediction BIBAFull-Text 93-102
  Yang Song; Xiaolin Shi; Ryen White; Ahmed Hassan Awadallah
Web search queries without hyperlink clicks are often referred to as abandoned queries. Understanding the reasons for abandonment is crucial for search engines in evaluating their performance. Abandonment can be categorized as good or bad depending on whether user information needs are satisfied by result page content. Previous research has sought to understand abandonment rationales via user surveys, or has developed models to predict those rationales using behavioral patterns. However, these models ignore important contextual factors such as the relationship between the abandoned query and prior abandonment instances. We propose more advanced methods for modeling and predicting abandonment rationales using contextual information from user search sessions by analyzing search engine logs, and discover dependencies between abandoned queries and user behaviors. We leverage these dependency signals to build a sequential classifier using a structured learning framework designed to handle such signals. Our experimental results show that our approach is 22% more accurate than the state-of-the-art abandonment-rationale classifier. Going beyond prediction, we leverage the prediction results to significantly improve relevance using instances of predicted good and bad abandonment.
Impact of response latency on user behavior in web search BIBAFull-Text 103-112
  Ioannis Arapakis; Xiao Bai; B. Barla Cambazoglu
Traditionally, the efficiency and effectiveness of search systems have both been of great interest to the information retrieval community. However, an in-depth analysis on the interplay between the response latency of web search systems and users' search experience has been missing so far. In order to fill this gap, we conduct two separate studies aiming to reveal how response latency affects the user behavior in web search. First, we conduct a controlled user study trying to understand how users perceive the response latency of a search system and how sensitive they are to increasing delays in response. This study reveals that, when artificial delays are introduced into the response, the users of a fast search system are more likely to notice these delays than the users of a slow search system. The introduced delays become noticeable by the users once they exceed a certain threshold value. Second, we perform an analysis using a large-scale query log obtained from Yahoo web search to observe the potential impact of increasing response latency on the click behavior of users. This analysis demonstrates that latency has an impact on the click behavior of users to some extent. In particular, given two content-wise identical search result pages, we show that the users are more likely to perform clicks on the result page that is served with lower latency.
Towards better measurement of attention and satisfaction in mobile search BIBAFull-Text 113-122
  Dmitry Lagun; Chih-Hung Hsieh; Dale Webster; Vidhya Navalpakkam
Web Search has seen two big changes recently: rapid growth in mobile search traffic, and an increasing trend towards providing answer-like results for relatively simple information needs (e.g., [weather today]). Such results display the answer or relevant information on the search page itself without requiring a user to click. While clicks on organic search results have been used extensively to infer result relevance and search satisfaction, clicks on answer-like results are often rare (or meaningless), making it challenging to evaluate answer quality. Together, these call for better measurement and understanding of search satisfaction on mobile devices. In this paper, we studied whether tracking the browser viewport (visible portion of a web page) on mobile phones could enable accurate measurement of user attention at scale, and provide good measurement of search satisfaction in the absence of clicks. Focusing on answer-like results in web search, we designed a lab study to systematically vary answer presence and relevance (to the user's information need), obtained satisfaction ratings from users, and simultaneously recorded eye gaze and viewport data as users performed search tasks. Using this ground truth, we identified increased scrolling past answer and increased time below answer as clear, measurable signals of user dissatisfaction with answers. While the viewport may contain three to four results at any given time, we found strong correlations between gaze duration and viewport duration on a per result basis, and that the average user attention is focused on the top half of the phone screen, suggesting that we may be able to scalably and reliably identify which specific result the user is looking at, from viewport data alone.
Modeling action-level satisfaction for search task satisfaction prediction BIBAFull-Text 123-132
  Hongning Wang; Yang Song; Ming-Wei Chang; Xiaodong He; Ahmed Hassan; Ryen W. White
Search satisfaction is a property of a user's search process. Understanding it is critical for search providers to evaluate the performance and improve the effectiveness of search engines. Existing methods model search satisfaction holistically at the search-task level, ignoring important dependencies between action-level satisfaction and overall task satisfaction. We hypothesize that searchers' latent action-level satisfaction (i.e., whether they believe they were satisfied with the results of a query or click) influences their observed search behaviors and contributes to overall search satisfaction. We conjecture that by modeling search satisfaction at the action level, we can build more complete and more accurate predictors of search-task satisfaction. To do this, we develop a latent structural learning method, whereby rich structured features and dependency relations unique to search satisfaction prediction are explored. Using in-situ search satisfaction judgments provided by searchers, we show that there is significant value in modeling action-level satisfaction in search-task satisfaction prediction. In addition, experimental results on large-scale logs from Bing.com demonstrate clear benefit from using inferred action satisfaction labels for other applications such as document relevance estimation and query suggestion.

Session 2b: doctors and lawyers

Circumlocution in diagnostic medical queries BIBAFull-Text 133-142
  Isabelle Stanton; Samuel Ieong; Nina Mishra
Circumlocution is when many words are used to describe what could be said with fewer, e.g., "a machine that takes moisture out of the air" instead of "dehumidifier." Web search is a perfect backdrop for circumlocution where people struggle to name what they seek. In some domains, not knowing the correct term can have a significant impact on the search results that are retrieved. We study the medical domain, where professional medical terms are not commonly known and where the consequence of not knowing the correct term can impact the accuracy of surfaced information, as well as escalation of anxiety, and ultimately the medical care sought. Given a free-form colloquial health search query, our objective is to find the underlying professional medical term. The problem is complicated by the fact that people issue quite varied queries to describe what they have. Machine-learning algorithms can be brought to bear on the problem, but there are two key complexities: creating high-quality training data and identifying predictive features. To our knowledge, no prior work has been able to crack this important problem due to the lack of training data. We give novel solutions and demonstrate their efficacy via extensive experiments, greatly improving over the prior art.
Interactions between health searchers and search engines BIBAFull-Text 143-152
  Georg P. Schoenherr; Ryen W. White
The Web is an important resource for understanding and diagnosing medical conditions. Based on exposure to online content, people may develop undue health concerns, believing that common and benign symptoms are explained by serious illnesses. In this paper, we investigate potential strategies to mine queries and searcher histories for clues that could help search engines choose the most appropriate information to present in response to exploratory medical queries. To do this, we performed a longitudinal study of health search behavior using the logs of a popular Web search engine. We found that query variations which might appear innocuous (e.g. "bad headache" vs "severe headache") may hold valuable information about the searcher which could be used by search engines to improve performance. Furthermore, we investigated how medically-concerned users respond differently to search engine result pages (SERPs) and find that their disposition for clicking on concerning pages is pronounced, potentially leading to a self-reinforcement of concern. Finally, we studied to which degree variations in the SERP impact future search and real-world health-seeking behavior and obtained some surprising results (e.g., viewing concerning pages may lead to a short-term reduction of in-world healthcare utilization).
Evaluation of machine-learning protocols for technology-assisted review in electronic discovery BIBAFull-Text 153-162
  Gordon V. Cormack; Maura R. Grossman
Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our comparison addresses a central question in the deployment of technology-assisted review: Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning? On eight review tasks -- four derived from the TREC 2009 Legal Track and four derived from actual legal matters -- recall was measured as a function of human review effort. The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P<0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents. Among passive-learning methods, significantly less human review effort (P<0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of "stabilization" -- determining when training is adequate, and therefore may stop.
ReQ-ReC: high recall retrieval with query pooling and interactive classification BIBAFull-Text 163-172
  Cheng Li; Yue Wang; Paul Resnick; Qiaozhu Mei
We consider a scenario where a searcher requires both high precision and high recall from an interactive retrieval process. Such scenarios are very common in real life, exemplified by medical search, legal search, market research, and literature review. When access to the entire data set is available, an active learning loop could be used to ask for additional relevance feedback labels in order to refine a classifier. When data is accessed via search services, however, only limited subsets of the corpus can be considered, subsets defined by queries. In that setting, relevance feedback has been used in a query enhancement loop that updates a query.
   We describe and demonstrate the effectiveness of ReQ-ReC (ReQuery-ReClassify), a double-loop retrieval system that combines iterative expansion of a query set with iterative refinements of a classifier. This permits a separation of concerns, where the query selector's job is to enhance recall while the classifier's job is to maximize precision on the items that have been retrieved by any of the queries so far. The overall process alternates between the query enhancement loop, to increase recall, and the classifier refinement loop, to increase precision. The separation allows the query enhancement process to explore larger parts of the query space. Our experiments show that this distribution of work significantly outperforms previous relevance feedback methods that rely on a single ranking function to balance precision and recall.

Session 2c: hashing and efficiency

Supervised hashing with latent factor models BIBAFull-Text 173-182
  Peichao Zhang; Wei Zhang; Wu-Jun Li; Minyi Guo
Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in large-scale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) structure of the training data is preserved. Very recently, supervised hashing methods, which try to preserve the semantic structure constructed from the semantic labels of the training points, have exhibited higher accuracy than unsupervised methods. In this paper, we propose a novel supervised hashing method, called latent factor hashing (LFH), to learn similarity-preserving binary codes based on latent factor models. An algorithm with convergence guarantee is proposed to learn the parameters of LFH. Furthermore, a linear-time variant with stochastic learning is proposed for training LFH on large-scale datasets. Experimental results on two large datasets with semantic labels show that LFH can achieve superior accuracy than state-of-the-art methods with comparable training time.
Preference preserving hashing for efficient recommendation BIBAFull-Text 183-192
  Zhiwei Zhang; Qifan Wang; Lingyun Ruan; Luo Si
Recommender systems usually need to compare a large number of items before users' most preferred ones can be found This process can be very costly if recommendations are frequently made on large scale datasets. In this paper, a novel hashing algorithm, named Preference Preserving Hashing (PPH), is proposed to speed up recommendation. Hashing has been widely utilized in large scale similarity search (e.g. similar image search), and the search speed with binary hashing code is significantly faster than that with real-valued features. However, one challenge of applying hashing to recommendation is that, recommendation concerns users' preferences over items rather than their similarities. To address this challenge, PPH contains two novel components that work with the popular matrix factorization (MF) algorithm. In MF, users' preferences over items are calculated as the inner product between the learned real-valued user/item features. The first component of PPH constrains the learning process, so that users' preferences can be well approximated by user-item similarities. The second component, which is a novel quantization algorithm, generates the binary hashing code from the learned real-valued user/item features. Finally, recommendation can be achieved efficiently via fast hashing code search. Experiments on three real world datasets show that the recommendation speed of the proposed PPH algorithm can be hundreds of times faster than original MF with real-valued features, and the recommendation accuracy is significantly better than previous work of hashing for recommendation.
Load balancing for partition-based similarity search BIBAFull-Text 193-202
  Xun Tang; Maha Alabduljalil; Xin Jin; Tao Yang
All pairs similarity search, used in many data mining and information retrieval applications, is a time consuming process. Although a partition-based approach accelerates this process by simplifying parallelism management and avoiding unnecessary I/O and comparison, it is still challenging to balance the computation load among parallel machines with a distributed architecture. This is mainly due to the variation in partition sizes and irregular dissimilarity relationship in large datasets. This paper presents a two-stage heuristic algorithm to improve the load balance and shorten the overall processing time. We analyze the optimality and competitiveness of the proposed algorithm and demonstrates its effectiveness using several datasets. We also describe a static partitioning algorithm to even out the partition sizes while detecting more dissimilar pairs. The evaluation results show that the proposed scheme outperforms a previously developed solution by up to 41% in the tested cases.
Estimating global statistics for unstructured P2P search in the presence of adversarial peers BIBAFull-Text 203-212
  Sami Richardson; Ingemar J. Cox
A common problem in unstructured peer-to-peer (P2P) information retrieval is the need to compute global statistics of the full collection, when only a small subset of the collection is visible to a peer. Without accurate estimates of these statistics, the effectiveness of modern retrieval models can be reduced. We show that for the case of a probably approximately correct P2P architecture, and using either the BM25 retrieval model or a language model with Dirichlet smoothing, very close approximations of the required global statistics can be estimated with very little overhead and a small extension to the protocol. However, through theoretical modeling and simulations we demonstrate this technique also greatly increases the ability for adversarial peers to manipulate search results. We show an adversary controlling fewer than 10% of peers can censor or increase the rank of documents, or disrupt overall search results. As a defense, we propose a simple modification to the extension, and show global statistics estimation is viable even when up to 40% of peers are adversarial.

Session 3a: Social media

Hierarchical multi-label classification of social text streams BIBAFull-Text 213-222
  Zhaochun Ren; Maria-Hendrike Peetz; Shangsong Liang; Willemijn van Dolen; Maarten de Rijke
Hierarchical multi-label classification assigns a document to multiple hierarchical classes. In this paper we focus on hierarchical multi-label classification of social text streams. Concept drift, complicated relations among classes, and the limited length of documents in social text streams make this a challenging problem. Our approach includes three core ingredients: short document expansion, time-aware topic tracking, and chunk-based structural learning. We extend each short document in social text streams to a more comprehensive representation via state-of-the-art entity linking and sentence ranking strategies. From documents extended in this manner, we infer dynamic probabilistic distributions over topics by dividing topics into dynamic "global" topics and "local" topics. For the third and final phase we propose a chunk-based structural optimization strategy to classify each document into multiple classes. Extensive experiments conducted on a large real-world dataset show the effectiveness of our proposed method for hierarchical multi-label classification of social text streams.
An adaptive teleportation random walk model for learning social tag relevance BIBAFull-Text 223-232
  Xiaofei Zhu; Wolfgang Nejdl; Mihai Georgescu
Social tags are known to be a valuable source of information for image retrieval and organization. However, contrary to the conventional document retrieval, rich tag frequency information in social sharing systems, such as Flickr, is not available, thus we cannot directly use the tag frequency (analogous to the term frequency in a document) to represent the relevance of tags. Many heuristic approaches have been proposed to address this problem, among which the well-known neighbor voting based approaches are the most effective methods. The basic assumption of these methods is that a tag is considered as relevant to the visual content of a target image if this tag is also used to annotate the visual neighbor images of the target image by lots of different users. The main limitation of these approaches is that they treat the voting power of each neighbor image either equally or simply based on its visual similarity. In this paper, we cast the social tag relevance learning problem as an adaptive teleportation random walk process on the voting graph. In particular, we model the relationships among images by constructing a voting graph, and then propose an adaptive teleportation random walk, in which a confidence factor is introduced to control the teleportation probability, on the voting graph. Through this process, direct and indirect relationships among images can be explored to cooperatively estimate the tag relevance. To quantify the performance of our approach, we compare it with state-of-the-art methods on two publicly available datasets (NUS-WIDE and MIR Flickr). The results indicate that our method achieves substantial performance gains on these datasets.
Predicting the popularity of web 2.0 items based on user comments BIBAFull-Text 233-242
  Xiangnan He; Ming Gao; Min-Yen Kan; Yiqun Liu; Kazunari Sugiyama
In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for Web 2.0 items. Incorporating future popularity into ranking is one way to counter this. However, predicting popularity as a third party (as in the case of general search engines) is difficult in practice, due to their limited access to item view histories. To enable popularity prediction externally without excessive crawling, we propose an alternative solution by leveraging user comments, which are more accessible than view counts. Due to the sparsity of comments, traditional solutions that are solely based on view histories do not perform well. To deal with this sparsity, we mine comments to recover additional signal, such as social influence. By modeling comments as a time-aware bipartite graph, we propose a regularization-based ranking algorithm that accounts for temporal, social influence and current popularity factors to predict the future popularity of items. Experimental results on three real-world datasets -- crawled from YouTube, Flickr and Last.fm -- show that our method consistently outperforms competitive baselines in several evaluation tasks.
Recommending social media content to community owners BIBAFull-Text 243-252
  Inbal Ronen; Ido Guy; Elad Kravi; Maya Barnea
Online communities within the enterprise offer their leaders an easy and accessible way to attract, engage, and influence others. Our research studies the recommendation of social media content to leaders (owners) of online communities within the enterprise. We developed a system that suggests to owners new content from outside the community, which might interest the community members. As online communities are taking a central role in the pervasion of social media to the enterprise, sharing such recommendations can help owners create a more lively and engaging community. We compared seven different methods for generating recommendations, including content-based, member-based, and hybridization of the two. For member-based recommendations, we experimented with three groups: owners, active members, and regular members. Our evaluation is based on a survey in which 851 community owners rated a total of 8,218 recommended content items. We analyzed the quality of the different recommendation methods and examined the effect of different community characteristics, such as type and size.

Session 3b: indexing and efficiency

Predictive parallelization: taming tail latencies in web search BIBAFull-Text 253-262
  Myeongjae Jeon; Saehoon Kim; Seung-won Hwang; Yuxiong He; Sameh Elnikety; Alan L. Cox; Scott Rixner
Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization framework in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.
Skewed partial bitvectors for list intersection BIBAFull-Text 263-272
  Andrew Kane; Frank Wm. Tompa
This paper examines the space-time performance of in-memory conjunctive list intersection algorithms, as used in search engines, where integers represent document identifiers. We demonstrate that the combination of bitvectors, large skips, delta compressed lists and URL ordering produces superior results to using skips or bitvectors alone. We define semi-bitvectors, a new partial bitvector data structure that stores the front of the list using a bitvector and the remainder using skips and delta compression. To make it particularly effective, we propose that documents be ordered so as to skew the postings lists to have dense regions at the front. This can be accomplished by grouping documents by their size in a descending manner and then reordering within each group using URL ordering. In each list, the division point between bitvector and delta compression can occur at any group boundary. We explore the performance of semi-bitvectors using the GOV2 dataset for various numbers of groups, resulting in significant space-time improvements over existing approaches. Semi-bitvectors do not directly support ranking. Indeed, bitvectors are not believed to be useful for ranking based search systems, because frequencies and offsets cannot be included in their structure. To refute this belief, we propose several approaches to improve the performance of ranking-based search systems using bitvectors, and leave their verification for future work. These proposals suggest that bitvectors, and more particularly semi-bitvectors, warrant closer examination by the research community.
Partitioned Elias-Fano indexes BIBAFull-Text 273-282
  Giuseppe Ottaviano; Rossano Venturini
The Elias-Fano representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is competitive with some state-of-the-art methods such as gamma-delta-Golomb codes and PForDelta, it fails to exploit the local clustering that inverted lists usually exhibit, namely the presence of long subsequences of close identifiers. In this paper we describe a new representation based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure. This partitioning enables the encoding to better adapt to the local statistics of the chunk, thus exploiting clustering and improving compression. We present two partition strategies, respectively with fixed and variable-length chunks. For the latter case we introduce a linear-time optimization algorithm which identifies the minimum-space partition up to an arbitrarily small approximation factor.
   We show that our partitioned Elias-Fano indexes offer significantly better compression than plain Elias-Fano, while preserving their query time efficiency. Furthermore, compared with other state-of-the-art compressed encodings, our indexes exhibit the best compression ratio/query time trade-off.
Principled dictionary pruning for low-memory corpus compression BIBAFull-Text 283-292
  Jiancong Tong; Anthony Wirth; Justin Zobel
Compression of collections, such as text databases, can both reduce space consumption and increase retrieval efficiency, through better caching and better exploitation of the memory hierarchy. A promising technique is relative Lempel-Ziv coding, in which a sample of material from the collection serves as a static dictionary; in previous work, this method demonstrated extremely fast decoding and good compression ratios, while allowing random access to individual items. However, there is a trade-off between dictionary size and compression ratio, motivating the search for a compact, yet similarly effective, dictionary. In previous work it was observed that, since the dictionary is generated by sampling, some of it (selected substrings) may be discarded with little loss in compression. Unfortunately, simple dictionary pruning approaches are ineffective. We develop a formal model of our approach, based on generating an optimal dictionary for a given collection within a memory bound. We generate measures for identification of low-value substrings in the dictionary, and show on a variety of sizes of text collection that halving the dictionary size leads to only marginal loss in compression ratio. This is a dramatic improvement on previous approaches.

Session 3c: e pluribus unum

Learning for search result diversification BIBAFull-Text 293-302
  Yadong Zhu; Yanyan Lan; Jiafeng Guo; Xueqi Cheng; Shuzi Niu
Search result diversification has gained attention as a way to tackle the ambiguous or multi-faceted information needs of users. Most existing methods on this problem utilize a heuristic predefined ranking function, where limited features can be incorporated and extensive tuning is required for different settings. In this paper, we address search result diversification as a learning problem, and introduce a novel relational learning-to-rank approach to formulate the task. However, the definitions of ranking function and loss function for the diversification problem are challenging. In our work, we firstly show that diverse ranking is in general a sequential selection process from both empirical and theoretical aspects. On this basis, we define ranking function as the combination of relevance score and diversity score between the current document and those previously selected, and loss function as the likelihood loss of ground truth based on Plackett-Luce model, which can naturally model the sequential generation of a diverse ranking list. Stochastic gradient descent is then employed to conduct the unconstrained optimization, and the prediction of a diverse ranking list is provided by a sequential selection process based on the learned ranking function. The experimental results on the public TREC datasets demonstrate the effectiveness and robustness of our approach.
Fusion helps diversification BIBAFull-Text 303-312
  Shangsong Liang; Zhaochun Ren; Maarten de Rijke
A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.
Utilizing relevance feedback in fusion-based retrieval BIBAFull-Text 313-322
  Ella Rabinovich; Ofri Rom; Oren Kurland
Work on using relevance feedback for retrieval has focused on the single retrieved list setting. That is, an initial document list is retrieved in response to the query and feedback for the most highly ranked documents is used to perform a second search. We address a setting wherein the list for which feedback is provided results from fusing several intermediate retrieved lists. Accordingly, we devise methods that utilize the feedback while exploiting the special characteristics of the fusion setting. Specifically, the feedback serves two different, yet complementary, purposes. The first is to directly rank the pool of documents in the intermediate lists. The second is to estimate the effectiveness of the intermediate lists for improved re-fusion. In addition, we present a meta fusion method that uses the feedback for these two purposes simultaneously. Empirical evaluation demonstrates the merits of our approach. As a case in point, the retrieval performance is substantially better than that of using the relevance feedback as in the single list setting. The performance also substantially transcends that of a previously proposed approach to utilizing relevance feedback in fusion-based retrieval.
A simple term frequency transformation model for effective pseudo relevance feedback BIBAFull-Text 323-332
  Zheng Ye; Jimmy Xiangji Huang
Pseudo Relevance Feedback is an effective technique to improve the performance of ad-hoc information retrieval. Traditionally, the expansion terms are extracted either according to the term distributions in the feedback documents; or according to both the term distributions in the feedback documents and in the whole document collection. However, most of the existing models employ a single term frequency normalization mechanism or criteria that cannot take into account various aspects of a term's saliency in the feedback documents. In this paper, we propose a simple and heuristic, but effective model, in which three term frequency transformation techniques are integrated to capture the saliency of a candidate term associated with the original query terms in the feedback documents. Through evaluations and comparisons on six TREC collections, we show that our proposed model is effective and generally superior to the recent progress of relevance feedback models.

Plenary address

Seeking simplicity in search user interfaces BIBAFull-Text 333-334
  Marti A. Hearst
It is rare for a new user interface to break through and become successful, especially in information-intensive tasks like search, coming to consensus or building up knowledge. Most complex interfaces end up going unused. Often the successful solution lies in a previously unexplored part of the interface design space that is simple in a new way that works just right. In this talk I will give examples of such successes in the information-intensive interface design space, and attempt to provide stimulating ideas for future research directions.

Session 4a: think globally, act locally

Who is the barbecue king of Texas?: a geo-spatial approach to finding local experts on Twitter BIBAFull-Text 335-344
  Zhiyuan Cheng; James Caverlee; Himanshu Barthwal; Vandana Bachani
This paper addresses the problem of identifying local experts in social media systems like Twitter. Local experts -- in contrast to general topic experts -- have specialized knowledge focused around a particular location, and are important for many applications including answering local information needs and interacting with community experts. And yet identifying these experts is difficult. Hence in this paper, we propose a geo-spatial-driven approach for identifying local experts that leverages the fine-grained GPS coordinates of millions of Twitter users. We propose a local expertise framework that integrates both users' topical expertise and their local authority. Concretely, we estimate a user's local authority via a novel spatial proximity expertise approach that leverages over 15 million geo-tagged Twitter lists. We estimate a user's topical expertise based on expertise propagation over 600 million geo-tagged social connections on Twitter. We evaluate the proposed approach across 56 queries coupled with over 11,000 individual judgments from Amazon Mechanical Turk. We find significant improvement over both general (non-local) expert approaches and comparable local expert finding approaches.
Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction BIBAFull-Text 345-354
  Longke Hu; Aixin Sun; Yong Liu
Rating prediction is to predict the preference rating of a user to an item that she has not rated before. Using the business review data from Yelp, in this paper, we study business rating prediction. A business here can be a restaurant, a shopping mall or other kind of businesses. Different from most other types of items that have been studied in various recommender systems (e.g., movie, song, book), a business physically exists at a geographical location, and most businesses have geographical neighbors within walking distance. When a user visits a business, there is a good chance that she walks by its neighbors. Through data analysis, we observe that there exists weak positive correlation between a business's ratings and its neighbors' ratings, regardless of the categories of businesses. Based on this observation, we assume that a user's rating to a business is determined by both the intrinsic characteristics of the business and the extrinsic characteristics of its geographical neighbors. Using the widely adopted latent factor model for rating prediction, in our proposed solution, we use two kinds of latent factors to model a business: one for its intrinsic characteristics and the other for its extrinsic characteristics. The latter encodes the neighborhood influence of this business to its geographical neighbors. In our experiments, we show that by incorporating geographical neighborhood influences, much lower prediction error is achieved than the state-of-the-art models including Biased MF, SVD++, and Social MF. The prediction error is further reduced by incorporating influences from business category and review content.
Processing spatial keyword query as a top-k aggregation query BIBAFull-Text 355-364
  Dongxiang Zhang; Chee-Yong Chan; Kian-Lee Tan
We examine the spatial keyword search problem to retrieve objects of interest that are ranked based on both their spatial proximity to the query location as well as the textual relevance of the object's keywords. Existing solutions for the problem are based on either using a combination of textual and spatial indexes or using specialized hybrid indexes that integrate the indexing of both textual and spatial attribute values. In this paper, we propose a new approach that is based on modeling the problem as a top-k aggregation problem which enables the design of a scalable and efficient solution that is based on the ubiquitous inverted list index. Our performance study demonstrates that our approach outperforms the state-of-the-art hybrid methods by a wide margin.

Session 4b: scientia potentia est

Entity query feature expansion using knowledge base links BIBAFull-Text 365-374
  Jeffrey Dalton; Laura Dietz; James Allan
Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches.
QUADS: question answering for decision support BIBAFull-Text 375-384
  Zi Yang; Ying Li; James Cai; Eric Nyberg
As the scale of available on-line data grows ever larger, individuals and businesses must cope with increasing complexity in decision-making processes which utilize large volumes of unstructured, semi-structured and/or structured data to satisfy multiple, interrelated information needs which contribute to an overall decision. Traditional decision support systems (DSSs) have been developed to address this need, but such systems are typically expensive to build, and are purpose-built for a particular decision-making scenario, making them difficult to extend or adapt to new decision scenarios. In this paper, we propose a novel decision representation which allows decision makers to formulate and organize natural language questions or assertions into an analytic hierarchy, which can be evaluated as part of an ad hoc decision process or as a documented, repeatable analytic process. We then introduce a new decision support framework, QUADS, which takes advantage of automatic question answering (QA) technologies to automatically understand and process a decision representation, producing a final decision by gathering and weighting answers to individual questions using a Bayesian learning and inference process. An open source framework implementation is presented and applied to two real world applications: target validation, a fundamental decision-making task for the pharmaceutical industry, and product recommendation from review texts, an everyday decision-making situation faced by on-line consumers. In both applications, we implemented and compared a number of decision synthesis algorithms, and present experimental results which demonstrate the performance of the QUADS approach versus other baseline approaches.
Topic labeled text classification: a weakly supervised approach BIBAFull-Text 385-394
  Swapnil Hingmire; Sutanu Chakraborti
Supervised text classifiers require extensive human expertise and labeling efforts. In this paper, we propose a weakly supervised text classification algorithm based on the labeling of Latent Dirichlet Allocation (LDA) topics. Our algorithm is based on the generative property of LDA. In our algorithm, we ask an annotator to assign one or more class labels to each topic, based on its most probable words. We classify a document based on its posterior topic proportions and the class labels of the topics. We also enhance our approach by incorporating domain knowledge in the form of labeled words. We evaluate our approach on four real world text classification datasets. The results show that our approach is more accurate in comparison to semi-supervised techniques from previous work. A central contribution of this work is an approach that delivers effectiveness comparable to the state-of-the-art supervised techniques in hard-to-classify domains, with very low overheads in terms of manual knowledge engineering.

Session 4c: more hashing

Discriminative coupled dictionary hashing for fast cross-media retrieval BIBAFull-Text 395-404
  Zhou Yu; Fei Wu; Yi Yang; Qi Tian; Jiebo Luo; Yueting Zhuang
Cross-media hashing, which conducts cross-media retrieval by embedding data from different modalities into a common low-dimensional Hamming space, has attracted intensive attention in recent years. The existing cross-media hashing approaches only aim at learning hash functions to preserve the intra-modality and inter-modality correlations, but do not directly capture the underlying semantic information of the multi-modal data. We propose a discriminative coupled dictionary hashing (DCDH) method in this paper. In DCDH, the coupled dictionary for each modality is learned with side information (e.g., categories). As a result, the coupled dictionaries not only preserve the intra-similarity and inter-correlation among multi-modal data, but also contain dictionary atoms that are semantically discriminative (i.e., the data from the same category is reconstructed by the similar dictionary atoms). To perform fast cross-media retrieval, we learn hash functions which map data from the dictionary space to a low-dimensional Hamming space. Besides, we conjecture that a balanced representation is crucial in cross-media retrieval. We introduce multi-view features on the relatively "weak" modalities into DCDH and extend it to multi-view DCDH (MV-DCDH) in order to enhance their representation capability. The experiments on two real-world data sets show that our DCDH and MV-DCDH outperform the state-of-the-art methods significantly on cross-media retrieval.
Active hashing with joint data example and tag selection BIBAFull-Text 405-414
  Qifan Wang; Luo Si; Zhiwei Zhang; Ning Zhang
Similarity search is an important problem in many large scale applications such as image and text retrieval. Hashing method has become popular for similarity search due to its fast search speed and low storage cost. Recent research has shown that hashing quality can be dramatically improved by incorporating supervised information, e.g. semantic tags/labels, into hashing function learning. However, most existing supervised hashing methods can be regarded as passive methods, which assume that the labeled data are provided in advance. But in many real world applications, such supervised information may not be available.
   This paper proposes a novel active hashing approach, Active Hashing with Joint Data Example and Tag Selection (AH-JDETS), which actively selects the most informative data examples and tags in a joint manner for hashing function learning. In particular, it first identifies a set of informative data examples and tags for users to label based on the selection criteria that both the data examples and tags should be most uncertain and dissimilar with each other. Then this labeled information is combined with the unlabeled data to generate an effective hashing function.
   An iterative procedure is proposed for learning the optimal hashing function and selecting the most informative data examples and tags. Extensive experiments on four different datasets demonstrate that AH-JDETS achieves good performance compared with state-of-the-art supervised hashing methods but requires much less labeling cost, which overcomes the limitation of passive hashing methods. Furthermore, experimental results also indicate that the joint active selection approach outperforms a random (non-active) selection method and active selection methods only focusing on either data examples or tags.
Latent semantic sparse hashing for cross-modal similarity search BIBAFull-Text 415-424
  Jile Zhou; Guiguang Ding; Yuchen Guo
Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.

Session 5a: brains!!!

Predicting term-relevance from brain signals BIBAFull-Text 425-434
  Manuel J.A. Eugster; Tuukka Ruotsalo; Michiel M. Spapé; Ilkka Kosunen; Oswald Barral; Niklas Ravaja; Giulio Jacucci; Samuel Kaski
Term-Relevance Prediction from Brain Signals (TRPB) is proposed to automatically detect relevance of text information directly from brain signals. An experiment with forty participants was conducted to record neural activity of participants while providing relevance judgments to text stimuli for a given topic. High-precision scientific equipment was used to quantify neural activity across 32 electroencephalography (EEG) channels. A classifier based on a multi-view EEG feature representation showed improvement up to 17% in relevance prediction based on brain signals alone. Relevance was also associated with brain activity with significant changes in certain brain areas. Consequently, TRPB is based on changes identified in specific brain areas and does not require user-specific training or calibration. Hence, relevance predictions can be conducted for unseen content and unseen participants. As an application of TRPB we demonstrate a high-precision variant of the classifier that constructs sets of relevant terms for a given unknown topic of interest. Our research shows that detecting relevance from brain signals is possible and allows the acquisition of relevance judgments without a need to observe any other user interaction. This suggests that TRPB could be used in combination or as an alternative for conventional implicit feedback signals, such as dwell time or click-through activity.
Multidimensional relevance modeling via psychometrics and crowdsourcing BIBAFull-Text 435-444
  Yinglong Zhang; Jin Zhang; Matthew Lease; Jacek Gwizdka
While many multidimensional models of relevance have been posited, prior studies have been largely exploratory rather than confirmatory. Lacking a methodological framework to quantify the relationships among factors or measure model fit to observed data, many past models could not be empirically tested or falsified. To enable more positivist experimentation, Xu and Chen [77] proposed a psychometric framework for multidimensional relevance modeling. However, we show their framework exhibits several methodological limitations which could call into question the validity of findings drawn from it. In this work, we identify and address these limitations, scale their methodology via crowdsourcing, and describe quality control methods from psychometrics which stand to benefit crowdsourcing IR studies in general. Methodology we describe for relevance judging is expected to benefit both human-centered and systems-centered IR.

Session 5b0: auto-completio

Learning user reformulation behavior for query auto-completion BIBAFull-Text 445-454
  Jyun-Yu Jiang; Yen-Yu Ke; Pao-Yu Chien; Pu-Jen Cheng
It is crucial for query auto-completion to accurately predict what a user is typing. Given a query prefix and its context (e.g., previous queries), conventional context-aware approaches often produce relevant queries to the context. The purpose of this paper is to investigate the feasibility of exploiting the context to learn user reformulation behavior for boosting prediction performance. We first conduct an in-depth analysis of how the users reformulate their queries. Based on the analysis, we propose a supervised approach to query auto-completion, where three kinds of reformulation-related features are considered, including term-level, query-level and session-level features. These features carefully capture how the users change preceding queries along the query sessions. Extensive experiments have been conducted on the large-scale query log of a commercial search engine. The experimental results demonstrate a significant improvement over 4 competitive baselines.
A two-dimensional click model for query auto-completion BIBAFull-Text 455-464
  Yanen Li; Anlei Dong; Hongning Wang; Hongbo Deng; Yi Chang; ChengXiang Zhai
Query auto-completion (QAC) facilitates faster user query input by predicting users' intended queries. Most QAC algorithms take a learning-based approach to incorporate various signals for query relevance prediction. However, such models are trained on simulated user inputs from query log data. The lack of real user interaction data in the QAC process prevents them from further improving the QAC performance. In this work, for the first time we collect a high-resolution QAC query log that records every keystroke in a QAC session. Based on this data, we discover two user behaviors, namely the horizontal skipping bias and vertical position bias which are crucial for relevance prediction in QAC. In order to better explain them, we propose a novel two-dimensional click model for modeling the QAC process with emphasis on these behaviors. Extensive experiments on our QAC data set from both PC and mobile devices demonstrate that our proposed model can accurately explain the users' behaviors in interacting with a QAC system, and the resulting relevance model significant improves the QAC performance over existing click models. Furthermore, the learned knowledge about the skipping behavior can be effectively incorporated into existing learning-based models to further improve their performance.

Session 5b1: how to win friends and influence people

On measuring social friend interest similarities in recommender systems BIBAFull-Text 465-474
  Hao Ma
Social recommender system has become an emerging research topic due to the prevalence of online social networking services during the past few years. In this paper, aiming at providing fundamental support to the research of social recommendation problem, we conduct an in-depth analysis on the correlations between social friend relations and user interest similarities. When evaluating interest similarities without distinguishing different friends a user has, we surprisingly observe that social friend relations generally cannot represent user interest similarities. A user's average similarity on all his/her friends is even correlated with the average similarity on some other randomly selected users. However, when measuring interest similarities using a finer granularity, we find that the similarities between a user and his/her friends are actually controlled by the network structure in the friend network. Factors that affect the interest similarities include subgraph topology, connected components, number of co-friends, etc. We believe our analysis provides substantial impact for social recommendation research and will benefit ongoing research in both recommender systems and other social applications.
IMRank: influence maximization via finding self-consistent ranking BIBAFull-Text 475-484
  Suqi Cheng; Huawei Shen; Junming Huang; Wei Chen; Xueqi Cheng
Influence maximization, fundamental for word-of-mouth marketing and viral marketing, aims to find a set of seed nodes maximizing influence spread on social network. Early methods mainly fall into two paradigms with certain benefits and drawbacks: (1) Greedy algorithms, selecting seed nodes one by one, give a guaranteed accuracy relying on the accurate approximation of influence spread with high computational cost; (2) Heuristic algorithms, estimating influence spread using efficient heuristics, have low computational cost but unstable accuracy. We first point out that greedy algorithms are essentially finding a self-consistent ranking, where nodes' ranks are consistent with their ranking-based marginal influence spread. This insight motivates us to develop an iterative ranking framework, i.e., IMRank, to efficiently solve influence maximization problem under independent cascade model. Starting from an initial ranking, e.g., one obtained from efficient heuristic algorithm, IMRank finds a self-consistent ranking by reordering nodes iteratively in terms of their ranking-based marginal influence spread computed according to current ranking. We also prove that IMRank definitely converges to a self-consistent ranking starting from any initial ranking. Furthermore, within this framework, a last-to-first allocating strategy and a generalization of this strategy are proposed to improve the efficiency of estimating ranking-based marginal influence spread for a given ranking. In this way, IMRank achieves both remarkable efficiency and high accuracy by leveraging simultaneously the benefits of greedy algorithms and heuristic algorithms. As demonstrated by extensive experiments on large scale real-world social networks, IMRank always achieves high accuracy comparable to greedy algorithms, while the computational cost is reduced dramatically, about 10-100 times faster than other scalable heuristics.

Session 5c: collaborative complex personalization

User-driven system-mediated collaborative information retrieval BIBAFull-Text 485-494
  Laure Soulier; Chirag Shah; Lynda Tamine
Most of the previous approaches surrounding collaborative information retrieval (CIR) provide either a user-based mediation, in which the system only supports users' collaborative activities, or a system-based mediation, in which the system plays an active part in balancing user roles, re-ranking results, and distributing them to optimize overall retrieval performance. In this paper, we propose to combine both of these approaches by a role mining methodology that learns from users' actions about the retrieval strategy they adapt. This hybrid method aims at showing how users are different and how to use these differences for suggesting roles. The core of the method is expressed as an algorithm that (1) monitors users' actions in a CIR setting; (2) discovers differences among the collaborators along certain dimensions; and (3) suggests appropriate roles to make the most out of individual skills and optimize IR performance. Our approach is empirically evaluated and relies on two different laboratory studies involving 70 pairs of users. Our experiments show promising results that highlight how role mining could optimize the collaboration within a search session. The contributions of this work include a new algorithm for mining user roles in collaborative IR, an evaluation methodology, and a new approach to improve IR performance with the operationalization of user-driven system-mediated collaboration.
SearchPanel: framing complex search needs BIBAFull-Text 495-504
  Pernilla Qvarfordt; Simon Tretter; Gene Golovchinsky; Tony Dunnigan
People often use more than one query when searching for information. They revisit search results to re-find information and build an understanding of their search need through iterative explorations of query formulation. These tasks are not well-supported by search interfaces and web browsers. We designed and built SearchPanel, a Chrome browser extension that supports people in their ongoing information seeking. This extension combines document and process metadata into an interactive representation of the retrieved documents that can be used for sense-making, navigation, and re-finding documents. In a real-world deployment spanning over two months, results show that SearchPanel appears to have been primarily used for complex information needs, in search sessions with long durations and high numbers of queries. When process metadata was present in the UI, searchers in explorative search sessions submitted more and longer queries and interacted more with the SERP. These results indicate that the process metadata features in SearchPanel seem to be of particular importance for exploratory search.
Cohort modeling for enhanced personalized search BIBAFull-Text 505-514
  Jinyun Yan; Wei Chu; Ryen W. White
Web search engines utilize behavioral signals to develop search experiences tailored to individual users. To be effective, such personalization relies on access to sufficient information about each user's interests and intentions. For new users or new queries, profile information may be sparse or non-existent. To handle these cases, and perhaps also improve personalization for those with profiles, search engines can employ signals from users who are similar along one or more dimensions, i.e., those in the same cohort. In this paper we describe a characterization and evaluation of the use of such cohort modeling to enhance search personalization. We experiment with three pre-defined cohorts-topic, location, and top-level domain preference-independently and in combination, and also evaluate methods to learn cohorts dynamically. We show via extensive experimentation with large-scale logs from a commercial search engine that leveraging cohort behavior can yield significant relevance gains when combined with a production search engine ranking algorithm that uses similar classes of personalization signal but at the individual searcher level. Additional experiments show that our gains can be extended when we dynamically learn cohorts and target easily-identifiable classes of ambiguous or unseen queries.
Characterizing multi-click search behavior and the risks and opportunities of changing results during use BIBAFull-Text 515-524
  Chia-Jung Lee; Jaime Teevan; Sebastian de la Chica
Although searchers often click on more than one result following a query, little is known about how they interact with search results after their first click. Using large scale query log analysis, we characterize what people do when they return to a result page after having visited an initial result. We find that the initial click provides insight into the searcher's subsequent behavior, with short initial dwell times suggesting more future interaction and later clicks occurring close in rank to the first. Although users think of a search result list as static, when people return to a result list following a click there is the opportunity for the list to change, potentially providing additional relevant content. Such change, however, can be confusing, leading to increased abandonment and slower subsequent clicks. We explore the risks and opportunities of changing search results during use, observing, for example, that when results change above a user's initial click that user is less likely to find new content, whereas changes below correlate with increased subsequent interaction. Our results can be used to improve people's search experience during the course of a single query by seamlessly providing new, more relevant content as the user interacts with a search result page, helping them find what they are looking for without having to issue a new query.

Plenary address

The data revolution: how companies are transforming with big data BIBAFull-Text 525-526
  Hugh E. Williams
Spelling correction in the 1990s was all about algorithms and small dictionaries. This century, it is about mining vast data sets of past user behaviors, simple algorithms, and using those to correct mistakes. The large Internet giants are data-driven enterprises that use data to transform and continually improve user experiences. In this talk, Hugh Williams shares stories about data and how it is used to build Internet products, and explains why he believes data will transform businesses as we know them. Every major company is becoming a data-driven company, and Hugh shares examples of transformations occurring in health, aviation, farming, and telecommunications. He recently joined Pivotal, a company that is assembling the toolkit that exists in only a few consumer Internet companies, and making that toolkit open and available to every industry, including big data platforms, development frameworks, and an open, cloud-independent Platform-as-a-Service. He will conclude by sharing details about Pivotal, the Pivotal vision, and roadmap.
   Hugh E. Williams has been Senior Vice President of Research & Development at Pivotal since January 2014. His teams build big data technologies, and development frameworks and services, including Pivotal's Hadoop, Spring Java framework, and Greenplum database offerings. Most recently, he spent four and a half years as an executive with eBay where he was responsible for the team that conceived, designed, and built eBay's user experiences, search engine, big data technologies and platforms. Prior to joining eBay, he managed an R&D team at Microsoft's Bing for four and a half years, spent over ten years researching and developing search technologies, and ran his own startup and consultancy for several years. He has published over 100 works, mostly in the field of Information Retrieval, including two books for O'Reilly Media Inc. He holds 19 U.S. patents, with many more pending. He has a PhD from RMIT University in Australia.

Session 6a: #moremicroblog #sigir2014

Learning similarity functions for topic detection in online reputation monitoring BIBAFull-Text 527-536
  Damiano Spina; Julio Gonzalo; Enrique Amigó
Reputation management experts have to monitor -- among others -- Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality...). Solving this reputation monitoring problem automatically as a topic detection task is both essential -- manual processing of data is either costly or prohibitive -- and challenging -- topics of interest for reputation monitoring are usually fine-grained and suffer from data sparsity. We focus on a solution for the problem that (i) learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; (ii) applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).
Predicting trending messages and diffusion participants in microblogging network BIBAFull-Text 537-546
  Jingwen Bian; Yang Yang; Tat-Seng Chua
Microblogging services have emerged as an essential way to strengthen the communications among individuals. One of the most important features of microblog over traditional social networks is the extensive proliferation in information diffusion. As the outbreak of information diffusion often brings in valuable opportunities or devastating effects, it will be beneficial if a mechanism can be provided to predict whether a piece of information will become viral, and which part of the network will participate in propagating this information. In this work, we define three types of influences, namely, interest-oriented influence, social-oriented influence, and epidemic-oriented influence, that will affect a user's decision on whether to perform a diffusion action. We propose a diffusion-targeted influence model to differentiate and quantify various types of influence. Further we model the problem of diffusion prediction by factorizing a user's intention to transmit a microblog into these influences. The learned prediction model is then used to predict the future diffusion state of any new microblog. We conduct experiments on a real-world microblogging dataset to evaluate our method, and the results demonstrate the superiority of the proposed framework as compared to the state-of-the-art approaches.
Leveraging knowledge across media for spammer detection in microblogging BIBAFull-Text 547-556
  Xia Hu; Jiliang Tang; Huan Liu
While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

Session 6b: scents and sensibility

Using information scent and need for cognition to understand online search behavior BIBAFull-Text 557-566
  Wan-Ching Wu; Diane Kelly; Avneesh Sud
The purpose of this study is to investigate the extent to which two theories, Information Scent and Need for Cognition, explain people's search behaviors when interacting with search engine results pages (SERPs). Information Scent, the perception of the value of information sources, was manipulated by varying the number and distribution of relevant results on the first SERP. Need for Cognition (NFC), a personality trait that measures the extent to which a person enjoys cognitively effortful activities, was measured by a standardized scale. A laboratory experiment was conducted with forty-eight participants, who completed six open-ended search tasks. Results showed that while interacting with SERPs containing more relevant documents, participants examined more documents and clicked deeper in the search result list. When interacting with SERPs that contained the same number of relevant results distributed across different ranks, participants were more likely to abandon their queries when relevant documents appeared later on the SERP. With respect to NFC, participants with higher NFC paginated less frequently and paid less attention to results at lower ranks than those with lower NFC. The interaction between NFC and the number of relevant results on the SERP affected the time spent on searching and a participant's likelihood to reformulate, paginate and stop. Our findings suggest evaluating system effectiveness based on the first page of results, even for tasks that require the user to view multiple documents, and varying interface features based on NFC.
Discrimination between tasks with user activity patterns during information search BIBAFull-Text 567-576
  Michael J. Cole; Chathra Hendahewa; Nicholas J. Belkin; Chirag Shah
Can the activity patterns of page use during information search sessions discriminate between different types of information seeking tasks? We model sequences of interactions with search result and content pages during information search sessions. Two representations are created: the sequences of page use and a cognitive representation of page interactions. The cognitive representation is based on logged eye movement patterns of textual information acquisition via the reading process. Page sequence actions from task sessions (n=109) in a user study are analyzed. The study tasks differed from one another in basic dimensions of complexity, specificity, level, and the type of information product (intellectual or factual). The results show that differences in task types can be measured at both the level of observations of page type sequences and at the level of cognitive activity on the pages. We discuss the implications for personalization of search systems, measurement of task similarity and the development of user-centered information systems that can support the user's current and expected search intentions.
Investigating users' query formulations for cognitive search intents BIBAFull-Text 577-586
  Makoto P. Kato; Takehiro Yamamoto; Hiroaki Ohshima; Katsumi Tanaka
This study investigated query formulations by users with Cognitive Search Intents (CSIs), which are users' needs for the cognitive characteristics of documents to be retrieved, e.g. comprehensibility, subjectivity, and concreteness. Our four main contributions are summarized as follows (i) we proposed an example-based method of specifying search intents to observe query formulations by users without biasing them by presenting a verbalized task description; (ii) we conducted a questionnaire-based user study and found that about half our subjects did not input any keywords representing CSIs, even though they were conscious of CSIs; (iii) our user study also revealed that over 50% of subjects occasionally had experiences with searches with CSIs while our evaluations demonstrated that the performance of a current Web search engine was much lower when we not only considered users' topical search intents but also CSIs; and (iv) we demonstrated that a machine-learning-based query expansion could improve the performances for some types of CSIs. Our findings suggest users over-adapt to current Web search engines, and create opportunities to estimate CSIs with non-verbal user input.

Session 6c: users vs. models

Win-win search: dual-agent stochastic game in session search BIBAFull-Text 587-596
  Jiyun Luo; Sicong Zhang; Hui Yang
Session search is a complex search task that involves multiple search iterations triggered by query reformulations. We observe a Markov chain in session search: user's judgment of retrieved documents in the previous search iteration affects user's actions in the next iteration. We thus propose to model session search as a dual-agent stochastic game: the user agent and the search engine agent work together to jointly maximize their long term rewards. The framework, which we term "win-win search", is based on Partially Observable Markov Decision Process. We mathematically model dynamics in session search, including decision states, query changes, clicks, and rewards, as a cooperative game between the user and the search engine. The experiments on TREC 2012 and 2013 Session datasets show a statistically significant improvement over the state-of-the-art interactive search and session search algorithms.
Injecting user models and time into precision via Markov chains BIBAFull-Text 597-606
  Marco Ferrante; Nicola Ferro; Maria Maistro
We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user models into precision. Continuous-time MP behaves like time-calibrated measures, bringing the time spent by the user into the evaluation of a system; discrete-time MP behaves like traditional evaluation measures. Being part of the same Markovian framework, the time-based and rank-based versions of MP produce values that are directly comparable.
   We show that it is possible to re-create average precision using specific user models and this helps in providing an explanation of Average Precision (AP) in terms of user models more realistic than the ones currently used to justify it. We also propose several alternative models that take into account different possible behaviors in scanning a ranked result list.
   Finally, we conduct a thorough experimental evaluation of MP on standard TREC collections in order to show that MP is as reliable as other measures and we provide an example of calibration of its time parameters based on click logs from Yandex.
Searching, browsing, and clicking in a search session: changes in user behavior by task and over time BIBAFull-Text 607-616
  Jiepu Jiang; Daqing He; James Allan
There are many existing studies of user behavior in simple tasks (e.g., navigational and informational search) within a short duration of 1-2 queries. However, we know relatively little about user behavior, especially browsing and clicking behavior, for longer search session solving complex search tasks. In this paper, we characterize and compare user behavior in relatively long search sessions (10 minutes; about 5 queries) for search tasks of four different types. The tasks differ in two dimensions: (1) the user is locating facts or is pursuing intellectual understanding of a topic; (2) the user has a specific task goal or has an ill-defined and undeveloped goal. We analyze how search behavior as well as browsing and clicking patterns change during a search session in these different tasks. Our results indicate that user behavior in the four types of tasks differ in various aspects, including search activeness, browsing style, clicking strategy, and query reformulation. As a search session progresses, we note that users shift their interests to focus less on the top results but more on results ranked at lower positions in browsing. We also found that results eventually become less and less attractive for the users. The reasons vary and include downgraded search performance of query, decreased novelty of search results, and decaying persistence of users in browsing. Our study highlights the lack of long session support in existing search engines and suggests different strategies of supporting longer sessions according to different task types.

Session 7a: sentiments

Coarse-to-fine review selection via supervised joint aspect and sentiment model BIBAFull-Text 617-626
  Zhen Hai; Gao Cong; Kuiyu Chang; Wenting Liu; Peng Cheng
Online reviews are immensely valuable for customers to make informed purchase decisions and for businesses to improve the quality of their products and services. However, customer reviews grow exponentially while varying greatly in quality. It is generally very tedious and difficult, if not impossible, for users to read though the huge amount of review data. Fortunately, review quality evaluation enables a system to select the most helpful reviews for users' decision-making. Previous studies predict only the overall review utility about a product, and often focus on developing different data features to learn a quality function for addressing the problem. In this paper, we aim to select the most helpful reviews not only at the product level, but also at a fine-grained product aspect level. We propose a novel supervised joint aspect and sentiment model (SJASM), which is a probabilistic topic modeling framework that jointly discovers aspects and sentiments guided by a review helpfulness metric. One key advantage of SJASM is its ability to infer the underlying aspects and sentiments, which are indicative of the helpfulness of a review. We validate SJASM using publicly available review data, and our experimental results demonstrate the superiority of SJASM over several competing models.
Cross-domain and cross-category emotion tagging for comments of online news BIBAFull-Text 627-636
  Ying Zhang; Ning Zhang; Luo Si; Yanshan Lu; Qifan Wang; Xiaojie Yuan
In many online news services, users often write comments towards news in subjective emotions such as sadness, happiness or anger. Knowing such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide more relevant services to users. Although building emotion classifiers is a practical task, it highly depends on sufficient training data that is not easy to be collected directly and the manually labeling work of comments can be quite labor intensive. Also, online news has different domains, which makes the problem even harder as different word distributions of the domains require different classifiers with corresponding distinct training data.
   This paper addresses the task of emotion tagging for comments of cross-domain online news. The cross-domain task is formulated as a transfer learning problem which utilizes a small amount of labeled data from a target news domain and abundant labeled data from a different source domain. This paper proposes a novel framework to transfer knowledge across different news domains. More specifically, different approaches have been proposed when the two domains share the same set of emotion categories or use different categories. An extensive set of experimental results on four datasets from popular online news services demonstrates the effectiveness of our proposed models in cross-domain emotion tagging for comments of online news in both the scenarios of sharing the same emotion categories or having different categories in the source and target domains.
Economically-efficient sentiment stream analysis BIBAFull-Text 637-646
  Roberto, Jr. Lourenco; Adriano Veloso; Adriano Pereira; Wagner, Jr. Meira; Renato Ferreira; Srinivasan Parthasarathy
Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).

Session 7b: more like those

New and improved: modeling versions to improve app recommendation BIBAFull-Text 647-656
  Jovian Lin; Kazunari Sugiyama; Min-Yen Kan; Tat-Seng Chua
Existing recommender systems usually model items as static -- unchanging in attributes, description, and features. However, in domains such as mobile apps, a version update may provide substantial changes to an app as updates, reflected by an increment in its version number, may attract a consumer's interest for a previously unappealing version. Version descriptions constitute an important recommendation evidence source as well as a basis for understanding the rationale for a recommendation. We present a novel framework that incorporates features distilled from version descriptions into app recommendation. We use a semi-supervised topic model to construct a representation of an app's version as a set of latent topics from version metadata and textual descriptions. We then discriminate the topics based on genre information and weight them on a per-user basis to generate a version-sensitive ranked list of apps for a target user. Incorporating our version features with state-of-the-art individual and hybrid recommendation techniques significantly improves recommendation quality. An important advantage of our method is that it targets particular versions of apps, allowing previously disfavored apps to be recommended when user-relevant features are added.
Bundle recommendation in ecommerce BIBAFull-Text 657-666
  Tao Zhu; Patrick Harrington; Junjun Li; Lei Tang
Recommender system has become an important component in modern eCommerce. Recent research on recommender systems has been mainly concentrating on improving the relevance or profitability of individual recommended items. But in reality, users are usually exposed to a set of items and they may buy multiple items in one single order. Thus, the relevance or profitability of one item may actually depend on the other items in the set. In other words, the set of recommendations is a bundle with items interacting with each other. In this paper, we introduce a novel problem called the Bundle Recommendation Problem (BRP). By solving the BRP, we are able to find the optimal bundle of items to recommend with respect to preferred business objective. However, BRP is a large-scale NP-hard problem. We then show that it may be sufficient to solve a significantly smaller version of BRP depending on properties of input data. This allows us to solve BRP in real-world applications with millions of users and items. Both offline and online experimental results on a Walmart.com demonstrate the incremental value of solving BRP across multiple baseline models.
Does product recommendation meet its waterloo in unexplored categories?: no, price comes to help BIBAFull-Text 667-676
  Jia Chen; Qin Jin; Shiwan Zhao; Shenghua Bao; Li Zhang; Zhong Su; Yong Yu
State-of-the-art methods for product recommendation encounter significant performance drop in categories where a user has no purchase history. This problem needs to be addressed since current online retailers are moving beyond single category and attempting to be diversified. In this paper, we investigate the challenge problem of product recommendation in unexplored categories and discover that the price, a factor transferrable across categories, can improve the recommendation performance significantly. Through our investigation, we address four research questions progressively: 1) what is the impact of unexplored category on recommendation performance? 2) How to represent the price factor from the recommendation point of view? 3) What does price factor across categories mean to recommendation? 4) How to utilize price factor across categories for recommendation in unexplored categories? Based on a series of experiments and analysis conducted on a dataset collected from a leading E-commerce website, we discover valuable findings for the above four questions: first, unexplored categories cause performance drop by 40% relatively for current recommendation systems; second, the price factor can be represented as either a quantity for a product or a distribution for a user to improve performance; third, consumer behavior with respect to price factor across categories is complicated and needs to be carefully modeled; finally and most importantly, we propose a new method which encodes the two perspectives of the price factor. The proposed method significantly improves the recommendation performance in unexplored categories over the state-of-the-art baseline systems and shortens the performance gap by 43% relatively.

Session 7c: signs and symbols

Query expansion for mixed-script information retrieval BIBAFull-Text 677-686
  Parth Gupta; Kalika Bali; Rafael E. Banchs; Monojit Choudhury; Paolo Rosso
For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or multi-lingual space with more than one script which we refer to as the Mixed-Script space. IR in the mixed-script space is challenging because queries written in either the native or the Roman script need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. In this paper, we formally introduce the concept of Mixed-Script IR, and through analysis of the query logs of Bing search engine, estimate the prevalence and thereby establish the importance of this problem. We also give a principled solution to handle the mixed-script term matching and spelling variation where the terms across the scripts are modelled jointly in a deep-learning architecture and can be compared in a low-dimensional abstract space. We present an extensive empirical analysis of the proposed method along with the evaluation results in an ad-hoc retrieval setting of mixed-script IR where the proposed method achieves significantly better results (12% increase in MRR and 29% increase in MAP) compared to other state-of-the-art baselines.
Retrieval of similar chess positions BIBAFull-Text 687-696
  Debasis Ganguly; Johannes Leveling; Gareth J.F. Jones
We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based approach is that it allows using the standard inverted organization of stored chess positions, leading to an efficient retrieval. Moreover, in contrast to retrieving exactly identical board positions, the IR-based approach is able to provide approximate search functionality. In order to define the similarity between two chess board positions, we encode each game state with a textual representation. This textual encoding is designed to represent the position, reachability and the connectivity between chess pieces. Due to the absence of a standard IR dataset that can be used for this search task, a new evaluation benchmark dataset was constructed comprising of documents (chess positions) from a freely available chess game archive. Experiments conducted on this dataset demonstrate that our proposed method of similarity computation, which takes into account a combination of the mobility and the connectivities between the chess pieces, performs well on the search task, achieving MAP and nDCG values of 0:4233 and 0:6922 respectively.
A mathematics retrieval system for formulae in layout presentations BIBAFull-Text 697-706
  Xiaoyan Lin; Liangcai Gao; Xuan Hu; Zhi Tang; Yingnan Xiao; Xiaozhong Liu
The semantics of mathematical formulae depend on their spatial structure, and they usually exist in layout presentations such as PDF, LaTeX, and Presentation MathML, which challenges previous text index and retrieval methods. This paper proposes an innovative mathematics retrieval system along with the novel algorithms, which enables efficient formula index and retrieval from both webpages and PDF documents. Unlike prior studies, which require users to manually input formula markup language as query, the new system enables users to "copy" formula queries directly from PDF documents. Furthermore, by using a novel indexing and matching model, the system is aimed at searching for similar mathematical formulae based on both textual and spatial similarities. A hierarchical generalization technique is proposed to generate sub-trees from the semi-operator tree of formulae and support substructure match and fuzzy match. Experiments based on massive Wikipedia and CiteSeer repositories show that the new system along with novel algorithms, comparing with two representative mathematics retrieval systems, provides more efficient mathematical formula index and retrieval, while simplifying user query input for PDF documents.

Session 8a: picture this

The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos BIBAFull-Text 707-716
  Pai Peng; Lidan Shou; Ke Chen; Gang Chen; Sai Wu
This paper presents a project called Knowing Camera for real-time recognizing and annotating places-of-interest (POI) in smartphone photos, with the availability of online geotagged images of such places. We propose a "Spatial+Visual" (S+V) framework which consists of a probabilistic field-of-view model in the spatial phase and sparse coding similarity metric in the visual phase to recognize phone-captured POIs. Moreover, we put forward an offline Collaborative Salient Area (COSTAR) mining algorithm to detect common visual features (called Costars) among the noisy photos geotagged on each POI, thus to clean the geotagged image database. The mining result can be utilized to annotate the region-of-interest on the query image during the online query processing. Besides, this mining procedure further improves the efficiency and accuracy of the S+V framework. Our experiments in the real-world and Oxford 5K datasets show promising recognition and annotation performances of the proposed approach, and that the proposed COSTAR mining technique outperforms state-of-the-art approach.
Click-through-based cross-view learning for image search BIBAFull-Text 717-726
  Yingwei Pan; Ting Yao; Tao Mei; Houqiang Li; Chong-Wah Ngo; Yong Rui
One of the fundamental problems in image search is to rank image documents according to a given textual query. Existing search engines highly depend on surrounding texts for ranking images, or leverage the query-image pairs annotated by human labelers to train a series of ranking functions. However, there are two major limitations: 1) the surrounding texts are often noisy or too few to accurately describe the image content, and 2) the human annotations are resourcefully expensive and thus cannot be scaled up. We demonstrate in this paper that the above two fundamental challenges can be mitigated by jointly exploring the cross-view learning and the use of click-through data. The former aims to create a latent subspace with the ability in comparing information from the original incomparable views (i.e., textual and visual views), while the latter explores the largely available and freely accessible click-through data (i.e., "crowdsourced" human intelligence) for understanding query. Specifically, we propose a novel cross-view learning method for image search, named Click-through-based Cross-view Learning (CCL), by jointly minimizing the distance between the mappings of query and image in the latent subspace and preserving the inherent structure in each original space. On a large-scale click-based image dataset, CCL achieves the improvement over Support Vector Machine-based method by 4.0% in terms of relevance, while reducing the feature dimension by several orders of magnitude (e.g., from thousands to tens). Moreover, the experiments also demonstrate the superior performance of CCL to several state-of-the-art subspace learning techniques.
Learning to personalize trending image search suggestion BIBAFull-Text 727-736
  Chun-Che Wu; Tao Mei; Winston H. Hsu; Yong Rui
Trending search suggestion is leading a new paradigm of image search, where user's exploratory search experience is facilitated with the automatic suggestion of trending queries. Existing image search engines, however, only provide general suggestions and hence cannot capture user's personal interest. In this paper, we move one step forward to investigate personalized suggestion of trending image searches according to users' search behaviors. To this end, we propose a learning-based framework including two novel components. The first component, i.e., trending-aware weight-regularized matrix factorization (TA-WRMF), is able to suggest personalized trending search queries by learning user preference from many users as well as auxiliary common searches. The second component associates the most representative and trending image with each suggested query. The personalized suggestion of image search consists of a trending textual query and its associated trending image. The combined textual-visual queries not only are trending (bursty) and personalized to user's search preference, but also provide the compelling visual aspect of these queries. We evaluate our proposed learning-based framework on a large-scale search logs with 21 million users and 41 million queries in two weeks from a commercial image search engine. The evaluations demonstrate that our system achieve about 50% gain compared with state-of-the-art in terms of query prediction accuracy.
PRISM: concept-preserving social image search results summarization BIBAFull-Text 737-746
  Boon-Siew Seah; Sourav S. Bhowmick; Aixin Sun
Most existing tag-based social image search engines present search results as a ranked list of images, which cannot be consumed by users in a natural and intuitive manner. In this paper, we present a novel concept-preserving image search results summarization algorithm named Prism. Prism exploits both visual features and tags of the search results to generate high quality summary, which not only breaks the results into visually and semantically coherent clusters but it also maximizes the coverage of the summary w.r.t the original search results. It first constructs a visual similarity graph where the nodes are images in the search results and the edges represent visual similarities between pairs of images. This graph is optimally decomposed and compressed into a set of concept-preserving subgraphs based on a set of summarization objectives. Images in a concept-preserving subgraph are visually and semantically cohesive and are described by a minimal set of tags or concepts. Lastly, one or more exemplar images from each subgraph is selected to form the exemplar summary of the result set. Through empirical study, we demonstrate the effectiveness of Prism against state-of-the-art image summarization and clustering algorithms.

Session 8b: time and tide

Time-critical search BIBAFull-Text 747-756
  Nina Mishra; Ryen W. White; Samuel Ieong; Eric Horvitz
We study time-critical search, where users have urgent information needs in the context of an acute problem. As examples, users may need to know how to stem a severe bleed, help a baby who is choking on a foreign object, or respond to an epileptic seizure. While time-critical situations and actions have been studied in the realm of decision-support systems, little has been done with time-critical search and retrieval, and little direct support is offered by search systems. Critical challenges with time-critical search include accurately inferring when users have urgent needs and providing relevant information that can be understood and acted upon quickly. We leverage surveys and search log data from a large mobile search provider to (a) characterize the use of search engines for time-critical situations, and (b) develop predictive models to accurately predict urgent information needs, given a query and a diverse set of features spanning topical, temporal, behavioral, and geospatial attributes. The methods and findings highlight opportunities for extending search and retrieval to consider the urgency of queries.
Learning temporal-dependent ranking models BIBAFull-Text 757-766
  Miguel Costa; Francisco Couto; Mário Silva
Web archives already hold together more than 534 billion files and this number continues to grow as new initiatives arise. Searching on all versions of these files acquired throughout time is challenging, since users expect as fast and precise answers from web archives as the ones provided by current web search engines. This work studies, for the first time, how to improve the search effectiveness of web archives, including the creation of novel temporal features that explore the correlation found between web document persistence and relevance. The persistence was analyzed over 14 years of web snapshots. Additionally, we propose a temporal-dependent ranking framework that exploits the variance of web characteristics over time influencing ranking models. Based on the assumption that closer periods are more likely to hold similar web characteristics, our framework learns multiple models simultaneously, each tuned for a specific period. Experimental results show significant improvements over the search effectiveness of single-models that learn from all data independently of its time. Thus, our approach represents an important step forward on the state-of-the-art IR technology usually employed in web archives.
Web page segmentation with structured prediction and its application in web page classification BIBAFull-Text 767-776
  Lidong Bing; Rui Guo; Wai Lam; Zheng-Yu Niu; Haifeng Wang
We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the candidate segmentation boundaries of a page and the dependency relation among the adjacent segmentation boundaries. Each labeling scheme on the WPS-graph corresponds to a possible segmentation of the page. The task of finding the optimal labeling of the WPS-graph is transformed into a binary Integer Linear Programming problem, which considers the entire WPS-graph as a whole to conduct structured prediction. A learning algorithm based on the structured output Support Vector Machine framework is developed to determine the feature weights, which is capable to consider the inter-dependency among candidate segmentation boundaries. Furthermore, we investigate its efficacy in supporting the development of automatic Web page classification.
Query log driven web search results clustering BIBAFull-Text 777-786
  Jose G. Moreno; Gaël Dias; Guillaume Cleuziou
Different important studies in Web search results clustering have recently shown increasing performances motivated by the use of external resources. Following this trend, we present a new algorithm called Dual C-Means, which provides a theoretical background for clustering in different representation spaces. Its originality relies on the fact that external resources can drive the clustering process as well as the labeling task in a single step. To validate our hypotheses, a series of experiments are conducted over different standard datasets and in particular over a new dataset built from the TREC Web Track 2012 to take into account query logs information. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantages over traditional clustering and labeling techniques.

Session 8c0: summaries and semantics

CTSUM: extracting more certain summaries for news articles BIBAFull-Text 787-796
  Xiaojun Wan; Jianmin Zhang
People often read summaries of news articles in order to get reliable information about an event or a topic. However, the information expressed in news articles is not always certain, and some sentences contain uncertain information about the event. Existing summarization systems do not consider whether a sentence in news articles is certain or not. In this paper, we propose a novel system called CTSUM to incorporate the new factor of information certainty into the summarization task. We first analyze the sentences in news articles and automatically predict the certainty levels of sentences by using the support vector regression method with a few useful features. The predicted certainty scores are then incorporated into a summarization system with a graph-based ranking algorithm. Experimental results on a manually labeled dataset verify the effectiveness of the sentence certainty prediction technique, and experimental results on the DUC2007 dataset shows that our new summarization system cannot only produce summaries with better content quality, but also produce summaries with higher certainty.
Continuous word embeddings for detecting local text reuses at the semantic level BIBAFull-Text 797-806
  Qi Zhang; Jihua Kang; Jin Qian; Xuanjing Huang
Text reuse is a common phenomenon in a variety of user-generated content. Along with the quick expansion of social media, reuses of local text are occurring much more frequently than ever before. The task of detecting these local reuses serves as an essential step for many applications. It has attracted extensive attention in recent years. However, semantic level similarities have not received consideration in most previous works. In this paper, we introduce a novel method to efficiently detect local reuses at the semantic level for large scale problems. We propose to use continuous vector representations of words to capture the semantic level similarities between short text segments. In order to handle tens of billions of documents, methods based on information geometry and hashing methods are introduced to aggregate and map text segments presented by word embeddings to binary hash codes. Experimental results demonstrate that the proposed methods achieve significantly better performance than state-of-the-art approaches in all six document collections belonging to four different categories. At some recall levels, the precisions of the proposed method are even 10 times higher than previous methods. Moreover, the efficiency of the proposed method is comparable to or better than that of some other hashing methods.

Session 8C1: [citation] recommendation

CiteSight: supporting contextual citation recommendation using differential search BIBAFull-Text 807-816
  Avishay Livne; Vivek Gokuladas; Jaime Teevan; Susan T. Dumais; Eytan Adar
A person often uses a single search engine for very different tasks. For example, an author editing a manuscript may use the same academic search engine to find the latest work on a particular topic or to find the correct citation for a familiar article. The author's tolerance for latency and accuracy may vary according to task. However, search engines typically employ a consistent approach for processing all queries. In this paper we explore how a range of search needs and expectations can be supported within a single search system using differential search. We introduce CiteSight, a system that provides personalized citation recommendations to author groups that vary based on task. CiteSight presents cached recommendations instantaneously for online tasks (e.g., active paper writing), and refines these recommendations in the background for offline tasks (e.g., future literature review). We develop an active cache-warming process to enhance the system as the author works, and context-coupling, a technique for augment sparse citation networks. By evaluating the quality of the recommendations and collecting user feedback, we show that differential search can provide a high level of accuracy for different tasks on different time scales. We believe that differential search can be used in many situations where the user's tolerance for latency and desired response vary dramatically based on use.
Cross-language context-aware citation recommendation in scientific articles BIBAFull-Text 817-826
  Xuewei Tang; Xiaojun Wan; Xun Zhang
Adequacy of citations is very important for a scientific paper. However, it is not an easy job to find appropriate citations for a given context, especially for citations in different languages. In this paper, we define a novel task of cross-language context-aware citation recommendation, which aims at recommending English citations for a given context of the place where a citation is made in a Chinese paper. This task is very challenging because the contexts and citations are written in different languages and there exists a language gap when matching them. To tackle this problem, we propose the bilingual context-citation embedding algorithm (i.e. BLSRec-I), which can learn a low-dimensional joint embedding space for both contexts and citations. Moreover, two advanced algorithms named BLSRec-II and BLSRec-III are proposed by enhancing BLSRec-I with translation results and abstract information, respectively. We evaluate the proposed methods based on a real dataset that contains Chinese contexts and English citations. The results demonstrate that our proposed algorithms can outperform a few baselines and the BLSRec-II and BLSRec-III methods can outperform the BLSRec-I method.

Poster session (short papers)

Search result diversification via data fusion BIBAFull-Text 827-830
  Shengli Wu; Chunlan Huang
In recent years, researchers have investigated search result diversification through a variety of approaches. In such situations, information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. On the other hand, previous research has demonstrated that data fusion is useful for improving performance when we are only concerned with relevance. However, it is not clear if it helps when both relevance and diversity are both taken into consideration. In this short paper, we propose a few data fusion methods to try to improve performance when both relevance and diversity are concerned. Experiments are carried out with 3 groups of top-ranked results submitted to the TREC web diversity task. We find that data fusion is still a useful approach to performance improvement for diversity as for relevance previously.
Hashtag recommendation for hyperlinked tweets BIBAFull-Text 831-834
  Surendra Sedhai; Aixin Sun
Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending hashtags to hyperlinked tweets, we argue that the functions of hashtags such as providing the right context to interpret the tweets, tweet categorization, and tweet promotion, can be extended to the linked documents. The proposed solution for hashtag recommendation consists of two phases. In the first phase, we select candidate hashtags through five schemes by considering the similar tweets, the similar documents, the named entities contained in the document, and the domain of the link. In the second phase, we formulate the hashtag recommendation problem as a learning to rank problem and adopt RankSVM to aggregate and rank the candidate hashtags. Our experiments on a collection of 24 million tweets show that the proposed solution achieves promising results.
Personalized document re-ranking based on Bayesian probabilistic matrix factorization BIBAFull-Text 835-838
  Fei Cai; Shangsong Liang; Maarten de Rijke
A query considered in isolation provides limited information about the searcher's interest. Previous work has considered various types of user behavior, e.g., clicks and dwell time, to obtain a better understanding of the user's intent. We consider the searcher's search and page view history. Using search logs from a commercial search engine, we (i) investigate the impact of features derived from user behavior on reranking a generic ranked list; (ii) optimally integrate the contributions of user behavior and candidate documents by learning their relative importance per query based on similar users. We use dwell time on clicked URLs when estimating the relevance of documents for a query, and perform Bayesian Probabilistic Matrix Factorization as smoothing to predict the relevance. Considering user behavior achieves better rankings than non-personalized rankings. Aggregation of user behavior and query-document features with a user-dependent adaptive weight outperforms combinations with a fixed uniform value.
Using the cross-entropy method to re-rank search results BIBAFull-Text 839-842
  Haggai Roitman; Shay Hummel; Oren Kurland
We present a novel unsupervised approach to re-ranking an initially retrieved list. The approach is based on the Cross Entropy method applied to permutations of the list, and relies on performance prediction. Using pseudo predictors we establish a lower bound on the prediction quality that is required so as to have our approach significantly outperform the original retrieval. Our experiments serve as a proof of concept demonstrating the considerable potential of the proposed approach. A case in point, only a tiny fraction of the huge space of permutations needs to be explored to attain significant improvements over the original retrieval.
Computing and applying topic-level user interactions in microblog recommendation BIBAFull-Text 843-846
  Xiao Lu; Peng Li; Hongyuan Ma; Shuxin Wang; Anying Xu; Bin Wang
With the development of microblog services, tens of thousands of messages are produced every day and recommending useful messages according to users' interest is recognized as an effective way to overcome the information overload problem. Collaborative filtering which rooted from recommender system has been utilized for microblog recommendation, where social relationship information can help improve the recommendation performance. However, most of existing methods only consider the static relationship, i.e. the following relationship, which totally ignores the relationship conveyed by users' repost behaviors. To explore the effects of behavior based relationship on recommendation, we propose an Interaction Based Collaborative Filtering (IBCF) approach. Specifically, we first use topic model to analyze users' interactive behaviors and measure the topic-specific relationship strength, then we incorporate the relationship factor into the matrix factorization framework. Experimental results show that compared to the current popular social recommendation methods, IBCF can achieve better performance on the MAP and NDCG evaluation measures, and have better interpretability for the recommended results.
Towards context-aware search with right click BIBAFull-Text 847-850
  Aixin Sun; Chii-Hian Lou
Many queries are submitted to search engines by right-clicking the marked text (i.e., the query) in Web browsers. Because the document being read by the searcher often provides sufficient contextual information for the query, search engine could provide much more relevant search results if the query is augmented by the contextual information captured from the source document. How to extract the right contextual information from the source document is the main focus of this study. To this end, we evaluate 7 text component extraction schemes, and 5 feature extraction schemes. The former determines from which text component (e.g., title, meta-data, or paragraphs containing the selected query) to extract contextual information; the latter determines which words or phrases to extract. In total 35 combinations are evaluated and our evaluation results show that noun phrases extracted from all paragraphs that contain the query word is the best option.
Rendering expressions to improve accuracy of relevance assessment for math search BIBAFull-Text 851-854
  Matthias S. Reichenbach; Anurag Agarwal; Richard Zanibbi
Finding ways to help users assess relevance when they search using math expressions is critical for making Mathematical Information Retrieval (MIR) systems easier to use. We designed a study where participants completed search tasks involving mathematical expressions using two different summary styles, and measured response time and relevance assessment accuracy. The control summary style used Google's regular hit formatting where expressions are presented as text (e.g. in LaTeX), while the second summary style renders the math expressions. Participants were undergraduate and graduate students. Participants in the rendered summary style (n=19) had on average a 17.18% higher assessment accuracy than those in the non-rendered summary style (n=19), with no significant difference in response times. Participants in the rendered condition reported having fewer problems reading hits than participants in the control condition. This suggests that users will benefit from search engines that properly render math expressions in their hit summaries.
Exploring recommendations in internet of things BIBAFull-Text 855-858
  Lina Yao; Quan Z. Sheng; Anne H.H. Ngu; Helen Ashman; Xue Li
With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web-based services, physical things are becoming an integral part of the emerging ubiquitous Web. In this paper, we focus on the things recommendation problem in Internet of Things (IoT). In particular, we propose a unified probabilistic based framework by fusing information across relationships between users (i.e., users'social network) and things (i.e., things correlations) to make more accurate recommendations. The proposed approach not only inherits the advantages of the matrix factorization, but also exploits the merits of social relationships and thing-thing correlations. We validate our approach based on an Internet of Things platform and the experimental results demonstrate its feasibility and effectiveness.
Sig-SR: SimRank search over singular graphs BIBAFull-Text 859-862
  Weiren Yu; Julie A. McCann
SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for calculating SimRank, however, implies an assumption that the graph is non-singular, its adjacency matrix is invertible. In reality, non-singular graphs are very rare; such an assumption in [1] is too restrictive in practice. In this paper, we provide a treatment of [1], by supporting similarity assessment on non-invertible adjacency matrices. Assume that a singular graph G has n nodes, with r(2+Kr2)) time for K iterations. In contrast, the only known matrix-based algorithm that supports singular graphs [1] needs O(r4n2) time. The experimental results on real and synthetic datasets demonstrate the superiority of InvSR on singular graphs against its baselines.
Old dogs are great at new tricks: column stores for ir prototyping BIBAFull-Text 863-866
  Hannes Mühleisen; Thaer Samar; Jimmy Lin; Arjen de Vries
We make the suggestion that instead of implementing custom index structures and query evaluation algorithms, IR researchers should simply store document representations in a column-oriented relational database and implement ranking models using SQL. For rapid prototyping, this is particularly advantageous since researchers can explore new scoring functions and features by simply issuing SQL queries, without needing to write imperative code. We demonstrate the feasibility of this approach by an implementation of conjunctive BM25 using two modern column stores. Experiments on a web collection show that a retrieval engine built in this manner achieves effectiveness and efficiency on par with custom-built retrieval engines, but provides many additional advantages, including cleaner query semantics, a simpler architecture, built-in support for error analysis, and the ability to exploit advances in database technology "for free".
The role of network distance in LinkedIn people search BIBAFull-Text 867-870
  Shih-Wen Huang; Daniel Tunkelang; Karrie Karahalios
LinkedIn is the world's largest professional network, with over 300 million members. One of the primary activities on the site is people search, for which LinkedIn members are both the users and the corpus. This paper presents insights about people search behavior on LinkedIn, based on a log analysis and a user study. In particular, it examines the role that network distance plays in name searches and non-name searches. For name searches, users primarily click on only one of the results, and closer network distance leads to higher click-through rates. In contrast, for non-name searches, users are more likely to click on multiple results that are not in their existing connections, but with whom they have shared connections. The results show that, while network distance contributes significantly to LinkedIn search engagement in general, its role varies dramatically depending on the type of search query.
Latent community discovery through enterprise user search query modeling BIBAFull-Text 871-874
  Kevin M. Carter; Rajmonda S. Caceres; Ben Priest
Enterprise computer networks are filled with users performing a variety of tasks, ranging from business-critical tasks to personal interest browsing. Due to this multi-modal distribution of behaviors, it is non-trivial to automatically discern which behaviors are business-relevant and which are not. Additionally, it is difficult to infer communities of interest within the enterprise, even given an organizational mapping. In this work, we present a two-step framework for classifying user behavior within an enterprise in a data-driven way. As a first step, we use a latent topic model on active search queries to identify types of behaviors and topics of interest associated with a given user. We then leverage the information about user's assigned role within the organization to extract relevant topics which are most reflective of self-organizing communities of interest. We demonstrate that our framework is able to identify rich communities of interest that are better representations of how users interact and assemble in an enterprise setting.
Examining collaborative query reformulation: a case of travel information searching BIBAFull-Text 875-878
  Abu Shamim Mohammad Arif; Jia Tina Du; Ivan Lee
Users often reformulate or modify their queries when they engage in searching information particularly when the search task is complex and exploratory. This paper investigates query reformulation behavior in collaborative tourism information searching on the Web. A user study was conducted with 17 pairs of participants and each pair worked as a team collaboratively on an exploratory travel search task in two scenarios. We analyzed users' collaborative query (CQ) reformulation behavior in two dimensions: firstly, CQ reformulation strategies; and secondly, the effect of individual queries and chat logs on CQ reformulation. The findings show that individual queries and chat logs were two major sources of query terms in CQ reformulation. The statistical results demonstrate the significant effect of individual queries on CQ reformulation. We also found that five operations were performed to reformulate the CQs, namely: addition, modification, reordering, addition and modification, and addition and reordering. These findings have implications for the design of query suggestions that could be offered to users during searches using collaborative search tools.
Influential nodes selection: a data reconstruction perspective BIBAFull-Text 879-882
  Zhefeng Wang; Hao Wang; Qi Liu; Enhong Chen
Influence maximization is the problem of finding a set of seed nodes in social network for maximizing the spread of influence. Traditionally, researchers view influence propagation as a stochastic process and formulate the influence maximization problem as a discrete optimization problem. Thus, most previous works focus on finding efficient and effective heuristic algorithms within the greedy framework. In this paper, we view the influence maximization problem from the perspective of data reconstruction and propose a novel framework named Data Reconstruction for Influence Maximization (DRIM). In our framework, we first construct an influence matrix, each row of which is the influence of a node to other nodes. Then, we select $k$ most informative rows to reconstruct the matrix and the corresponding nodes are the seed nodes which could maximize the influence spread. Finally, we evaluate our framework on two real-world data sets, and the results show that DRIM is at least as effective as the traditional greedy algorithm.
A fusion approach to cluster labeling BIBAFull-Text 883-886
  Haggai Roitman; Shay Hummel; Michal Shmueli-Scheuer
We present a novel approach to the cluster labeling task using fusion methods. The core idea of our approach is to weigh labels, suggested by any labeler, according to the estimated labeler's decisiveness with respect to each of its suggested labels. We hypothesize that, a cluster labeler's labeling choice for a given cluster should remain stable even in the presence of a slightly incomplete cluster data. Using state-of-the-art cluster labeling and data fusion methods, evaluated over a large data collection of clusters, we demonstrate that, overall, the cluster labeling fusion methods that further consider the labeler's decisiveness provide the best labeling performance.
Evaluating the effort involved in relevance assessments for images BIBAFull-Text 887-890
  Martin Halvey; Robert Villa
How assessors and end users judge the relevance of images has been studied in information science and information retrieval for a considerable time. The criteria by which assessors' judge relevance has been intensively studied, and there has been a large amount of work which has investigated how relevance judgments for test collections can be more cheaply generated, such as through crowd sourcing. Relatively little work has investigated the process individual assessors go through to judge the relevance of an image. In this paper, we focus on the process by which relevance is judged for images, and in particular, the degree of effort a user must expend to judge relevance for different topics. Results suggest that topic difficulty and how semantic/visual a topic is impact user performance and perceived effort.
Diversifying query suggestions based on query documents BIBAFull-Text 891-894
  Youngho Kim; W. Bruce Croft
Many domain-specific search tasks are initiated by document-length queries, e.g., patent invalidity search aims to find prior art related to a new (query) patent. We call this type of search Query Document Search. In this type of search, the initial query document is typically long and contains diverse aspects (or sub-topics). Users tend to issue many queries based on the initial document to retrieve relevant documents. To help users in this situation, we propose a method to suggest diverse queries that can cover multiple aspects of the query document. We first identify multiple query aspects and then provide diverse query suggestions that are effective for retrieving relevant documents as well being related to more query aspects. In the experiments, we demonstrate that our approach is effective in comparison to previous query suggestion methods.
Comparing client and server dwell time estimates for click-level satisfaction prediction BIBAFull-Text 895-898
  Youngho Kim; Ahmed Hassan; Ryen W. White; Imed Zitouni
Click dwell time is the amount of time that a user spends on a clicked search result. Many previous studies have shown that click dwell time is strongly correlated with result-level satisfaction and document relevance. Accurate estimates of dwell time are therefore important for applications such as search satisfaction prediction and result ranking. However, dwell time can be estimated in different ways according to the information available about the search process. For example, a result reached for the query [Garfield] may involve 145s of "server-side" dwell time (observable to the search engine) and 40s of "client-side" dwell time (observable from the browser). Since search engines can only observe server-side actions (i.e., activity on the search engine result page), server-side dwell times are estimated by measuring the time between a search result click and the next search event (click or query). Conversely, more detailed information about page dwell times can be obtained via client-side methods such as Web browser toolbars. The client-side information enables the estimation of more accurate dwell times by measuring the amount of time that a user spends on pages of interest (either the landing page, or pages on the full navigation trail). In this paper, we define three different dwell times, i.e., server-side, client-side, and trail dwell time, and examine their effectiveness for predicting click satisfaction. For this, we collect toolbar and search engine logs from real users, and provide an analysis of dwell times for improving prediction performance. Moreover, we show further improvements in predicting click-level satisfaction by combining dwell times with other query features (e.g., query clarity).
Score-safe term-dependency processing with hybrid indexes BIBAFull-Text 899-902
  Matthias Petri; Alistair Moffat; J. Shane Culpepper
Score-safe index processing has received a great deal of attention over the last two decades. By pre-calculating maximum term impacts during indexing, the number of scoring operations can be minimized, and the top-k documents for a query can be located efficiently. However, these methods often ignore the importance of the effectiveness gains possible when using sequential dependency models. We present a hybrid approach which leverages score-safe processing and suffix-based self-indexing structures in order to provide efficient and effective top-k document retrieval.
Co-training on authorship attribution with very few labeled examples: methods vs. views BIBAFull-Text 903-906
  Tieyun Qian; Bing Liu; Ming Zhong; Guoliang He
Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is hard or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel two-view co-training framework to iteratively identify the authors of a few unlabeled data to augment the training set. The key idea is to first represent each document as several distinct views, and then a co-training technique is adopted to exploit the large amount of unlabeled documents. Starting from 10 training texts per author, we systematically evaluate the effectiveness of co-training for authorship attribution with limited labeled data. Two methods and three views are investigated: logistic regression (LR) and support vector machines (SVM) methods, and character, lexical, and syntactic views. The experimental results show that LR is particularly effective for improving co-training in AA, and the lexical view performs the best among three views when combined with a LR classifier. Furthermore, the co-training framework does not make much difference between one classifier from two views and two classifiers from one view. Instead, it is the learning approach and the view that plays a critical role.
Probabilistic text modeling with orthogonalized topics BIBAFull-Text 907-910
  Enpeng Yao; Guoqing Zheng; Ou Jin; Shenghua Bao; Kailong Chen; Zhong Su; Yong Yu
Topic models have been widely used for text analysis. Previous topic models have enjoyed great success in mining the latent topic structure of text documents. With many efforts made on endowing the resulting document-topic distributions with different motivations, however, none of these models have paid any attention on the resulting topic-word distributions. Since topic-word distribution also plays an important role in the modeling performance, topic models which emphasize only the resulting document-topic representations but pay less attention to the topic-term distributions are limited. In this paper, we propose the Orthogonalized Topic Model (OTM) which imposes an orthogonality constraint on the topic-term distributions. We also propose a novel model fitting algorithm based on the generalized Expectation-Maximization algorithm and the Newthon-Raphson method. Quantitative evaluation of text classification demonstrates that OTM outperforms other baseline models and indicates the important role played by topic orthogonalizing.
Evaluating non-deterministic retrieval systems BIBAFull-Text 911-914
  Gaya K. Jayasinghe; William Webber; Mark Sanderson; Lasitha S. Dharmasena; J. Shane Culpepper
The use of sampling, randomized algorithms, or training based on the unpredictable inputs of users in Information Retrieval often leads to non-deterministic outputs. Evaluating the effectiveness of systems incorporating these methods can be challenging since each run may produce different effectiveness scores. Current IR evaluation techniques do not address this problem. Using the context of distributed information retrieval as a case study for our investigation, we propose a solution based on multivariate linear modeling. We show that the approach provides a consistent and reliable method to compare the effectiveness of non-deterministic IR algorithms, and explain how statistics can safely be used to show that two IR algorithms have equivalent effectiveness.
Extending test collection pools without manual runs BIBAFull-Text 915-918
  Gaya K. Jayasinghe; William Webber; Mark Sanderson; J. Shane Culpepper
Information retrieval test collections traditionally use a combination of automatic and manual runs to create a pool of documents to be judged. The quality of the final judgments produced for a collection is a product of the variety across each of the runs submitted and the pool depth. In this work, we explore fully automated approaches to generating a pool. By combining a simple voting approach with machine learning from documents retrieved by automatic runs, we are able to identify a large portion of relevant documents that would normally only be found through manual runs. Our initial results are promising and can be extended in future studies to help test collection curators ensure proper judgment coverage is maintained across complete document collections.
The search duel: a response to a strong ranker BIBAFull-Text 919-922
  Peter Izsak; Fiana Raiber; Oren Kurland; Moshe Tennenholtz
How can a search engine with a relatively weak relevance ranking function compete with a search engine that has a much stronger ranking function? This dual challenge, which to the best of our knowledge has not been addressed in previous work, entails an interesting bi-modal utility function for the weak search engine. That is, the goal is to produce in response to a query a document result list whose effectiveness does not fall much behind that of the strong search engine; and, which is quite different than that of the strong engine. We present a per-query algorithmic approach that leverages fundamental retrieval principles such as pseudo-feedback-based relevance modeling. We demonstrate the merits of our approach using TREC data.
Modeling the evolution of product entities BIBAFull-Text 923-926
  Priya Radhakrishnan; Manish Gupta; Vasudeva Varma
A large number of web queries are related to product entities. Studying evolution of product entities can help analysts understand the change in particular attribute values for these products. However, studying the evolution of a product requires us to be able to link various versions of a product together in a temporal order. While it is easy to temporally link recent versions of products in a few domains manually, solving the problem in general is challenging. The ability to temporally order and link various versions of a single product can also improve product search engines. In this paper, we tackle the problem of finding the previous version (predecessor) of a product entity. Given a repository of product entities, we first parse the product names using a CRF model. After identifying entities corresponding to a single product, we solve the problem of finding the previous version of any given particular version of the product. For the second task, we leverage innovative features with a Naïve Bayes classifier. Our methods achieve a precision of 88% in identifying the product version from product entity names, and a precision of 53% in identifying the predecessor.
Predicting bursts and popularity of hashtags in real-time BIBAFull-Text 927-930
  Shoubin Kong; Qiaozhu Mei; Ling Feng; Fei Ye; Zhe Zhao
Hashtags have been widely used to annotate topics in tweets (short posts on Twitter.com). In this paper, we study the problems of real-time prediction of bursting hashtags. Will a hashtag burst in the near future? If it will, how early can we predict it, and how popular will it become? Based on empirical analysis of data collected from Twitter, we propose solutions to these challenging problems. The performance of different features and possible solutions are evaluated.
Probabilistic ensemble learning for Vietnamese word segmentation BIBAFull-Text 931-934
  Wuying Liu; Li Lin
Word segmentation is a challenging issue, and the corresponding algorithms can be used in many applications of natural language processing. This paper addresses the problem of Vietnamese word segmentation, proposes a probabilistic ensemble learning (PEL) framework, and designs a novel PEL-based word segmentation (PELWS) algorithm. Supported by the data structure of syllable-syllable frequency index, the PELWS algorithm combines multiple weak segmenters to form a strong segmenter within the PEL framework. The experimental results show that the PELWS algorithm can achieve the state-of-the-art performance in the Vietnamese word segmentation task.
Improving unsupervised query segmentation using parts-of-speech sequence information BIBAFull-Text 935-938
  Rishiraj Saha Roy; Yogarshi Vyas; Niloy Ganguly; Monojit Choudhury
We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different POS tagsets and an unsupervised POS induction technique specifically adapted for queries show that POS information can significantly improve query segmentation performance in all these cases.
Building a query log via crowdsourcing BIBAFull-Text 939-942
  Omar Alonso; Maria Stone
A query log is a key asset in a commercial search engine. Everyday millions of users rely on search engines to find information on the Web by entering a few keywords on a simple search interface. Those queries represent a subset of user behavioral data which is used to mine and discover search patterns for improving the overall end user experience. While queries are very useful, it is not always possible to capture precisely what the user was looking for when the intent is not that clear. We explore a different alternative based on human computation to gather a bit more information from users and show the type of query log that would be possible to construct.
Web search without 'stupid' results BIBAFull-Text 943-946
  Aleksandra Lomakina; Nikita Povarov; Pavel Serdyukov
One of the main targets of any search engine is to make every user fully satisfied with her search results. For this reason, lots of efforts are being paid to improving ranking models in order to show the best results to users. However, there is a class of documents on the Web, which can spoil all efforts being shown to the users. When users receive results, which are not only irrelevant, but also completely out of the picture of their expectations, they can get really frustrated. So, we attempted to find a method to determine such documents and reduce their negative impact upon users and, as a consequence, on search engines in general.
Discovering real-world use cases for a multimodal math search interface BIBAFull-Text 947-950
  Keita Del Valle Wangari; Richard Zanibbi; Anurag Agarwal
To use math expressions in search, current search engines require knowing expression names or using a structure editor or string encoding (e.g., LaTeX). For mathematical non-experts, this can lead to an "intention gap" between the query they wish to express and what the interface will allow them to express. min is a search interface that supports drawing expressions on a canvas using mouse/touch, keyboard and images. We present a user study examining whether min changes search behavior for mathematical non-experts, and to identify real-world usage scenarios for multimodal math search interfaces. Participants found query-by-expression using hand-drawn input useful, and identified scenarios in which they would like to use systems like min such as for locating, editing and sharing complex expressions (e.g., with many Greek letters), and working on complex math problems.
Improving search personalisation with dynamic group formation BIBAFull-Text 951-954
  Thanh Tien Vu; Dawei Song; Alistair Willis; Son Ngoc Tran; Jingfei Li
Recent research has shown that the performance of search engines can be improved by enriching a user's personal profile with information about other users with shared interests. In the existing approaches, groups of similar users are often statically determined, e.g., based on the common documents that users clicked. However, these static grouping methods are query-independent and neglect the fact that users in a group may have different interests with respect to different topics. In this paper, we argue that common interest groups should be dynamically constructed in response to the user's input query. We propose a personalisation framework in which a user profile is enriched using information from other users dynamically grouped with respect to an input query. The experimental results on query logs from a major commercial web search engine demonstrate that our framework improves the performance of the web search engine and also achieves better performance than the static grouping method.
Detection of abnormal profiles on group attacks in recommender systems BIBAFull-Text 955-958
  Wei Zhou; Yun Sing Koh; Junhao Wen; Shafiq Alam; Gillian Dobbie
Recommender systems using Collaborative Filtering techniques are capable of make personalized predictions. However, these systems are highly vulnerable to profile injection attacks. Group attacks are attacks that target a group of items instead of one, and there are common attributes among these items. Such profiles will have a good probability of being similar to a large number of user profiles, making them hard to detect. We propose a novel technique for identifying group attack profiles which uses an improved metric based on Degree of Similarity with Top Neighbors (DegSim) and Rating Deviation from Mean Agreement (RDMA). We also extend our work with a detailed analysis of target item rating patterns. Experiments show that the combined methods can improve detection rates in user-based recommender systems.
On run diversity in Evaluation as a Service BIBAFull-Text 959-962
  Ellen M. Voorhees; Jimmy Lin; Miles Efron
"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which the evaluation task can be completed. However, this concept violates some of the premises of traditional pool-based collection building and thus calls into question the quality of the resulting test collection. In particular, the service API might restrict the diversity of runs that contribute to the pool: this might hamper innovation by researchers and lead to incomplete judgment pools that affect the reusability of the collection. This paper shows that the distinctiveness of the retrieval runs used to construct the first test collection built using EaaS, the TREC 2013 Microblog collection, is not substantially different from that of the TREC-8 ad hoc collection, a high-quality collection built using traditional pooling. Further analysis using the 'leave out uniques' test suggests that pools from the Microblog 2013 collection are less complete than those from TREC-8, although both collections benefit from the presence of distinctive and effective manual runs. Although we cannot yet generalize to all EaaS implementations, our analyses reveal no obvious flaws in the test collection built using the methodology in the TREC 2013 Microblog track.
Evaluating answer passages using summarization measures BIBAFull-Text 963-966
  Mostafa Keikha; Jae Hyun Park; W. Bruce Croft
Passage-based retrieval models have been studied for some time and have been shown to have some benefits for document ranking. Finding passages that are not only topically relevant, but are also answers to the users' questions would have a significant impact in applications such as mobile search. To develop models for answer passage retrieval, we need to have appropriate test collections and evaluation measures. Making annotations at the passage level is, however, expensive and can have poor coverage. In this paper, we describe the advantages of document summarization measures for evaluating answer passage retrieval and show that these measures have high correlation with existing measures and human judgments.
Analyzing bias in CQA-based expert finding test sets BIBAFull-Text 967-970
  Reyyan Yeniterzi; Jamie Callan
Data retrieved from community question answering (CQA) sites, such as content and users' assessments of content, is commonly used for expertise estimation related tasks. One such task, in which the received votes are directly used as graded relevance assessment values, is ranking replies of a question. Even though these available assessments values are very practical for evaluation purposes, they may not always reflect the correct assessment value of the content, due to the possible temporal or presentation bias introduced by the CQA system during voting process. This paper analyzes a very commonly used CQA data collection in terms of these introduced biases and their effects on the experimental evaluation of approaches. A more bias free test set construction approach, which has correlated results with the manual assessments, is also proposed in this paper.
Understanding negation and family history to improve clinical information retrieval BIBAFull-Text 971-974
  Bevan Koopman; Guido Zuccon
We present a study to understand the effect that negated terms (e.g., "no fever") and family history (e.g., "family history of diabetes") have on searching clinical records. Our analysis is aimed at devising the most effective means of handling negation and family history. In doing so, we explicitly represent a clinical record according to its different content types: negated, family history and normal content; the retrieval model weights each of these separately. Empirical evaluation shows that overall the presence of negation harms retrieval effectiveness while family history has little effect. We show negation is best handled by weighting negated content (rather than the common practise of removing or replacing it). However, we also show that many queries benefit from the inclusion of negated content and that negation is optimally handled on a per-query basis. Additional evaluation shows that adaptive handing of negated and family history content can have significant benefits.
Modeling dual role preferences for trust-aware recommendation BIBAFull-Text 975-978
  Weilong Yao; Jing He; Guangyan Huang; Yanchun Zhang
Unlike in general recommendation scenarios where a user has only a single role, users in trust rating network, e.g. Epinions, are associated with two different roles simultaneously: as a truster and as a trustee. With different roles, users can show distinct preferences for rating items, which the previous approaches do not involve. Moreover, based on explicit single links between two users, existing methods can not capture the implicit correlation between two users who are similar but not socially connected. In this paper, we propose to learn dual role preferences (truster/trustee-specific preferences) for trust-aware recommendation by modeling explicit interactions (e.g., rating and trust) and implicit interactions. In particular, local links structure of trust network are exploited as two regularization terms to capture the implicit user correlation, in terms of truster/trustee-specific preferences. Using a real-world and open dataset, we conduct a comprehensive experimental study to investigate the performance of the proposed model, RoRec. The results show that RoRec outperforms other trust-aware recommendation approaches, in terms of prediction accuracy.
Mouse movement during relevance judging: implications for determining user attention BIBAFull-Text 979-982
  Mark D. Smucker; Xiaoyu Sunny Guo; Andrew Toulis
Several researchers have found that a user's mouse position gives an indication of the user's gaze during web search and other tasks. As part of a user study that involved relevance judging of document summaries and full documents, we recorded users' mouse movements. We found that in a large number of cases, the users did nothing more with their mouse than move it to the buttons used for recording the relevance decision. In addition, we found that different search topics can result in large differences in the amount of mouse movement that is indicative of user attention. For simple reading tasks, such as short document summaries, mouse-tracking does not appear to be an effective means of discerning user attention. While more complex tasks may allow mouse movements to provide information regarding user attention, on average, indications of user attention existed in only 59% of the relevance judgments made for full documents.
PAAP: prefetch-aware admission policies for query results cache in web search engines BIBAFull-Text 983-986
  Hongyuan Ma; Wei Liu; Bingjie Wei; Liang Shi; Xiuguo Bao; Lihong Wang; Bin Wang
Caching query results is an efficient technique for Web search engines. Admission policy can prevent infrequent queries from taking space of more frequent queries in the cache. In this paper we present two novel admission policies tailored for query results cache. These policies are based on query results prefetching information. We also propose a demote operation for the query results cache to improve the cache hit ratio. We then use a trace of over 5 million queries to evaluate our admission policies, as well as traditional policies. Experimental results show that our prefetch-aware admission policies can achieve hit ratios better than state-of-the-art admission policies.
User geospatial context for music recommendation in microblogs BIBAFull-Text 987-990
  Markus Schedl; Andreu Vall; Katayoun Farrahi
Music information retrieval and music recommendation are seeing a paradigm shift towards methods that incorporate user context aspects. However, structured experiments on a standardized music dataset to investigate the effects of doing so are scarce. In this paper, we compare performance of various combinations of collaborative filtering and geospatial as well as cultural user models for the task of music recommendation. To this end, we propose a geospatial model that uses GPS coordinates and a cultural model that uses semantic locations (continent, country, and state of the user). We conduct experiments on a novel standardized music collection, the "Million Musical Tweets Dataset" of listening events extracted from microblogs. Overall, we find that modeling listeners' location via Gaussian mixture models and computing similarities from these outperforms both cultural user models and collaborative filtering.
Compositional data analysis (CoDA) approaches to distance in information retrieval BIBAFull-Text 991-994
  Paul Thomas; David Lovell
Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole -- term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR.
   In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naive to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering.
Group latent factor model for recommendation with multiple user behaviors BIBAFull-Text 995-998
  Jian Cheng; Ting Yuan; Jinqiao Wang; Hanqing Lu
Recently, some recommendation methods try to relieve the data sparsity problem of Collaborative Filtering by exploiting data from users' multiple types of behaviors. However, most of the exist methods mainly consider to model the correlation between different behaviors and ignore the heterogeneity of them, which may make improper information transferred and harm the recommendation results. To address this problem, we propose a novel recommendation model, named Group Latent Factor Model (GLFM), which attempts to learn a factorization of latent factor space into subspaces that are shared across multiple behaviors and subspaces that are specific to each type of behaviors. Thus, the correlation and heterogeneity of multiple behaviors can be modeled by these shared and specific latent factors. Experiments on the real-world dataset demonstrate that our model can integrate users' multiple types of behaviors into recommendation better.
Hashing with List-Wise learning to rank BIBAFull-Text 999-1002
  Zhou Yu; Fei Wu; Yin Zhang; Siliang Tang; Jian Shao; Yueting Zhuang
Hashing techniques have been extensively investigated to boost similarity search for large-scale high-dimensional data. Most of the existing approaches formulate the their objective as a pair-wise similarity-preserving problem. In this paper, we consider the hashing problem from the perspective of optimizing a list-wise learning to rank problem and propose an approach called List-Wise supervised Hashing (LWH). In LWH, the hash functions are optimized by employing structural SVM in order to explicitly minimize the ranking loss of the whole list-wise permutations instead of merely the point-wise or pair-wise supervision. We evaluate the performance of LWH on two real-world data sets. Experimental results demonstrate that our method obtains a significant improvement over the state-of-the-art hashing approaches due to both structural large margin and list-wise ranking pursuing in a supervised manner.
A burstiness-aware approach for document dating BIBAFull-Text 1003-1006
  Dimitrios Kotsakos; Theodoros Lappas; Dimitrios Kotzias; Dimitrios Gunopulos; Nattiya Kanhabua; Kjetil Nørvåg
A large number of mainstream applications, like temporal search, event detection, and trend identification, assume knowledge of the timestamp of every document in a given textual collection. In many cases, however, the required timestamps are either unavailable or ambiguous. A characteristic instance of this problem emerges in the context of large repositories of old digitized documents. For such documents, the timestamp may be corrupted during the digitization process, or may simply be unavailable. In this paper, we study the task of approximating the timestamp of a document, so-called document dating. We propose a content-based method and use recent advances in the domain of term burstiness, which allow it to overcome the drawbacks of previous document dating methods, e.g. the fix time partition strategy. We use an extensive experimental evaluation on different datasets to validate the efficacy and advantages of our methodology, showing that our method outperforms the state of the art methods on document dating.
An analysis of query difficulty for information retrieval in the medical domain BIBAFull-Text 1007-1010
  Lorraine Goeuriot; Liadh Kelly; Johannes Leveling
We present a post-hoc analysis of a benchmarking activity for information retrieval (IR) in the medical domain to determine if performance for queries with different levels of complexity can be associated with different IR methods or techniques. Our analysis is based on data and runs for Task 3 of the CLEF 2013 eHealth lab, which provided patient queries and a large medical document collection for patient centred medical information retrieval technique development. We categorise the queries based on their complexity, which is defined as the number of medical concepts they contain. We then show how query complexity affects performance of runs submitted to the lab, and provide suggestions for improving retrieval quality for this complex retrieval task and similar IR evaluation tasks.
Mobile query reformulations BIBAFull-Text 1011-1014
  Milad Shokouhi; Rosie Jones; Umut Ozertem; Karthik Raghunathan; Fernando Diaz
Users frequently interact with web search systems on their mobile devices via multiple modalities, including touch and speech. These interaction modes are substantially different from the user experience on desktop search. As a result, system designers have new challenges and questions around understanding the intent on these platforms. In this paper, we study the query reformulation patterns in mobile logs. We group query reformulations based on their input method into four categories; text-text, text-voice, voice-text and voice-voice. We discuss the unique characteristics of each of these groups by comparing them against each other and desktop logs. We also compare the distribution of reformulation types (e.g. adding/dropping words) against desktop logs and show that there are new classes of reformulations that are caused by errors in speech recognition. Our results suggest that users do not tend to switch between different input types (e.g. voice and text). Voice to text switches are largely caused by speech recognition errors, and text to voice switches are unlikely to be about the same intent.
On peculiarities of positional effects in sponsored search BIBAFull-Text 1015-1018
  Vyacheslav Alipov; Valery Topinsky; Ilya Trofimov
Click logs provide a unique and highly valuable source of human judgments on ads' relevance. However, clicks are heavily biased by lots of factors. Two main factors that are widely acknowledged to be the most influential ones are neighboring ads and presentation order. The latter is referred to as positional effect. A popular practice to recover the ads quality cleaned from positional bias is to adopt click models based on examination or cascade hypothesis originally developed for organic search. In this paper we show the strong evidence that this practice is far from perfection when considering the top ads block on a search engine result page (SERP). We show that cascade hypothesis is the most questionable one because of important differences between organic and sponsored search results that may encourage users to analyze the whole ads-block before clicking. Additionally, we design a testing setup for an unbiased evaluation of click model prediction accuracy.
A collective topic model for milestone paper discovery BIBAFull-Text 1019-1022
  Ziyu Lu; Nikos Mamoulis; David W. Cheung
Prior arts stay at the foundation for future work in academic research. However the increasingly large amount of publications makes it difficult for researchers to effectively discover the most important previous works to the topic of their research. In this paper, we study the automatic discovery of the core papers for a research area. We propose a collective topic model on three types of objects: papers, authors and published venues. We model any of these objects as bags of citations. Based on Probabilistic latent semantic analysis (PLSA), authorship, published venues and citation relations are used for quantifying paper importance. Our method discusses milestone paper discovery in different cases of input objects. Experiments on the ACL Anthology Network (ANN) indicate that our model is superior in milestone paper discovery when compared to a previous model which considers only papers.
Document summarization based on word associations BIBAFull-Text 1023-1026
  Oskar Gross; Antoine Doucet; Hannu Toivonen
In the age of big data, automatic methods for creating summaries of documents become increasingly important. In this paper we propose a novel, unsupervised method for (multi-)document summarization. In an unsupervised and language-independent fashion, this approach relies on the strength of word associations in the set of documents to be summarized. The summaries are generated by picking sentences which cover the most specific word associations of the document(s). We measure the performance on the DUC 2007 dataset. Our experiments indicate that the proposed method is the best-performing unsupervised summarization method in the state-of-the-art that makes no use of human-curated knowledge bases.
Do users rate or review?: boost phrase-level sentiment labeling with review-level sentiment classification BIBAFull-Text 1027-1030
  Yongfeng Zhang; Haochen Zhang; Min Zhang; Yiqun Liu; Shaoping Ma
Current approaches for contextual sentiment lexicon construction in phrase-level sentiment analysis assume that the numerical star rating of a review represents the overall sentiment orientation of the review text. Although widely adopted, we find through user rating analysis that this is not necessarily true. In this paper, we attempt to bridge the gap between phrase-level and review/document-level sentiment analysis by leveraging the results given by review-level sentiment classification to boost phrase-level sentiment polarity labeling in contextual sentiment lexicon construction tasks, using a novel constrained convex optimization framework. Experimental results on both English and Chinese reviews show that our framework improves the precision of sentiment polarity labeling by up to 5.6%, which is a significant improvement from current approaches.
Random subspace for binary codes learning in large scale image retrieval BIBAFull-Text 1031-1034
  Cong Leng; Jian Cheng; Hanqing Lu
Due to the fast query speed and low storage cost, hashing based approximate nearest neighbor search methods have attracted much attention recently. Many state of the art methods are based on eigenvalue decomposition. In these approaches, the information caught in different dimensions is unbalanced and generally most of the information is contained in the top eigenvectors. We demonstrate that this leads to an unexpected phenomenon that longer hashing code does not necessarily yield better performance. In this work, we introduce a random subspace strategy to address this limitation. At first, a small fraction of the whole feature space is randomly sampled to train the hashing algorithms each time and only the top eigenvectors are kept to generate one piece of short code. This process will be repeated several times and then the obtained many pieces of short codes are concatenated into one piece of long code. Theoretical analysis and experiments on two benchmarks confirm the effectiveness of the proposed strategy for hashing.
Incorporating query-specific feedback into learning-to-rank models BIBAFull-Text 1035-1038
  Ethem F. Can; W. Bruce Croft; R. Manmatha
Relevance feedback has been shown to improve retrieval for a broad range of retrieval models. It is the most common way of adapting a retrieval model for a specific query. In this work, we expand this common way by focusing on an approach that enables us to do query-specific modification of a retrieval model for learning-to-rank problems. Our approach is based on using feedback documents in two ways: 1) to improve the retrieval model directly and 2) to identify a subset of training queries that are more predictive than others. Experiments with the Gov2 collection show that this approach can obtain statistically significant improvements over two baselines; learning-to-rank (SVM-rank) with no feedback and learning-to-rank with standard relevance feedback.
Large-scale author verification: temporal and topical influences BIBAFull-Text 1039-1042
  Michiel van Dam; Claudia Hauff
The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.
Evaluating mobile web search performance by taking good abandonment into account BIBAFull-Text 1043-1046
  Olga Arkhipova; Lidia Grauer
Usage of mobile devices for Web search grows rapidly in recent years. The common tendency is that users want to receive information immediately results in incorporating rich snippets and vertical results into search engine result pages (SERPs) and in increasing of good abandonment. This article provides an offline metric for quality evaluation of mobile Web search, which takes good abandonment rate into consideration. The metric is the DBN click model that allows the probability to be satisfied directly on the SERP. The model parameters are estimated from the mobile search logs of a controlled experiment. The new metric outperforms traditional ERR metric in terms of the validation dataset built using a SERP degradation technique.
Assessing the reliability and reusability of an E-discovery privilege test collection BIBAFull-Text 1047-1050
  Jyothi K. Vinjumur; Douglas W. Oard; Jiaul H. Paik
In some jurisdictions, parties to a lawsuit can request documents from each other, but documents subject to a claim of privilege may be withheld. The TREC 2010 Legal Track developed what is presently the only public test collection for evaluating privilege classification. This paper examines the reliability and reusability of that collection. For reliability, the key question is the extent to which privilege judgments correctly reflect the opinion of the senior litigator whose judgment is authoritative. For reusability, the key question is the degree to which systems whose results contributed to creation of the test collection can be fairly compared with other systems that use those privilege judgments in the future. These correspond to measurement error and sampling error, respectively. The results indicate that measurement error is the larger problem.
Modeling evolution of a social network using temporal graph kernels BIBAFull-Text 1051-1054
  Akash Anil; Niladri Sett; Sanasam Ranbir Singh
Majority of the studies on modeling the evolution of a social network using spectral graph kernels do not consider temporal effects while estimating the kernel parameters. As a result, such kernels fail to capture structural properties of the evolution over the time. In this paper, we propose temporal spectral graph kernels of four popular graph kernels namely path counting, triangle closing, exponential and neumann. Their responses in predicting future growth of the network have been investigated in detail, using two large datasets namely Facebook and DBLP. It is evident from various experimental setups that the proposed temporal spectral graph kernels outperform all of their non-temporal counterparts in predicting future growth of the networks.
On user interactions with query auto-completion BIBAFull-Text 1055-1058
  Bhaskar Mitra; Milad Shokouhi; Filip Radlinski; Katja Hofmann
Query Auto-Completion (QAC) is a popular feature of web search engines that aims to assist users to formulate queries faster and avoid spelling mistakes by presenting them with possible completions as soon as they start typing. However, despite the wide adoption of auto-completion in search systems, there is little published on how users interact with such services.
   In this paper, we present the first large-scale study of user interactions with auto-completion based on query logs of Bing, a commercial search engine. Our results confirm that lower-ranked auto-completion suggestions receive substantially lower engagement than those ranked higher. We also observe that users are most likely to engage with auto-completion after typing about half of the query, and in particular at word boundaries. Interestingly, we also noticed that the likelihood of using auto-completion varies with the distance of query characters on the keyboard.
   Overall, we believe that the results reported in our study provide valuable insights for understanding user engagement with auto-completion, and are likely to inform the design of more effective QAC systems.
Re-ranking approach to classification in large-scale power-law distributed category systems BIBAFull-Text 1059-1062
  Rohit Babbar; Ioannis Partalas; Eric Gaussier; Massih-reza Amini
For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power-law distribution. It implies that a significant fraction of categories, referred to as rare categories, have very few documents assigned to them. This characteristic of the data makes it harder for learning algorithms to learn effective decision boundaries which can correctly detect such categories in the test set. In this work, we exploit the distribution of documents among categories to (i) derive an upper bound on the accuracy of any classifier, and (ii) propose a ranking-based algorithm which aims to maximize this upper bound. The empirical evaluation on publicly available large-scale datasets demonstrate that the proposed method not only achieves higher accuracy but also much higher coverage of rare categories as compared to state-of-the-art methods.
Enhancing personalization via search activity attribution BIBAFull-Text 1063-1066
  Adish Singla; Ryen W. White; Ahmed Hassan; Eric Horvitz
Online services rely on machine identifiers to tailor services such as personalized search and advertising to individual users. The assumption made is that each identifier comprises the behavior of a single person. However, shared machine usage is common, and in these cases, the activities of multiple users may be generated under a single identifier, creating a potentially noisy signal for applications such as search personalization. We propose enhancing Web search personalization with methods that can disambiguate among different users of a machine, thus connecting the current query with the appropriate search history. Using logs containing both person and machine identifiers, and logs from a popular commercial search engine, we learn models that accurately assign observed search behaviors to each of different users. This information is then used to augment existing personalization methods that are currently based only on machine identifiers. We show that this new capability to infer users can be used to improve the performance of existing personalization methods. The early findings of our research are promising and have implications for search personalization.
A syntax-aware re-ranker for microblog retrieval BIBAFull-Text 1067-1070
  Aliaksei Severyn; Alessandro Moschitti; Manos Tsagkias; Richard Berendsen; Maarten de Rijke
We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly discriminative features. We test the generalization power of our approach on the TREC Microblog 2011 and 2012 tasks. We find that relational syntactic features generated by structural kernels are effective for learning to rank (L2R) and can easily be combined with those of other existing systems to boost their accuracy. In particular, the results show that our L2R approach improves on almost all the participating systems at TREC, only using their raw scores as a single feature. Our method yields an average increase of 5% in retrieval effectiveness and 7 positions in system ranks.
Weighted aspect-based collaborative filtering BIBAFull-Text 1071-1074
  YanPing Nie; Yang Liu; Xiaohui Yu
Existing work on collaborative filtering (CF) is often based on the overall ratings the items have received. However, in many cases, understanding how a user rates each aspect of an item may reveal more detailed information about her preferences and thus may lead to more effective CF. Prior work has studied extracting/quantizing sentiments on different aspects from the reviews, based on which the unknown overall ratings are inferred. However, in that work, all the aspects are treated equally; while in reality, different users tend to place emphases on difference aspects when reaching the overall rating. For example, users may give a high rating to a movie just for its plot despite its mediocre performances. This emphasis on aspects varies for different users and different items. In this paper, we propose a method that uses tensor factorization to automatically infer the weights of different aspects in forming the overall rating. The main idea is to learn, through constrained optimization, a compact representation of a weight tensor indexed by three dimensions for user, item, and aspect, respectively. Overall ratings can then be predicted using the obtained weights. Experiments on a movie dataset show that our method compares favorably with three baseline methods.
Evaluating intuitiveness of vertical-aware click models BIBAFull-Text 1075-1078
  Aleksandr Chuklin; Ke Zhou; Anne Schuth; Floor Sietsma; Maarten de Rijke
Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation.
   We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical (e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.
Recipient recommendation in enterprises using communication graphs and email content BIBAFull-Text 1079-1082
  David Graus; David van Dijk; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke
We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.
Analyzing the content emphasis of web search engines BIBAFull-Text 1083-1086
  Mohammed A. Alam; Doug Downey
Millions of people search the Web each day. As a consequence, the ranking algorithms employed by Web search engines have a profound influence on which pages users visit. Characterizing this influence, and informing users when different engines favor certain sites or points of view, enables more transparent access to the Web's information. We present PAWS, a platform for analyzing differences among Web search engines. PAWS measures content emphasis: the degree to which differences across search engines' rankings correlate with features of the ranked content, including point of view (e.g., positive or negative orientation toward their company's products) and advertisements. We propose an approach for identifying the orientations in search results at scale, through a novel technique that minimizes the expected number of human judgments required. We apply PAWS to news search on Google and Bing, and find no evidence that the engines emphasize results that express positive orientation toward the engine company's products. We do find that the engines emphasize particular news sites, and that they also favor pages containing their company's advertisements, as opposed to competitor advertisements.
Effects of task and domain on searcher attention BIBAFull-Text 1087-1090
  Dmitry Lagun; Eugene Agichtein
Previous studies of online user attention during information seeking tasks have mainly focused on analyzing searcher behavior in the web search settings. While these studies enabled better understanding of search result examination, their findings might not generalize for the tasks and search interfaces in other domains such as Shopping or Social Media. In this paper we present, to best of our knowledge, the first cross-domain comparison of search examination behavior and patterns of aggregated attention across Web Search, News, Shopping and Social Network domains. We investigate how domain of the search and the scope of the information need affect search examination, and find significant differences beyond those arising from natural disparities between individuals. For example, we find that the mean fixation duration, a common indicator of cognitive load, varies significantly across domains (e.g., mean fixation duration in the Social Network domain exceeds that of general Web Search by over 30%). We also find large differences in the aggregate patterns of user attention on the screen, especially in the Shopping and Social Network domains compared to the Web Search domain, emphasizing the need for domain specific user models and evaluation metrics.
Learning sufficient queries for entity filtering BIBAFull-Text 1091-1094
  Miles Efron; Craig Willis; Garrick Sherman
Entity-centric document filtering is the task of analyzing a time-ordered stream of documents and emitting those that are relevant to a specified set of entities (e.g., people, places, organizations). This task is exemplified by the TREC Knowledge Base Acceleration (KBA) track and has broad applicability in other modern IR settings. In this paper, we present a simple yet effective approach based on learning high-quality Boolean queries that can be applied deterministically during filtering. We call these Boolean statements sufficient queries. We argue that using deterministic queries for entity-centric filtering can reduce confounding factors seen in more familiar "score-then-threshold" filtering methods. Experiments on two standard datasets show significant improvements over state-of-the-art baseline models.
PatentLine: analyzing technology evolution on multi-view patent graphs BIBAFull-Text 1095-1098
  Longhui Zhang; Lei Li; Tao Li; Qi Zhang
The fast growth of technologies has driven the advancement of our society. It is often necessary to quickly grab the evolution of technologies in order to better understand the technology trend. The availability of huge volumes of granted patent documents provides a reasonable basis for analyzing technology evolution. In this paper, we propose a unified framework, named PatentLine, to generate a technology evolution tree for a given topic or a classification code related to granted patents. The framework integrates different types of patent information, including patent content, citations of patents, temporal relations, etc., and provides a concise yet comprehensive evolution summary. The generated summary enables a variety of patent-related analyses such as identifying relevant prior art and detecting technology gap. A case study on a collection of US patents demonstrates the efficacy of our proposed framework.
Query performance prediction for entity retrieval BIBAFull-Text 1099-1102
  Hadas Raviv; Oren Kurland; David Carmel
We address the query-performance-prediction task for entity retrieval; that is, retrieval effectiveness is estimated with no relevance judgements. First we show how to adapt state-of-the-art query-performance predictors proposed for document retrieval to the entity retrieval domain. We then present a novel predictor that is based on the cluster hypothesis. Evaluation performed with the INEX entity ranking track collections shows that our predictor can often outperform the most effective predictors we experimented with.
Second order probabilistic models for within-document novelty detection in academic articles BIBAFull-Text 1103-1106
  Laurence A.F. Park; Simeon Simoff
It is becoming increasingly difficult to stay aware of the state-of-the-art in any research field due to the exponential increase in the number of academic publications. This problem effects authors and reviewers of submissions to academic journals and conferences, who must be able to identify which portions of an article are novel and which are not. Therefore, having a process to automatically judge the flow of novelty though a document would assist academics in their quest for truth. In this article, we propose the concept of Within Document Novelty Location, a method of identifying locations of novelty and non-novelty within a given document. In this preliminary investigation, we examine if a second order statistical model has any benefit, in terms of accuracy and confidence, over a simpler first order model. Experiments on 928 text sequences taken from three academic articles showed that the second order model provided a significant increase in novelty location accuracy for two of the three documents. There was no significant difference in accuracy for the remaining document, which is likely to be due to the absence of context analysis.
Modeling the dynamics of personal expertise BIBAFull-Text 1107-1110
  Yi Fang; Archana Godavarthy
Personal expertise or interests often evolve over time. Despite much work on expertise retrieval in the recent years, very little work has studied the dynamics of personal expertise. In this paper, we propose a probabilistic model to characterize how people change or stick with their expertise. Specifically, three factors are taken into consideration in whether an expert will choose a new expertise area: 1) the personality of the expert in exploring new areas; 2) the similarity between the new area and the expert's current areas; 3) the popularity of the new area. These three factors are integrated into a unified generative process. A predictive language model is derived to estimate the distribution of the expert's words in her future publications. In addition, KL divergence is defined on the predictive language model to quantify and forecast the change of expertise. We conduct the experiments on a testbed of academic publications and the initial results demonstrate the effectiveness of the proposed approach.
An annotation similarity model in passage ranking for historical fact validation BIBAFull-Text 1111-1114
  Jun Araki; Jamie Callan
State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity model to improve passage ranking. The proposed annotation similarity model is generic enough to process annotations of arbitrary types. Historical fact validation is a subtask to determine whether a given sentence tells us historically correct information, which is important for a QA task on world history. Experimental results show that the combined model gains up to 7.7% and 4.2% improvements in historical fact validation in terms of precision at rank 1 and mean reciprocal rank, respectively.
To hint or not: exploring the effectiveness of search hints for complex informational tasks BIBAFull-Text 1115-1118
  Denis Savenkov; Eugene Agichtein
Extensive previous research has shown that searchers often require assistance with query formulation and refinement. Yet, it is not clear what kind of assistance is most useful, and how effective it is both objectively (e.g., in terms of task success) and subjectively (e.g., in terms of searcher perception of the search difficulty). This work describes the results of a controlled user study comparing the effects of providing specific vs. generic search hints on search success and satisfaction. Our results indicate that specific search hints tend to effectively improve searcher success rates and reduce perceived effort, while generic ones can be detrimental in both search effectiveness and user satisfaction. The results of this study are an important step towards the design of future search systems that could effectively assist and guide the user in accomplishing complex search tasks.
The effect of sampling strategy on inferred measures BIBAFull-Text 1119-1122
  Ellen M. Voorhees
Using the inferred measures framework is a popular choice for constructing test collections when the target document set is too large for pooling to be a viable option. Within the framework, different amounts of assessing effort is placed on different regions of the ranked lists as defined by a sampling strategy. The sampling strategy is critically important to the quality of the resultant collection, but there is little published guidance as to the important factors. This paper addresses this gap by examining the effect on collection quality of different sampling strategies within the inferred measures framework. The quality of a collection is measured by how accurately it distinguishes the set of significantly different system pairs. Top-K pooling is competitive, though not the best strategy because it cannot distinguish topics with large relevant set sizes. Incorporating a deep, very sparsely sampled stratum is a poor choice. Strategies that include a top-10 pool create better collections than those that do not, as well as allow Precision(10) scores to be directly computed.
Cache-conscious runtime optimization for ranking ensembles BIBAFull-Text 1123-1126
  Xun Tang; Xin Jin; Tao Yang
Multi-tree ensemble models have been proven to be effective for document ranking. Using a large number of trees can improve accuracy, but it takes time to calculate ranking scores of matched documents. This paper investigates data traversal methods for fast score calculation with a large ensemble. We propose a 2D blocking scheme for better cache utilization with simpler code structure compared to previous work. The experiments with several benchmarks show significant acceleration in score calculation without loss of ranking accuracy.
Bridging temporal context gaps using time-aware re-contextualization BIBAFull-Text 1127-1130
  Andrea Ceroni; Nam Khanh Tran; Nattiya Kanhabua; Claudia Niederée
Understanding a text, which was written some time ago, can be compared to translating a text from another language. Complete interpretation requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context knowledge at time of text creation. In this paper, we study time-aware re-contextualization, the challenging problem of retrieving concise and complementing information in order to bridge this temporal context gap. We propose an approach based on learning to rank techniques using sentence-level context information extracted from Wikipedia. The employed ranking combines relevance, complimentarity and time-awareness. The effectiveness of the approach is evaluated by contextualizing articles from a news archive collection using more than 7,000 manually judged relevance pairs. To this end, we show that our approach is able to retrieve a significant number of relevant context information for a given news article.
An enhanced context-sensitive proximity model for probabilistic information retrieval BIBAFull-Text 1131-1134
  Jiashu Zhao; Jimmy Xiangji Huang
We propose to enhance proximity-based probabilistic retrieval models with more contextual information. A term pair with higher contextual relevance of term proximity is assigned a higher weight. Several measures are proposed to estimate the contextual relevance of term proximity. We assume the top ranked documents from a basic weighting model are more relevant to the query, and calculate the contextual relevance of term proximity using the top ranked documents. We propose a context-sensitive proximity model, and the experimental results on standard TREC data sets show the effectiveness of our proposed model.
On the information difference between standard retrieval models BIBAFull-Text 1135-1138
  Peter B. Golbus; Javed A. Aslam
Recent work introduced a probabilistic framework that measures search engine performance information-theoretically. This allows for novel meta-evaluation measures such as Information Difference, which measures the magnitude of the difference between search engines in their ranking of documents. for which we have relevance information. Using Information Difference we can compare the behavior of search engines-which documents the search engine prefers, as well as search engine performance-how likely the search engine is to satisfy a hypothetical user. In this work, we a) extend this probabilistic framework to precision-oriented contexts, b) show that Information Difference can be used to detect similar search engines at shallow ranks, and c) demonstrate the utility of the Information Difference methodology by showing that well-tuned search engines employing different retrieval models are more similar than a well-tuned and a poorly tuned implementation of the same retrieval model.
A POMDP model for content-free document re-ranking BIBAFull-Text 1139-1142
  Sicong Zhang; Jiyun Luo; Hui Yang
Log-based document re-ranking is a special form of session search. The task re-ranks documents from Search Engine Results Page (SERP) according to the search logs, in which both the search activities from other users and personalized query log for a user are available. The purpose of re-ranking is to provide the user with a new and better ordering of the initial retrieved documents. We test the system on the WSCD 2014 dataset, in which the actual content of the queries and documents are not available due to privacy concerns. The challenge is to perform effective re-ranking purely based on user behaviors, such as clicks and query reformulations rather than document content. In this paper, we propose to model log-based document re-ranking as a Partially Observable Markov Decision Process (POMDP). Experiments on the document re-ranking task show that our approach is effective and outperforms the baseline rankings provided by a commercial search engine.
Using score differences for search result diversification BIBAFull-Text 1143-1146
  Sadegh Kharazmi; Mark Sanderson; Falk Scholer; David Vallet
We investigate the application of a light-weight approach to result list clustering for the purposes of diversifying search results. We introduce a novel post-retrieval approach, which is independent of external information or even the full-text content of retrieved documents; only the retrieval score of a document is used. Our experiments show that this novel approach is beneficial to effectiveness, albeit only on certain baseline systems. The fact that the method works indicates that the retrieval score is potentially exploitable in diversity.
TREC: topic engineering exercise BIBAFull-Text 1147-1150
  J Shane Culpepper; Stefano Mizzaro; Mark Sanderson; Falk Scholer
In this work, we investigate approaches to engineer better topic sets in information retrieval test collections. By recasting the TREC evaluation exercise from one of building more effective systems to an exercise in building better topics, we present two possible approaches to quantify topic "goodness": topic ease and topic set predictivity. A novel interpretation of a well known result and a twofold analysis of data from several TREC editions lead to a result that has been neglected so far: both topic ease and topic set predictivity have changed significantly across the years, sometimes in a perhaps undesirable way.
How k-12 students search for learning?: analysis of an educational search engine log BIBAFull-Text 1151-1154
  Arif Usta; Ismail Sengor Altingovde; Ibrahim Bahattin Vidinli; Rifat Ozcan; Özgür Ulusoy
In this study, we analyze an educational search engine log for shedding light on K-12 students' search behavior in a learning environment. We specially focus on query, session, user and click characteristics and compare the trends to the findings in the literature for general web search engines. Our analysis helps understanding how students search with the purpose of learning in an educational vertical, and reveals new directions to improve the search performance in the education domain.
The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval BIBAFull-Text 1155-1158
  Fiana Raiber; Oren Kurland
We present a study of the correlation between the extent to which the cluster hypothesis holds, as measured by various tests, and the relative effectiveness of cluster-based retrieval with respect to document-based retrieval. We show that the correlation can be affected by several factors, such as the size of the result list of the most highly ranked documents that is analyzed. We further show that some cluster hypothesis tests are often negatively correlated with one another. Moreover, in several settings, some of the tests are also negatively correlated with the relative effectiveness of cluster-based retrieval.
The effect of expanding relevance judgements with duplicates BIBAFull-Text 1159-1162
  Gaurav Baruah; Adam Roegiest; Mark D. Smucker
We examine the effects of expanding a judged set of sentences with their duplicates from a corpus. Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection. We perform experiments in context of the Temporal Summarization Track at TREC 2013. We find that adding duplicate sentences to the judged set does not significantly affect relative system performance. However, we do find statistically significant changes in the performance of nearly half the systems that participated in the Track. We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance.
On correlation of absence time and search effectiveness BIBAFull-Text 1163-1166
  Sunandan Chakraborty; Filip Radlinski; Milad Shokouhi; Paul Baecke
Online search evaluation metrics are typically derived based on implicit feedback from the users. For instance, computing the number of page clicks, number of queries, or dwell time on a search result. In a recent paper, Dupret and Lalmas introduced a new metric called absence time, which uses the time interval between successive sessions of users to measure their satisfaction with the system. They evaluated this metric on a version of Yahoo! Answers. In this paper, we investigate the effectiveness of absence time in evaluating new features in a web search engine, such as new ranking algorithm or a new user interface. We measured the variation of absence time to the effects of 21 experiments performed on a search engine. Our findings show that the outcomes of absence time agreed with the judgement of human experts performing a thorough analysis of a wide range of online and offline metrics in 14 out of these 21 cases.
   We also investigated the relationship between absence time and a set of commonly-used covariates (features) such as the number of queries and clicks in the session. Our results suggest that users are likely to return to the search engine sooner when their previous session has more queries and more clicks.
Necessary and frequent terms in queries BIBAFull-Text 1167-1170
  Jiepu Jiang; James Allan
Vocabulary mismatch has long been recognized as one of the major issues affecting search effectiveness. Ineffective queries usually fail to incorporate important terms and/or incorrectly include inappropriate keywords. However, in this paper we show another cause of reduced search performance: sometimes users issue reasonable query terms, but systems cannot identify the correct properties of those terms and take advantages of the properties. Specifically, we study two distinct types of terms that exist in all search queries: (1) necessary terms, for which term occurrence alone is indicative of document relevance; and (2) frequent terms, for which the relative term frequency is indicative of document relevance within the set of documents where the term appears. We evaluate these two properties of query terms in a dataset. Results show that only 1/3 of the terms are both necessary and frequent, while another 1/3 only hold one of the properties and the final third do not hold any of the properties. However, existing retrieval models do not clearly distinguish terms with the two properties and consider them differently. We further show the great potential of improving retrieval models by treating terms with distinct properties differently.
Extracting topics based on authors, recipients and content in microblogs BIBAFull-Text 1171-1174
  Nazneen Fatema N. Rajani; Kate McArdle; Jason Baldridge
Microblogs such as Twitter are important sources for spreading vital information at high speed. They also reflect the general people's reaction and opinion towards major events or stories. With information traveling so quickly, it is helpful to be able to apply unsupervised learning techniques to discover topics for information extraction and analysis. Although graphical models have been traditionally used for topic discovery in microblogs and text streams, previous work may not be as efficient because of the diverse and noisy nature of microblogs.
   In this paper, we demonstrate the application of the Author-Topic and the Author-Recipient-Topic model to microblogs. We extensively compare these models under different settings to an LDA baseline. Our results show that the Author-Recipient-Topic model extracts the most coherent topics establishing that joint modeling on author-recipient pairs and on the content of tweet leads to quantitatively better topic discovery. This paper also addresses the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models. Our study reveals interesting traits about Twitter messages, users and their interactions.
Exploiting Twitter and Wikipedia for the annotation of event images BIBAFull-Text 1175-1178
  Philip James McParlane; Joemon Jose
With the rise in popularity of smart phones, there has been a recent increase in the number of images taken at large social (e.g. festivals) and world (e.g. natural disasters) events which are uploaded to image sharing websites such as Flickr. As with all online images, they are often poorly annotated, resulting in a difficult retrieval scenario. To overcome this problem, many photo tag recommendation methods have been introduced, however, these methods all rely on historical Flickr data which is often problematic for a number of reasons, including the time lag problem (i.e. in our collection, users upload images on average 50 days after taking them, meaning "training data" is often out of date). In this paper, we develop an image annotation model which exploits textual content from related Twitter and Wikipedia data which aims to overcome the discussed problems. The results of our experiments show and highlight the merits of exploiting social media data for annotating event images, where we are able to achieve recommendation accuracy comparable with a state-of-the-art model.
Learning to translate queries for CLIR BIBAFull-Text 1179-1182
  Artem Sokolov; Felix Hieber; Stefan Riezler
The statistical machine translation (SMT) component of cross-lingual information retrieval (CLIR) systems is often regarded as black box that is optimized for translation quality independent from the retrieval task. In recent work [10], SMT has been tuned for retrieval by training a reranker on $k$-best translations ordered according to their retrieval performance. In this paper we propose a decomposable proxy for retrieval quality that obviates the need for costly intermediate retrieval. Furthermore, we explore the full search space of the SMT decoder by directly optimizing decoder parameters under a retrieval-based objective. Experimental results for patent retrieval show our approach to be a promising alternative to the standard pipeline approach.
Predicting query performance in microblog retrieval BIBAFull-Text 1183-1186
  Jesus A. Rodriguez Perez; Joemon M. Jose
Query Performance Prediction (QPP) is the estimation of the retrieval success for a query, without explicit knowledge about relevant documents. QPP is especially interesting in the context of Automatic Query Expansion (AQE) based on Pseudo Relevance Feedback (PRF). PRF-based AQE is known to produce unreliable results when the initial set of retrieved documents is poor. Theoretically, a good predictor would allow to selectively apply PRF-based AQE when performance of the initial result set is good enough, thus enhancing the overall robustness of the system. QPP would be of great benefit in the context of microblog retrieval, as AQE was the most widely deployed technique for enhancing retrieval performance at TREC. In this work we study the performance of the state of the art predictors under microblog retrieval conditions as well as introducing our own predictors. Our results show how our proposed predictors outperform the baselines significantly.
An event extraction model based on timeline and user analysis in Latent Dirichlet allocation BIBAFull-Text 1187-1190
  Bayar Tsolmon; Kyung-Soon Lee
Social media such as Twitter has come to reflect the reaction of the general public to major events. Since posts are short and noisy, it is hard to extract reliable events based on word frequency. Even though an event term appears in a particularly low frequency, as long as at least one reliable user mentions the term, it should be extracted. This paper proposes an event extraction method which combines user reliability and timeline analysis. The Latent Dirichlet Allocation (LDA) topic model is adapted with the weights of event terms on timeline and reliable users to extract social events. The reliable users are detected on Twitter according to their tweeting behaviors: socially well-known users and active users. Reliable and low-frequency events can be detected based on reliable users In order to see the effectiveness of the proposed method, experiments are conducted on a Korean tweet collection; the proposed model achieved 72% in precision. This shows that the LDA with timeline and reliable users is effective for extracting events on the Twitter test collection.
What makes data robust: a data analysis in learning to rank BIBAFull-Text 1191-1194
  Shuzi Niu; Yanyan Lan; Jiafeng Guo; Xueqi Cheng; Xiubo Geng
When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on two typical kinds of learning to rank algorithms (i.e.~pairwise and listwise methods) and three different public data sets (i.e.~OHSUMED, TD2003 and MSLR-WEB10K). We find that when label noise increases in training data, it is the document pair noise ratio (i.e.~pNoise) rather than document noise ratio (i.e.~dNoise) that can well explain the performance degradation of a ranking algorithm.
Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data BIBAFull-Text 1195-1198
  Ivan Vulic; Susana Zoghbi; Marie-Francine Moens
We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared, that is, idiom-specific. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We intrinsically evaluate our model by the perplexity measure. Following that, as an extrinsic evaluation, we present the utility of the new MiLDA topic model in a recently proposed IR task of linking Pinterest pins (given in colloquial English on the users' side) to online webshops (given in formal English on the retailers' side). We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.
Uncovering the unarchived web BIBAFull-Text 1199-1202
  Thaer Samar; Hugo C. Huurdeman; Anat Ben-David; Jaap Kamps; Arjen de Vries
Many national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection's (aura): the web documents that were not included in the archived collection, but are known to have existed -- due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be derived from the crawls by combining crawl date distribution, anchor text and link structure. We illustrate empirically that the size of the aura can be substantial: in 2012, the Dutch Web archive contained 12.3M unique pages, while we uncover references to 11.9M additional (unarchived) pages.
Inferring topic-dependent influence roles of Twitter users BIBAFull-Text 1203-1206
  Chengyao Chen; Dehong Gao; Wenjie Li; Yuexian Hou
Twitter, as one of the most popular social media platforms, provides a convenient way for people to communicate and interact with each other. It has been well recognized that influence exists during users' interactions. Some pioneer studies on finding influential users have been reported in the literature, but they do not distinguish different influence roles, which are of great value for various marketing purposes. In this paper, we move a step forward trying to further distinguish influence roles of Twitter users in a certain topic. By defining three views of features relating to topic, sentiment and popularity respectively, we propose a Multi-view Influence Role Clustering (MIRC) algorithm to group Twitter users into five categories. Experimental results show the effectiveness of the proposed approach in inferring influence roles.
Reputation analysis with a ranked sentiment-lexicon BIBAFull-Text 1207-1210
  Filipa Peleja; João Santos; João Magalhães
Reputation analysis is naturally linked to a sentiment analysis task of the targeted entities. This analysis leverages on a sentiment lexicon that includes general sentiment words and domain specific jargon. However, in most cases target entities are themselves part of the sentiment lexicon, creating a loop from which it is difficult to infer an entity reputation. Sometimes, the entity became a reference in the domain and is vastly cited as an example of a highly reputable entity. For example, in the movies domain it is not uncommon to see reviews citing Batman or Anthony Hopkins as esteemed references. In this paper we describe an unsupervised method for performing a simultaneous-analysis of the reputation of multiple named-entities. Our method jointly extracts named entities reputation and a domain specific sentiment lexicon. The objective is two-fold: (1) named-entities are naturally ranked by our method and (2) we can build a reputation graph of the domain's named entities. This framework has immediate applications in terms of visualization or search by reputation.
On predicting religion labels in microblogging networks BIBAFull-Text 1211-1214
  Minh-Thap Nguyen; Ee-Peng Lim
Religious belief plays an important role in how people behave, influencing how they form preferences, interpret events around them, and develop relationships with others. Traditionally, the religion labels of user population are obtained by conducting a large scale census study. Such an approach is both high cost and time consuming. In this paper, we study the problem of predicting users' religion labels using their microblogging data. We formulate religion label prediction as a classification task, and identify content, structure and aggregate features considering their self and social variants for representing a user. We introduce the notion of representative user to identify users who are important in the religious user community. We further define features using representative users. We show that SVM classifiers using our proposed features can accurately assign Christian and Muslim labels to a set of Twitter users with known religion labels.
Efficiently identify local frequent keyword co-occurrence patterns in geo-tagged Twitter stream BIBAFull-Text 1215-1218
  Xiaoyang Wang; Ying Zhang; Wenjie Zhang; Xuemin Lin
With the prevalence of the geo-position enabled devices and services, a rapidly growing amount of tweets are associated with geo-tags. Consequently, the real time search on geo-tagged Twitter streams has attracted great attentions. In this paper, we advocate the significance of the co-occurrence of keywords for the geo-tagged tweets data analytics, which is overlooked by existing studies. Particularly, we formally introduce the problem of identifying local frequent keyword co-occurrence patterns over the geo-tagged Twitter streams, namely LFP query. To accommodate the high volume and the rapid updates of the Twitter stream, we develop an inverted KMV sketch (IK sketch for short) structure to capture the co-occurrence of keywords in limited space. Then efficient algorithms are developed based on IK sketch to support LFP queries as well as its variant. The extensive empirical study on real Twitter dataset confirms the effectiveness and efficiency of our approaches.
Item group based pairwise preference learning for personalized ranking BIBAFull-Text 1219-1222
  Shuang Qiu; Jian Cheng; Ting Yuan; Cong Leng; Hanqing Lu
Collaborative filtering with implicit feedbacks has been steadily receiving more attention, since the abundant implicit feedbacks are more easily collected while explicit feedbacks are not necessarily always available. Several recent work address this problem well utilizing pairwise ranking method with a fundamental assumption that a user prefers items with positive feedbacks to the items without observed feedbacks, which also implies that the items without observed feedbacks are treated equally without distinction. However, users have their own preference on different items with different degrees which can be modeled into a ranking relationship. In this paper, we exploit this prior information of a user's preference from the nearest neighbor set by the neighbors' implicit feedbacks, which can split items into different item groups with specific ranking relations. We propose a novel PRIGP (Personalized Ranking with Item Group based Pairwise preference learning) algorithm to integrate item based pairwise preference and item group based pairwise preference into the same framework. Experimental results on three real-world datasets demonstrate the proposed method outperforms the competitive baselines on several ranking-oriented evaluation metrics.
Where not to go?: detecting road hazards using Twitter BIBAFull-Text 1223-1226
  Avinash Kumar; Miao Jiang; Yi Fang
Conventional approaches to road hazard detection involve manual inspections of roads by government transportation agencies. These approaches are usually expensive to execute, and sometimes are not able to capture the most recent hazards. Moreover, they often only focus on major highways due to a lack of sufficient manpower. Consequently, many hazards on minor roads get ignored, which may pose serious dangers to drivers. In this paper, we demonstrate an application of Twitter to atomically determining road hazards. By building language models based on Twitter users' online communication, our system aims at pinpointing potential road hazards that pose driving risks. The likelihood of poor driving conditions can then be exposed via map overlays to warn drivers about potentially dangerous driving conditions in their locale or on current routes, thereby significantly reducing the chances of an accident occurring. To the best of our knowledge, this is the first work demonstrating the utility of social media to automatically detect road hazards. We conduct experiments on a testbed of tweets discussing road conditions and the initial results demonstrate the effectiveness of our approach.
Enhancing sketch-based sport video retrieval by suggesting relevant motion paths BIBAFull-Text 1227-1230
  Ihab Al Kabary; Heiko Schuldt
Searching for scenes in team sport videos is a task that recurs very often in game analysis and other related activities performed by coaches. In most cases, queries are formulated on the basis of specific motion characteristics the user remembers from the video. Providing sketching interfaces for graphically specifying query input is thus a very natural user interaction for a retrieval application. However, the quality of the query (the sketch) heavily depends on the memory of the user and her ability to accurately formulate the intended search query by transforming this 3D memory of the known item(s) into a 2D sketch query. In this paper, we present an auto-suggest search feature that harnesses spatiotemporal data of team sport videos to suggest potential directions containing relevant data during the formulation of a sketch-based motion query. Users can intuitively select the direction of the desired motion query on-the-fly using the displayed visual clues, thus relaxing the need for relying heavily on memory to formulate the query. At the same time, this significantly enhances the accuracy of the results and the speed at which they appear. A first evaluation has shown the effectiveness and efficiency of our approach.
Dynamic location models BIBAFull-Text 1231-1234
  Vanessa Murdock
Location models built on social media have been shown to be an important step toward understanding places in queries. Current search technology focuses on predicting broad regions such as cities. Hyperlocal scenarios are important because of the increasing prevalence of smartphones and mobile search and recommendation. Users expect the system to recognize their location and provide information about their immediate surroundings.
   In this work we propose an algorithm for constructing hyperlocal models of places that are as small as half a city block. We show that Dynamic Location Models (DLMs) are computationally efficient, and provide better estimates of the language models of hyperlocal places than the standard method of segmenting the globe into approximately equal grid squares. We evaluate the models using a repository of 25 million geotagged public images from Flickr. We show that the indexes produced by DLMs have a larger vocabulary, and smaller average document length than their fixed grid counterparts, for indexes with an equivalent number of locations. This produces location models that are more robust to retrieval parameters, and more accurate in predicting locations in text.
Wikipedia-based query performance prediction BIBAFull-Text 1235-1238
  Gilad Katz; Anna Shtock; Oren Kurland; Bracha Shapira; Lior Rokach
The query-performance prediction task is to estimate retrieval effectiveness with no relevance judgments. Pre-retrieval prediction methods operate prior to retrieval time. Hence, these predictors are often based on analyzing the query and the corpus upon which retrieval is performed. We propose a corpus-independent approach to pre-retrieval prediction which relies on information extracted from Wikipedia. Specifically, we present Wikipedia-based features that can attest to the effectiveness of retrieval performed in response to a query regardless of the corpus upon which search is performed. Empirical evaluation demonstrates the merits of our approach. As a case in point, integrating the Wikipedia-based features with state-of-the-art pre-retrieval predictors that analyze the corpus yields prediction quality that is consistently better than that of using the latter alone.
A revisit to social network-based recommender systems BIBAFull-Text 1239-1242
  Hui Li; Dingming Wu; Nikos Mamoulis
With the rapid expansion of online social networks, social network-based recommendation has become a meaningful and effective way of suggesting new items or activities to users. In this paper, we propose two methods to improve the performance of the state-of-art social network-based recommender system (SNRS), which is based on a probabilistic model. Our first method classifies the correlations between pairs of users' ratings. The other is making the system robust to sparse data, i.e., few immediate friends having few common ratings with the target user. Our experimental study demonstrates that our techniques significantly improve the accuracy of SNRS.

Demo session

Relevation!: an open source system for information retrieval relevance assessment BIBAFull-Text 1243-1244
  Bevan Koopman; Guido Zuccon
Relevation! is a system for performing relevance judgements for information retrieval evaluation. Relevation! is web-based, fully configurable and expandable; it allows researchers to effectively collect assessments and additional qualitative data. The system is easily deployed allowing assessors to smoothly perform their relevance judging tasks, even remotely. Relevation! is available as an open source project at: http://ielab.github.io/relevation.
WenZher: comprehensive vertical search for healthcare domain BIBAFull-Text 1245-1246
  Liqiang Nie; Tao Li; Mohammad Akbari; Jialie Shen; Tat-Seng Chua
Online health seeking has transformed the way of health knowledge exchange and reusability. The existing general and vertical health search engines, however, just routinely return lists of matched documents or question answer (QA) pairs, which may overwhelm the seekers or not sufficiently meet the seekers' expectations. Instead, our multilingual system is able to return one multi-faceted answer that is well-structured and precisely extracted from multiple heterogeneous healthcare sources. Further, should the seekers not be satisfied with the returned search results, our system can automatically route the unsolved questions to the professionals with relevant expertise.
STICS: searching with strings, things, and cats BIBAFull-Text 1247-1248
  Johannes Hoffart; Dragan Milchevski; Gerhard Weikum
This paper describes an advanced search engine that supports users in querying documents by means of keywords, entities, and categories. Users simply type words, which are automatically mapped onto appropriate suggestions for entities and categories. Based on named-entity disambiguation, the search engine returns documents containing the query's entities and prominent entities from the query's categories.
VIRLab: a web-based virtual lab for learning and studying information retrieval models BIBAFull-Text 1249-1250
  Hui Fang; Hao Wu; Peilin Yang; ChengXiang Zhai
In this paper, we describe VIRLab, a novel web-based virtual laboratory for Information Retrieval (IR). Unlike existing command line based IR toolkits, the VIRLab system provides a more interactive tool that enables easy implementation of retrieval functions with only a few lines of codes, simplified evaluation process over multiple data sets and parameter settings and straightforward result analysis interface through operational search engines and pair-wise comparisons. These features make VIRLab a unique and novel tool that can help teaching IR models, improving the productivity for doing IR model research, as well as promoting controlled experimental study of IR models.
ServiceXplorer: a similarity-based web service search engine BIBAFull-Text 1251-1252
  Anne H.H. Ngu; Jiangang Ma; Quan Z. Sheng; Lina Yao; Scott Julian
Finding relevant Web services and composing them into value-added applications is becoming increasingly important in cloud and service based marketplaces. The key problem with current approaches to finding relevant Web services is that most of them only provide searches over a discrete set of features using exact keyword matching. We demonstrate in this paper that by utilizing well known indexing scheme such as inverted file and R-tree indexes over Web services attributes, the Earth Mover's Distance (EMD) algorithm can be used efficiently to find partial matches between a query and a database of Web services.
Real-time visualization and targeting of online visitors BIBAFull-Text 1253-1254
  Deepak Pai; Sandeep Zechariah George
Identifying and targeting visitors on an e-commerce website with personalized content in real-time is extremely important to marketers. Although such targeting exists today, it is based on demographic attributes of the visitors. We show that dynamic visitor attributes extracted from their click-stream provide much better predictive capabilities of visitor intent. In this demonstration, we showcase an interactive real-time user interface for marketers to visualize and target visitor segments. Our dashboard not only provides the marketers understanding of their visitor click patterns, but also lets them target individual or group of visitors with offers and promotions.
CharBoxes: a system for automatic discovery of character infoboxes from books BIBAFull-Text 1255-1256
  Manish Gupta; Piyush Bansal; Vasudeva Varma
Entities are centric to a large number of real world applications. Wikipedia shows entity infoboxes for a large number of entities. However, not much structured information is available about character entities in books. Automatic discovery of characters from books can help in effective summarization. Such a structured summary which not just introduces characters in the book but also provides a high level relationship between them can be of critical importance for buyers. This task involves the following challenging novel problems: 1. automatic discovery of important characters given a book; 2. automatic social graph construction relating the discovered characters; 3. automatic summarization of text most related to each of the characters; and 4. automatic infobox extraction from such summarized text for each character. As part of this demo, we design mechanisms to address these challenges and experiment with publicly available books.
ADAM: a system for jointly providing ir and database queries in large-scale multimedia retrieval BIBAFull-Text 1257-1258
  Ivan Giangreco; Ihab Al Kabary; Heiko Schuldt
The tremendous increase of multimedia data in recent years has heightened the need for systems that not only allow to search with keywords, but that also support content-based retrieval in order to effectively and efficiently query large collections. In this paper, we introduce ADAM, a system that is able to store and retrieve multimedia objects by seamlessly combining aspects from databases and information retrieval. ADAM is able to work with both structured and unstructured data and to jointly provide Boolean retrieval and similarity search. To efficiently handle large volumes of data it makes use of a signature-based indexing and the distribution of the collection to multiple shards that are queried in a MapReduce style. We present ADAM in the setting of a sketch-based image retrieval application using the ImageNet collection containing 14 million images.
NicePic!: a system for extracting attractive photos from flickr streams BIBAFull-Text 1259-1260
  Sergej Zerr; Stefan Siersdorfer; Jose San Pedro; Jonathon Hare; Xiaofei Zhu
A large number of images are continuously uploaded to popular photo sharing websites and online social communities. In this demonstration we show a novel application which automatically classifies images in a live photo stream according to their attractiveness for the community, based on a number of visual and textual features. The system effectively introduces an additional facet to browse and explore photo collections by highlighting the most attractive photographs and demoting the least attractive.
A perspective-aware approach to search: visualizing perspectives in news search results BIBAFull-Text 1261-1262
  Muhammad Atif Qureshi; Colm O'Riordan; Gabriella Pasi
The result set from a search engine for any user's query may exhibit an inherent perspective due to issues with the search engine or issues with the underlying collection. This demonstration paper presents a system that allows users to specify at query time a perspective together with their query. The system then presents results from well-known search engines with a visualization of the results which allows the users to quickly surmise the presence of the perspective in the returned set.
FitYou: integrating health profiles to real-time contextual suggestion BIBAFull-Text 1263-1264
  Christopher Wing; Hui Yang
Obesity and its associated health consequences such as high blood pressure and cardiac disease affect a significant proportion of the world's population. At the same time, the popularity of location-based services (LBS) and recommender systems is continually increasing with improvements in mobile technology. We observe that the health domain lacks a suggestion system that focuses on healthy lifestyle choices. We introduce the mobile application FitYou, which dynamically generates recommendations according to the user's current location and health condition as a real-time LBS. It utilizes preferences determined from user history and health information from a biometric profile. The system was developed upon a top performing contextual suggestion system in both TREC 2012 and 2013 Contextual Suggestion Tracks.
Semantic full-text search with broccoli BIBAFull-Text 1265-1266
  Hannah Bast; Florian Bäurle; Björn Buchhold; Elmar Haußmann
We combine search in triple stores with full-text search into what we call semantic full-text search. We provide a fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with the facts from Freebase. The user is guided by context-sensitive suggestions of matching words, instances, classes, and relations after each keystroke. We also provide a powerful API, which may be used for research tasks or as a back end, e.g., for a question answering system. Our web application and public API are available under http://broccoli.cs.uni-freiburg.de.
Just-for-me: an adaptive personalization system for location-aware social music recommendation BIBAFull-Text 1267-1268
  Zhiyong Cheng; Jialie Shen; Tao Mei
In recent years, location-aware music recommendation is increasing in popularity, as more and more users consume music on the move. In this demonstration, we present an intelligent system, called Just-for-Me, to facilitate accurate music recommendation based on where user presents. Our system is developed based on a novel probabilistic generative model, which can effectively integrate the location contexts and global music popularity trends. This approach allows us to gain more comprehensive modeling on user preference and thus significantly enhances the music recommendation performance.
A novel system for the semi automatic annotation of event images BIBAFull-Text 1269-1270
  Philip James McParlane; Joemon Jose
With the rise in popularity of smart phones, taking and sharing photographs has never been more openly accessible. Further, photo sharing websites, such as Flickr, have made the distribution of photographs easy, resulting in an increase of visual content uploaded online. Due to the laborious nature of annotating images, however, a large percentage of these images are unannotated making their organisation and retrieval difficult. Therefore, there has been a recent research focus on the automatic and semi-automatic process of annotating these images. Despite the progress made in this field, however, annotating images automatically based on their visual appearance often results in unsatisfactory suggestions and as a result these models have not been adopted in photo sharing websites. Many methods have therefore looked to exploit new sources of evidence for annotation purposes, such as image context for example. In this demonstration, we instead explore the scenario of annotating images taken at a large scale events where evidences can be extracted from a wealth of online textual resources. Specifically, we present a novel tag recommendation system for images taken at a popular music festival which allows the user to select relevant tags from related Tweets and Wikipedia content, thus reducing the workload involved in the annotation process.
An interactive interface for visualizing events on Twitter BIBAFull-Text 1271-1272
  Andrew J. McMinn; Daniel Tsvetkov; Tsvetan Yordanov; Andrew Patterson; Rrobi Szk; Jesus A. Rodriguez Perez; Joemon M. Jose
In recent years, social media has become one of the most popular tools for discovering and following breaking news and ongoing events. However tools and interfaces have lagged behind users' expectations, with current tools making it difficult to discover new events and failing to provide a solution to the problem of information overload. We have developed an interactive interface for visualizing events, backed by a state-of-the-art event detection approach, which is able to detect, track and summarize events in real-time. Our interface provides up-to-the-second information about ongoing events in an easy to understand manner, including category information, temporal distribution, and location information -- all of which was previously unobtainable in real-time.
ExperTime: tracking expertise over time BIBAFull-Text 1273-1274
  Jan Rybak; Krisztian Balog; Kjetil Nørvåg
This paper presents ExperTime, a web-based system for tracking expertise over time. We visualize a person's expertise profile on a timeline, where we detect and characterize changes in the focus or topics of expertise. It is possible to zoom in on a given time period in order to examine the underlying data that is used as supporting evidence. It is also possible to perform visual and quantitative comparison of two arbitrarily selected time periods in a highly interactive environment. We invite profile owners to evaluate and fine-tune their profiles, and to leave feedback.

Doctoral consortium

Cluster links prediction for literature based discovery using latent structure and semantic features BIBAFull-Text 1275
  Yakub Sebastian
The potential impact of a scientific article has a significant correlation with its ability to establish novel connections between pre-existing knowledge [1-2]. Discovering hidden connections between the existing scientific literature is an interesting yet highly challenging information retrieval problem [2]. Literature based discovery (LBD) uses computational algorithms to discover potential hidden connections between previously disconnected sets of literature [3]. Most of the current LBD methods focus on analyzing latent semantic features in texts but are usually computationally demanding. In particular, they do not aim at predicting novel discovery links between clusters of literature.
   Combining latent semantic and structural features of literature is a promising yet unexplored LBD approach. This approach is potentially scalable and effective. For example, incorporating structural features of Web pages has increased the effectiveness of many large-scale IR systems [4]. The bibliographic structures of scientific papers make it possible to view a corpus of literature as a complex network of nodes (articles) and links (citation relationships) in which recognizable communities or clusters can be observed, each representing a distinct research field [5]. Consequently, potential hidden connections between disparate fields might be found from among non-overlapping clusters that do not have any existing link between their members yet exhibit a high propensity to converge in the future.
   This work approaches LBD as a cluster link prediction problem. We view disjoint literature sets as disjoint clusters in citation networks. Our method searches for hidden connections between disjoint clusters whose member nodes show high probabilities in forming future links. To this end, we address two research problems. The first problem is to group papers into clusters of distinct research areas. We compare the accuracy of well-known community detection algorithms, such as LOUVAIN and INFOMAP [5], in detecting research field clusters from citation networks of physics literature. We evaluate the quality of these clusters using purity, Rand Index, F-measure and Normalized Mutual Information [5-6]. Since ground truth communities are usually unknown, we also propose using alternative textual coherence measures such as Jensen-Shannon divergence [7].
   The second problem is to predict the future formation of links between the nodes in previously disconnected clusters. We introduce a novel algorithm, Latent Domain Similarity (LDS), which uses combinations of semantic features (e.g. distribution of technical terms in titles and abstracts) and structural features (e.g. cited references, citing articles) of two or more articles in order to infer shared latent domains between them. We assume that while two sets of literature could have been published separately in two seemingly unrelated fields, it is possible that they share many similar domains previously unknown to researchers in each field. The goal is to explore whether these shared latent domains correlate with the probability of previously disconnected clusters to form future citation links with each other.
Graph-based large scale RDF data compression BIBAFull-Text 1276
  Wei Emma Zhang
We propose a two-stage lossless compression approach on large scale RDF data. Our approach exploits both Representation Compression and Component Compression techniques to support query and dynamic operations directly on the compressed data.
Entity-based retrieval BIBAFull-Text 1277
  Hadas Raviv
We address the core challenge of the entity retrieval task: ranking entities in response to a query by their presumed relevance to the information need that the query represents. As an initial research direction we explored two models for entity ranking that were evaluated using the INEX entity ranking dataset and which posted promising performance. A natural future direction to explore is how to generalize these models to address various types of information needs that are associated with entities.
Improving offline and online web search evaluation by modelling the user behaviour BIBAFull-Text 1278
  Eugene Kharitonov
Measurements are fundamental to any empirical science and, similarly, search evaluation is a vital part of information retrieval (IR). Evaluation ensures the progressive development of approaches, tools, and methods studied in this field. Apart from the scientific perspective, the evaluation approaches are also important from the practical perspective. Indeed, the evaluation experiments enable commercial search engines to make data-driven decisions while developing new features and working on the quality of the user experience. Thus, it is not surprising that evaluation has gained a huge attention from the research community and such an interest spans almost fifty years of research [3]. The Cranfield experiments [3] evolved into the widely used offline system evaluation approach. Despite its convenience and popularity, the offline evaluation approach has several limitations [8]. These limitations resulted in the development and recent growth in popularity of the online user-based evaluation approaches such as interleaving and A/B testing.
Modelling of terms across scripts through autoencoders BIBAFull-Text 1279
  Parth Gupta
cripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or cross-lingual space with more than one scripts which is referred as mixed-script space and information retrieval in this space is referred as mixed-script information retrieval (MSIR) [1]. In mixed-script space, the documents and queries may either be in the native script and/or the Roman transliterated script for a language (mono-lingual scenario). There can be further extension of MSIR such as multi-lingual MSIR in which terms can be in multiple scripts in multiple languages. Since there are no standard ways of spelling a word in a non-native script, transliteration content almost always features extensive spelling variations. This phenomenon presents a non-trivial term matching problem for search engines to match the native-script or Roman-transliterated query with the documents in multiple scripts taking into account the spelling variations. This problem, although prevalent inWeb search for users of many languages around the world, has received very little attention till date. Very recently we have formally defined the problem of MSIR and presented the quantitative study on it through Bing query log analysis.
A tag-based personalized item recommendation system using tensor modeling and topic model approaches BIBAFull-Text 1280
  Noor Ifada
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment (user, item, tag), should be appropriately modelled in order to create the user profiles [1]. Secondly, the semantics behind the tags should be considered properly as the flexibility with their design can cause semantic problems such as synonymy and polysemy [2]. This research proposes to address these two challenges for building a tag-based item recommendation system by employing tensor modeling as the multi-dimensional user profile approach, and the topic model as the semantic analysis approach. The first objective is to optimize the tensor model reconstruction and to improve the model performance in generating quality recommendation. A novel Tensor-based Recommendation using Probabilistic Ranking (TRPR) method [3] has been developed. Results show this method to be scalable for large datasets and outperforming the benchmarking methods in terms of accuracy. The memory efficient loop implements the n-mode block-striped (matrix) product for tensor reconstruction as an approximation of the initial tensor. The probabilistic ranking calculates the probability of users to select candidate items using their tag preference list based on the entries generated from the reconstructed tensor.
   The second objective is to analyse the tag semantics and utilize the outcome in building the tensor model. This research proposes to investigate the problem using topic model approach to keep the tags nature as the "social vocabulary" [4]. For the tag assignment data, topics can be generated from the occurrences of tags given for an item. However there is only limited amount of tags available to represent items as collection of topics, since an item might have only been tagged by using several tags. Consequently, the generated topics might not able to represent the items appropriately. Furthermore, given that each tag can belong to any topics with various probability scores, the occurrence of tags cannot simply be mapped by the topics to build the tensor model. A standard weighting technique will not appropriately calculate the value of tagging activity since it will define the context of an item using a tag instead of a topic.
Novelty and diversity enhancement and evaluation in recommender systems and information retrieval BIBAFull-Text 1281
  Saúl Vargas
The development and evaluation of Information Retrieval and Recommender Systems has traditionally focused on the relevance and accuracy of retrieved documents and recommendations, respectively. However, there is an increasing realization that accuracy alone might be a sub-optimal strategy for a successful user experience. Properties such as novelty and diversity have been explored in both fields for assessing and enhancing the usefulness of search results and recommendations. In this doctoral research we study the assessment and enhancement of both properties in the confluence of Information Retrieval and Recommender Systems.
Enrichment of user profiles across multiple online social networks for volunteerism matching for social enterprise BIBAFull-Text 1282
  Xuemeng Song
Volunteers are extremely crucial to nonprofit organizations (NPOs) to sustain their continuing operations. On the other hand, many talents are looking for appropriate volunteer opportunities to realize their dreams of making an impact on the world with their expertise. This is a typical supply and demand matching issue. Fortunately, user profiling and the discovery of user volunteering tendency can benefit from users' continuous enthusiasm and active participation in diverse online social networks (OSNs) and the huge amount of publicly available user generated contents (UGCs). In this work, we aim to bridge the gap between the supply of talents with volunteering tendency and the demands of social enterprise and enhance the social welfare. This is done by incorporating volunteering tendency into user profiling across multiple OSNs. Consequently, this interdisciplinary research opens a new window for both computer science and social science. To the best of our knowledge, this is the first attempt to tackle the problem of volunteer matching for social enterprise based on publicly available UGCs. First, we explain the definitions of the main concepts with examples. Second, we propose a system architecture for addressing the problem of volunteerism matching that Includes three components: Profile Collection, Profile Enrichment and Profile Matching. Finally, we identify the major challenges encountered in our current research work. This paper discusses our design and progress in this research.

Tutorials

Choices and constraints: research goals and approaches in information retrieval (part 1) BIBAFull-Text 1283
  Diane Kelly; Filip Radlinski; Jaime Teevan
All research projects begin with a goal, for instance to describe search behavior, to predict when a person will enter a second query, or to discover which IR system performs the best. Different research goals suggest different research approaches, ranging from field studies to lab studies to online experimentation. This tutorial will provide an overview of the different types of research goals, common evaluation approaches used to address each type, and the constraints each approach entails. Participants will come away with a broad perspective of research goals and approaches in IR, and an understanding of the benefits and limitations of these research approaches. The tutorial will take place in two independent, but interrelated parts, each focusing on a unique set of research approaches but with the same intended tutorial outcomes. These outcomes will be accomplished by deconstructing and analyzing our own published research papers, with further illustrations of each technique using the broader literature. By using our own research as anchors, we will provide insight about the research process, revealing the difficult choices and trade-offs researchers make when designing and conducting IR studies.
Choices and constraints: research goals and approaches in information retrieval (part 2) BIBAFull-Text 1284
  Diane Kelly; Filip Radlinski; Jaime Teevan
All research projects begin with a goal, for instance to describe search behavior, to predict when a person will enter a second query, or to discover which IR system performs the best. Different research goals suggest different research approaches, ranging from field studies to lab studies to online experimentation. This tutorial will provide an overview of the different types of research goals, common evaluation approaches used to address each type, and the constraints each approach entails. Participants will come away with a broad perspective of research goals and approaches in IR, and an understanding of the benefits and limitations of these research approaches. The tutorial will take place in two independent, but interrelated parts, each focusing on a unique set of research approaches but with the same intended tutorial outcomes. These outcomes will be accomplished by deconstructing and analyzing our own published research papers, with further illustrations of each technique using the broader literature. By using our own research as anchors, we will provide insight about the research process, revealing the difficult choices and trade-offs researchers make when designing and conducting IR studies.
Scalability and efficiency challenges in large-scale web search engines BIBAFull-Text 1285
  B. Barla Cambazoglu; Ricardo Baeza-Yates
Large-scale web search engines rely on massive compute infrastructures to be able to cope with the continuous growth of the Web and their user bases. In such search engines, achieving scalability and efficiency requires making careful architectural design choices while devising algorithmic performance optimizations. Unfortunately, most details about the internal functioning of commercial web search engines remain undisclosed due to their financial value and the high level of competition in the search market. The main objective of this tutorial is to provide an overview of the fundamental scalability and efficiency challenges in commercial web search engines, bridging the existing gap between the industry and academia.
Statistical significance testing in information retrieval: theory and practice BIBAFull-Text 1286
  Ben Carterette
The past 20 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtrieval Conference [2]), and the increased practice of statistical hypothesis testing to determine whether measured improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work in information retrieval (IR) increasingly cannot be published unless it has been evaluated using a well-constructed test collection and shown to produce a statistically significant improvement over a good baseline.
   But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Most researchers treat them as a "black box": evaluation results go in and a p-value comes out. Because significance is such an important factor in determining what research directions to explore and what is published, using p-values obtained without thought can have consequences for everyone doing research in IR. Ioannidis has argued that the main consequence in the biomedical sciences is that most published research findings are false; could that be the case in IR as well?
Speech search: techniques and tools for spoken content retrieval BIBFull-Text 1287
  Gareth J.F. Jones
Axiomatic analysis and optimization of information retrieval models BIBAFull-Text 1288
  Hui Fang; ChengXiang Zhai
Axiomatic approach provides a systematic way to think about heuristics, identify the weakness of existing methods, and optimize the existing methods accordingly. This tutorial aims to promote axiomatic thinking that can benefit not only the study of IR models but also the methods for many IR applications.
A general account of effectiveness metrics for information tasks: retrieval, filtering, and clustering BIBAFull-Text 1289
  Enrique Amigó; Julio Gonzalo; Stefano Mizzaro
In this tutorial we will present, review, and compare the most popular evaluation metrics for some of the most salient information related tasks, covering: (i) Information Retrieval, (ii) Clustering, and (iii) Filtering. The tutorial will make a special emphasis on the specification of constraints for suitable metrics in each of the three tasks, and on the systematic comparison of metrics according to such constraints. The last part of the tutorial will investigate the challenge of combining and weighting metrics.
Dynamic information retrieval modeling BIBAFull-Text 1290
  Hui Yang; Marc Sloan; Jun Wang
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. It will cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
The retrievability of documents BIBAFull-Text 1291
  Leif Azzopardi
Retrievability is an important and interesting indicator that can be used in a number of ways to analyse Information Retrieval systems and document collections. Rather than focusing totally on relevance, retrievability examines what is retrieved, how often it is retrieved, and whether a user is likely to retrieve a document or not. This is important because a document needs to be retrieved, before it can be judged for relevance. In this tutorial, we shall explain the concept of retrievability along with a number of retrievability measures, how it can be estimated and how it can be used for analysis. Since retrieval precedes relevance, we shall also provide an overview of how retrievability relates to effectiveness -- describing some of the insights that researchers have discovered so far. We shall also show how retrievability relates to efficiency, and how the theory of retrievability can be used to improve both effectiveness and efficiency. Then we shall provide an overview of the different applications of retrievability such as Search Engine Bias, Corpus Profiling, etc., before wrapping up with challenges and opportunities. The final session of the day will look at example problems and ways to analyse and apply retrievability to other problems and domains.

Workshops

ERD'14: entity recognition and disambiguation challenge BIBFull-Text 1292
  David Carmel; Ming-Wei Chang; Evgeniy Gabrilovich; Bo-June (Paul) Hsu; Kuansan Wang
SIGIR 2014 workshop on gathering efficient assessments of relevance (GEAR) BIBAFull-Text 1293
  Martin Halvey; Robert Villa; Paul Clough
Evaluation is a fundamental part of Information Retrieval, and in the conventional Cranfield evaluation paradigm, sets of relevance assessments are a fundamental part of test collections. This workshop revisits how relevance assessments can be efficiently created, seeking to provide a forum for discussion and exploration of the topic.
MedIR14: medical information retrieval workshop BIBAFull-Text 1294
  Lorraine Goeuriot; Gareth J.F. Jones; Liadh Kelly; Henning Müller; Justin Zobel
Medical information is accessible from diverse sources including the general web, social media, journal articles, and hospital records; information searchers can be patients and their families, researchers, practitioners and clinicians. Challenges in medical information retrieval include: diversity of users and user knowledge and expertise; variations in the format, reliability, and quality of biomedical and medical information; the multi-modal nature of much of the data; and the need for accuracy and reliability of medical information. The aim of the workshop is to bring together researchers interested in medical information search with the goal of identifying specific challenges that need to be addressed to advance the state-of-the-art.
Privacy-preserving IR: when information retrieval meets privacy and security BIBAFull-Text 1295
  Luo Si; Hui Yang
Information retrieval (IR) and information privacy/security are two fast-growing computer science disciplines. There are many synergies and connections between these two disciplines. However, there have been very limited efforts to connect the two important disciplines. On the other hand, due to lack of mature techniques in privacy-preserving IR, concerns about information privacy and security have become serious obstacles that prevent valuable user data to be used in IR research such as studies on query logs, social media, tweets, sessions, and medical record retrieval. This privacy-preserving IR workshop aims to spur research that brings together the research fields of IR and privacy/security, and research that mitigates privacy threats in information retrieval by constructing novel algorithms and tools that enable web users to better understand associated privacy risks.
SIGIR 2014 workshop on semantic matching in information retrieval BIBAFull-Text 1296
  Julio Gonzalo; Hang Li; Alessandro Moschitti; Jun Xu
Recently, significant progress has been made in research on what we call semantic matching (SM), in web search, question answering, online advertisement, cross-language information retrieval, and other tasks. Advanced technologies based on machine learning have been developed. Let us take Web search as example of the problem that also pervades the other tasks. When comparing the textual content of query and documents, Web search still heavily relies on the term-based approach, where the relevance scores between queries and documents are calculated on the basis of the degree of matching between query terms and document terms. This simple approach works rather well in practice, partly because there are many other signals in web search (hypertext, user logs, etc.) that complement it. However, when considering the long tail of web searches, it can suffer from data sparseness, e.g., Trenton does not match New Jersey Capital. Query document mismatches occur when searcher and author use different terms (representations), and this phenomenon is prevalent due to the nature of human language.
SoMeRA 2014: social media retrieval and analysis workshop BIBAFull-Text 1297
  Markus Schedl; Peter Knees; Jialie Shen
The SoMeRA workshop targets cutting edge research from all fields of retrieval, recommendation, and browsing in social media, as well as the analysis of user's multifaceted traces therein. Submissions to the workshop cover a broad range of topics including multimedia retrieval and exploration, user-aware recommender systems, network analysis, event detection, and computational linguistics.
SIGIR 2014 workshop on temporal, social and spatially-aware information access (#TAIA2014) BIBFull-Text 1298
  Fernando Diaz; Claudia Hauff; Vanessa Murdock; Maarten de Rijke; Milad Shokouhi