HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 010203040506070809101112131415

Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Wei-Ying Ma; Jian-Yun Nie; Ricardo Baeza-Yates; Tat-Seng Chua; W. Bruce Croft
Location:Beijing, China
Dates:2011-Jul-25 to 2011-Jul-29
Publisher:ACM
Standard No:ISBN: 1-4503-0757-4, 978-1-4503-0757-4; ACM DL: Table of Contents hcibib: IR11
Papers:238
Pages:1346
Links:Conference Home Page
  1. Keynote address 1
  2. Keynote address 2
  3. Users 1
  4. Query analysis I
  5. Learning to rank
  6. Personalization
  7. Retrieval models I
  8. Social media
  9. Content analysis
  10. Web IR
  11. Collaborative filtering I
  12. Users II
  13. Query analysis II
  14. Communities
  15. Classification
  16. Retrieval models II
  17. Image search
  18. Indexing
  19. Web queries
  20. Collaborative filtering II
  21. Latent semantic analysis
  22. Multimedia IR
  23. Summarization
  24. Vertical & entity search
  25. Query suggestions
  26. Linguistic analysis
  27. Clustering
  28. Effectiveness
  29. Multilingual IR
  30. Efficiency
  31. Recommender systems
  32. Test collections
  33. Posters presentations
  34. Demonstrations
  35. Tutorials
  36. Doctoral consortium
  37. Industrial track

Keynote address 1

Future of the web and search BIBAFull-Text 1-2
  Qi Lu
No one doubts that we have only scratched the surface of what is possible with the Web. The day is coming fast when the Web will become almost a virtual mind reader. Your intent, interests, and needs will be instantly perceived and the information you want will be promptly delivered -- whether you ask for it directly or not -- based on a deep understanding of the meaning of words in your query, knowledge of your preferences and patterns, what others have done before you, your location, and more. In this talk, I will share some of my thoughts about where the Web is heading and how search will be transformed to align to this new Web, laying out some specifics behind Microsoft's vision to empower people with knowledge.

Keynote address 2

Beyond search: statistical topic models for text analysis BIBAFull-Text 3-4
  ChengXiang Zhai
Search is generally a means to the end of finishing a task. While the current search engines are useful to users for finding relevant information, they offer little help to users for further digesting and analyzing the overwhelming found information needed for finishing a complex task. In this talk, I will discuss how statistical topic models can be used to help users analyze and digest the found relevant information and turn search results into actionable knowledge needed to complete a task. I will present several general statistical topic models for extracting and analyzing topics and their patterns in text, and show sample applications of such models in tasks such as opinion integration, comparative summarization, contextual topic trend analysis, and event impact analysis. The talk will conclude with a discussion of novel challenges raised in extending a search engine to an analysis engine that can go beyond search to provide more complete support for users to finish their tasks.

Users 1

Modeling and analysis of cross-session search tasks BIBAFull-Text 5-14
  Alexander Kotov; Paul N. Bennett; Ryen W. White; Susan T. Dumais; Jaime Teevan
The information needs of search engine users vary in complexity, depending on the task they are trying to accomplish. Some simple needs can be satisfied with a single query, whereas others require a series of queries issued over a longer period of time. While search engines effectively satisfy many simple needs, searchers receive little support when their information needs span session boundaries. In this work, we propose methods for modeling and analyzing user search behavior that extends over multiple search sessions. We focus on two problems: (i) given a user query, identify all of the related queries from previous sessions that the same user has issued, and (ii) given a multi-query task for a user, predict whether the user will return to this task in the future. We model both problems within a classification framework that uses features of individual queries and long-term user search behavior at different granularity. Experimental evaluation of the proposed models for both tasks indicates that it is possible to effectively model and analyze cross-session search behavior. Our findings have implications for improving search for complex information needs and designing search engine features to support cross-session search tasks.
The economics in interactive information retrieval BIBAFull-Text 15-24
  Leif Azzopardi
Searching is inherently an interactive process usually requiring numerous iterations of querying and assessing in order to find the desired amount of relevant information. Essentially, the search process can be viewed as a combination of inputs (queries and assessments) which are used to "produce" output (relevance). Under this view, it is possible to adapt microeconomic theory to analyze and understand the dynamics of Interactive Information Retrieval. In this paper, we approach the search process as an economics problem and conduct extensive simulations on TREC test collections analyzing various combinations of inputs in the "production" of relevance. The analysis reveals that the total Cumulative Gain (output) obtained during the course of a search session is functionally related to querying and assessing (inputs), and this can be characterized mathematically by the Cobbs-Douglas production function. Further analysis using cost models, that are grounded using cognitive load as the cost, reveals which search strategies minimize the cost of interaction for a given level of output. This paper demonstrates how economics can be applied to formally model the search process. This development establishes the theoretical foundations of Interactive Information Retrieval, providing numerous directions for empirical experimentation that are motivated directly from theory.
Seeding simulated queries with user-study data for personal search evaluation BIBAFull-Text 25-34
  David Elsweiler; David E. Losada; José C. Toucedo; Ronald T. Fernandez
In this paper we perform a lab-based user study (n=21) of email re-finding behaviour, examining how the characteristics of submitted queries change in different situations. A number of logistic regression models are developed on the query data to explore the relationship between user- and contextual- variables and query characteristics including length, field submitted to and use of named entities. We reveal several interesting trends and use the findings to seed a simulated evaluation of various retrieval models. Not only is this an enhancement of existing evaluation methods for Personal Search, but the results show that different models are more effective in different situations, which has implications both for the design of email search tools and for the way algorithms for Personal Search are evaluated.
Understanding re-finding behavior in naturalistic email interaction logs BIBAFull-Text 35-44
  David Elsweiler; Morgan Harvey; Martin Hacker
In this paper we present a longitudinal, naturalistic study of email behavior (n=47) and describe our efforts at isolating re-finding behavior in the logs through various qualitative and quantitative analyses. The presented work underlines the methodological challenges faced with this kind of research, but demonstrates that it is possible to isolate re-finding behavior from email interaction logs with reasonable accuracy. Using the approaches developed we uncover interesting aspects of email re-finding behavior that have so far been impossible to study, such as how various features of email-clients are used in re-finding and the difficulties people encounter when using these. We explain how our findings could influence the design of email-clients and outline our thoughts on how future, more in depth analyses, can build on the work presented here to achieve a fuller understanding of email behavior and the support that people need.

Query analysis I

People searching for people: analysis of a people search engine log BIBAFull-Text 45-54
  Wouter Weerkamp; Richard Berendsen; Bogomil Kovachev; Edgar Meij; Krisztian Balog; Maarten de Rijke
Recent years show an increasing interest in vertical search: searching within a particular type of information. Understanding what people search for in these "verticals" gives direction to research and provides pointers for the search engines themselves. In this paper we analyze the search logs of one particular vertical: people search engines. Based on an extensive analysis of the logs of a search engine geared towards finding people, we propose a classification scheme for people search at three levels: (a) queries, (b) sessions, and (c) users. For queries, we identify three types, (i) event-based high-profile queries (people that become "popular" because of an event happening), (ii) regular high-profile queries (celebrities), and (iii) low-profile queries (other, less-known people). We present experiments on automatic classification of queries. On the session level, we observe five types: (i) family sessions (users looking for relatives), (ii) event sessions (querying the main players of an event), (iii) spotting sessions (trying to "spot" different celebrities online), (iv) polymerous sessions (sessions without a clear relation between queries), and (v) repetitive sessions (query refinement and copying). Finally, for users we identify four types: (i) monitors, (ii) spotters, (iii) followers, and (iv) polymers.
   Our findings not only offer insight into search behavior in people search engines, but they are also useful to identify future research directions and to provide pointers for search engine improvements.
Learning search tasks in queries and web pages via graph regularization BIBAFull-Text 55-64
  Ming Ji; Jun Yan; Siyu Gu; Jiawei Han; Xiaofei He; Wei Vivian Zhang; Zheng Chen
As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query "thinkpad T410 broken" is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish.
   In this paper, we propose to simultaneously classify queries and web pages into the popular search tasks by exploiting their content together with click-through logs. Specifically, we construct a task-oriented heterogeneous graph among queries and web pages. Each pair of objects in the graph are linked together as long as they potentially share similar search tasks. A novel graph-based regularization algorithm is designed for search task prediction by leveraging the graph. Extensive experiments in real search log data demonstrate the effectiveness of our method over state-of-the-art classifiers, and the search performance can be significantly improved by using the task prediction results as additional information.
Intentions and attention in exploratory health search BIBAFull-Text 65-74
  Marc-Allen Cartright; Ryen W. White; Eric Horvitz
We study information goals and patterns of attention in exploratory search for health information on the Web, reporting results of a large-scale log-based study. We examine search activity associated with the goal of diagnosing illness from symptoms versus more general information-seeking about health and illness. We decompose exploratory health search into evidence-based and hypothesis-directed information seeking. Evidence-based search centers on the pursuit of details and relevance of signs and symptoms. Hypothesis-directed search includes the pursuit of content on one or more illnesses, including risk factors, treatments, and therapies for illnesses, and on the discrimination among different diseases under the uncertainty that exists in advance of a confirmed diagnosis. These different goals of exploratory health search are not independent, and transitions can occur between them within or across search sessions. We construct a classifier that identifies medically-related search sessions in log data. Given a set of search sessions flagged as health-related, we show how we can identify different intentions persisting as foci of attention within those sessions. Finally, we discuss how insights about foci dynamics can help us better understand exploratory health search behavior and better support health search on the Web.
User behavior in zero-recall eCommerce queries BIBAFull-Text 75-84
  Gyanit Singh; Nish Parikh; Neel Sundaresn
User expectation and experience for web search and eCommerce (product) search are quite different. Product descriptions are concise as compared to typical web documents. User expectation is more specific to find the right product. The difference in the publisher and searcher vocabulary (in case of product search the seller and the buyer vocabulary) combined with the fact that there are fewer products to search over than web documents result in observable numbers of searches that return no results (zero recall searches). In this paper we describe a study of zero recall searches. Our study is focused on eCommerce search and uses data from a leading eCommerce site's user click stream logs. There are 3 main contributions of our study: 1) The cause of zero recall searches; 2) A study of user's reaction and recovery from zero recall; 3) A study of differences in behavior of power users versus novice users to zero recall searches.

Learning to rank

Bagging gradient-boosted trees for high precision, low variance ranking models BIBAFull-Text 85-94
  Yasser Ganjisaffar; Rich Caruana; Cristina Videira Lopes
Recent studies have shown that boosting provides excellent predictive performance across a wide variety of tasks. In Learning-to-rank, boosted models such as RankBoost and LambdaMART have been shown to be among the best performing learning methods based on evaluations on public data sets. In this paper, we show how the combination of bagging as a variance reduction technique and boosting as a bias reduction technique can result in very high precision and low variance ranking models. We perform thousands of parameter tuning experiments for LambdaMART to achieve a high precision boosting model. Then we show that a bagged ensemble of such LambdaMART boosted models results in higher accuracy ranking models while also reducing variance as much as 50%. We report our results on three public learning-to-rank data sets using four metrics. Bagged LamdbaMART outperforms all previously reported results on ten of the twelve comparisons, and bagged LambdaMART outperforms non-bagged LambdaMART on all twelve comparisons. For example, wrapping bagging around LambdaMART increases NDCG@1 from 0.4137 to 0.4200 on the MQ2007 data set; the best prior results in the literature for this data set is 0.4134 by RankBoost.
Learning to rank for freshness and relevance BIBAFull-Text 95-104
  Na Dai; Milad Shokouhi; Brian D. Davison
Freshness of results is important in modern web search. Failing to recognize the temporal aspect of a query can negatively affect the user experience, and make the search engine appear stale. While freshness and relevance can be closely related for some topics (e.g., news queries), they are more independent in others (e.g., time insensitive queries). Therefore, optimizing one criterion does not necessarily improve the other, and can even do harm in some cases.
   We propose a machine-learning framework for simultaneously optimizing freshness and relevance, in which the trade-off is automatically adaptive to query temporal characteristics. We start by illustrating different temporal characteristics of queries, and the features that can be used for capturing these properties. We then introduce our supervised framework that leverages the temporal profile of queries (inferred from pseudo-feedback documents) along with the other ranking features to improve both freshness and relevance of search results. Our experiments on a large archival web corpus demonstrate the efficacy of our techniques.
A cascade ranking model for efficient ranked retrieval BIBAFull-Text 105-114
  Lidan Wang; Jimmy Lin; Donald Metzler
There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to rank, while efficiency gains tend to arise from improvements in query evaluation and caching strategies. Given their inherently disjoint nature, it is difficult to jointly optimize effectiveness and efficiency in end-to-end systems. To address this problem, we formulate and develop a novel cascade ranking model, which unlike previous approaches, can simultaneously improve both top k ranked effectiveness and retrieval efficiency. The model constructs a cascade of increasingly complex ranking functions that progressively prunes and refines the set of candidate documents to minimize retrieval latency and maximize result set quality. We present a novel boosting algorithm for learning such cascades to directly optimize the tradeoff between effectiveness and efficiency. Experimental results show that our cascades are faster and return higher quality results than comparable ranking models.
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation BIBAFull-Text 115-124
  Peng Cai; Wei Gao; Aoying Zhou; Kam-Fai Wong
Learning to adapt in a new setting is a common challenge to our knowledge and capability. New life would be easier if we actively pursued supervision from the right mentor chosen with our relevant but limited prior knowledge. This variant principle of active learning seems intuitively useful to many domain adaptation problems. In this paper, we substantiate its power for advancing automatic ranking adaptation, which is important in web search since it's prohibitive to gather enough labeled data for every search domain for fully training domain-specific rankers. For the cost-effectiveness, it is expected that only those most informative instances in target domain are collected to annotate while we can still utilize the abundant ranking knowledge in source domain. We propose a unified ranking framework to mutually reinforce the active selection of informative target-domain queries and the appropriate weighting of source training data as related prior knowledge. We select to annotate those target queries whose documents' order most disagrees among the members of a committee built on the mixture of source training data and the already selected target data. Then the replenished labeled set is used to adjust the importance of source queries for enhancing their rank transfer. This procedure iterates until labeling budget exhausts. Based on LETOR3.0 and Yahoo! Learning to Rank Challenge data sets, our approach significantly outperforms the random query annotation commonly used in ranking adaptation and the active rank learner on target-domain data only.

Personalization

SCENE: a scalable two-stage personalized news recommendation system BIBAFull-Text 125-134
  Lei Li; Dingding Wang; Tao Li; Daniel Knox; Balaji Padmanabhan
Recommending news articles has become a promising research direction as the Internet provides fast access to real-time information from multiple sources around the world. Traditional news recommendation systems strive to adapt their services to individual users by virtue of both user and news content information. However, the latent relationships among different news items, and the special properties of new articles, such as short shelf lives and value of immediacy, render the previous approaches inefficient.
   In this paper, we propose a scalable two-stage personalized news recommendation approach with a two-level representation, which considers the exclusive characteristics (e.g., news content, access patterns, named entities, popularity and recency) of news items when performing recommendation. Also, a principled framework for news selection based on the intrinsic property of user interest is presented, with a good balance between the novelty and diversity of the recommended result. Extensive empirical experiments on a collection of news articles obtained from various news websites demonstrate the efficacy and efficiency of our approach.
Inferring and using location metadata to personalize web search BIBAFull-Text 135-144
  Paul N. Bennett; Filip Radlinski; Ryen W. White; Emine Yilmaz
Personalization of search results offers the potential for significant improvements in Web search. Among the many observable user attributes, approximate user location is particularly simple for search engines to obtain and allows personalization even for a first-time Web search user. However, acting on user location information is difficult, since few Web documents include an address that can be interpreted as constraining the locations where the document is relevant. Furthermore, many Web documents -- such as local news stories, lottery results, and sports team fan pages -- may not correspond to physical addresses, but the location of the user still plays an important role in document relevance. In this paper, we show how to infer a more general location relevance which uses not only physical location but a more general notion of locations of interest for Web pages. We compute this information using implicit user behavioral data, characterize the most location-centric pages, and show how location information can be incorporated into Web search ranking. Our results show that a substantial fraction of Web search queries can be significantly improved by incorporating location-based features.
Active learning to maximize accuracy vs. effort in interactive information retrieval BIBAFull-Text 145-154
  Aibo Tian; Matthew Lease
We consider an interactive information retrieval task in which the user is interested in finding several to many relevant documents with minimal effort. Given an initial document ranking, user interaction with the system produces relevance feedback (RF) which the system then uses to revise the ranking. This interactive process repeats until the user terminates the search. To maximize accuracy relative to user effort, we propose an active learning strategy. At each iteration, the document whose relevance is maximally uncertain to the system is slotted high into the ranking in order to obtain user feedback for it. Simulated feedback on the Robust04 TREC collection shows our active learning approach dominates several standard RF baselines relative to the amount of feedback provided by the user. Evaluation on Robust04 under noisy feedback and on LETOR collections further demonstrate the effectiveness of active learning, as well as value of negative feedback in this task scenario.

Retrieval models I

CRTER: using cross terms to enhance probabilistic information retrieval BIBAFull-Text 155-164
  Jiashu Zhao; Jimmy Xiangji Huang; Ben He
Term proximity retrieval rewards a document where the matched query terms occur close to each other. Although term proximity is known to be effective in many Information Retrieval (IR) applications, the within-document distribution of each individual query term and how the query terms associate with each other, are not fully considered. In this paper, we introduce a pseudo term, namely Cross Term, to model term proximity for boosting retrieval performance. An occurrence of a query term is assumed to have an impact towards its neighboring text, which gradually weakens with the increase of the distance to the place of occurrence. We use a shape function to characterize such an impact. A Cross Term occurs when two query terms appear close to each other and their impact shape functions have an intersection. We propose a Cross Term Retrieval (CRTER) model that combines the Cross Terms' information with basic probabilistic weighting models to rank the retrieved documents. Extensive experiments on standard TREC collections illustrate the effectiveness of our proposed CRTER model.
A boosting approach to improving pseudo-relevance feedback BIBAFull-Text 165-174
  Yuanhua Lv; ChengXiang Zhai; Wan Chen
Pseudo-relevance feedback has proven effective for improving the average retrieval performance. Unfortunately, many experiments have shown that although pseudo-relevance feedback helps many queries, it also often hurts many other queries, limiting its usefulness in real retrieval applications. Thus an important, yet difficult challenge is to improve the overall effectiveness of pseudo-relevance feedback without sacrificing the performance of individual queries too much. In this paper, we propose a novel learning algorithm, FeedbackBoost, based on the boosting framework to improve pseudo-relevance feedback through optimizing the combination of a set of basis feedback algorithms using a loss function defined to directly measure both robustness and effectiveness. FeedbackBoost can potentially accommodate many basis feedback methods as features in the model, making the proposed method a general optimization framework for pseudo-relevance feedback. As an application, we apply FeedbackBoost to improve pseudo feedback based on language models through combining different document weighting strategies. The experiment results demonstrate that FeedbackBoost can achieve better average precision and meanwhile dramatically reduce the number and magnitude of feedback failures as compared to three representative pseudo feedback methods and a standard learning to rank approach for pseudo feedback.
Enhancing ad-hoc relevance weighting using probability density estimation BIBAFull-Text 175-184
  Xiaofeng Zhou; Jimmy Xiangji Huang; Ben He
Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document's relevance of its length, and the Scope hypothesis, which assumes the opposite. Despite the effectiveness of the classical probabilistic models, the potential relationship between document length and relevance is not fully explored to improve retrieval performance. In this paper, we conduct an in-depth study of this relationship based on the Scope hypothesis that document length does have its impact on relevance. We study a list of probability density functions and examine which of the density functions fits the best to the actual distribution of the document length. Based on the studied probability density functions, we propose a length-based BM25 relevance weighting model, called BM25L, which incorporates document length as a substantial weighting factor. Extensive experiments conducted on standard TREC collections show that our proposed BM25L markedly outperforms the original BM25 model, even if the latter is optimized.

Social media

Who should share what?: item-level social influence prediction for users and posts ranking BIBAFull-Text 185-194
  Peng Cui; Fei Wang; Shaowei Liu; Mingdong Ou; Shiqiang Yang; Lifeng Sun
People and information are two core dimensions in a social network. People sharing information (such as blogs, news, albums, etc.) is the basic behavior. In this paper, we focus on predicting item-level social influence to answer the question Who should share What, which can be extended into two information retrieval scenarios: (1) Users ranking: given an item, who should share it so that its diffusion range can be maximized in a social network; (2) Web posts ranking: given a user, what should she share to maximize her influence among her friends. We formulate the social influence prediction problem as the estimation of a user-post matrix, in which each entry represents the strength of influence of a user given a web post. We propose a Hybrid Factor Non-Negative Matrix Factorization (HF-NMF) approach for item-level social influence modeling, and devise an efficient projected gradient method to solve the HF-NMF problem. Intensive experiments are conducted and demonstrate the advantages and characteristics of the proposed method.
Mining tags using social endorsement networks BIBAFull-Text 195-204
  Theodoros Lappas; Kunal Punera; Tamas Sarlos
Entities on social systems, such as users on Twitter, and images on Flickr, are at the core of many interesting applications: they can be ranked in search results, recommended to users, or used in contextual advertising. Such applications assume knowledge of an entity's nature and characteristic attributes. An effective way to encode such knowledge is in the form of tags. An untagged entity is practically inaccessible, since it is hard to retrieve or interact with. To address this, some platforms allow users to manually tag entities. However, while such tags can be informative, they can oftentimes be inadequate, trivial, ambiguous, or even plain false. Numerous automated tagging methods have been proposed to address these issues. However, most of them require pre-existing high-quality tags or descriptive texts for every entity that needs to be tagged. In our work, we propose a method based on social endorsements that is free from such constraints.
   Virtually every major social networking platform allows users to endorse entities that they find appealing. Examples include "following" Twitter users or "favoriting" Flickr photos. These endorsements are abundant and directly capture the preferences of users. In this paper, we pose and solve the problem of using the underlying social endorsement network to extract useful tags for entities in a social system. Our work leverages techniques from topic modeling to capture the interests of users and then uses them to extract relevant and descriptive tags for the entities they endorse. We perform an extensive evaluation of our proposed approach on real large-scale datasets from both Twitter and Flickr, and show that it significantly outperforms meaningful and competitive baselines.
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking BIBAFull-Text 205-214
  Gabriella Kazai; Jaap Kamps; Marijn Koolen; Natasa Milic-Frayling
The evaluation of information retrieval (IR) systems over special collections, such as large book repositories, is out of reach of traditional methods that rely upon editorial relevance judgments. Increasingly, the use of crowdsourcing to collect relevance labels has been regarded as a viable alternative that scales with modest costs. However, crowdsourcing suffers from undesirable worker practices and low quality contributions. In this paper we investigate the design and implementation of effective crowdsourcing tasks in the context of book search evaluation. We observe the impact of aspects of the Human Intelligence Task (HIT) design on the quality of relevance labels provided by the crowd. We assess the output in terms of label agreement with a gold standard data set and observe the effect of the crowdsourced relevance judgments on the resulting system rankings. This enables us to observe the effect of crowdsourcing on the entire IR evaluation process. Using the test set and experimental runs from the INEX 2010 Book Track, we find that varying the HIT design, and the pooling and document ordering strategies leads to considerable differences in agreement with the gold set labels. We then observe the impact of the crowdsourced relevance label sets on the relative system rankings using four IR performance metrics. System rankings based on MAP and Bpref remain less affected by different label sets while the Precision@10 and nDCG@10 lead to dramatically different system rankings, especially for labels acquired from HITs with weaker quality controls. Overall, we find that crowdsourcing can be an effective tool for the evaluation of IR systems, provided that care is taken when designing the HITs.

Content analysis

A site oriented method for segmenting web pages BIBAFull-Text 215-224
  David Fernandes; Edleno Silva de Moura; Altigran Soares da Silva; Berthier Ribeiro-Neto; Edisson Braga
Information about how to segment a Web page can be used nowadays by applications such as segment aware Web search, classification and link analysis. In this research, we propose a fully automatic method for page segmentation and evaluate its application through experiments with four separate Web sites. While the method may be used in other applications, our main focus in this article is to use it as input to segment aware Web search systems. Our results indicate that the proposed method produces better segmentation results when compared to the best segmentation method we found in literature. Further, when applied as input to a segment aware Web search method, it produces results close to those produced when using a manual page segmentation method.
Composite hashing with multiple information sources BIBAFull-Text 225-234
  Dan Zhang; Fei Wang; Luo Si
Similarity search applications with a large amount of text and image data demands an efficient and effective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted much attention due to its fast query/search speed and drastically reduced storage requirement. All of the current semantic hashing methods only deal with the case when each example is represented by one type of features. However, examples are often described from several different information sources in many real world applications. For example, the characteristics of a webpage can be derived from both its content part and its associated links.
   To address the problem of learning good hashing codes in this scenario, we propose a novel research problem -- Composite Hashing with Multiple Information Sources (CHMIS). The focus of the new research problem is to design an algorithm for incorporating the features from different information sources into the binary hashing codes efficiently and effectively. In particular, we propose an algorithm CHMIS-AW (CHMIS with Adjusted Weights) for learning the codes. The proposed algorithm integrates information from several different sources into the binary hashing codes by adjusting the weights on each individual source for maximizing the coding performance, and enables fast conversion from query examples to their binary hashing codes. Experimental results on five different datasets demonstrate the superior performance of the proposed method against several other state-of-the-art semantic hashing techniques.
Detecting outlier sections in us congressional legislation BIBAFull-Text 235-244
  Elif Aktolga; Irene Ros; Yannick Assogba
Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.
DOM based content extraction via text density BIBAFull-Text 245-254
  Fei Sun; Dandan Song; Lejian Liao
In addition to the main content, most web pages also contain navigation panels, advertisements and copyright and disclaimer notices. This additional content, which is also known as noise, is typically not related to the main subject and may hamper the performance of web data mining, and hence needs to be removed properly. In this paper, we present Content Extraction via Text Density (CETD) a fast, accurate and general method for extracting content from diverse web pages, and using DOM (Document Object Model) node text density to preserve the original structure. For this purpose, we introduce two concepts to measure the importance of nodes: Text Density and Composite Text Density. In order to extract content intact, we propose a technique called DensitySum to replace Data Smoothing. The approach was evaluated with the CleanEval benchmark and with randomly selected pages from well-known websites, where various web domains and styles are tested. The average F1-scores with our method were 8.79% higher than the best scores among several alternative methods.

Web IR

Social context summarization BIBAFull-Text 255-264
  Zi Yang; Keke Cai; Jie Tang; Li Zhang; Zhong Su; Juanzi Li
We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard documents. With the rapid growth of online social networks, abundant user generated content (e.g., comments) associated with the standard documents is available. Which parts in a document are social users really caring about? How can we generate summaries for standard documents by considering both the informativeness of sentences and interests of social users? This paper explores such an approach by modeling Web documents and social contexts into a unified framework. We propose a dual wing factor graph (DWFG) model, which utilizes the mutual reinforcement between Web documents and their associated social contexts to generate summaries. An efficient algorithm is designed to learn the proposed factor graph model.
   Experimental results on a Twitter data set validate the effectiveness of the proposed model. By leveraging the social context information, our approach obtains significant improvement (averagely +5.0%-17.3%) over several alternative methods (CRF, SVM, LR, PR, and DocLead) on the performance of summarization.
Probabilistic factor models for web site recommendation BIBAFull-Text 265-274
  Hao Ma; Chao Liu; Irwin King; Michael R. Lyu
Due to the prevalence of personalization and information filtering applications, modeling users' interests on the Web has become increasingly important during the past few years. In this paper, aiming at providing accurate personalized Web site recommendations for Web users, we propose a novel probabilistic factor model based on dimensionality reduction techniques. We also extend the proposed method to collective probabilistic factor modeling, which further improves model performance by incorporating heterogeneous data sources. The proposed method is general, and can be applied to not only Web site recommendations, but also a wide range of Web applications, including behavioral targeting, sponsored search, etc. The experimental analysis on Web site recommendation shows that our method outperforms other traditional recommendation approaches. Moreover, the complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations.
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation BIBAFull-Text 275-284
  Jing He; Wayne Xin Zhao; Baihan Shu; Xiaoming Li; Hongfei Yan
Various click models have been recently proposed as a principled approach to infer the relevance of documents from the clickthrough data. The inferred document relevance is potentially useful in evaluating the Web retrieval systems. In practice, it generally requires to acquire the accurate evaluation results within minimal users' query submissions. This problem is important for speeding up search engine development and evaluation cycle and acquiring reliable evaluation results on tail queries. In this paper, we propose a reordering framework for efficient evaluation problem in the context of clickthrough based Web retrieval evaluation. The main idea is to move up the documents that contribute more for the evaluation task. In this framework, we propose four intuitions and formulate them as an optimization problem. Both user study and TREC data based experiments validate that the reordering framework results in much fewer query submissions to get accurate evaluation results with only a little harm to the users' utility.
Unsupervised query segmentation using clickthrough for information retrieval BIBAFull-Text 285-294
  Yanen Li; Bo-Jun Paul Hsu; ChengXiang Zhai; Kuansan Wang
Query segmentation is an important task toward understanding queries accurately, which is essential for improving search results. Existing segmentation models either use labeled data to predict the segmentation boundaries, for which the training data is expensive to collect, or employ unsupervised strategy based on a large text corpus, which might be inaccurate because of the lack of relevant information. In this paper, we propose a probabilistic model to exploit clickthrough data for query segmentation, where the model parameters are estimated via an efficient EM algorithm. We further study how to properly interpret the segmentation results and utilize them to improve retrieval accuracy. Specifically, we propose an integrated language model based on the standard bigram language model to exploit the probabilistic structure obtained through query segmentation. Experiment results on two datasets show that our segmentation model outperforms existing segmentation models. Furthermore, extensive experiments on a large retrieval dataset reveals that the results of query segmentation can be leveraged to improve retrieval relevance by using the proposed integrated language model.

Collaborative filtering I

Collaborative competitive filtering: learning recommender using context of user choice BIBAFull-Text 295-304
  Shuang-Hong Yang; Bo Long; Alexander J. Smola; Hongyuan Zha; Zhaohui Zheng
While a user's preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) approaches take into account only the binary events of user actions but totally disregard the contexts in which users' decisions are made. In this paper, we propose Collaborative Competitive Filtering (CCF), a framework for learning user preferences by modeling the choice process in recommender systems. CCF employs a multiplicative latent factor model to characterize the dyadic utility function. But unlike CF, CCF models the user behavior of choices by encoding a local competition effect. In this way, CCF allows us to leverage dyadic data that was previously lumped together with missing data in existing CF models. We present two formulations and an efficient large scale optimization algorithm. Experiments on three real-world recommendation data sets demonstrate that CCF significantly outperforms standard CF approaches in both offline and online evaluations.
CLR: a collaborative location recommendation framework based on co-clustering BIBAFull-Text 305-314
  Kenneth Wai-Ting Leung; Dik Lun Lee; Wang-Chien Lee
GPS data tracked on mobile devices contains rich information about human activities and preferences. In this paper, GPS data is used in location-based services (LBSs) to provide collaborative location recommendations. We observe that most existing LBSs provide location recommendations by clustering the User-Location matrix. Since the User-Location matrix created based on GPS data is huge, there are two major problems with these methods. First, the number of similar locations that need to be considered in computing the recommendations can be numerous. As a result, the identification of truly relevant locations from numerous candidates is challenging. Second, the clustering process on large matrix is time consuming. Thus, when new GPS data arrives, complete re-clustering of the whole matrix is infeasible. To tackle these two problems, we propose the Collaborative Location Recommendation (CLR) framework for location recommendation. By considering activities (i.e., temporal preferences) and different user classes (i.e., Pattern Users, Normal Users, and Travelers) in the recommendation process, CLR is capable of generating more precise and refined recommendations to the users compared to the existing methods. Moreover, CLR employs a dynamic clustering algorithm CADC to cluster the trajectory data into groups of similar users, similar activities and similar locations efficiently by supporting incremental update of the groups when new GPS trajectory data arrives. We evaluate CLR with a real-world GPS dataset, and confirm that the CLR framework provides more accurate location recommendations compared to the existing methods.
Functional matrix factorizations for cold-start recommendation BIBAFull-Text 315-324
  Ke Zhou; Shuang-Hong Yang; Hongyuan Zha
A key challenge in recommender system research is how to effectively profile new users, a problem generally known as cold-start recommendation. Recently the idea of progressively querying user responses through an initial interview process has been proposed as a useful new user preference elicitation strategy. In this paper, we present functional matrix factorization (fMF), a novel cold-start recommendation method that solves the problem of initial interview construction within the context of learning user and item profiles. Specifically, fMF constructs a decision tree for the initial interview with each node being an interview question, enabling the recommender to query a user adaptively according to her prior responses. More importantly, we associate latent profiles for each node of the tree -- in effect restricting the latent profiles to be a function of possible answers to the interview questions -- which allows the profiles to be gradually refined through the interview process based on user responses. We develop an iterative optimization algorithm that alternates between decision tree construction and latent profiles extraction as well as a regularization scheme that takes into account of the tree structure. Experimental results on three benchmark recommendation data sets demonstrate that the proposed fMF algorithm significantly outperforms existing methods for cold-start recommendation.
Exploiting geographical influence for collaborative point-of-interest recommendation BIBAFull-Text 325-334
  Mao Ye; Peifeng Yin; Wang-Chien Lee; Dik-Lun Lee
In this paper, we aim to provide a point-of-interests (POI) recommendation service for the rapid growing location-based social networks (LBSNs), e.g., Foursquare, Whrrl, etc. Our idea is to explore user preference, social influence and geographical influence for POI recommendations. In addition to deriving user preference based on user-based collaborative filtering and exploring social influence from friends, we put a special emphasis on geographical influence due to the spatial clustering phenomenon exhibited in user check-in activities of LBSNs. We argue that the geographical influence among POIs plays an important role in user check-in behaviors and model it by power law distribution. Accordingly, we develop a collaborative recommendation algorithm based on geographical influence based on naive Bayesian. Furthermore, we propose a unified POI recommendation framework, which fuses user preference to a POI with social influence and geographical influence. Finally, we conduct a comprehensive performance evaluation over two large-scale datasets collected from Foursquare and Whrrl. Experimental results with these real datasets show that the unified collaborative recommendation approach significantly outperforms a wide spectrum of alternative recommendation approaches.

Users II

Why searchers switch: understanding and predicting engine switching rationales BIBAFull-Text 335-344
  Qi Guo; Ryen W. White; Yunqiao Zhang; Blake Anderson; Susan T. Dumais
Search engine switching is the voluntary transition between Web search engines. Engine switching can occur for a number of reasons, including user dissatisfaction with search results, a desire for broader topic coverage or verification, user preferences, or even unintentionally. An improved understanding of switching rationales allows search providers to tailor the search experience according to the different causes. In this paper we study the reasons behind search engine switching within a session. We address the challenge of identifying switching rationales by designing and implementing client-side instrumentation to acquire in-situ feedbacks from users. Using this feedback, we investigate in detail the reasons that users switch engines within a session. We also study the relationship between implicit behavioral signals and the switching causes, and develop and evaluate models to predict the reasons for switching. In addition, we collect editorial judgments of switching rationales by third-party judges and show that we can recover switching causes a posteriori. Our findings provide valuable insights into why users switch search engines in a session and demonstrate the relationship between search behavior and switching motivations. The findings also reveal sufficient behavioral consistency to afford accurate prediction of switching rationale, which can be used to dynamically adapt the search experience and derive more accurate competitive metrics.
Find it if you can: a game for modeling different types of web search success using interaction data BIBAFull-Text 345-354
  Mikhail Ageev; Qi Guo; Dmitry Lagun; Eugene Agichtein
A better understanding of strategies and behavior of successful searchers is crucial for improving the experience of all searchers. However, research of search behavior has been struggling with the tension between the relatively small-scale, but controlled lab studies, and the large-scale log-based studies where the searcher intent and many other important factors have to be inferred. We present our solution for performing controlled, yet realistic, scalable, and reproducible studies of searcher behavior. We focus on difficult informational tasks, which tend to frustrate many users of the current web search technology. First, we propose a principled formalization of different types of "success" for informational search, which encapsulate and sharpen previously proposed models. Second, we present a scalable game-like infrastructure for crowdsourcing search behavior studies, specifically targeted towards capturing and evaluating successful search strategies on informational tasks with known intent. Third, we report our analysis of search success using these data, which confirm and extends previous findings. Finally, we demonstrate that our model can predict search success more effectively than the existing state-of-the-art methods, on both our data and on a different set of log data collected from regular search engine sessions. Together, our search success models, the data collection infrastructure, and the associated behavior analysis techniques, significantly advance the study of success in web search.
Measuring improvement in user search performance resulting from optimal search tips BIBAFull-Text 355-364
  Neema Moraveji; Daniel Russell; Jacob Bien; David Mease
Web search performance can be improved by either improving the search engine itself or by educating the user to search more efficiently. There is a large amount of literature describing techniques for measuring the former; whereas, improvements resulting from the latter are more difficult to quantify. In this paper we demonstrate an experimental methodology that proves to successfully quantify improvements from user education. The user education in our study is realized in the form of tactical search feature tips that expand user awareness of task-relevant tools and features of the search application. Initially, these tips are presented in an idealized situation: each tip is shown at the same time as the study participants are given a task that is constructed to benefit from the specific tip. However, we also present a follow-up study roughly one week later in which the search tips are no longer presented but the study participants who previously were shown search tips still demonstrate improved search efficiency compared to the control group. This research has implications for search user interface designers and the study of information retrieval systems.
ViewSer: enabling large-scale remote user studies of web search examination and interaction BIBAFull-Text 365-374
  Dmitry Lagun; Eugene Agichtein
Web search behaviour studies, including eye-tracking studies of search result examination, have resulted in numerous insights to improve search result quality and presentation. Yet, eye tracking studies have been restricted in scale, due to the expense and the effort required. Furthermore, as the reach of the Web expands, it becomes increasingly important to understand how searchers around the world see and interact with the search results. To address both challenges, we introduce ViewSer, a novel methodology for performing web search examination studies remotely, at scale, and without requiring eye-tracking equipment. ViewSer operates by automatically modifying the appearance of a search engine result page, to clearly show one search result at a time as if through a "viewport", while partially blurring the rest and allowing the participant to move the viewport naturally with a computer mouse or trackpad. Remarkably, the resulting result viewing and clickthrough patterns agree closely with unrestricted viewing of results, as measured by eye-tracking equipment, validated by a study with over 100 participants. We also explore applications of ViewSer to practical search tasks, such as analyzing the search result summary (snippet) attractiveness, result re-ranking, and evaluating snippet quality. These experiments could have only be done previously by tracking the eye movements for a small number of subjects in the lab. In contrast, our study was performed with over 100 participants, allowing us to reproduce and extend previous findings, establishing ViewSer as a valuable tool for large-scale search behavior experiments.

Query analysis II

CrowdLogging: distributed, private, and anonymous search logging BIBAFull-Text 375-384
  Henry Allen Feild; James Allan; Joshua Glatt
We describe CrowdLogging, an approach for distributed search log collection, storage, and mining, with the dual goals of preserving privacy and making the mined information broadly available. Most search log mining approaches and most privacy enhancing schemes have focused on centralized search logs and methods for disseminating them to third parties. In our approach, a user's search log is encrypted and shared in such a way that (a) the source of a search behavior artifact, such as a query, is unknown and (b) extremely rare artifacts -- that is, artifacts more likely to contain private information -- are not revealed. The approach works with any search behavior artifact that can be extracted from a search log, including queries, query reformulations, and query-click pairs. In this work, we: (1) present a distributed search log collection, storage, and mining framework; (2) compare several privacy policies, including differential privacy, showing the trade-offs between strong guarantees and the utility of the released data; (3) demonstrate the impact of our approach using two existing research query logs; and (4) describe a pilot study for which we implemented a version of the framework.
Out of sight, not out of mind: on the effect of social and physical detachment on information need BIBAFull-Text 385-394
  Elad Yom-Tov; Fernando Diaz
The information needs of users and the documents which answer it are frequently contingent on the different characteristics of users. This is especially evident during natural disasters, such as earthquakes and violent weather incidents, which create a strong transient information need. In this paper we investigate how the information need of users is affected by their physical detachment, as estimated by their physical location in relation to that of the event, and by their social detachment, as quantified by the number of their acquaintances who may be affected by the event. Drawing on large-scale data from three major events, we show that social and physical detachment levels of users are a major influence on their information needs, as manifested by their search engine queries. We demonstrate how knowing social and physical detachment levels can assist in improving retrieval for two applications: identifying search queries related to events and ranking results in response to event-related queries. We find that the average precision in identifying relevant search queries improves by approximately 18%, and that the average precision of ranking that uses detachment information improves by 10%.
Scalable multi-dimensional user intent identification using tree structured distributions BIBAFull-Text 395-404
  Vinay Jethava; Liliana Calderón-Benavides; Ricardo Baeza-Yates; Chiranjib Bhattacharyya; Devdatt Dubhashi
The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.
Social annotation in query expansion: a machine learning approach BIBAFull-Text 405-414
  Yuan Lin; Hongfei Lin; Song Jin; Zheng Ye
Automatic query expansion technologies have been proven to be effective in many information retrieval tasks. Most existing approaches are based on the assumption that the most informative terms in top-retrieved documents can be viewed as context of the query and thus can be used for query expansion. One problem with these approaches is that some of the expansion terms extracted from feedback documents are irrelevant to the query, and thus may hurt the retrieval performance. In social annotations, users provide different keywords describing the respective Web pages from various aspects. These features may be used to boost IR performance. However, to date, the potential of social annotation for this task has been largely unexplored. In this paper, we explore the possibility and potential of social annotation as a new resource for extracting useful expansion terms. In particular, we propose a term ranking approach based on social annotation resource. The proposed approach consists of two phases: (1) in the first phase, we propose a term-dependency method to choose the most likely expansion terms; (2) in the second phase, we develop a machine learning method for term ranking, which is learnt from the statistics of the candidate expansion terms, using ListNet. Experimental results on three TREC test collections show that the retrieval performance can be improved when the term ranking method is used. In addition, we also demonstrate that terms selected by the term-dependency method from social annotation resources are beneficial to improve the retrieval performance.

Communities

Predicting web searcher satisfaction with existing community-based answers BIBAFull-Text 415-424
  Qiaoling Liu; Eugene Agichtein; Gideon Dror; Evgeniy Gabrilovich; Yoelle Maarek; Dan Pelleg; Idan Szpektor
Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 billion, and are aggressively indexed by web search engines. In fact, a large number of search engine users benefit from these archives, by finding existing answers that address their own queries. This scenario poses new challenges and opportunities for both search engines and CQA sites. To this end, we formulate a new problem of predicting the satisfaction of web searchers with CQA answers. We analyze a large number of web searches that result in a visit to a popular CQA site, and identify unique characteristics of searcher satisfaction in this setting, namely, the effects of query clarity, query-to-question match, and answer quality. We then propose and evaluate several approaches to predicting searcher satisfaction that exploit these characteristics. To the best of our knowledge, this is the first attempt to predict and validate the usefulness of CQA archives for external searchers, rather than for the original askers. Our results suggest promising directions for improving and exploiting community question answering services in pursuit of satisfying even more Web search queries.
Competition-based user expertise score estimation BIBAFull-Text 425-434
  Jing Liu; Young-In Song; Chin-Yew Lin
In this paper, we consider the problem of estimating the relative expertise score of users in community question and answering services (CQA). Previous approaches typically only utilize the explicit question answering relationship between askers and answerers and apply link analysis to address this problem. The implicit pairwise comparison between two users that is implied in the best answer selection is ignored. Given a question and answering thread, it's likely that the expertise score of the best answerer is higher than the asker's and all other non-best answerers'. The goal of this paper is to explore such pairwise comparisons inferred from best answer selections to estimate the relative expertise scores of users. Formally, we treat each pairwise comparison between two users as a two-player competition with one winner and one loser. Two competition models are proposed to estimate user expertise from pairwise comparisons. Using the NTCIR-8 CQA task data with 3 million questions and introducing answer quality prediction based evaluation metrics, the experimental results show that the pairwise comparison based competition model significantly outperforms link analysis based approaches (PageRank and HITS) and pointwise approaches (number of best answers and best answer ratio) for estimating the expertise of active users. Furthermore, it's shown that pairwise comparison based competition models have better discriminative power than other methods. It's also found that answer quality (best answer) is an important factor to estimate user expertise.
Learning online discussion structures by conditional random fields BIBAFull-Text 435-444
  Hongning Wang; Chi Wang; ChengXiang Zhai; Jiawei Han
Online forum discussions are emerging as valuable information repository, where knowledge is accumulated by the interaction among users, leading to multiple threads with structures. Such replying structure in each thread conveys important information about the discussion content. Unfortunately, not all the online forum sites would explicitly record such replying relationship, making it hard to for both users and computers to digest the information buried in a thread discussion.
   In this paper, we propose a probabilistic model in the Conditional Random Fields framework to predict the replying structure for a threaded online discussion. Different from previous thread reconstruction methods, most of which fail to consider dependency between the posts, we cast the problem as a supervised structure learning problem to incorporate the features describing the structural dependency among the discussion content and learn their relationship. Experiment results on three different online forums show that the proposed method can well capture the replying structures in online discussion threads, and multiple tasks such as forum search and question answering can benefit from the reconstructed replying structures.
Mining topics on participations for community discovery BIBAFull-Text 445-454
  Guoqing Zheng; Jinwen Guo; Lichun Yang; Shengliang Xu; Shenghua Bao; Zhong Su; Dingyi Han; Yong Yu
Community discovery on large-scale linked document corpora has been a hot research topic for decades. There are two types of links. The first one, which we call d2d-link, indicates connectiveness among different documents, such as blog references and research paper citations. The other one, which we call u2u-link, represents co-occurrences or simultaneous participations of different users in one document and typically each document from u2u-link corpus has more than one user/author. Examples of u2u-link data covers email archives and research paper co-authorship networks. Community discovery in d2d-link data has achieved much success, while methods for that in u2u-link data either make no use of the textual content of the documents or make oversimplified assumptions about the users and the textual content. In this paper we propose a general approach of community discovery for u2u-link data, i.e., multiple user data, by placing topical variables on multiple authors' participations in documents. Experiments on a research proceeding co-authorship corpus and a New York Times news corpus show the effectiveness of our model.

Classification

Authorship classification: a discriminative syntactic tree mining approach BIBAFull-Text 455-464
  Sangkyum Kim; Hyungsul Kim; Tim Weninger; Jiawei Han; Hyun Duk Kim
In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an author's writing style. However, due to the high computational complexity for extracting and computing syntactic features, only simple variations of basic syntactic features such as function words, POS (Part of Speech) tags, and rewrite rules were considered. In this paper, we propose a new feature set of k-embedded-edge subtree patterns that holds more syntactic information than previous feature sets. We also propose a novel approach to directly mining them from a given set of syntactic trees. We show that this approach reduces the computational burden of using complex syntactic structures as the feature set. Comprehensive experiments on real-world datasets demonstrate that our approach is reliable and more accurate than previous studies.
On theme location discovery for travelogue services BIBAFull-Text 465-474
  Mao Ye; Rong Xiao; Wang-Chien Lee; Xing Xie
In this paper, we aim to develop a travelogue service that discovers and conveys various travelogue digests, in form of theme locations, geographical scope, traveling trajectory and location snippet, to users. In this service, theme locations in a travelogue are the core information to discover. Thus we aim to address the problem of theme location discovery to enable the above travelogue services. Due to the inherent ambiguity of location relevance, we perform location relevance mining (LRM) in two complementary angles, relevance classification and relevance ranking, to provide comprehensive understanding of locations. Furthermore, we explore the textual (e.g., surrounding words) and geographical (e.g., geographical relationship among locations) features of locations to develop a co-training model for enhancement of classification performance. Built upon the mining result of LRM, we develop a series of techniques for provisioning of the aforementioned travelogue digests in our travelogue system. Finally, we conduct comprehensive experiments on collected travelogues to evaluate the performance of our location relevance mining techniques and demonstrate the effectiveness of the travelogue service.
Effective sentiment stream analysis with self-augmenting training and demand-driven projection BIBAFull-Text 475-484
  Ismael Santana Silva; Janaína Gomide; Adriano Veloso; Wagner, Jr. Meira; Renato Ferreira
How do we analyze sentiments over a set of opinionated Twitter messages? This issue has been widely studied in recent years, with a prominent approach being based on the application of classification techniques. Basically, messages are classified according to the implicit attitude of the writer with respect to a query term. A major concern, however, is that Twitter (and other media channels) follows the data stream model, and thus the classifier must operate with limited resources, including labeled data for training classification models. This imposes serious challenges for current classification techniques, since they need to be constantly fed with fresh training messages, in order to track sentiment drift and to provide up-to-date sentiment analysis.
   We propose solutions to this problem. The heart of our approach is a training augmentation procedure which takes as input a small training seed, and then it automatically incorporates new relevant messages to the training data. Classification models are produced on-the-fly using association rules, which are kept up-to-date in an incremental fashion, so that at any given time the model properly reflects the sentiments in the event being analyzed. In order to track sentiment drift, training messages are projected on a demand driven basis, according to the content of the message being classified. Projecting the training data offers a series of advantages, including the ability to quickly detect trending information emerging in the stream. We performed the analysis of major events in 2010, and we show that the prediction performance remains about the same, or even increases, as the stream passes and new training messages are acquired. This result holds for different languages, even in cases where sentiment distribution changes over time, or in cases where the initial training seed is rather small. We derive lower-bounds for prediction performance, and we show that our approach is extremely effective under diverse learning scenarios, providing gains that range from 7% to 58%.

Retrieval models II

Hypergeometric language models for republished article finding BIBAFull-Text 485-494
  Manos Tsagkias; Maarten de Rijke; Wouter Weerkamp
Republished article finding is the task of identifying instances of articles that have been published in one source and republished more or less verbatim in another source, which is often a social media source. We address this task as an ad hoc retrieval problem, using the source article as a query. Our approach is based on language modeling. We revisit the assumptions underlying the unigram language model taking into account the fact that in our setup queries are as long as complete news articles. We argue that in this case, the underlying generative assumption of sampling words from a document with replacement, i.e., the multinomial modeling of documents, produces less accurate query likelihood estimates.
   To make up for this discrepancy, we consider distributions that emerge from sampling without replacement: the central and non-central hypergeometric distributions. We present two retrieval models that build on top of these distributions: a log odds model and a Bayesian model where document parameters are estimated using the Dirichlet compound multinomial distribution.
   We analyse the behavior of our new models using a corpus of news articles and blog posts and find that for the task of republished article finding, where we deal with queries whose length approaches the length of the documents to be retrieved, models based on distributions associated with sampling without replacement outperform traditional models based on multinomial distributions.
Estimation methods for ranking recent information BIBAFull-Text 495-504
  Miles Efron; Gene Golovchinsky
Temporal aspects of documents can impact relevance for certain kinds of queries. In this paper, we build on earlier work of modeling temporal information. We propose an extension to the Query Likelihood Model that incorporates query-specific information to estimate rate parameters, and we introduce a temporal factor into language model smoothing and query expansion using pseudo-relevance feedback. We evaluate these extensions using a Twitter corpus and two newspaper article collections. Results suggest that, compared to prior approaches, our models are more effective at capturing the temporal variability of relevance associated with some topics.
Query by document via a decomposition-based two-level retrieval approach BIBAFull-Text 505-514
  Linkai Weng; Zhiwei Li; Rui Cai; Yaoxue Zhang; Yuezhi Zhou; Laurence T. Yang; Lei Zhang
Retrieving similar documents from a large-scale text corpus according to a given document is a fundamental technique for many applications. However, most of existing indexing techniques have difficulties to address this problem due to special properties of a document query, e.g. high dimensionality, sparse representation and semantic concern. Towards addressing this problem, we propose a two-level retrieval solution based on a document decomposition idea. A document is decomposed to a compact vector and a few document specific keywords by a dimension reduction approach. The compact vector embodies the major semantics of a document, and the document specific keywords complement the discriminative power lost in dimension reduction process. We adopt locality sensitive hashing (LSH) to index the compact vectors, which guarantees to quickly find a set of related documents according to the vector of a query document. Then we re-rank documents in this set by their document specific keywords. In experiments, we obtained promising results on various datasets in terms of both accuracy and performance. We demonstrated that this solution is able to index large-scale corpus for efficient similarity-based document retrieval.

Image search

Integrating hierarchical feature selection and classifier training for multi-label image annotation BIBAFull-Text 515-524
  Cheng Jin; Chunlei Yang
It is well accepted that using high-dimensional multi-modal visual features for image content representation and classifier training may achieve more sufficient characterization of the diverse visual properties of the images and further result in higher discrimination power of the classifiers. However, training the classifiers in a high-dimensional multi-modal feature space requires a large number of labeled training images, which will further result in the problem of curse of dimensionality. To tackle this problem, a hierarchical feature subset selection algorithm is proposed to enable more accurate image classification, where the processes for feature selection and classifier training are seamlessly integrated in a single framework. First, a feature hierarchy (i.e., concept tree for automatic feature space partition and organization) is used to automatically partition high-dimensional heterogeneous multi-modal visual features into multiple low-dimensional homogeneous single-modal feature subsets according to their certain physical meanings and each of them is used to characterize one certain type of the diverse visual properties of the images. Second, principal component analysis (PCA) is performed on each homogeneous singlemodal feature subset to select the most representative feature dimensions and a weak classifier is learned simultaneously. After the weak classifiers and their representative feature dimensions are available for all these homogeneous single-modal feature subsets, they are combined to generate an ensemble image classifier and achieve hierarchical feature subset selection. Our experiments on a specific domain of natural images have also obtained very positive results.
Efficient manifold ranking for image retrieval BIBAFull-Text 525-534
  Bin Xu; Jiajun Bu; Chun Chen; Deng Cai; Xiaofei He; Wei Liu; Jiebo Luo
Manifold Ranking (MR), a graph-based ranking algorithm, has been widely applied in information retrieval and shown to have excellent performance and feasibility on a variety of data types. Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, both in graph construction and ranking computation stages, which significantly limits its applicability to very large data sets. In this paper, we extend the original manifold ranking algorithm and propose a new framework named Efficient Manifold Ranking (EMR). We aim to address the shortcomings of MR from two perspectives: scalable graph construction and efficient computation. Specifically, we build an anchor graph on the data set instead of the traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking computation. The experimental results on a real world image database demonstrate the effectiveness and efficiency of our proposed method. With a comparable performance to the original manifold ranking, our method significantly reduces the computational time, makes it a promising method to large scale real world retrieval problems.
Mining weakly labeled web facial images for search-based face annotation BIBAFull-Text 535-544
  Dayong Wang; Steven C. H. Hoi; Ying He
In this paper, we investigate a search-based face annotation framework by mining weakly labeled facial images that are freely available on the internet. A key component of such a search-based annotation paradigm is to build a database of facial images with accurate labels. This is however challenging since facial images on the WWW are often noisy and incomplete. To improve the label quality of raw web facial images, we propose an effective Unsupervised Label Refinement (ULR) approach for refining the labels of web facial images by exploring machine learning techniques. We develop effective optimization algorithms to solve the large-scale learning tasks efficiently, and conduct an extensive empirical study on a web facial image database with 400 persons and 40,000 web facial images. Encouraging results showed that the proposed ULR technique can significantly boost the performance of the promising search-based face annotation scheme.

Indexing

Temporal index sharding for space-time efficiency in archive search BIBAFull-Text 545-554
  Avishek Anand; Srikanta Bedathur; Klaus Berberich; Ralf Schenkel
Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the web archives or wikis. Typical approaches for efficient evaluation of these queries involve slicing either the entire collection [20] or individual index lists [10] along the time-axis. Both these methods are not satisfactory since they sacrifice compactness of index for processing efficiency making them either too big or, otherwise, too slow.
   We present a novel index organization scheme that shards each index list with almost zero increase in index size but still minimizes the cost of reading index entries during query processing. Based on the optimal sharding thus obtained, we develop a practically efficient sharding that takes into account the different costs of random and sequential accesses. Our algorithm merges shards from the optimal solution to allow for a few extra sequential accesses while gaining significantly by reducing the number of random accesses. We empirically establish the effectiveness of our sharding scheme with experiments over the revision history of the English Wikipedia between 2001-2005 (approx 700 GB) and an archive of U.K. governmental web sites (approx 400 GB). Our results demonstrate the feasibility of faster time-travel query processing with no space overhead.
Inverted indexes for phrases and strings BIBAFull-Text 555-564
  Manish Patil; Sharma V. Thankachan; Rahul Shah; Wing-Kai Hon; Jeffrey Scott Vitter; Sabrina Chandrasekaran
Inverted indexes are the most fundamental and widely used data structures in information retrieval. For each unique word occurring in a document collection, the inverted index stores a list of the documents in which this word occurs. Compression techniques are often applied to further reduce the space requirement of these lists. However, the index has a shortcoming, in that only predefined pattern queries can be supported efficiently. In terms of string documents where word boundaries are undefined, if we have to index all the substrings of a given document, then the storage quickly becomes quadratic in the data size. Also, if we want to apply the same type of indexes for querying phrases or sequence of words, then the inverted index will end up storing redundant information. In this paper, we show the first set of inverted indexes which work naturally for strings as well as phrase searching. The central idea is to exclude document d in the inverted list of a string P if every occurrence of P in d is subsumed by another string of which P is a prefix. With this we show that our space utilization is close to the optimal. Techniques from succinct data structures are deployed to achieve compression while allowing fast access in terms of frequency and document id based retrieval. Compression and speed trade-offs are evaluated for different variants of the proposed index. For phrase searching, we show that our indexes compare favorably against a typical inverted index deploying position-wise intersections. We also show efficient top-k based retrieval under relevance metrics like frequency and tf-idf.
Faster temporal range queries over versioned text BIBAFull-Text 565-574
  Jinru He; Torsten Suel
Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collection of the Internet Archive. Search queries over such collections often use keywords as well as temporal constraints, most commonly a time range of interest. In this paper, we study how to support such temporal range queries over versioned text. Our goal is to process these queries faster than the corresponding keyword-only queries, by exploiting the additional constraint. A simple approach might partition the index into different time ranges, and then access only the relevant parts. However, specialized inverted index compression techniques are crucial for large versioned collections, and a naive partitioning can negatively affect index size and query throughput. We show how to achieve high query throughput by using smart index partitioning techniques that take index compression into account. Experiments on over 85 million versions of Wikipedia articles show that queries can be executed in a few milliseconds on memory-based index structures, and only slightly more time on disk-based structures. We also show how to efficiently support the recently proposed stable top-k search primitive on top of our schemes.
Indexing strategies for graceful degradation of search quality BIBAFull-Text 575-584
  Shuai Ding; Sreenivas Gollapudi; Samuel Ieong; Krishnaram Kenthapadi; Alexandros Ntoulas
Large web search engines process billions of queries each day over tens of billions of documents with often very stringent requirements for a user's search experience, in particular, low latency and highly relevant search results. Index generation and serving are key to satisfying both these requirements. For example, the load to search engines can vary drastically when popular events happen around the world. In the case when the load is exceeding what the search engine can serve, queries will get dropped. This results in an un-graceful degradation in search quality. Another example that could increase the query load and affect the user's search experience are ambiguous queries which often result in the execution of multiple query alterations in the back end.
   In this paper, we look into the problem of designing robust indexing strategies, i.e. strategies that allow for a graceful degradation of search quality in both the above scenarios. We study the problems of index generation and serving using the notions of document allocation, server selection, and document replication. We explore the space of efficient algorithms for these problems and empirically corroborate with existing theory that it is hard to optimally solve the allocation and selection problems without any replication. We propose a greedy replication algorithm and study its performance under different choices of allocation and selection. Further, we show hat under random selection and allocation, our algorithm is optimal.

Web queries

Incremental diversification for very large sets: a streaming-based approach BIBAFull-Text 585-594
  Enrico Minack; Wolf Siberski; Wolfgang Nejdl
Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data.
   To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.
Intent-aware search result diversification BIBAFull-Text 595-604
  Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis
Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.
Parameterized concept weighting in verbose queries BIBAFull-Text 605-614
  Michael Bendersky; Donald Metzler; W. Bruce Croft
The majority of the current information retrieval models weight the query concepts (e.g., terms or phrases) in an unsupervised manner, based solely on the collection statistics. In this paper, we go beyond the unsupervised estimation of concept weights, and propose a parameterized concept weighting model. In our model, the weight of each query concept is determined using a parameterized combination of diverse importance features. Unlike the existing supervised ranking methods, our model learns importance weights not only for the explicit query concepts, but also for the latent concepts that are associated with the query through pseudo-relevance feedback. The experimental results on both newswire and web TREC corpora show that our model consistently and significantly outperforms a wide range of state-of-the-art retrieval models. In addition, our model significantly reduces the number of latent concepts used for query expansion compared to the non-parameterized pseudo-relevance feedback based models.
UPS: efficient privacy protection in personalized web search BIBAFull-Text 615-624
  Gang Chen; He Bai; Lidan Shou; Ke Chen; Yunjun Gao
In recent years, personalized web search (PWS) has demonstrated effectiveness in improving the quality of search service on the Internet. Unfortunately, the need for collecting private information in PWS has become a major barrier for its wide proliferation. We study privacy protection in PWS engines which capture personalities in user profiles. We propose a PWS framework called UPS that can generalize profiles in for each query according to user-specified privacy requirements. Two predictive metrics are proposed to evaluate the privacy breach risk and the query utility for hierarchical user profile. We develop two simple but effective generalization algorithms for user profiles allowing for query-level customization using our proposed metrics. We also provide an online prediction mechanism based on query utility for deciding whether to personalize a query in UPS. Extensive experiments demonstrate the efficiency and effectiveness of our framework.

Collaborative filtering II

Handling data sparsity in collaborative filtering using emotion and semantic based features BIBAFull-Text 625-634
  Yashar Moshfeghi; Benjamin Piwowarski; Joemon M. Jose
Collaborative filtering (CF) aims to recommend items based on prior user interaction. Despite their success, CF techniques do not handle data sparsity well, especially in the case of the cold start problem where there is no past rating for an item. In this paper, we provide a framework, which is able to tackle such issues by considering item-related emotions and semantic data. In order to predict the rating of an item for a given user, this framework relies on an extension of Latent Dirichlet Allocation, and on gradient boosted trees for the final prediction. We apply this framework to movie recommendation and consider two emotion spaces extracted from the movie plot summary and the reviews, and three semantic spaces: actor, director, and genre. Experiments with the 100K and 1M MovieLens datasets show that including emotion and semantic information significantly improves the accuracy of prediction and improves upon the state-of-the-art CF techniques. We also analyse the importance of each feature space and describe some uncovered latent groups.
Fast context-aware recommendations with factorization machines BIBAFull-Text 635-644
  Steffen Rendle; Zeno Gantner; Christoph Freudenthaler; Lars Schmidt-Thieme
The situation in which a choice is made is an important information for recommender systems. Context-aware recommenders take this information into account to make predictions. So far, the best performing method for context-aware rating prediction in terms of predictive accuracy is Multiverse Recommendation based on the Tucker tensor factorization model. However this method has two drawbacks: (1) its model complexity is exponential in the number of context variables and polynomial in the size of the factorization and (2) it only works for categorical context variables. On the other hand there is a large variety of fast but specialized recommender methods which lack the generality of context-aware methods.
   We propose to apply Factorization Machines (FMs) to model contextual information and to provide context-aware rating predictions. This approach results in fast context-aware recommendations because the model equation of FMs can be computed in linear time both in the number of context variables and the factorization size. For learning FMs, we develop an iterative optimization method that analytically finds the least-square solution for one parameter given the other ones. Finally, we show empirically that our approach outperforms Multiverse Recommendation in prediction quality and runtime.
Filtering semi-structured documents based on faceted feedback BIBAFull-Text 645-654
  Lanbo Zhang; Yi Zhang; Qianli Xing
Existing adaptive filtering systems learn user profiles based on users' relevance judgments on documents. In some cases, users have some prior knowledge about what features are important for a document to be relevant. For example, a Spanish speaker may only want news written in Spanish, and thus a relevant document should contain the feature "Language: Spanish"; a researcher working on HIV knows an article with the medical subject "Subject: AIDS" is very likely to be interesting to him/her.
   Semi-structured documents with rich faceted metadata are increasingly prevalent over the Internet. Motivated by the commonly used faceted search interface in e-commerce, we study whether users' prior knowledge about faceted features could be exploited for filtering semi-structured documents. We envision two faceted feedback solicitation mechanisms, and propose a novel user profile learning algorithm that can incorporate user feedback on features. To evaluate the proposed work, we use two data sets from the TREC filtering track, and conduct a user study on Amazon Mechanical Turk. Our experimental results show that user feedback on faceted features is useful for filtering. The new user profile learning algorithm can effectively learn from user feedback on faceted features and performs better than several other methods adapted from the feature-based feedback techniques proposed for retrieval and text classification tasks in previous work.
Learning relevance from heterogeneous social network and its application in online targeting BIBAFull-Text 655-664
  Chi Wang; Rajat Raina; David Fong; Ding Zhou; Jiawei Han; Greg Badros
The rise of social networking services in recent years presents new research challenges for matching users with interesting content. While the content-rich nature of these social networks offers many cues on "interests" of a user such as text in user-generated content, the links in the network, and user demographic information, there is a lack of successful methods for combining such heterogeneous data to model interest and relevance. This paper proposes a new method for modeling user interest from heterogeneous data sources with distinct but unknown importance. The model leverages links in the social graph by integrating the conceptual representation of a user's linked objects. The proposed method seeks a scalable relevance model of user interest, that can be discriminatively optimized for various relevance-centric problems, such as Internet advertisement selection, recommendation, and web search personalization. We apply our algorithm to the task of selecting relevant ads for users on Facebook's social network. We demonstrate that our algorithm can be scaled to work with historical data for all users, and learns interesting associations between concept classes automatically. We also show that using the learnt user model to predict the relevance of an ad is the single most important signal in our ranking system for new ads (with no historical clickthrough data), and overall leads to an improvement in the accuracy of the clickthrough rate prediction, a key problem in online advertising.

Latent semantic analysis

ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews BIBAFull-Text 665-674
  Samaneh Moghaddam; Martin Ester
Today, more and more product reviews become available on the Internet, e.g., product review forums, discussion groups, and Blogs. However, it is almost impossible for a customer to read all of the different and possibly even contradictory opinions and make an informed decision. Therefore, mining online reviews (opinion mining) has emerged as an interesting new research direction. Extracting aspects and the corresponding ratings is an important challenge in opinion mining. An aspect is an attribute or component of a product, e.g. 'screen' for a digital camera. It is common that reviewers use different words to describe an aspect (e.g. 'LCD', 'display', 'screen'). A rating is an intended interpretation of the user satisfaction in terms of numerical values. Reviewers usually express the rating of an aspect by a set of sentiments, e.g. 'blurry screen'. In this paper we present three probabilistic graphical models which aim to extract aspects and corresponding ratings of products from online reviews. The first two models extend standard PLSI and LDA to generate a rated aspect summary of product reviews. As our main contribution, we introduce Interdependent Latent Dirichlet Allocation (ILDA) model. This model is more natural for our task since the underlying probabilistic assumptions (interdependency between aspects and ratings) are appropriate for our problem domain. We conduct experiments on a real life dataset, Epinions.com, demonstrating the improved effectiveness of the ILDA model in terms of the likelihood of a held-out test set, and the accuracy of aspects and aspect ratings.
Clickthrough-based latent semantic models for web search BIBAFull-Text 675-684
  Jianfeng Gao; Kristina Toutanova; Wen-tau Yih
This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into semantic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.
Regularized latent semantic indexing BIBAFull-Text 685-694
  Quan Wang; Jun Xu; Hang Li; Nick Craswell
Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require drastic steps such as vastly reducing input vocabulary. We introduce Regularized Latent Semantic Indexing (RLSI), a new method which is designed for parallelization. It is as effective as existing topic models, and scales to larger datasets without reducing input vocabulary. RLSI formalizes topic modeling as a problem of minimizing a quadratic loss function regularized by l1 and/or l2; norm. This formulation allows the learning process to be decomposed into multiple sub-optimization problems which can be optimized in parallel, for example via MapReduce. We particularly propose adopting l1 norm on topics and l2 norm on document representations, to create a model with compact and readable topics and useful for retrieval. Relevance ranking experiments on three TREC datasets show that RLSI performs better than LSI, PLSI, and LDA, and the improvements are sometimes statistically significant. Experiments on a web dataset, containing about 1.6 million documents and 7 million terms, demonstrate a similar boost in performance on a larger corpus and vocabulary than in previous studies.

Multimedia IR

Multimedia answering: enriching text QA with media information BIBAFull-Text 695-704
  Liqiang Nie; Meng Wang; Zhengjun Zha; Guangda Li; Tat-Seng Chua
Existing community question-answering forums usually provide only textual answers. However, for many questions, pure texts cannot provide intuitive information, while image or video contents are more appropriate. In this paper, we introduce a scheme that is able to enrich text answers with image and video information. Our scheme investigates a rich set of techniques including question/answer classification, query generation, image and video search reranking, etc. Given a question and the community-contributed answer, our approach is able to determine which type of media information should be added, and then automatically collects data from Internet to enrich the textual answer. Different from some efforts that attempt to directly answer questions with image and video data, our approach is built based on the community-contributed textual answers and thus it is more feasible and able to deal with more complex questions. We have conducted empirical study on more than 3,000 QA pairs and the results demonstrate the effectiveness of our approach.
Enhancing multi-label music genre classification through ensemble techniques BIBAFull-Text 705-714
  Chris Sanden; John Z. Zhang
In the field of Music Information Retrieval (MIR), multi-label genre classification is the problem of assigning one or more genre labels to a music piece. In this work, we propose a set of ensemble techniques, which are specific to the task of multi-label genre classification. Our goal is to enhance classification performance by combining multiple classifiers. In addition, we also investigate some existing ensemble techniques from machine learning. The effectiveness of these techniques is demonstrated through a set of empirical experiments and various related issues are discussed. To the best of our knowledge, there has been limited work on applying ensemble techniques to multi-label genre classification in the literature and we consider the results in this work as our initial efforts toward this end. The significance of our work has two folds: (1) proposing a set of ensemble techniques specific to music genre classification and (2) shedding light on further research along this direction.
Picasso -- to sing, you must close your eyes and draw BIBAFull-Text 715-724
  Aleksandar Stupar; Sebastian Michel
We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.

Summarization

Enhanced results for web search BIBAFull-Text 725-734
  Kevin Haas; Peter Mika; Paul Tarjan; Roi Blanco
"Ten blue links" have defined web search results for the last fifteen years -- snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intent-specific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages.
Summarizing the differences in multilingual news BIBAFull-Text 735-744
  Xiaojun Wan; Houping Jia; Shanshan Huang; Jianguo Xiao
There usually exist many news articles written in different languages about a hot news event. The news articles in different languages are written in different ways to reflect different standpoints. For example, the Chinese news agencies and the Western news agencies have published many articles to report the same news of "Liu Xiaobo's Nobel Prize" in Chinese and English languages, respectively. The Chinese news articles and the English news articles share something about the news fact in common, but they focus on different aspects in order to reflect different standpoints about the event. In this paper, we investigate the task of multilingual news summarization for the purpose of finding and summarizing the major differences between the news articles about the same event in the Chinese and English languages. We propose a novel constrained co-ranking (C-CoRank) method for addressing this special task. The C-CoRank method adds the constraints between the difference score and the common score of each sentence to the co-ranking process. Evaluation results on the manually labeled test set with 15 news topics show the effectiveness of our proposed method, and the constrained co-ranking method can outperform a few baselines and the typical co-ranking method.
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution BIBAFull-Text 745-754
  Rui Yan; Xiaojun Wan; Jahna Otterbacher; Liang Kong; Xiaoming Li; Yan Zhang
Classic news summarization plays an important role with the exponential document growth on the Web. Many approaches are proposed to generate summaries but seldom simultaneously consider evolutionary characteristics of news plus to traditional summary elements. Therefore, we present a novel framework for the web mining problem named Evolutionary Timeline Summarization (ETS). Given the massive collection of time-stamped web documents related to a general news query, ETS aims to return the evolution trajectory along the timeline, consisting of individual but correlated summaries of each date, emphasizing relevance, coverage, coherence and cross-date diversity. ETS greatly facilitates fast news browsing and knowledge comprehension and hence is a necessity. We formally formulate the task as an optimization problem via iterative substitution from a set of sentences to a subset of sentences that satisfies the above requirements, balancing coherence/diversity measurement and local/global summary quality. The optimized substitution is iteratively conducted by incorporating several constraints until convergence. We develop experimental systems to evaluate on 6 instinctively different datasets which amount to 10251 documents. Performance comparisons between different system-generated timelines and manually created ones by human editors demonstrate the effectiveness of our proposed framework in terms of ROUGE metrics.

Vertical & entity search

Ranking related news predictions BIBAFull-Text 755-764
  Nattiya Kanhabua; Roi Blanco; Michael Matthews
We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.
Collective entity linking in web text: a graph-based method BIBAFull-Text 765-774
  Xianpei Han; Le Sun; Jun Zhao
Entity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional interdependence between different EL decisions, i.e., the entities in the same document should be semantically related to each other. In these cases, Collective Entity Linking, in which the name mentions in the same document are linked jointly by exploiting the interdependence between them, can improve the entity linking accuracy.
   This paper proposes a graph-based collective EL method, which can model and exploit the global interdependence between different EL decisions. Specifically, we first propose a graph-based representation, called Referent Graph, which can model the global interdependence between different EL decisions. Then we propose a collective inference algorithm, which can jointly infer the referent entities of all name mentions by exploiting the interdependence captured in Referent Graph. The key benefit of our method comes from: 1) The global interdependence model of EL decisions; 2) The purely collective nature of the inference algorithm, in which evidence for related EL decisions can be reinforced into high-probability decisions. Experimental results show that our method can achieve significant performance improvement over the traditional EL methods.
From one tree to a forest: a unified solution for structured web data extraction BIBAFull-Text 775-784
  Qiang Hao; Rui Cai; Yanwei Pang; Lei Zhang
Structured data, in the form of entities and associated attributes, has been a rich web resource for search engines and knowledge databases. To efficiently extract structured data from enormous websites in various verticals (e.g., books, restaurants), much research effort has been attracted, but most existing approaches either require considerable human effort or rely on strong features that lack of flexibility. We consider an ambitious scenario -- can we build a system that (1) is general enough to handle any vertical without re-implementation and (2) requires only one labeled example site from each vertical for training to automatically deal with other sites in the same vertical? In this paper, we propose a unified solution to demonstrate the feasibility of this scenario. Specifically, we design a set of weak but general features to characterize vertical knowledge (including attribute-specific semantics and inter-attribute layout relationships). Such features can be adopted in various verticals without redesign; meanwhile, they are weak enough to avoid overfitting of the learnt knowledge to seed sites. Given a new unseen site, the learnt knowledge is first applied to identify page-level candidate attribute values, while inevitably involve false positives. To remove noise, site-level information of the new site is then exploited to boost up the true values. The site-level information is derived in an unsupervised manner, without harm to the applicability of the solution. Promising experimental performance on 80 websites in 8 distinct verticals demonstrated the feasibility and flexibility of the proposed solution.
Improving local search ranking through external logs BIBAFull-Text 785-794
  Klaus Berberich; Arnd Christian König; Dimitrios Lymberopoulos; Peixiang Zhao
The signals used for ranking in local search are very different from web search: in addition to (textual) relevance, measures of (geographic) distance between the user and the search result, as well as measures of popularity of the result are important for effective ranking. Depending on the query and search result, different ways to quantify these factors exist -- for example, it is possible to use customer ratings to quantify the popularity of restaurants, whereas different measures are more appropriate for other types of businesses. Hence, our approach is to capture the different notions of distance/popularity relevant via a number of external data sources (e.g., logs of customer ratings, driving-direction requests, or site accesses).
   In this paper we will describe the relevant signal contained in a number of such data sources in detail and present methods to integrate these external data sources into the feature generation for local search ranking. In particular, we propose novel backoff methods to alleviate the impact of skew, noise or incomplete data in these logs in a systematic manner. We evaluate our techniques on both human-judged relevance data as well as click-through data from a commercial local search engine.

Query suggestions

Query suggestions in the absence of query logs BIBAFull-Text 795-804
  Sumit Bhatia; Debapriyo Majumdar; Prasenjit Mitra
After an end-user has partially input a query, intelligent search engines can suggest possible completions of the partial query to help end-users quickly express their information needs. All major web-search engines and most proposed methods that suggest queries rely on search engine query logs to determine possible query suggestions. However, for customized search systems in the enterprise domain, intranet search, or personalized search such as email or desktop search or for infrequent queries, query logs are either not available or the user base and the number of past user queries is too small to learn appropriate models. We propose a probabilistic mechanism for generating query suggestions from the corpus without using query logs. We utilize the document corpus to extract a set of candidate phrases. As soon as a user starts typing a query, phrases that are highly correlated with the partial user query are selected as completions of the partial query and are offered as query suggestions. Our proposed approach is tested on a variety of datasets and is compared with state-of-the-art approaches. The experimental results clearly demonstrate the effectiveness of our approach in suggesting queries with higher quality.
Synthesizing high utility suggestions for rare web search queries BIBAFull-Text 805-814
  Alpa Jain; Umut Ozertem; Emre Velipasaoglu
Search engines are continuously looking into methods to alleviate users' effort in finding desired information. For this, all major search engines employ query suggestions methods to facilitate effective query formulation and reformulation. Providing high quality query suggestions is a critical task for search engines and so far most research efforts have focused on tapping various information available in search query logs to identify potential suggestions. By relying on this single source of information, suggestion providing systems often restrict themselves to only previously observed query sessions. Therefore, a critical challenge faced by query suggestions provision mechanism is that of coverage, i.e., the number of unique queries for which users are provided with suggestions, while keeping the suggestion quality high. To address this problem, we propose a novel way of generating suggestions for user search queries by moving beyond the dependency on search query logs and providing synthetic suggestions for web search queries. The key challenges in providing synthetic suggestions include identifying important concepts in a query and systematically exploring related concepts while ensuring that the resulting suggestions are relevant to the user query and of high utility. We present an end-to-end system to generate synthetic suggestions that builds upon novel query-level operations and combines information available from various textual sources. We evaluate our suggestion system over a large-scale real-world dataset of query logs and show that our methods increase the coverage of query-suggestion pairs by up to 39% without compromising the quality or the utility of the suggestions.
Post-ranking query suggestion by diversifying search results BIBAFull-Text 815-824
  Yang Song; Dengyong Zhou; Li-wei He
Query suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of the search results from suggested queries while keeping the suggestion relevant. Our framework first retrieves a set of query candidates from search engine logs using random walk and other techniques. We then re-rank the suggested queries by ranking them in the order which maximizes the diversification function that measures the difference between the original search results and the results from suggested queries. The diversification function we proposed includes features like ODP category, URL and domain similarity and so on. One important outcome from our research which contradicts with most existing researches is that, with the increase of suggestion relevance, the similarity between the queries actually decreases. Experiments are conducted on a large set of human-labeled data, which is randomly sampled from a commercial search engine's log. Results indicate that the post-ranking framework significantly improves the relevance of suggested queries by comparing to existing models.
Automatic boolean query suggestion for professional search BIBAFull-Text 825-834
  Youngho Kim; Jangwon Seo; W. Bruce Croft
In professional search environments, such as patent search or legal search, search tasks have unique characteristics: 1) users interactively issue several queries for a topic, and 2) users are willing to examine many retrieval results, i.e., there is typically an emphasis on recall. Recent surveys have also verified that professional searchers continue to have a strong preference for Boolean queries because they provide a record of what documents were searched. To support this type of professional search, we propose a novel Boolean query suggestion technique. Specifically, we generate Boolean queries by exploiting decision trees learned from pseudo-labeled documents and rank the suggested queries using query quality predictors. We evaluate our algorithm in simulated patent and medical search environments. Compared with a recent effective query generation system, we demonstrate that our technique is effective and general.

Linguistic analysis

Improved video categorization from text metadata and user comments BIBAFull-Text 835-842
  Katja Filippova; Keith B. Hall
We consider the task of assigning categories (e.g., howto/cooking, sports/basketball, pet/dogs) to YouTube videos from video and text signals. We show that two complementary views on the data -- from the video and text perspectives -- complement each other and refine predictions. The contributions of the paper are threefold: (1) we show that a text-based classifier trained on imperfect predictions of the weakly supervised video content-based classifier is not redundant; (2) we demonstrate that a simple model which combines the predictions made by the two classifiers outperforms each of them taken independently; (3) we analyse such sources of text information as video title, description, user tags and viewers' comments and show that each of them provides valuable clues to the topic of the video.
Multifaceted toponym recognition for streaming news BIBAFull-Text 843-852
  Michael D. Lieberman; Hanan Samet
News sources on the Web generate constant streams of information, describing many aspects of the events that shape our world. In particular, geography plays a key role in the news, and enabling geographic retrieval of news articles involves recognizing the textual references to geographic locations (called toponyms) present in the articles, which can be difficult due to ambiguity in natural language. Toponym recognition in news is often accomplished with algorithms designed and tested around small corpora of news articles, but these static collections do not reflect the streaming nature of online news, as evidenced by poor performance in tests. In contrast, a method for toponym recognition is presented that is tuned for streaming news by leveraging a wide variety of recognition components, both rule-based and statistical. An evaluation of this method shows that it outperforms two prominent toponym recognition systems when tested on large datasets of streaming news, indicating its suitability for this domain.
Enriching document representation via translation for improved monolingual information retrieval BIBAFull-Text 853-862
  Seung-Hoon Na; Hwee Tou Ng
Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.
A novel corpus-based stemming algorithm using co-occurrence statistics BIBAFull-Text 863-872
  Jiaul H. Paik; Dipasree Pal; Swapan K. Parui
We present a stemming algorithm for text retrieval. The algorithm uses the statistics collected on the basis of certain corpus analysis based on the co-occurrence between two word variants. We use a very simple co-occurrence measure that reflects how often a pair of word variants occurs in a document as well as in the whole corpus. A graph is formed where the word variants are the nodes and two word variants form an edge if they co-occur. On the basis of the co-occurrence measure, a certain edge strength is defined for each of the edges. Finally, on the basis of the edge strengths, we propose a partition algorithm that groups the word variants based on their strongest neighbors, that is, the neighbors with largest strengths.
   Our stemming algorithm has two static parameters and does not use any other information except the co-occurrence statistics from the corpus. The experiments on TREC, CLEF and FIRE data consisting of four European and two Asian languages show a significant improvement over no-stem strategy on all the languages. Also, the proposed algorithm significantly outperforms a number of strong stemmers including the rule-based ones on a number of languages. For highly inflectional languages, a relative improvement of about 50% is obtained compared to un-normalized words and a relative improvement ranging from 5% to 16% is obtained compared to the rule based stemmer for the concerned language.

Clustering

Document clustering with Universum BIBAFull-Text 873-882
  Dan Zhang; Jingdong Wang; Luo Si
Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.
Identifying points of interest by self-tuning clustering BIBAFull-Text 883-892
  Yiyang Yang; Zhiguo Gong; Leong Hou U
Deducing trip related information from web-scale datasets has received very large amounts of attention recently. Identifying points of interest (POIs) in geo-tagged photos is one of these problems. The problem can be viewed as a standard clustering problem of partitioning two dimensional objects. In this work, we study spectral clustering which is the first attempt for the POIs identification. However, there is no unified approach to assign the clustering parameters; especially the features of POIs are immensely varying in different metropolitans and locations. To address this, we are intent to study a self-tuning technique which can properly assign the parameters for the clustering needed.
   Besides geographical information, web photos inherently store rich information. These information are mutually influenced each others and should be taken into trip related mining tasks. To address this, we study reinforcement which constructs the relationship over multiple sources by iterative learning. At last, we thoroughly demonstrate our findings by web scale datasets collected from Flickr.
Cluster-based fusion of retrieved lists BIBAFull-Text 893-902
  Anna Khudyak Kozorovitsky; Oren Kurland
Methods for fusing document lists that were retrieved in response to a query often use retrieval scores (or ranks) of documents in the lists. We present a novel probabilistic fusion approach that utilizes an additional source of rich information, namely, inter-document similarities. Specifically, our model integrates information induced from clusters of similar documents created across the lists with that produced by some fusion method that relies on retrieval scores (ranks). Empirical evaluation shows that our approach is highly effective for fusion. For example, the performance of our model is consistently better than that of the standard (effective) fusion method that it integrates. The performance also transcends that of standard fusion of re-ranked lists, where list re-ranking is based on clusters created from documents in the list.

Effectiveness

System effectiveness, user models, and user utility: a conceptual framework for investigation BIBAFull-Text 903-912
  Ben Carterette
There is great interest in producing effectiveness measures that model user behavior in order to better model the utility of a system to its users. These measures are often formulated as a sum over the product of a discount function of ranks and a gain function mapping relevance assessments to numeric utility values. We develop a conceptual framework for analyzing such effectiveness measures based on classifying members of this broad family of measures into four distinct families, each of which reflects a different notion of system utility. Within this framework we can hypothesize about the properties that such a measure should have and test those hypotheses against user and system data. Along the way we present a collection of novel results about specific measures and relationships between them.
Evaluating the synergic effect of collaboration in information seeking BIBAFull-Text 913-922
  Chirag Shah; Roberto González-Ibáñez
It is typically expected that when people work together, they can often accomplish goals that are difficult or even impossible for individuals. We consider this notion of the group achieving more than the sum of all individuals' achievements to be the synergic effect in collaboration. Similar expectation exists for people working in collaboration for information seeking tasks. We, however, lack a methodology and appropriate evaluation metrics for studying and measuring the synergic effect. In this paper we demonstrate how to evaluate this effect and discuss what it means to various collaborative information seeking (CIS) situations. We present a user study with four different conditions: single user, pair of users at the same computer, pair of users at different computers and co-located, and pair of users remotely located. Each of these individuals or pairs was given the same task of information seeking and usage for the same amount of time. We then combined the outputs of single independent users to form artificial pairs, and compared against the real pairs. Not surprisingly, participants using different computers (co-located or remotely located) were able to cover more information sources than those using a single computer (single user or a pair). But more interestingly, we found that real pairs with their own computers (co-located or remotely located) were able to cover more unique and useful information than that of the artificially created pairs. This indicates that those working in collaboration achieved something greater and better than what could be achieved by adding independent users, thus, demonstrating the synergic effect. Remotely located real teams were also able to formulate a wider range of queries than those pairs that were co-located or artificially created. This shows that the collaborators working remotely were able to achieve synergy while still being able to think and work independently. Through the experiments and measurements presented here, we have also contributed a unique methodology and an evaluation metric for CIS.
Repeatable and reliable search system evaluation using crowdsourcing BIBAFull-Text 923-932
  Roi Blanco; Harry Halpin; Daniel M. Herzig; Peter Mika; Jeffrey Pound; Henry S. Thompson; Thanh Tran Duc
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.

Multilingual IR

Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization BIBAFull-Text 933-942
  Hua Wang; Heng Huang; Feiping Nie; Chris Ding
The lack of sufficient labeled Web pages in many languages, especially for those uncommonly used ones, presents a great challenge to traditional supervised classification methods to achieve satisfactory Web page classification performance. To address this, we propose a novel Nonnegative Matrix Tri-factorization (NMTF) based Dual Knowledge Transfer (DKT) approach for cross-language Web page classification, which is based on the following two important observations. First, we observe that Web pages for a same topic from different languages usually share some common semantic patterns, though in different representation forms. Second, we also observe that the associations between word clusters and Web page classes are a more reliable carrier than raw words to transfer knowledge across languages. With these recognitions, we attempt to transfer knowledge from the auxiliary language, in which abundant labeled Web pages are available, to target languages, in which we want classify Web pages, through two different paths: word cluster approximations and the associations between word clusters and Web page classes. Due to the reinforcement between these two different knowledge transfer paths, our approach can achieve better classification accuracy. We evaluate the proposed approach in extensive experiments using a real world cross-language Web page data set. Promising results demonstrate the effectiveness of our approach that is consistent with our theoretical analyses.
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity BIBAFull-Text 943-952
  Ferhan Ture; Tamer Elsayed; Jimmy Lin
This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this problem are of general interest for text mining in the multi-lingual context and have specific applications in statistical machine translation. Our approach takes advantage of cross-language information retrieval (CLIR) techniques to project feature vectors from one language into another, and then uses locality-sensitive hashing (LSH) to extract similar pairs. We show that effective cross-lingual pairwise similarity requires working with similarity thresholds that are much lower than in typical monolingual applications, making the problem quite challenging. We present a parallel, scalable MapReduce implementation of the sort-based sliding window algorithm, which is compared to a brute-force approach on German and English Wikipedia collections. Our central finding can be summarized as "no free lunch": there is no single optimal solution. Instead, we characterize effectiveness-efficiency tradeoffs in the solution space, which can guide the developer to locate a desirable operating point based on application- and resource-specific constraints.
An event-centric model for multilingual document similarity BIBAFull-Text 953-962
  Jannik Strötgen; Michael Gertz; Conny Junghans
Document similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. Interestingly, the proposed approaches typically rely on extensions to the vector space model and are rarely suited for multilingual corpora.
   In this paper, we present a novel document similarity measure that is based on events extracted from documents. An event is solely described by nearby occurrences of temporal and geographic expressions in a document's text. Thus, a document is modeled as a set of events that can be compared and ranked using temporal and geographic hierarchies. A key feature of our model is that it is term- and language-independent as temporal and geographic expressions mentioned in texts are normalized to a standard format. This also allows to determine similar documents across languages, an important feature in the context of document exploration. Our approach proves to be quite effective, including the discovery of new similarities, as our experiments using different (multilingual) corpora demonstrate.

Efficiency

Posting list intersection on multicore architectures BIBAFull-Text 963-972
  Shirish Tatikonda; B. Barla Cambazoglu; Flavio P. Junqueira
In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.
Timestamp-based result cache invalidation for web search engines BIBAFull-Text 973-982
  Sadiye Alici; Ismail Sengor Altingovde; Rifat Ozcan; Berkant Barla Cambazoglu; Özgür Ulusoy
The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluated using a Wikipedia document collection with real update information and a real-life query log. We show that our technique has good prediction accuracy, relative to a baseline based on the time-to-live mechanism. Moreover, it is easy to implement and incurs less processing overhead on the system relative to a recently proposed, more sophisticated invalidation mechanism.
Energy-price-driven query processing in multi-center web search engines BIBAFull-Text 983-992
  Enver Kayaaslan; B. Barla Cambazoglu; Roi Blanco; Flavio P. Junqueira; Cevdet Aykanat
Concurrently processing thousands of web queries, each with a response time under a fraction of a second, necessitates maintaining and operating massive data centers. For large-scale web search engines, this translates into high energy consumption and a huge electric bill. This work takes the challenge to reduce the electric bill of commercial web search engines operating on data centers that are geographically far apart. Based on the observation that energy prices and query workloads show high spatio-temporal variation, we propose a technique that dynamically shifts the query workload of a search engine between its data centers to reduce the electric bill. Experiments on real-life query workloads obtained from a commercial search engine show that significant financial savings can be achieved by this technique.
Faster top-k document retrieval using block-max indexes BIBAFull-Text 993-1002
  Shuai Ding; Torsten Suel
Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization techniques called early termination achieves faster query processing by avoiding the scoring of documents that are unlikely to be in the top results. We study new algorithms for early termination that outperform previous methods. In particular, we focus on safe techniques for disjunctive queries, which return the same result as an exhaustive evaluation over the disjunction of the query terms. The current state-of-the-art methods for this case, the WAND algorithm by Broder et al. [11] and the approach of Strohman and Croft [30], achieve great benefits but still leave a large performance gap between disjunctive and (even non-early terminated) conjunctive queries. We propose a new set of algorithms by introducing a simple augmented inverted index structure called a block-max index. Essentially, this is a structure that stores the maximum impact score for each block of a compressed inverted list in uncompressed form, thus enabling us to skip large parts of the lists. We show how to integrate this structure into the WAND approach, leading to considerable performance gains. We then describe extensions to a layered index organization, and to indexes with reassigned document IDs, that achieve additional gains that narrow the gap between disjunctive and conjunctive top-k query processing.

Recommender systems

Utilizing marginal net utility for recommendation in e-commerce BIBAFull-Text 1003-1012
  Jian Wang; Yi Zhang
Traditional recommendation algorithms often select products with the highest predicted ratings to recommend. However, earlier research in economics and marketing indicates that a consumer usually makes purchase decision(s) based on the product's marginal net utility (i.e., the marginal utility minus the product price). Utility is defined as the satisfaction or pleasure user u gets when purchasing the corresponding product. A rational consumer chooses the product to purchase in order to maximize the total net utility. In contrast to the predicted rating, the marginal utility of a product depends on the user's purchase history and changes over time. According to the Law of Diminishing Marginal Utility, many products have the decreasing marginal utility with the increase of purchase count, such as cell phones, computers, and so on. Users are not likely to purchase the same or similar product again in a short time if they already purchased it before. On the other hand, some products, such as pet food, baby diapers, would be purchased again and again.
   To better match users' purchase decisions in the real world, this paper explores how to recommend products with the highest marginal net utility in e-commerce sites. Inspired by the Cobb-Douglas utility function in consumer behavior theory, we propose a novel utility-based recommendation framework. The framework can be utilized to revamp a family of existing recommendation algorithms. To demonstrate the idea, we use Singular Value Decomposition (SVD) as an example and revamp it with the framework. We evaluate the proposed algorithm on an e-commerce (shop.com) data set. The new algorithm significantly improves the base algorithm, largely due to its ability to recommend both products that are new to the user and products that the user is likely to re-purchase.
Recommending ephemeral items at web scale BIBAFull-Text 1013-1022
  Ye Chen; John F. Canny
We describe an innovative and scalable recommendation system successfully deployed at eBay. To build recommenders for long-tail marketplaces requires projection of volatile items into a persistent space of latent products. We first present a generative clustering model for collections of unstructured, heterogeneous, and ephemeral item data, under the assumption that items are generated from latent products. An item is represented as a vector of independently and distinctly distributed variables, while a latent product is characterized as a vector of probability distributions, respectively. The probability distributions are chosen as natural stochastic models for different types of data. The learning objective is to maximize the total intra-cluster coherence measured by the sum of log likelihoods of items under such a generative process. In the space of latent products, robust recommendations can then be derived using naive Bayes for ranking, from historical transactional data. Item-based recommendations are achieved by inferring latent products from unseen items. In particular, we develop a probabilistic scoring function of recommended items, which takes into account item-product membership, product purchase probability, and the important auction-end-time factor. With the holistic probabilistic measure of a prospective item purchase, one can further maximize the expected revenue and the more subjective user satisfaction as well. We evaluated the latent product clustering and recommendation ranking models using real-world e-commerce data from eBay, in both forms of offline simulation and online A/B testing. In the recent production launch, our system yielded 3-5 folds improvement over the existing production system in click-through, purchase-through and gross merchandising value; thus now driving 100% related recommendation traffic with billions of items at eBay. We believe that this work provides a practical yet principled framework for recommendation in the domains with affluent user self-input data.
A unified framework for recommendations based on quaternary semantic analysis BIBAFull-Text 1023-1032
  Chen Wei; Wynne Hsu; Mong Li Lee
Social network systems such as FaceBook and YouTube have played a significant role in capturing both explicit and implicit user preferences for different items in the form of ratings and tags. This forms a quaternary relationship among users, items, tags and ratings. Existing systems have utilized only ternary relationships such as users-items-ratings, or users-items-tags to derive their recommendations. In this paper, we show that ternary relationships are insufficient to provide accurate recommendations. Instead, we model the quaternary relationship among users, items, tags and ratings as a 4-order tensor and cast the recommendation problem as a multi-way latent semantic analysis problem. A unified framework for user recommendation, item recommendation, tag recommendation and item rating prediction is proposed. The results of extensive experiments performed on a real world dataset demonstrate that our unified framework outperforms the state-of-the-art techniques in all the four recommendation tasks.
Associative tag recommendation exploiting multiple textual features BIBAFull-Text 1033-1042
  Fabiano Belém; Eder Martins; Tatiana Pontes; Jussara Almeida; Marcos Gonçalves
This work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimensions of the problem: (i) term co-occurrence with tags pre-assigned to the target object, (ii) terms extracted from multiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic methods, which extend state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object's content. We also exploit two learning-to-rank (L2R) techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Further improvements can also be achieved with the new L2R strategies, which have the additional advantage of being quite flexible and extensible to exploit other aspects of the tag recommendation problem.

Test collections

Evaluating diversified search results using per-intent graded relevance BIBAFull-Text 1043-1052
  Tetsuya Sakai; Ruihua Song
Search queries are often ambiguous and/or underspecified. To accommodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, but their properties are little understood. We compare the properties of existing metrics given the premises that (1) queries may have multiple intents; (2) the likelihood of each intent given a query is available; and (3) graded relevance assessments are available for each intent. We compare a wide range of traditional and diversified IR metrics after adding graded relevance assessments to the TREC 2009 Web track diversity task test collection which originally had binary relevance assessments. Our primary criterion is discriminative power, which represents the reliability of a metric in an experiment. Our results show that diversified IR experiments with a given number of topics can be as reliable as traditional IR experiments with the same number of topics, provided that the right metrics are used. Moreover, we compare the intuitiveness of diversified IR metrics by closely examining the actual ranked lists from TREC. We show that a family of metrics called D#-measures have several advantages over other metrics such as α-nDCG and Intent-Aware metrics.
Evaluating multi-query sessions BIBAFull-Text 1053-1062
  Evangelos Kanoulas; Ben Carterette; Paul D. Clough; Mark Sanderson
The standard system-based evaluation paradigm has focused on assessing the performance of retrieval systems in serving the best results for a single query. Real users, however, often begin an interaction with a search engine with a sufficiently under-specified query that they will need to reformulate before they find what they are looking for. In this work we consider the problem of evaluating retrieval systems over test collections of multi-query sessions. We propose two families of measures: a model-free family that makes no assumption about the user's behavior over a session, and a model-based family with a simple model of user interactions over the session. In both cases we generalize traditional evaluation metrics such as average precision to multi-query session evaluation. We demonstrate the behavior of the proposed metrics by using the new TREC 2010 Session track collection and simulations over the TREC-9 Query track collection.
Quantifying test collection quality based on the consistency of relevance judgements BIBAFull-Text 1063-1072
  Falk Scholer; Andrew Turpin; Mark Sanderson
Relevance assessments are a key component for test collection-based evaluation of information retrieval systems. This paper reports on a feature of such collections that is used as a form of ground truth data to allow analysis of human assessment error. A wide range of test collections are retrospectively examined to determine how accurately assessors judge the relevance of documents. Our results demonstrate a high level of inconsistency across the collections studied. The level of irregularity is shown to vary across topics, with some showing a very high level of assessment error. We investigate possible influences on the error, and demonstrate that inconsistency in judging increases with time. While the level of detail in a topic specification does not appear to influence the errors that assessors make, judgements are significantly affected by the decisions made on previously seen similar documents. Assessors also display an assessment inertia. Alternate approaches to generating relevance judgements appear to reduce errors. A further investigation of the way that retrieval systems are ranked using sets of relevance judgements produced early and late in the judgement process reveals a consistent influence measured across the majority of examined test collections.
   We conclude that there is a clear value in examining, even inserting, ground truth data in test collections, and propose ways to help minimise the sources of inconsistency when creating future test collections.
Pseudo test collections for learning web search ranking functions BIBAFull-Text 1073-1082
  Nima Asadi; Donald Metzler; Tamer Elsayed; Jimmy Lin
Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank models. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing web search test collections in a completely unsupervised manner given only a large web corpus as input. Within our proposed framework, anchor text extracted from the web graph is treated as a pseudo query log from which pseudo queries are sampled. For each pseudo query, a set of relevant and non-relevant documents are selected using a variety of web-specific features, including spam and aggregated anchor text weights. The automatically mined queries and judgments form a pseudo test collection that can be used for training ranking functions. Experiments carried out on TREC web track data show that learning to rank models trained using pseudo test collections outperform an unsupervised ranking function and are statistically indistinguishable from a model trained using manual judgments, demonstrating the usefulness of our approach in extracting reasonable quality training data "for free".

Posters presentations

Parallel learning to rank for information retrieval BIBAFull-Text 1083-1084
  Shuaiqiang Wang; Byron J. Gao; Ke Wang; Hady W. Lauw
Learning to rank represents a category of effective ranking methods for information retrieval. While the primary concern of existing research has been accuracy, learning efficiency is becoming an important issue due to the unprecedented availability of large-scale training data and the need for continuous update of ranking functions. In this paper, we investigate parallel learning to rank, targeting simultaneous improvement in accuracy and efficiency.
Learning features through feedback for blog distillation BIBAFull-Text 1085-1086
  Dehong Gao; Renxian Zhang; Wenjie Li; Yiu Keung Lau; Kam Fai Wong
The paper is focused on blogosphere research based on the TREC blog distillation task, and aims to explore unbiased and significant features automatically and efficiently. Feedback from faceted feeds is introduced to harvest relevant features and information gain is used to select discriminative features. The evaluation result shows that the selected feedback features can greatly improve the performance and adapt well to the terabyte data.
Time-based relevance models BIBAFull-Text 1087-1088
  Mostafa Keikha; Shima Gerani; Fabio Crestani
This paper addresses blog feed retrieval where the goal is to retrieve the most relevant blog feeds for a given user query. Since the retrieval unit is a blog, as a collection of posts, performing relevance feedback techniques and selecting the most appropriate documents for query expansion becomes challenging. By assuming time as an effective parameter on the blog posts content, we propose a time-based query expansion method. In this method, we select terms for expansion using most relevant days for the query, as opposed to most relevant documents. This provide us with more trustable terms for expansion. Our preliminary experiments on Blog08 collection shows that this method can outperform state of the art relevance feedback methods in blog retrieval.
Improved query performance prediction using standard deviation BIBAFull-Text 1089-1090
  Ronan Cummins; Joemon Jose; Colm O'Riordan
Query performance prediction (QPP) is an important task in information retrieval (IR). In this paper, we (1) develop a new predictor based on the standard deviation of scores in a variable length ranked list, and (2) we show that this new predictor outperforms state-of-the-art approaches without the need for tuning.
Learning to rank using query-level regression BIBAFull-Text 1091-1092
  Jiajin Wu; Zhihao Yang; Yuan Lin; Hongfei Lin; Zheng Ye; Kan Xu
In this paper, we use query-level regression as the loss function. The regression loss function has been used in pointwise methods, however pointwise methods ignore the query boundaries and treat the data equally across queries, and thus the effectiveness is limited. We show that regression is an effective loss function for learning to rank when used in query-level. We use neural network to model the ranking function and gradient descent for optimization and refer our method as ListReg. Experimental results show that ListReg significantly outperforms pointwise Regression and the state-of-the-art listwise method in most cases.
Diversifying product search results BIBAFull-Text 1093-1094
  Xiangru Chen; Haofen Wang; Xinruo Sun; Junfeng Pan; Yong Yu
In recent years, online shopping is becoming more and more popular. Users type keyword queries on product search systems to find relevant products, accessories, and even related products. However, existing product search systems always return very similar products on the first several pages instead of taking diversity into consideration. In this paper, we propose a novel approach to address the diversity issue in the context of product search. We transform search result diversification into a combination of diversifying product categories and diversifying product attribute values within each category. The two sub-problems are optimization problems which can be reduced into well-known NP-hard problems respectively. We further leverage greedy-based approximation algorithms for efficient product search results re-ranking.
Ad hoc IR: not much room for improvement BIBAFull-Text 1095-1096
  Andrew Trotman; David Keeler
Ranking function performance reached a plateau in 1994. The reason for this is investigated. First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results -- it performs well. The performance is then compared to human performance. They perform comparably. The conclusion is there isn't much room for ranking function improvement.
Image annotation based on recommendation model BIBAFull-Text 1097-1098
  Zijia Lin; Guiguang Ding; Jianmin Wang
In this paper, a novel approach based on recommendation model is proposed for automatic image annotation. For any to-be-annotated image, we first select some related images with tags from training dataset according to their visual similarity. And then we estimate the initial ratings for tags of the training images based on tag ranking method and construct a rating matrix. We also construct a trust matrix based on visual similarity with a k-NN strategy. Then a recommendation model is built on both matrices to rank candidate tags for the target image. The proposed approach is evaluated using two benchmark image datasets, and experimental results have indicated its effectiveness.
Utilizing minimal relevance feedback for ad hoc retrieval BIBAFull-Text 1099-1100
  Eyal Krikon; Oren Kurland
Using relevance feedback can significantly improve (ad hoc) retrieval effectiveness. Yet, if little feedback is available, effectively exploiting it is a challenge. To that end, we present a novel approach that utilizes document passages. Empirical evaluation demonstrates the merits of the approach.
Sense discrimination for physics retrieval BIBAFull-Text 1101-1102
  Christina Lioma; Alok Kothari; Hinrich Schuetze
Information Retrieval in technical domains like physics is characterised by long and precise queries, whose meaning is strongly influenced by term context and domain. We treat this as a disambiguation problem, and present initial findings of a retrieval model that posits a higher probability of relevance for documents matching disambiguated query terms. Preliminary evaluation on a real-life physics test collection shows promising performance improvement.
When documents are very long, BM25 fails! BIBAFull-Text 1103-1104
  Yuanhua Lv; ChengXiang Zhai
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which "shifts" the term frequency normalization formula to boost scores of very long documents. Our experiments show that BM25L, with the same computation cost, is more effective and robust than the standard BM25.
Location and timeliness of information sources during news events BIBAFull-Text 1105-1106
  Elad Yom-Tov; Fernando Diaz
People nowadays can obtain information on current news events through media outlets, social media, and by actively seeking information using search engines. In this paper we investigate the temporal relationship between news coverage by media outlets, social media, and query logs and show that social media frequently precedes other information sources. Additionally, we demonstrate that there is strong negative correlation between the probability for reporting of an event and the distance of the information source from the event.
What deliberately degrading search quality tells us about discount functions BIBAFull-Text 1107-1108
  Paul Thomas; Timothy Jones; David Hawking
Deliberate degradation of search results is a common tool in user experiments. We degrade high-quality search results by inserting non-relevant documents at different ranks. The effect of these manipulations, on a number of commonly-used metrics, is counter-intuitive: the discount functions implicit in P@k, MRR, NDCG, and others do not account for the true relationship between rank and value to the user. We propose an alternative, based on visibility data.
Collective topic modeling for heterogeneous networks BIBAFull-Text 1109-1110
  Hongbo Deng; Bo Zhao; Jiawei Han
In this paper, we propose a joint probabilistic topic model for simultaneously modeling the contents of multi-typed objects of a heterogeneous information network. The intuition behind our model is that different objects of the heterogeneous network share a common set of latent topics so as to adjust the multinomial distributions over topics for different objects collectively. Experimental results demonstrate the effectiveness of our approach for the tasks of topic modeling and object clustering.
Graph-cut based tag enrichment BIBAFull-Text 1111-1112
  Xueming Qian; Xian-Sheng Hua
In this paper, a graph cut based tag enrichment approach is proposed. We build a graph for each image with its initial tags. The graph is with two terminals. Nodes of the graph are full connected with each other. Min-cut/max-flow algorithm is utilized to find the relevant tags for the image. Experiments on Flickr dataset demonstrate the effectiveness of the proposed graph-cut based tag enrichment approach.
Personalized social query expansion using social bookmarking systems BIBAFull-Text 1113-1114
  Mohamed Reda Bouadjenek; Hakim Hacid; Mokrane Bouzeghoub; Johann Daigremont
We propose a new approach for social and personalized query expansion using social structures in the Web 2.0. While focusing on social tagging systems, the proposed approach considers (i) the semantic similarity between tags composing a query, (ii) a social proximity between the query and the user profile, and (iii) on the fly, a strategy for expanding user queries. The proposed approach has been evaluated using a large dataset crawled from del.icio.us.
What are the real differences of children's and adults' web search BIBAFull-Text 1115-1116
  Tatiana Gossen; Thomas Low; Andreas Nürnberger
We present first results of a logfile analysis on web search engines for children. The aim of this research is to analyse fundamental facts about how children's web search behaviour differs from that of adults. We show differences to previous results, which are often based on small lab experiments. Our large-scale analysis suggests that children search queries are more information-oriented and shorter on average. Children indeed make a lot of spelling errors and often repeat searches and revisit web pages.
Cognitive coordinating behaviors in multitasking web search BIBAFull-Text 1117-1118
  Jia Tina Du
This paper investigates how users cognitively coordinate multitasking Web search across different information search problems. The analysis suggests that (1) multitasking is a prevalent Web search behavior including both sequential multitasking (31%) and parallel multitasking (69%); (2) multitasking is performed through a task switching process; and (3) such a process is supported and underpinned by cognitive coordination mechanisms and strategy coordination.
Optimizing multimodal reranking for web image search BIBAFull-Text 1119-1120
  Hao Li; Meng Wang; Zhisheng Li; Zheng-Jun Zha; Jialie Shen
In this poster, we introduce a web image search reranking approach with exploring multiple modalities. Different from the conventional methods that build graph with one feature set for reranking, our approach integrates multiple feature sets that describe visual content from different aspects. We simultaneously integrate the learning of relevance scores, the weighting of different feature sets, the distance metric and the scaling for each feature set into a unified scheme. Experimental results on a large data set that contains more than 1,100 queries and 1 million images demonstrate the effectiveness of our approach.
Multi-layer graph-based semi-supervised learning for large-scale image datasets using MapReduce BIBAFull-Text 1121-1122
  Wen-Yu Lee; Liang-Chi Hsieh; Guan-Long Wu; Winston Hsu; Ya-Fan Su
Semi-supervised learning is to exploit the vast amount of unlabeled data in the world. This paper proposes a scalable graph-based technique leveraging the distributed computing power of the MapReduce programming model. For a higher quality of learning, the paper also presents a multi-layer learning structure to unify both visual and textual information of image data during the learning process. Experimental results show the effectiveness of the proposed methods.
Tackling class imbalance and data scarcity in literature-based gene function annotation BIBAFull-Text 1123-1124
  Mathieu Blondel; Kazuhiro Seki; Kuniaki Uehara
In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass simpler approaches based on string-matching. In this paper, we propose a principled machine learning approach based on kernel classifiers.
   We show that kernels can address the task's inherent data scarcity by embedding additional knowledge and we propose a simple yet effective solution to deal with class imbalance. From experiments on the TREC Genomics Track data, our approach achieves better F1-score than two state-of-the-art approaches based on string-matching and cross-species information.
Bootstrapping subjectivity detection BIBAFull-Text 1125-1126
  Valentin Jijkoun; Maarten de Rijke
We describe a method for automatically generating subjectivity clues for a specific topic and a set of (relevant) document, evaluating it on the task of classifying sentences w.r.t. subjectivity, with improvements over previous work.
The effects of choice in routing relevance judgments BIBAFull-Text 1127-1128
  Edith Law; Paul N. Bennett; Eric Horvitz
The emergence of human computation systems, including Mechanical Turk and games with a purpose, has made it feasible to distribute relevance judgment tasks to workers over the Web. Most human computation systems assign tasks to individuals randomly, and such assignments may match workers with tasks that they may be unqualified or unmotivated to perform. We compare two groups of workers, those given a choice of queries to judge versus those who are not, in terms of their self-rated competence and their actual performance. Results show that when given a choice of task, workers choose ones for which they have greater expertise, interests, confidence, and understanding.
Statistical feature extraction for cross-language web content quality assessment BIBAFull-Text 1129-1130
  Guang-Gang Geng; Xiao-Dong Li; Li-Ming Wang; Wei Wang; Shuo Shen
Web content quality assessment is a typical static ranking problem. Heuristic content and TFIDF features based statistical systems have proven effective for Web content quality assessment. But they are all language dependent features, which are not suitable for cross-language ranking. In this paper, we fuse a series of language-independent features including hostname features, domain registration features, two-layer hyperlink analysis features and third-party Web service features to assess the Web content quality. The experiments on ECML/PKDD 2010 Discovery Challenge cross-language datasets show that the assessment is effective.
Exploiting endorsement information and social influence for item recommendation BIBAFull-Text 1131-1132
  Cheng-Te Li; Shou-De Lin; Man-Kwan Shan
Social networking services possess two features: (1) capturing the social relationships among people, represented by the social network, and (2) allowing users to express their preferences on different kinds of items (e.g. photo, celebrity, pages) through endorsing buttons, represented by a kind of endorsement bipartite graph. In this work, using such information, we propose a novel recommendation method, which leverages the viral marketing in the social network and the wisdom of crowds from endorsement network. Our recommendation consists of two parts. First, given some query terms describing user's preference, we find a set of targeted influencers who have the maximum activation probability on those nodes related to the query terms in the social network. Second, based on the derived targeted influencers as key experts, we recommend items via the endorsement network. We conduct the experiments on DBLP co-authorship social network with author-reference data as the endorsement network. The results show our method can achieve effective recommendations.
Modeling subset distributions for verbose queries BIBAFull-Text 1133-1134
  Xiaobing Xue; W. Bruce Croft
Improving verbose (or long) queries poses a new challenge for search systems. Previous techniques mainly focused on two aspects, weighting the important words or phrases and selecting the best subset query. The former does not consider how words and phrases are used in actual subset queries, while the latter ignores alternative subset queries. Recently, a novel reformulation framework has been proposed to transform the original query as a distribution of reformulated queries, which overcomes the disadvantages of previous techniques. In this paper, we apply this framework to verbose queries, where a reformulated query is specified as a subset query. Experiments on TREC collections show that the query distribution based framework outperforms the state-of-the-art techniques.
Domain expert topic familiarity and search behavior BIBAFull-Text 1135-1136
  Sarvnaz Karimi; Falk Scholer; Adam Clark; Sadegh Kharazmi
Users of information retrieval systems employ a variety of strategies when searching for information. One factor that can directly influence how searchers go about their information finding task is the level of familiarity with a search topic. We investigate how the search behavior of domain experts changes based on their previous level of familiarity with a search topic, reporting on a user study of biomedical experts searching for a range of domain-specific material. The results of our study show that topic familiarity can influence the number of queries that are employed to complete a task, the types of queries that are entered, and the overall number of query terms. Our findings suggest that biomedical search systems should enable searching through a variety of querying modes, to support the different search strategies that users were found to employ depending on their familiarity with the information that they are searching for.
Sample selection for dictionary-based corpus compression BIBAFull-Text 1137-1138
  Christopher Hoobin; Simon Puglisi; Justin Zobel
Compression of large text corpora has the potential to drastically reduce both storage requirements and per-document access costs. Adaptive methods used for general-purpose compression are ineffective for this application, and historically the most successful methods have been based on word-based dictionaries, which allow use of global properties of the text. However, these are dependent on the text complying with assumptions about content and lead to dictionaries of unpredictable size. In recent work we have described an LZ-like approach in which sampled blocks of a corpus are used as a dictionary against which the complete corpus is compressed, giving compression twice as effective than that of zlib. Here we explore how pre-processing can be used to eliminate redundancy in our sampled dictionary. Our experiments show that dictionary size can be reduced by 50% or more (less than 0.1% of the collection size) with no significant effect on compression or access speed.
Evaluating medical information retrieval BIBAFull-Text 1139-1140
  Bevan Koopman; Peter Bruza; Laurianne Sitbon; Michael Lawley
This paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminologists using the ICD-9 medical classification system. The ICD codes are used to devise queries and relevance judgements for this collection. Results of initial test runs using a baseline IR system show that there is room for improvement in medical information retrieval. Queries and relevance judgements are made available at http://aehrc.com/med_eval
Region-based landmark discovery by crowdsourcing geo-referenced photos BIBAFull-Text 1141-1142
  Yen-Ta Huang; An-Jung Cheng; Liang-Chi Hsieh; Winston Hsu; Kuo-Wei Chang
We propose a novel model for landmark discovery that locates region-based landmarks on map in contrast to the traditional point-based landmarks. The proposed method preserves more information and automatically identifies candidate regions on map by crowdsourcing geo-referenced photos. Gaussian kernel convolution is applied to remove noises and generate detected region. We adopt F1 measure to evaluate discovered landmarks and manually check the association between tags and regions. The experiment results show that more than 90% of attractions in the selected city can be correctly located by this method.
Towards effective short text deep classification BIBAFull-Text 1143-1144
  Xinruo Sun; Haofen Wang; Yong Yu
Recently, more and more short texts (e.g., ads, tweets) appear on the Web. Classifying short texts into a large taxonomy like ODP or Wikipedia category system has become an important mining task to improve the performance of many applications such as contextual advertising and topic detection for micro-blogging. In this paper, we propose a novel multi-stage classification approach to solve the problem. First, explicit semantic analysis is used to add more features for both short texts and categories. Second, we leverage information retrieval technologies to fetch the most relevant categories for an input short text from thousands of candidates. Finally, a SVM classifier is applied on only a few selected categories to return the final answer. Our experimental results show that the proposed method achieved significant improvements on classification accuracy compared with several existing state of art approaches.
Temporal latent semantic analysis for collaboratively generated content: preliminary results BIBAFull-Text 1145-1146
  Yu Wang; Eugene Agichtein
Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due to the recent proliferation of collaboratively generated content such as threads in online forums, Collaborative Question Answering archives, Wikipedia, and other versioned content, the document generation process is now directly observable. In this study, we explore how this additional temporal information about the document evolution could be used to enhance the identification of latent document topics. Specifically, we propose a novel hidden-topic modeling algorithm, temporal Latent Semantic Analysis (tLSA), which elegantly extends LSA to modeling document revision history using tensor decomposition. Our experiments show that tLSA outperforms LSA on word relatedness estimation using benchmark data, and explore applications of tLSA for other tasks.
Self-adjusting hybrid recommenders based on social network analysis BIBAFull-Text 1147-1148
  Alejandro Bellogin; Pablo Castells; Ivan Cantador
Ensemble recommender systems successfully enhance recommendation accuracy by exploiting different sources of user preferences, such as ratings and social contacts. In linear ensembles, the optimal weight of each recommender strategy is commonly tuned empirically, with limited guarantee that such weights are optimal afterwards. We propose a self-adjusting hybrid recommendation approach that alleviates the social cold start situation by weighting the recommender combination dynamically at recommendation time, based on social network analysis algorithms. We show empirical results where our approach outperforms the best static combination for different hybrid recommenders.
BlogCast effect on information diffusion in a blogosphere BIBAFull-Text 1149-1150
  Sang-Wook Kim; Christos Faloutsos; Jiwoon Ha
A blog service company provides a function named BlogCast that exposes quality posts on the blog main page to vitalize a blogosphere. This paper analyzes a new type of information diffusion via BlogCast. We show that there exists a strong halo effect in a blogosphere via thorough investigation on a huge volume of blog data.
Product comparison using comparative relations BIBAFull-Text 1151-1152
  Si Li; Zheng-Jun Zha; Zhaoyan Ming; Meng Wang; Tat-Seng Chua; Jun Guo; Weiran Xu
This paper proposes a novel Product Comparison approach. The comparative relations between products are first mined from both user reviews on multiple review websites and community-based question answering pairs containing product comparison information. A unified graph model is then developed to integrate the resultant comparative relations for product comparison. Experiments on popular electronic products show that the proposed approach outperforms the state-of-the-art methods.
Collaborative cyberporn filtering with collective intelligence BIBAFull-Text 1153-1154
  Lung-Hao Lee; Hsin-Hsi Chen
This paper presents a user intent method to generate blacklists for collaborative cyberporn filtering. A novel porn detection framework that finds new pornographic web pages by mining user search behaviors is proposed. It employs users' clicks in search query logs to select the suspected web pages without extra human efforts to label data for training, and determines their categories with the help of URL host name and path information, but without web page content. We adopt an MSN porn data set to explore the effectiveness of our method. This user intent approach achieves high precision, while maintaining favorably low false positive rate. In addition, real-life filtering simulation reveals that our user intent method with its accumulative update strategy achieves 43.36% of blocking rate, while maintaining a steadily less than 7% of over-blocking rate.
Do IR models satisfy the TDC retrieval constraint BIBFull-Text 1155-1156
  Stéphane Clinchant; Eric Gaussier
On diversifying and personalizing web search BIBAFull-Text 1157-1158
  David Vallet; Pablo Castells
Diversification and personalization methods are common approaches to deal with the one-size-fits-all paradigm of Web search engines. We performed a user study with 190 subjects where we analyzed the effects of diversification and personalization methods in a Web search engine. The obtained results suggest that our proposed combination of diversification and personalization factors may be a way to overcome the notion of intrusiveness in personalized approaches.
Semantic tag recommendation using concept model BIBAFull-Text 1159-1160
  Chenliang Li; Anwitaman Datta; Aixin Sun
The common tags given by multiple users to a particular document are often semantically relevant to the document and each tag represents a specific topic. In this paper, we attempt to emulate human tagging behavior to recommend tags by considering the concepts contained in documents. Specifically, we represent each document using a few most relevant concepts contained in the document, where the concept space is derived from Wikipedia. Tags are then recommended based on the tag concept model derived from the annotated documents of each tag. Evaluated on a Delicious dataset of more than 53K documents, the proposed technique achieved comparable tag recommendation accuracy as the state-of-the-art, while yielding an order of magnitude speed-up.
Recommending interesting activity-related local entities BIBAFull-Text 1161-1162
  Jie Tang; Ryen W. White; Peter Bailey
When searching for entities with a strong local character (e.g., a museum), people may also be interested in discovering proximal activity-related entities (e.g., a café). Geographical proximity is a necessary, but not sufficient, qualifier for recommending other entities such that they are related in a useful manner (e.g., interest in a fish market does not imply interest in nearby bookshops, but interest in other produce stores is more likely). We describe and evaluate methods to identify such activity-related local entities.
Cross-corpus relevance projection BIBFull-Text 1163-1164
  Nima Asadi; Donald Metzler; Jimmy Lin
Location disambiguation for geo-tagged images BIBAFull-Text 1165-1166
  Zhu Zhu; Lidan Shou; Kuang Mao; Gang Chen
In this poster, we address the problem of location disambiguation for geotagged Web photo resources. We propose an approach for analyzing and partitioning large geotagged photo collections using geographic and semantic information. By organizing the dataset in a structural scheme, we resolve the location ambiguity and clutter problem yield by massive volume of geotagged photos.
Towards an indexing method to speed-up music retrieval BIBAFull-Text 1167-1168
  Benjamin Martin; Pierre Hanna; Matthias Robine; Pascal Ferraro
Computations in most music retrieval systems strongly depend on the size of data compared. We propose to enhance performances of a music retrieval system, namely a harmonic similarity evaluation method, by first indexing relevant parts of music pieces. The indexing algorithm represents each audio piece exclusively by its major repetition, using harmonic descriptions and string matching techniques. Evaluations are performed in the context of a state-of-the-art retrieval method, namely cover songs identification, and results highlight the success of our indexing system in keeping similar results while yielding a substantial gain in computation time.
An investigation of decompounding for cross-language patent search BIBAFull-Text 1169-1170
  Johannes Leveling; Walid Magdy; Gareth J. F. Jones
Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("patentese"), which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
Detecting seasonal queries by time-series analysis BIBAFull-Text 1171-1172
  Milad Shokouhi
Seasonal events such as Halloween and Christmas repeat every year and initiate several temporal information needs. The impact of such events on users is often reflected in search logs in form of seasonal spikes in the frequency of related queries (e.g. "Halloween costumes", "where is Santa"). Many seasonal queries such as "sigir conference" mainly target fresh pages (e.g. sigir2011.org) that have less usage data such as clicks and anchor-text compared to older alternatives (e.g.sigir2009.org). Thus, it is important for search engines to correctly identify seasonal queries and make sure that their results are temporally reordered if necessary.
   In this poster, we focus on detecting seasonal queries using time-series analysis. We demonstrate that the seasonality of a query can be determined with high accuracy according to its historical frequency distribution.
Learning to rank under tight budget constraints BIBAFull-Text 1173-1174
  Christian Pölitz; Ralf Schenkel
This paper investigates the influence of pruning feature lists to keep a given budget for the evaluation of ranking methods. We learn from a given training set how important the individual prefixes are for the ranking quality. Based on there importance we choose the best prefixes to calculate the ranking while keeping the budget.
A novel hybrid index structure for efficient text retrieval BIBAFull-Text 1175-1176
  Andreas Broschart; Ralf Schenkel
Query processing with precomputed term pair lists can improve efficiency for some queries, but suffers from the quadratic number of index lists that need to be read. We present a novel hybrid index structure that aims at decreasing the number of index lists retrieved at query processing time, trading off a reduced number of index lists for an increased number of bytes to read. Our experiments demonstrate significant cold-cache performance gains of almost 25% on standard benchmark queries.
A weighted curve fitting method for result merging in federated search BIBAFull-Text 1177-1178
  Chuan He; Dzung Hong; Luo Si
Result merging is an important step in federated search to merge the documents returned from multiple source-specific ranked lists for a user query. Previous result merging methods such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) use regression methods to estimate global document scores from document ranks in individual ranked lists. SSL relies on overlapping documents that exist in both individual ranked lists and a centralized sample database. SAFE goes a step further by using both overlapping documents with accurate rank information and documents with estimated rank information for regression. However, existing methods do not distinguish the accurate rank information from the estimated information. Furthermore, all documents are assigned equal weights in regression while intuitively, documents in the top should carry higher weights. This paper proposes a weighted curve fitting method for result merging in federated search. The new method explicitly models the importance of information from overlapping documents over non-overlapping ones. It also weights documents at different positions differently. Empirically results on two datasets clearly demonstrate the advantage of the proposed algorithm.
Effect of different docid orderings on dynamic pruning retrieval strategies BIBAFull-Text 1179-1180
  Nicola Tonellotto; Craig Macdonald; Iadh Ounis
Document-at-a-time (DAAT) dynamic pruning strategies for information retrieval systems such as MaxScore and Wand can increase querying efficiency without decreasing effectiveness. Both work on posting lists sorted by ascending document identifier (docid). The order in which docids are assigned -- and hence the order of postings in the posting lists -- is known to have a noticeable impact on posting list compression. However, the resulting impact on dynamic pruning strategies is not well understood. In this poster, we examine the impact on the efficiency of these strategies across different docid orderings, by experimenting using the TREC ClueWeb09 corpus. We find that while the number of postings scored by dynamic pruning strategies do not markedly vary for different docid orderings, the ordering still has a marked impact on mean query response time. Moreover, when docids are assigned by lexicographical URL ordering, the benefit to response time for is more pronounced for Wand than for MaxScore.
Time-based query performance predictors BIBAFull-Text 1181-1182
  Nattiya Kanhabua; Kjetil Nørvåg
Query performance prediction is aimed at predicting the retrieval effectiveness that a query will achieve with respect to a particular ranking model. In this paper, we study query performance prediction for a ranking model that explicitly incorporates the time dimension into ranking. Different time-based predictors are proposed as analogous to existing keyword-based predictors. In order to improve predicting performance, we combine different predictors using linear regression and neural networks. Extensive experiments are conducted using queries and relevance judgments obtained by crowdsourcing.
Search task difficulty: the expected vs. the reflected BIBAFull-Text 1183-1184
  Jingjing Liu; Nicholas J. Belkin
We report findings on how the user's perception of task difficulty changes before and after searching for information to solve tasks. We found that while in one type of task, the dependent task, this did not change, in another, the parallel task, it did. The findings have implications on designing systems that can provide assistance to users with their search and task solving strategies.
On the suitability of diversity metrics for learning-to-rank for diversity BIBAFull-Text 1185-1186
  Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis
An optimally diverse ranking should achieve the maximum coverage of the aspects underlying an ambiguous or underspecified query, with minimum redundancy with respect to the covered aspects. Although evaluation metrics that reward coverage and penalise redundancy provide intuitive objective functions for learning a diverse ranking, it is unclear whether they are the most effective. In this paper, we contrast the suitability of relevance and diversity metrics as objective functions for learning a diverse ranking. Our results in the context of the diversity task of the TREC 2009 and 2010 Web tracks show that diversity metrics are not necessarily better suited for guiding a learning approach. Moreover, the suitability of these metrics is compromised as they try to penalise redundancy during the learning process.
How diverse are web search results? BIBAFull-Text 1187-1188
  Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis
Search result diversification has recently gained attention as a means to tackle ambiguous queries. While query ambiguity is of particular concern for the short queries commonly observed in a Web search scenario, it is unclear how much diversity is actually promoted by Web search engines (WSEs). In this paper, we assess the diversification performance of two leading WSEs in the context of the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that these WSEs perform effectively for queries with multiple interpretations, but not for those open to multiple aspects related to a single interpretation. Moreover, by deploying a state-of-the-art diversification approach based on query suggestions from these WSEs themselves, we show that their diversification performance can be further improved.
Analysis of an expert search query log BIBAFull-Text 1189-1190
  Yi Fang; Naveen Somasundaram; Luo Si; Jeongwoo Ko; Aditya P. Mathur
Expert search has made rapid progress in modeling, algorithms and evaluations in the recent years. However, there is very few work on analyzing how users interact with expert search systems. In this paper, we conduct analysis of an expert search query log. The aim is to understand the special characteristics of expert search usage. To the best of our knowledge, this is one of the earliest work on expert search query log analysis. We find that expert search users generally issue shorter queries, more common queries, and use more advanced search features, with fewer queries in a session, than general Web search users do. This study explores a new research direction in expert search by analyzing and exploiting query logs.
A model for expert finding in social networks BIBAFull-Text 1191-1192
  Elena Smirnova
Expert finding is a task of finding knowledgeable people on a given topic. State-of-the-art expertise retrieval algorithms identify matching experts based on analysis of textual content of documents experts are associated with. While powerful, these models ignore social structure that might be available. In this paper, we develop a Bayesian hierarchical model for expert finding that accounts for both social relationships and content. The model assumes that social links are determined by expertise similarity between candidates. We demonstrate the improved retrieval performance of our model over the baseline on a realistic data set.
Transductive learning over automatically detected themes for multi-document summarization BIBAFull-Text 1193-1194
  Massih-Reza Amini; Nicolas Usunier
We propose a new method for query-biased multi-document summarization, based on sentence extraction. The summary of multiple documents is created in two steps. Sentences are first clustered; where each cluster corresponds to one of the main themes present in the collection. Inside each theme, sentences are then ranked using a transductive learning-to-rank algorithm based on RankNet, in order to better identify those which are relevant to the query. The final summary contains the top-ranked sentences of each theme. Our approach is validated on DUC 2006 and DUC 2007 datasets.
Rating-based collaborative filtering combined with additional regularization BIBAFull-Text 1195-1196
  Shu Wu; Shengrui Wang
The collaborative filtering (CF) approach to recommender system has received much attention recently. However, previous work mainly focuses on improving the formula of rating prediction, e.g. by adding user and item biases, implicit feedback and time-aware factors, etc, to reach a better prediction by minimizing an objective function. However, little effort has been made on improving CF by incorporating additional regularization to the objective function. Regularization can further bound the searching range of predicted ratings. In this paper, we improve the conventional rating-based objective function by using ranking constraints as the supplementary regularization to restrict the searching of predicted ratings in smaller and more likely ranges, and develop a novel method, called RankSVD++, based on the SVD++ model. Experimental results show that RankSVD++ achieves better performance than existing main-streaming methods due to the addition of informative ranking-based regularization. The idea proposed here can also be easily incorporated to the other CF models.
Words-of-interest selection based on temporal motion coherence for video retrieval BIBAFull-Text 1197-1198
  Lei Wang; Dawei Song; Eyad Elyan
The "Bag of Visual Words" (BoW) framework has been widely used in query-by-example video retrieval to model the visual content by a set of quantized local feature descriptors. In this paper, we propose a novel technique to enhance BoW by the selection of Word-of-Interest (WoI) that utilizes the quantified temporal motion coherence of the visual words between the adjacent frames in the query example. Experiments carried out using TRECVID datasets show that our technique improves the retrieval performance of the classical BoW-based approach.
Aggregating multiple opinion evidence in proximity-based opinion retrieval BIBAFull-Text 1199-1200
  Shima Gerani; Mostafa Keikha; Fabio Crestani
Blog post opinion retrieval is the problem of ranking blog posts according to the likelihood that the post is relevant to the query and that the author was expressing an opinion about the topic (of the query). A recent study has proposed a method for finding the opinion density at query term positions in a document which uses the proximity of query term and opinion term as an indicator of their relatedness. The maximum opinion density between different query positions was used as an opinion score of the whole document. In this paper we investigate the effect of exploiting multiple opinion evidence of a document. We propose using the ordered weighted averaging (OWA) operator in order to combine the opinion score of different query positions for a final score of a document, in the proximity-based opinion retrieval system.
Enhancing mobile search using web search log data BIBAFull-Text 1201-1202
  Yoshiyuki Inagaki; Jiang Bian; Yi Chang; Motoko Maki
Mobile search is still in infancy compared with general purpose web search. With limited training data and weak relevance features, the ranking performance in mobile search is far from satisfactory. To address this problem, we propose to leverage the knowledge of Web search to enhance the ranking of mobile search. In this paper, we first develop an equivalent page conversion between web search and mobile search, then we design a few novel ranking features, generated from the click-through data in web search, for estimating the relevance of mobile search. Large scale evaluations demonstrate that the knowledge from web search is quite effective for boosting the relevance of ranking on mobile search.
Award prediction with temporal citation network analysis BIBAFull-Text 1203-1204
  Zaihan Yang; Dawei Yin; Brian D. Davison
Each year many ACM SIG communities will recognize an outstanding researcher through an award in honor of his or her profound impact and numerous research contributions. This work is the first to investigate an automated mechanism to help in selecting future award winners. We approach the problem as a researchers' expertise ranking problem, and propose a temporal probabilistic ranking model which combines content with citation network analysis. Experimental results based on real-world citation data and historical awardees indicate that some kinds of SIG awards are well-modeled by this approach.
Rating prediction using feature words extracted from customer reviews BIBAFull-Text 1205-1206
  Masanao Ochi; Makoto Okabe; Rikio Onai
We developed a simple method of improving the accuracy of rating prediction using feature words extracted from customer reviews. Many rating predictors work well for a small and dense dataset of customer reviews. However, a practical dataset tends to be large and sparse, because it often includes too many products for each customer to buy and evaluate. Data sparseness reduces prediction accuracy. To improve accuracy, we reduced the dimension of the feature vector using feature words extracted by analyzing the relationship between ratings and accompanying review comments instead of using ratings. We applied our method to the Pranking algorithm and evaluated it on a corpus of golf course reviews supplied by a Japanese e-commerce company. We found that by successfully reducing data sparseness, our method improves prediction accuracy as measured using RankLoss.
Ranking tags in resource collections BIBAFull-Text 1207-1208
  Dimitrios Skoutas; Mohammad Alrifai
We examine different tag ranking strategies for constructing tag clouds to represent collections of tagged objects. The proposed methods are based on random walk on graphs, diversification, and rank aggregation, and they are empirically evaluated on a data set of tagged images from Flickr.
Identifying similar people in professional social networks with discriminative probabilistic models BIBAFull-Text 1209-1210
  Suleyman Cetintas; Monica Rogati; Luo Si; Yi Fang
Identifying similar professionals is an important task for many core services in professional social networks. Information about users can be obtained from heterogeneous information sources, and different sources provide different insights on user similarity.
   This paper proposes a discriminative probabilistic model that identifies latent content and graph classes for people with similar profile content and social graph similarity patterns, and learns a specialized similarity model for each latent class. To the best of our knowledge, this is the first work on identifying similar professionals in professional social networks, and the first work that identifies latent classes to learn a separate similarity model for each latent class. Experiments on a real-world dataset demonstrate the effectiveness of the proposed discriminative learning model.
Intent-oriented diversity in recommender systems BIBAFull-Text 1211-1212
  Saul Vargas; Pablo Castells; David Vallet
Diversity as a relevant dimension of retrieval quality is receiving increasing attention in the Information Retrieval and Recommender Systems (RS) fields. The problem has nonetheless been approached under different views and formulations in IR and RS respectively, giving rise to different models, methodologies, and metrics, with little convergence between both fields. In this poster we explore the adaptation of diversity metrics, techniques, and principles from ad-hoc IR to the recommendation task, by introducing the notion of user profile aspect as an analogue of query intent. As a particular approach, user aspects are automatically extracted from latent item features. Empirical results support the proposed approach and provide further insights.
Disambiguating biomedical acronyms using EMIM BIBAFull-Text 1213-1214
  Nut Limsopatham; Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis
Expanding a query with acronyms or their corresponding 'long-forms' has not been shown to provide consistent improvements in the biomedical IR literature. The major open issue with expanding acronyms in a query is their inherent ambiguity, as an acronym can refer to multiple long-forms. At the same time, a long-form identified in a query can be expanded with its acronym(s); however, some of these may be also ambiguous and lead to poor retrieval performance. In this work, we propose the use of the EMIM (Expected Mutual Information Measure) between a long-form and its abbreviated acronym to measure ambiguity. We experiment with expanding both acronyms and long-forms identified in the queries from the ad hoc task of the TREC 2004 Genomics track. Our preliminary analysis shows the potential of both acronym and long-form expansions for biomedical IR.
Best document selection based on approximate utility optimization BIBAFull-Text 1215-1216
  Hungyu Henry Lin; Yi Zhang; James Davis
This poster describes an alternative approach to handling the best document selection problem. Best document selection is a common problem with many real world applications, but is not a well studied problem by itself; a simple solution would be to treat it as a ranking problem and to use existing ranking algorithms to rank all documents. We could then select only the first element of the sorted list. However, because ranking models optimize for all ranks, the model may sacrifice accuracy of the top rank for the sake of overall accuracy. This is an unnecessary trade-off.
   We begin by first defining an appropriate objective function for the domain, then create a boosting algorithm that explicitly targets this function. Based on experiments on a benchmark retrieval data set and Digg.com news commenting data set, we find that even a simple algorithm built for this specific problem gives better results than baseline algorithms that were designed for the more complicated ranking tasks.
Forecasting counts of user visits for online display advertising with probabilistic latent class models BIBAFull-Text 1217-1218
  Suleyman Cetintas; Datong Chen; Luo Si; Bin Shen; Zhanibek Datbayev
Display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages have different patterns of user visits.
   In this paper we propose a probabilistic latent class model to automatically learn the underlying user visit patterns among multiple Web pages. Experiments carried out on real-world data demonstrate the advantage of using latent classes in forecasting online user visits.
Knowledge effects on document selection in search results pages BIBAFull-Text 1219-1220
  Michael J. Cole; Xiangmin Zhang; Chang Liu; Nicholas J. Belkin; Jacek Gwizdka
Click through events in search results pages (SERPs) are not reliable implicit indicators of document relevance. A user's task and domain knowledge are key factors in recognition and link selection and the most useful SERP document links may be those that best match the user's domain knowledge. User study participants rated their knowledge of genomics MeSH terms before conducting 2004 TREC Genomics Track tasks. Each participant's document knowledge was represented by their knowledge of the indexing MeSH terms. Results show high, intermediate, and low domain knowledge groups had similar document selection SERP rank distributions. SERP link selection distribution varied when participant knowledge of the available documents was analyzed. High domain knowledge participants usually selected a document with the highest personal knowledge rating. Low domain knowledge participants were reasonably successful at selecting available documents of which they had the most knowledge, while intermediate knowledge participants often failed to do so. This evidence for knowledge effects on SERP link selection may contribute to understanding the potential for personalization of search results ranking based on user domain knowledge.
Learning to rank from a noisy crowd BIBAFull-Text 1221-1222
  Abhimanu Kumar; Matthew Lease
We study how to best use crowdsourced relevance judgments learning to rank [1, 7]. We integrate two lines of prior work: unreliable crowd-based binary annotation for binary classification [5, 3], and aggregating graded relevance judgments from reliable experts for ranking [7]. To model varying performance of the crowd, we simulate annotation noise with varying magnitude and distributional properties. Evaluation on three LETOR test collections reveals a striking trend contrary to prior studies: single labeling outperforms consensus methods in maximizing learner accuracy relative to annotator eýort. We also see surprising consistency of the learning curve across noise distributions, as well as greater challenge with the adversarial case for multi-class labeling.
How to count thumb-ups and thumb-downs?: an information retrieval approach to user-rating based ranking of items BIBAFull-Text 1223-1224
  Dell Zhang; Robert Mao; Haitao Li; Joanne Mao
It is a common practice among Web 2.0 services to allow users to rate items on their sites. In this paper, we first point out the flaws of the popular methods for user-rating based ranking of items, and then argue that two well-known Information Retrieval (IR) techniques, namely the Probability Ranking Principle and Statistical Language Modelling, provide a simple but effective solution to this problem.
Predicting users' domain knowledge from search behaviors BIBAFull-Text 1225-1226
  Xiangmin Zhang; Michael Cole; Nicholas Belkin
This study uses regression modeling to predict a user's domain knowledge level (DK) from implicit evidence provided by certain search behaviors. A user study (n=35) with recall-oriented search tasks in the genomic domain was conducted. A number of regression models of a person's DK, were generated using different behavior variable selection methods. The best model highlights three behavior variables as DK predictors: the number of documents saved, the average query length, and the average ranking position of documents opened. The model is validated using the split sampling method. Limitations and future research directions are discussed.
The interactive PRP for diversifying document rankings BIBAFull-Text 1227-1228
  Guido Zuccon; Leif Azzopardi; C. J. "Keith" van Rijsbergen
The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP's limitations. In this poster we focus on the Interactive PRP (iPRP), which rejects the assumption of independence between documents made by the PRP. Although the theoretical framework of the iPRP is appealing, no instantiation has been proposed and investigated. In this poster, we propose a possible instantiation of the principle, performing the first empirical comparison of the iPRP against the PRP. For document diversification, our results show that the iPRP is significantly better than the PRP, and comparable to or better than other methods such as Modern Portfolio Theory.
Detecting success in mobile search from interaction BIBAFull-Text 1229-1230
  Qi Guo; Shuai Yuan; Eugene Agichtein
Predicting searcher success and satisfaction is a key problem in Web search, which is essential for automatic evaluating and improving search engine performance. This problem has been studied actively in the desktop search setting, but not specifically for mobile search, despite many known differences between the two modalities. As mobile devices become increasingly popular for searching the Web, improving the searcher experience on such devices is becoming crucially important. In this paper, we explore the possibility of predicting searcher success and satisfaction in mobile search with a smart phone. Specifically, we investigate client-side interaction signals, including the number of browsed pages, and touch screen-specific actions such as zooming and sliding. Exploiting this information with machine learning techniques results in nearly 80% accuracy for predicting searcher success -- significantly outperforming the previous models.
Measuring assessor accuracy: a comparison of NIST assessors and user study participants BIBAFull-Text 1231-1232
  Mark D. Smucker; Chandra Prakash Jethani
In many situations, humans judging document relevance are forced to trade-off accuracy for speed. The development of better interactive retrieval systems and relevance assessing platforms requires the measurement of assessor accuracy, but to date the subjective nature of relevance has prevented such measurement. To quantify assessor performance, we define relevance to be a group's majority opinion, and demonstrate the value of this approach by comparing the performance of NIST assessors to a group of assessors representative of participants in many information retrieval user studies. Using data collected as part of a user study with 48 participants, we found that NIST assessors discriminate between relevant and non-relevant documents better than the average participant in our study, but that NIST assessors' true positive rate is no better than that of the study participants. In addition, we found NIST assessors to be conservative in their judgment of relevance compared to the average participant.
A bipartite graph based social network splicing method for person name disambiguation BIBAFull-Text 1233-1234
  Jintao Tang; Qin Lu; Ting Wang; Ji Wang; Wenjie Li
The key issue of person name disambiguation is to discover different namesakes in massive web documents rather than simply cluster documents by using textual features. In this paper, we describe a novel person name disambiguation method based on social networks to effectively identify namesakes. The social network snippets in each document are extracted. Then, the namesakes are identified via splicing the social networks of each namesake by using the snippets as a bipartite graph. Experimental results show that our method achieves better result than the top performance of WePS-2 in identifying different namesakes.
Link formation analysis in microblogs BIBAFull-Text 1235-1236
  Dawei Yin; Liangjie Hong; Xiong Xiong; Brian D. Davison
Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are important for many tasks such as friend recommendation, community detection, and network growth models. In this paper, by analyzing data collected over time, we find that 90% of new links are to people just two hops away and dynamics of friend acquisition are also related to users' account age. Finally, we compare two popular sampling methods which are widely used for network analysis and find that ForestFire does not preserve properties required for the link prediction task.
Evolution of web search results within years BIBAFull-Text 1237-1238
  Ismail Sengor Altingovde; Rifat Ozcan; Özgür Ulusoy
We provide a first large-scale analysis of the evolution of query results obtained from a real search engine at two distant points in time, namely, in 2007 and 2010, for a set of 630,000 real queries.
Decayed DivRank: capturing relevance, diversity and prestige in information networks BIBAFull-Text 1239-1240
  Pan Du; Jiafeng Guo; Xue-Qi Cheng
Many network-based ranking approaches have been proposed to rank objects according to different criteria, including relevance, prestige and diversity. However, existing approaches either only aim at one or two of the criteria, or handle them with additional heuristics in multiple steps. Inspired by DivRank, we propose a unified ranking model, Decayed DivRank (DDRank), to meet the three criteria simultaneously. Empirical experiments on paper citation network show that DDRank can outperform existing algorithms in capturing relevance, diversity and prestige simultaneously in ranking.
Multi-objective optimization in learning to rank BIBAFull-Text 1241-1242
  Na Dai; Milad Shokouhi; Brian D. Davison
Supervised learning to rank algorithms typically optimize for high relevance and ignore other facets of search quality, such as freshness and diversity. Prior work on multi-objective ranking trained rankers focused on using hybrid labels that combine overall quality of documents, and implicitly incorporate multiple criteria into quantifying ranking risks. However, these hybrid scores are usually generated based on heuristics without considering potential correlations between individual facets (e.g., freshness versus relevance). In this poster, we empirically demonstrate that the correlation between objective facets in multi-criteria ranking optimization may significantly influence the effectiveness of trained rankers with respect to each objective.
A large-scale study of the effect of training set characteristics over learning-to-rank algorithms BIBAFull-Text 1243-1244
  Evangelos Kanoulas; Stefan Savev; Pavel Metrikov; Virgil Pavlu; Javed Aslam
In this work we describe the results of a large-scale study on the effect of the distribution of labels across the different grades of relevance in the training set on the performance of trained ranking functions. In a controlled experiment we generate a large number of training datasets with different label distributions and employ three learning to rank algorithms over these datasets. We investigate the effect of these distributions on the accuracy of obtained ranking functions to give an insight into the manner training sets should be constructed.
Exploring term temporality for pseudo-relevance feedback BIBAFull-Text 1245-1246
  Stewart Whiting; Yashar Moshfeghi; Joemon M. Jose
As digital collections expand, the importance of the temporal aspect of information has become increasingly apparent. The aim of this paper is to investigate the effect of using long-term temporal profiles of terms in information retrieval by enhancing the term selection process of pseudo-relevance feedback (PRF). For this purpose, two temporal PRF approaches were introduced considering only temporal aspect and temporal along with textual aspect. Experiments used the AP88-89 and WSJ87-92 test collections with TREC Ad-Hoc Topics 51-100. Term temporal profiles are extracted from the Google Books n-grams dataset. The results show that the long-term temporal aspects of terms are capable of enhancing retrieval effectiveness.
MSSF: a multi-document summarization framework based on submodularity BIBAFull-Text 1247-1248
  Jingxuan Li; Lei Li; Tao Li
Multi-document summarization aims to distill the most representative information from a set of documents to generate a summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in textual-unit similarity motivates us to incorporate this property into our solution to multi-document summarization tasks. In this poster, we propose a new principled and versatile framework for different multi-document summarization tasks using the submodular function [8].
SEJoin: an optimized algorithm towards efficient approximate string searches BIBAFull-Text 1249-1250
  Junfeng Zhou; Ziyang Chen; Jingrong Zhang
We investigated the problem of finding from a collection of strings those similar to a given query string based on edit distance, for which the critical operation is merging inverted lists of grams generated from the collection of strings. We present an efficient algorithm to accelerate the merging operation.
Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval BIBAFull-Text 1251-1252
  Konstantinos Zagoris; Savvas A. Chatzichristofis; Avi Arampatzis
The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being near-duplicate to queries. In this experimental study we demonstrate that this advantage of BOVW is diminished when visual diversity is enhanced by using a secondary modality, such as text, to pre-filter images.
   The TOP-SURF descriptor is evaluated against Compact Composite Descriptors on a two-stage image retrieval setup, which first uses a text modality to rank the collection and then perform CBIR only on the top-K items.
Query term ranking based on search results overlap BIBAFull-Text 1253-1254
  Wei Song; Yu Zhang; Yubin Xie; Ting Liu; Sheng Li
In this paper, we propose a method to rank and assign weights to query terms according to their impact on the topic of the query. We use Search Result Overlap Ratio (SROR) to quantify the overlap of the search results of the full query and a shorten query after removing one term. Intuitively, if the overlap is small, it indicates a big topic shift and the removed term should be discriminative and important. The SROR could be used for measuring query term importance with a search engine automatically. By this way, learning based models could be trained based on a large number of automatically labeled instances and make predictions for future queries efficiently.
Tossing coins to trim long queries BIBAFull-Text 1255-1256
  Sudip Datta; Vasudeva Varma
Verbose web queries are often descriptive in nature where a term based search engine is unable to distinguish between the essential and noisy words, which can result in a drift from the user intent. We present a randomized query reduction technique that builds on an earlier learning to rank based approach. The proposed technique randomly picks only a small set of samples, instead of the exponentially many sub-queries, thus being fast enough to be useful for web search engines, while still covering wide sub-query space.
A comparison of time-aware ranking methods BIBAFull-Text 1257-1258
  Nattiya Kanhabua; Kjetil Nørvåg
When searching a temporal document collection, e.g., news archives or blogs, the time dimension must be explicitly incorporated into a retrieval model in order to improve relevance ranking. Previous work has followed one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, or 2) a probabilistic model generating a query from the textual and temporal part of a document independently. In this paper, we compare the effectiveness of different time-aware ranking methods by using a mixture model applied to all methods. Extensive evaluation is conducted using the New York Times Annotated Corpus, queries and relevance judgments obtained using the Amazon Mechanical Turk.
Learning for graphs with annotated edges BIBAFull-Text 1259-1260
  Fan Li
Automatic classification with graphs containing annotated edges is an interesting problem and has many potential applications. We present a risk minimization formulation that exploits the annotated edges for classification tasks. One major advantage of our approach compared to other methods is that the weight of each edge in the graph structures in our model, including both positive and negative weights, can be learned automatically from training data based on edge features. The empirical results show that our approach can lead to significantly improved classification performance compared to several baseline approaches.
Formulating effective questions for community-based question answering BIBAFull-Text 1261-1262
  Saori Suzuki; Shin'ichi Nakayama; Hideo Joho
Community-based Question Answering (CQA) services have become a major venue for people's information seeking on the Web. However, many studies on CQA have focused on the prediction of the best answers for a given question. This paper looks into the formulation of effective questions in the context of CQA. In particular, we looked at effect of contextual factors appended to a basic question on the performance of submitted answers. This study analysed a total of 930 answers returned in response to 266 questions that were formulated by 46 participants. The results show that adding a questionnaire's personal and social attribute to the question helped improve the perceptions of answers both in information seeking questions and opinion seeking questions.

Demonstrations

ClusteringWiki: personalized and collaborative clustering of search results BIBAFull-Text 1263-1264
  Dragos C. Anastasiu; Byron J. Gao; David Buttler
How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and clustering has been considered as a promising alternative. Clustering divides a long list of disparate search results into a few topic-coherent clusters, allowing the user to quickly locate relevant results by topic navigation. While many clustering algorithms have been proposed that innovate on the automatic clustering procedure, we introduce ClusteringWiki, the first prototype and framework for personalized clustering that allows direct user editing of clustering results. Through a Wiki interface, the user can edit and annotate the membership, structure and labels of clusters for a personalized presentation. In addition, the edits and annotations can be shared among users as a mass collaborative way of improving search result organization and search engine utility.
OrientSTS: spatio-temporal sequence searching in flickr BIBAFull-Text 1265-1266
  Chunjie Zhou; Dongqi Liu; Xiaofeng Meng
Nowadays, due to the increasing user requirements of efficient and personalized services, a perfect travel plan is urgently needed. However, at present it is hard for people to make a personalized traveling plan. Most of them follow other people's general travel trajectory. So only after finishing their travel, do they know which scene is their favorite, which is not, and what is the perfect order of visits. In this research we propose a novel spatio-temporal sequence (STS) searching, which mainly includes two steps.
   Firstly, we propose a novel method to detect tourist features of every scene, and its difference in different seasons. Secondly, combined with personal profile and scene features, a set of interesting scenes will be chosen and each scene has a specific weight for each user. The goal of our research is to provide the traveler with the STS, which passes through as many chosen scenes as possible with the maximum weight and the minimum distance within his travel time. We propose a method based on topic model to detect scene features, and provide two approximate algorithms to mine STS: a local optimization algorithm and a global optimization algorithm. System evaluations have been conducted and the performance results show the efficiency.
A toolkit for knowledge base population BIBAFull-Text 1267-1268
  Zheng Chen; Suzanne Tamang; Adam Lee; Heng Ji
The main goal of knowledge base population (KBP) is to distill entity information (e.g., facts of a person) from multiple unstructured and semi-structured data sources, and incorporate the information into a knowledge base (KB). In this work, we intend to release an open source KBP toolkit that is publicly available for research purposes.
iMecho: a context-aware desktop search system BIBAFull-Text 1269-1270
  Jidong Chen; Hang Guo; Wentao Wu; Wei Wang
In this demo, we present iMecho, a context-aware desktop search system to help users get more relevant results. Different from other desktop search engines, iMecho ranks results not only by the content of the query, but also the context of the query. It employs an Hidden Markov Model (HMM)-based user model, which is learned from user's activity logs, to estimate the query context when he submits the query. The results from keyword search are re-ranked by their relevances to the context with acceptable overhead.
Visualizing and querying semantic social networks BIBAFull-Text 1271-1272
  Aixin Sun; Anwitaman Datta; Ee-Peng Lim; Kuiyu Chang
We demonstrate SSNetViz that is developed for integrating, visualizing and querying heterogeneous semantic social networks obtained from multiple information sources. A semantic social network refers to a social network graph with multi-typed nodes and links. We demonstrate various innovative features of SSNetViz with social networks from three information sources covering a similar set of entities and relationships in terrorism domain.
What-you-retrieve-is-what-you-see: a preliminary cyber-physical search engine BIBAFull-Text 1273-1274
  Lidan Shou; Ke Chen; Gang Chen; Chao Zhang; Yi Ma; Xian Zhang
The cyber-physical systems (CPS) are envisioned as a class of real-time systems integrating the computing, communication and storage facilities with monitoring and control of the physical world. One interesting CPS application in the mobile Internet is to provide Web search "on the spot" regarding the physical world that a user sees, or literally WYRIWYS (What-You-Retrieve-Is-What-You-See). The objective of our work is to develop server/browser software for supporting WYRIWYS search in our prototype cyber-physical search engine. A WYRIWYS search retrieves visible Web objects and ranks them by their cyber-physical relevances (term, visual, spatial, temporal etc.). This work is distinguished from previous LWS as it provides quality Web search geared with the physical world. Therefore it suggests a very promising solution to cyber-physical Web search.
QuickView: advanced search of tweets BIBAFull-Text 1275-1276
  Xiaohua Liu; Long Jiang; Furu Wei; Ming Zhou; QuickView Team Microsoft
Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
Personalized video: leanback online video consumption BIBAFull-Text 1277-1278
  Krishnan Ramanathan; Yogesh Sankarasubramaniam; Vidhya Govindaraju
Current user interfaces for online video consumption are mostly browser based, lean forward, require constant interaction and provide a fragmented view of the total content available. For easier consumption, the user interface and interactions need to be redesigned for less interruptive and lean back experience. In this paper, we describe Personalized Video, an application that converts the online video experience into a personalized lean back experience. It has been implemented on the Windows platform and integrated with intuitive user interactions like gesture and face recognition. It also supports group personalization for concurrent users.
GreenMeter: a tool for assessing the quality and recommending tags for web 2.0 applications BIBAFull-Text 1279-1280
  Saulo M. R. Ricci; Dilson A. Guimarães; Fabiano M. Belém; Jussara M. Almeida; Marcos A. Gonçalves; Raquel Prates
We present GreenMeter, a tool for assessing the quality and recommending tags for Web 2.0 content. Its goal is to improve tag quality and the effectiveness of various information services (e.g., search, content recommendation) that rely on tags as data sources. We demonstrate an implementation of GreenMeter for the popular Last.fm application.
JuSe: a picture dictionary query system for children BIBAFull-Text 1281-1282
  Tamara Polajnar; Richard Glassey; Leif Azzopardi
As adults we take for granted our capacity to express our information needs verbally and textually. However, young children also have preferences and information needs, but are just learning to be able to express themselves effectively. Consequently they encounter many barriers when trying to spell, type, and communicate their needs to a 'faceless' search engine text box.
   Junior Search (JuSe) is an interface that enables preschoolers and young children to search and find consumable online content (such as games for kids, videos, etc.) through adaptable picture dictionaries. Inspired by educational children's toys, rather than search engines designed for adults, JuSe incorporates a learning element by combining audio-visual and textual cues to improve written word recognition and vocabulary skills. JuSe provides an interactive learning environment that allows parents to introduce new words and concepts into the child's lexicon, as well as controlling the content and search queries.
CrowdTracker: enabling community-based real-time web monitoring BIBAFull-Text 1283-1284
  James Caverlee; Zhiyuan Cheng; Brian Eoff; Chiao-Fang Hsu; Krishna Kamath; Jeffrey McGee
CrowdTracker is a community-based web monitoring system optimized for real-time web streams like Twitter, Facebook, and Google Buzz. In this demo summary, we provide an overview of the system and architecture, and outline the demonstration plan.
The Meta-Dex Suite: generating and analyzing indexes and meta-indexes BIBAFull-Text 1285-1286
  Michael Huggett; Edie Rasmussen
Our Meta-dex software suite extracts content and index text from a corpus of PDF files, and generates a meta-index that references entries across an entire domain. We provide tools to analyze the individual and integrated indexes, and visualize entries and books within the meta-index. The suite is scalable to very large data sets.
Tulsa: web search for writing assistance BIBFull-Text 1287-1288
  Duo Ding; Xingping Jiang; Matthew R. Scott; Ming Zhou; Yong Yu
The TREC files: the (ground) truth is out there BIBAFull-Text 1289-1290
  Savvas A. Chatzichristofis; Konstantinos Zagoris; Avi Arampatzis
Traditional tools for information retrieval (IR) evaluation, such as TREC's trec_eval, have outdated command-line interfaces with many unused features, or 'switches', accumulated over the years. They are usually seen as cumbersome applications by new IR researchers, steepening the learning curve. We introduce a platform-independent application for IR evaluation with a graphical easy-to-use interface: the TREC_Files Evaluator. The application supports most of the standard measures used for evaluation in TREC, CLEF, and elsewhere, such as MAP, P10, P20, and bpref, as well as the Averaged Normalized Modified Retrieval Rank (ANMRR) proposed by MPEG for image retrieval evaluation. Additional features include a batch mode and statistical significance testing of the results against a pre-selected baseline.
A tool for comparative IR evaluation on component level BIBFull-Text 1291-1292
  Thomas Wilhelm; Jens Kürsten; Maximilian Eibl

Tutorials

Machine learning for information retrieval BIBAFull-Text 1293-1294
  Luo Si; Rong Jin
In recent years, we have witnessed successful application of machine learning techniques to a wide range of information retrieval problems, including Web search engines, recommendation systems, online advertising, etc. It is thus critical for researchers in the information retrieval community to understand the core machine learning techniques. In order to accommodate audiences with different levels of understanding of machine learning, we divide this tutorial into two sessions: the first session will focus on basic machine learning concepts and tools; in the second session, we will introduce more advanced topics in machine learning, and will present recent developments in machine learning and its application to information retrieval. Each season is self-contained.
   Session 1: Core Learning Technologies for Information Retrieval. This session of the tutorial will cover the core machine learning methods, basic optimization techniques and key information retrieval applications. In particular, it includes: 1). Core concepts in machine learning, such as supervised learning/unsupervised learning, bias and variance trade off, and probabilistic models; 2). Useful concepts and algorithms in optimization including the first and second order gradient methods, and Expectation and Maximization; 3). The application of machine learning methods to key information retrieval problems including text classification, collaborative filtering, clustering and learning to rank;
   Session 2: Emerging Learning Technologies for Information Retrieval. This session will cover more advanced machine learning techniques that have started to be utilized in information retrieval applications. In particular, it will cover: 1). Advanced Optimization Techniques including stochastic optimization and smooth minimization; 2). Emerging Learning Techniques such as Multiple-Instance Learning, Active Learning and Semi-supervised Learning.
   The tutorial will benefit a large body of audience in the information retrieval community, ranging from students who are new to machine learning to the seasoned researchers who would like to understand the recent advance in machine learning for information retrieval research. This tutorial will also benefit the practitioners who apply learning techniques to real-world information retrieval systems.
Enhancing web search by mining search and browse logs BIBAFull-Text 1295-1296
  Daxin Jiang; Jian Pei; Hang Li
Huge amounts of search log data have been accumulated in various search engines. Currently, a commercial search engine receives billions of queries and collects tera-bytes of log data on any single day. Other than search log data, browse logs can be collected by client-side browser plug-ins, which record the browse information if users' permissions are granted. Such massive amounts of search/browse log data, on the one hand, provide great opportunities to mine the wisdom of crowds and improve web search results. On the other hand, designing effective and efficient methods to clean, model, and process large scale log data also presents great challenges.
   In this tutorial, we will focus on mining search and browse log data for search engines. We will start with an introduction of search and browse log data and an overview of frequently-used data summarization in log mining. We will then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, query-document matching, user understanding, and monitoring and feedbacks. For each aspect, we will survey the major tasks, fundamental principles, and state-of-the-art methods. Finally, we will discuss the challenges and future trends of log data mining.
   The goal of this tutorial is to provide a systematic survey on large-scale search/browse log mining to the IR community. It may help IR researchers to get familiar with the core challenges and promising directions in log mining. At the same time, this tutorial may also serve the developers of web information retrieval systems as a comprehensive and in-depth reference to the advanced log mining techniques.
A new look at old tricks: the fertile roots of current research BIBAFull-Text 1297-1298
  Paul B. Kantor
As we face an explosion of potential new applications for the fundamental concepts and technologies of information retrieval, ranging from ad ranking to social media, from collaborative recommending to question answering systems, many researchers are spending unnecessary time reinventing ideas and relationships that are buried in the prehistory of information retrieval (which, for many researchers, means anything published before they entered graduate school). A lot of the ideas that surface as "new" in today's super-heated research environment have very firm roots in earlier developments in fields as diverse as citation analysis and pattern recognition. The purpose of this tutorial is to survey those roots, and their relation to the contemporary fruits on the tree of information retrieval, and to separate, as much as is possible in an era of increasing secrecy about methods, the problems to be solved, the algorithms for solving them, and the heuristics that are the bread and butter of a working operation. Participants will become familiar with roots in Pattern Analysis, Statistics, Information Science and other sources of key ideas that reappear in the current development of Information Retrieval as it applies to Search Engines, Social Media, and Collaborative Systems. They will be able to separate problems from algorithms, and algorithms from heuristics, in the application of these ideas to their own research and/or development activities. Course materials will be made available on a Web site two weeks prior to the tutorial. They will include links to relevant software; links to publications that will be discussed; and mechanisms for chat among the tutorial participants, before, during and after the tutorial.
Crowdsourcing for information retrieval: principles, methods, and applications BIBAFull-Text 1299-1300
  Omar Alonso; Matthew Lease
Crowdsourcing has emerged in recent years as a promising new avenue for leveraging today's digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee or contractor. Crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower have gained particular attention as active online market places for reaching and tapping into this still largely under-utilized workforce. Crowdsourcing also offers intriguing new opportunities for accomplishing different kinds of tasks or achieving broader participation than previously possible, as well as completing standard tasks more accurately in less time and at lower cost. Unlocking the potential of crowdsourcing in practice, however, requires a tri-partite understanding of principles, platforms, and best practices. We will introduce the opportunities and challenges of crowdsourcing while discussing the three issues above. This will provide a basic foundation to begin crowdsourcing in the context of one's own particular tasks.
Practical online retrieval evaluation BIBAFull-Text 1301-1302
  Filip Radlinski; Yisong Yue
Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers to any evaluation of retrieval quality conducted while observing user behavior in a natural context. However, it is rarely employed outside of large commercial search engines due primarily to a perception that it is impractical at small scales. The goal of this tutorial is to familiarize information retrieval researchers with state-of-the-art techniques in evaluating information retrieval systems based on natural user clicking behavior, as well as to show how such methods can be practically deployed. In particular, our focus will be on demonstrating how the Interleaving approach and other click based techniques contrast with traditional offline evaluation, and how these online methods can be effectively used in academic-scale research. In addition to lecture notes, we will also provide sample software and code walk-throughs to showcase the ease with which Interleaving and other click-based methods can be employed by students, academics and other researchers.
Web retrieval: the role of users BIBAFull-Text 1303-1304
  Ricardo Baeza-Yates; Yoelle Maarek
Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways: (1) Implicitly, through the analysis of usage data captured by query logs, and session and click information in general; the goal here being to improve ranking as well as to measure user's happiness and engagement; (2) Explicitly, by offering novel interactive features; the goal here being to better answer users' needs. This half day tutorial covers the user-related challenges associated with the implicit and explicit role of users in Web retrieval. More specifically, we review and discuss challenges associated with:
  • Usage data analysis and metrics -- It is critical to monitor how users take
       advantage and interact with Web retrieval systems, as this implicit relevant
       feedback aggregated at a large scale, can provide insights on users'
       underlying intent as well as approximate quite accurately the level of
       success of a given feature. Here we have to consider not only clicks
       statistics, the sequences of queries, the time spent in a page, the number
       of actions per session, etc. This is the focus of the first part of the
       tutorial.
  • User interaction -- Given the intrinsic problems posed by the Web, the key
       challenge for the user is to conceive a good query to be submitted to the
       search system, one that leads to a manageable and relevant answer. The
       retrieval system must assist users during two key stages of interaction:
       before the query is fully expressed and after the results are returned.
       After quite some stagnation on the front-end of Web retrieval, we have seen
       numerous novel interactive features appear in the last 3 to 4 years, as the
       leading commercial search engines seem to compete for users' attention. The
       second part of the tutorial will be dedicated to explicit user interaction.
       We will introduce novel material (as compared to previous versions of this
       tutorial that were given at SIGIR'2010, WSDM'2011 and ECIR'2011) in order to
       reflect recent Web search features such as Google Instant or Yahoo! Direct
       Search. The goal of this tutorial is to teach the key principles and technologies behind the activities and challenges briefly outlined above, bring new understanding and insights to the attendees, and hopefully foster future research. A previous version of this tutorial was offered at the ACM SIGIR'2010, WSDM'2011 and ECIR'2011.
  • Information organization and retrieval with collaboratively generated content BIBAFull-Text 1307-1308
      Eugene Agichtein; Evgeniy Gabrilovich
    Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). In this tutorial, we will discuss organizing and exploiting Collaboratively Generated Content (CGC) for information organization and retrieval. Specifically, we intend to cover two complementary areas of the problem: (1) using such content as a powerful enabling resource for knowledge-enriched, intelligent representations and new information retrieval algorithms, and (2) development of supporting technologies for extracting, filtering, and organizing collaboratively created content.
       The unprecedented amounts of information in CGC enable new, knowledge-rich approaches to information access, which are significantly more powerful than the conventional word-based methods. Considerable progress has been made in this direction over the last few years. Examples include explicit manipulation of human-defined concepts and their use to augment the bag of words (cf. Explicit Semantic Analysis), using large-scale taxonomies of topics from Wikipedia or the Open Directory Project to construct additional class-based features, or using Wikipedia for better word sense disambiguation.
       However, the quality and comprehensiveness of collaboratively created content vary widely, and in order for this resource to be useful, a significant amount of preprocessing, filtering, and organization is necessary. Consequently, new methods for analyzing CGC and corresponding user interactions are required to effectively harness the resulting knowledge. Thus, not only the content repositories can be used to improve IR methods, but the reverse pollination is also possible, as better information extraction methods can be used for automatically collecting more knowledge, or verifying the contributed content. This natural connection between modeling the generation process of CGC and effectively using the accumulated knowledge suggests covering both areas together in a single tutorial.
       The intended audience of the tutorial includes IR researchers and graduate students, who would like to learn about the recent advances and research opportunities in working with collaboratively generated content. The emphasis of the tutorial is on comparing the existing approaches and presenting practical techniques that IR practitioners can use in their research. We also cover open research challenges, as well as survey available resources (software tools and data) for getting started in this research field.

    Doctoral consortium

    Persistence in the ephemeral: utilizing repeat behaviors for multi-session personalized search BIBAFull-Text 1311-1312
      Sarah K. Tyler
    As the abundance of information on the Internet grows, an increasing burden is placed on the user to specify his or her query precisely in order to avoid extraneous results that may be relevant, but not useful. At the same time, users have a tendency to repeat their search behaviors, seeking the same URL (re-finding) as well as issuing the same query (re-searching). These repeated actions reveal a form of user preference that the search engine can utilize to personalize the results.
       In our approach, we personalize search results related to ongoing tasks, allowing for a different degree of strength of interest, and diversity of interest per task. We focus on high valued queries; queries that are both related to past queries and will be related to future queries given the ongoing nature of the task.
    Search engines that learn online BIBAFull-Text 1313-1314
      Katja Hofmann
    The goal of my research is to develop self-learning search engines, that can learn online, i.e., directly from interactions with actual users. Such systems can continuously adapt to user preferences throughout their lifetime, leading to better search performance in settings where expensive manual tuning is infeasible. Challenges that are addressed in my work include the development of effective online learning to rank algorithms for IR, user aspects, and evaluation.
    Query expansion based on a semantic graph model BIBAFull-Text 1315-1316
      Xue Jiang
    Query expansion is a classical topic in the field of information retrieval, which is proposed to bridge the gap between searchers' information intents and their queries. Previous researches usually expand queries based on document collections, or some external resources such as WordNet and Wikipedia [1, 2, 3, 4, 5]. However, it seems that independently using one of these resources has some defects, document collections lack semantic information of words, while WordNet and Wikipedia may not include domain-specific knowledge in certain document collection. Our work aims to combine these two kinds of resources to establish an expansion model which represents not only domain-specific information but also semantic information. In our preliminary experiments, we construct a two-layer word graph and use Random-Walk algorithm to calculate the weights of each term in pseudo-relevance feedback documents, then select the highest weighted term to expand original query. The first layer of the word graph contains terms in related documents, while the second layer contains semantic senses corresponding to these terms. These terms and semantic senses are treated as vertices of the graph and connected with each other by all possible relationships, such as mutual information and semantic similarities. We utilized mutual information, semantic similarity and uniform distribution as the weight of term-term relation, sense-sense relation and word-sense relation respectively. Though these experiments show that our expansion outperform original queries, we are troubled with some difficult problems.
       Given the framework of semantic graph model, we need more effort to find out an optimal graph to represent the relationships between terms and their semantic senses. We utilized a two-layer graph model in our preliminary research, where terms from different documents are treated equally. Maybe we can introduce the document as a third layer in future work, where we can differ the same terms in different documents according to document relevance and context.
       Then we need appropriately represent initial weights of this words, senses and relationships. Various measures for weights of terms and term relations have been proved effective in other information retrieval tasks, such as TFIDF, mutual information (MI), but there is little research on weights for semantic senses and their relations. For polysemous words, we add all of their semantic senses to the graph and assume that these senses are uniformly distributed. Actually, it is not precise for a word in a special document and query. As we know, a polysemous word may have only one or two senses in a document, and they are not uniformly distributed. Give a word, what we should do is to determine its word senses in a relevant document and estimate the distribution of these senses. Word sense disambiguation may help us in this problem. Then, there are many methods to compute word similarity according to WordNet, which we use to represent the weights of relationships between word senses. Varelas et al implemented some popular methods to compute semantic similarity by mapping terms to an ontology and examining their relationships in that ontology [4]. We also need to know which algorithm for semantic similarity is most suitable for our model.
       Additional, WordNet is suitable to calculate word similarity but not suitable to measure word relevance. The inner hyperlinks of Wikipedia could help us to calculate word relevance. We wish to find an effective way to combine the similarity measure from WordNet and relevance measure from Wikipedia, which may completely reflect word relationships.
    Descriptive modelling of text classification and its integration with other IR tasks BIBAFull-Text 1317-1318
      Miguel Martinez-Alvarez
    Nowadays, Information Retrieval (IR) systems have to deal with multiple sources of data available in different formats. Datasets can consist of complex and heterogeneous objects with relationships between them. In addition, information needs can vary wildly and they can include different tasks. As a result, the importance of flexibility in IR systems is rapidly growing. This fact is specially important in environments where the information required at different moments is very different and its utility may be contingent on timely implementation. In these cases, how quickly a new problem is solved is as important as how well you solve it.
       Current systems are usually developed for specific cases. It implies that too much engineering effort is needed to adapt them when new knowledge appears or there are changes in the requirements. Furthermore, heterogeneous and linked data present greater challenges, as well as the simultaneous application of different tasks.
       This research proposes the usage of descriptive approaches for three different purposes: the modelling of the specific task of Text Classification (TC), focusing on knowledge and complex data exploitation; the flexible application of models to different tasks; and the simultaneously application of different IR-tasks. This investigation will contribute to the long-term goal of achieving a descriptive and composable IR technology that provides a modular framework that knowledge engineers can compose into a task-specific solution. The ultimate goal is to develop a flexible framework that offers classifiers, retrieval models, information extractors, and other functions. In addition, those functional blocks could be customised to satisfy user needs.
       Descriptive approaches allow a high-level definition of algorithms which are, in some cases, as compact as mathematical formulations. One of the expected benefits is to make the implementation clearer and the knowledge transfer easier. They allow models from different tasks to be defined as modules that can be "concatenated", processing the information as a pipeline where some of the outputs of one module are the input of the following one. This combination involves minimum engineering effort due to the paradigm's "Plug & Play" capabilities offered by its functional syntax. This solution provides the flexibility needed to customise and quickly combine different IR-tasks and/or models.
       Classification is a desired candidate for being part of a flexible IR framework because it can be required in several situations for different purposes. In particular, descriptive approaches will improve its modelling with complex and heterogeneous objects. Furthermore, we aim to show how this approach allows to apply TC models for ad-hoc retrieval (and vice versa) and their simultaneous application for complex information needs.
       The main hypothesis of this research is that a seamless approach for modelling TC and its integration with other IR-tasks will provide a general framework for rapid prototyping and modelling of solutions for specific users. In addition, it will allow new complex models that take into account relationships and inference from large ontologies. The importance of flexibility for information systems and the exploitation of complex information and knowledge from heterogeneous sources are the main points for discussion. The main challenges are expressiveness and scalability. Abstraction improves flexibility and maintainability. However, it limits the modelling power. A balance between abstraction and expressiveness has to be reached. On the other hand, scalability has been traditionally a challenge for descriptive modelling. Our goal is to prove the feasibility of our approach for real-scale environments.
    Efficient and effective solutions for search engines BIBFull-Text 1319-1320
      Xiang-Fei Jia
    Modeling document scores for distributed information retrieval BIBAFull-Text 1321-1322
      Ilya Markov
    Distributed Information Retrieval (DIR), also known as Federated Search, integrates multiple searchable collections and provides direct access to them through a unified interface [3]. This is done by a centralized broker, that receives user queries, forwards them to appropriate collections and returns merged results to users.
       In practice, most of federated resources do not cooperate with a broker and do not provide neither their content nor the statistics used for retrieval. This is known as uncooperative DIR. In this case a broker creates a resource representation by sending sample queries to a collection and analyzing retrieved documents. This process is called query-based sampling. The key issue here is the following:
       1.1 How many documents have to be retrieved from a resource in order to obtain a representative sample?
       Although there have been a number of attempts to address this issue it is still not solved appropriately.
       For a given user query resources are ranked according to their similarity to the query or based on the number of relevant documents they contain. Since resource representations are usually incomplete, the similarity or the number of relevant documents cannot be calculated precisely. Resource selection algorithms proposed in the literature estimate these numbers based on incomplete samples. However these estimates are subjects to error. In practice, inaccurate estimates that have high error should be trusted less then the more accurate estimates with low error. Unfortunately none of the existing algorithms can make the calculation of the estimation errors possible. Therefore the following questions arise:
       2.1 How to estimate resource scores so that the estimation errors can be calculated?
       2.2 How to use these errors in order to improve the resource selection performance?
       Existing results merging algorithms estimate normalized document scores based on scores of documents that appear both in a sample and in a result list. The problem similar to the resource selection one arises. The normalized document scores are only the estimates and are subjects to error. Inaccurate estimates should be trusted less then the more accurate ones. Again none of the existing algorithms provide a way for calculating these errors. Thus the two question to be address on the results merging phase are similar to the resource selection ones:
       3.1 How to estimate normalized document scores so that the estimation errors can be calculated?
       3.2 How to use these errors in order to improve the results merging performance?
       In this work we address the above issues by applying score distribution models (SDM) to different phases of DIR [2]. In particular, we discuss the SDM-based resource selection technique that allows the calculation of resource score estimation errors and can be extended in order to calculate the number of documents to be sampled from each resource for a given query. We have performed initial experiments comparing the SDM-based resource selection technique to the state-of-the-art algorithms and we are currently experimenting with the SDM-based results merging method.
       We plan to apply the existing score normalization techniques from meta-search to the DIR results merging problem [1]. However, the SDM-based results merging approaches require the relevance scores to be returned together with retrieved documents. It is not yet clear how to relax this strong assumption that does not always hold in practice.
    Improving query and result list adaptation in personalized multilingual information retrieval BIBAFull-Text 1323-1324
      M. Rami Ghorab
    A general characteristic of Information Retrieval (IR) and Multilingual IR (MIR) [5] systems is that if the same query was submitted by different users, the system would yield the same results, regardless of the user. On the other hand, Adaptive Hypermedia (AH) systems operate in a personalized manner where the services are adapted to the user [1]. Personalized IR (PIR) is motivated by the success in both areas, IR and AH [4]. IR systems have the advantage of scalability and AH systems have the advantage of satisfying individual user needs. The majority of studies in PIR literature have focused on monolingual IR, and relatively little work has been done concerning multilingual IR.
       This PhD research study aims to improve personalization in MIR systems, by improving the relevance of multilingual search results with respect to the user and not just the query. The study investigates how to model different aspects of a multilingual search user. Information about users can be demographic information, such as language and country, or information about the user's search interests. This information can be gathered explicitly by asking the user to supply the required information or implicitly by inferring the information from the user's search history. The study will then investigate how to exploit the modeled user information to personalize the user's multilingual search by performing query and result list adaptation. The main research questions that are addressed in this study are: how to improve the relevance of search results with respect to individual users in PMIR and how to construct profiles that represent aspects and interests of a multilingual search user.
       So far, the work carried out for this study included: (1) a proposed framework for the delivery and evaluation of PMIR [3]; and (2) exploratory experiments with search history and collection (result) re-ranking on a dataset of multilingual search logs [2]. The next stage of experimentation will involve the investigation and development of algorithms for: (1) constructing multilingual user profiles; (2) pre-translation and post-translation query expansion based on terms from the user profile; and (3) result list re-ranking based on the user's interests, and preferred language.
       Two types of experiments will be conducted in an in-lab setting, with a group of users from different linguistic backgrounds. In the first set of experiments, users will be asked to use a baseline web search system for their daily search activities over a period of time. The baseline system will be wrapped around one of the major search engines. Interactions with the system will be logged, and part of this information will be used for training the system (constructing user profiles from text of queries and clicked documents); the other part (remaining queries) will be used for testing the effectiveness of the query adaptation and result list adaptation algorithms, where the users will be asked to provide some personal relevance judgements. In the second set of experiments, the users will be asked to use the PMIR system to fulfill a number of defined search tasks.
       Quantitative and qualitative techniques will be used to evaluate different aspects of the experiments, including: (1) retrieval effectiveness, which can be measured using standard IR metrics; (2) user's performance on search tasks, which can be measured in terms of time and number of actions needed to fulfill the tasks; (3) user profile accuracy, which can be assessed by questionnaires that indicate how well the user profile depicted the users' search interests; and (4) usability and user satisfaction, which can be assessed using standard system usability questionnaires.
    Using k-Top retrieved web snippets to date temporal implicit queries based on web content analysis BIBAFull-Text 1325-1326
      Ricardo Nuno Taborda Campos
    The World Wide Web (WWW) is a huge information network from which retrieving and organizing quality relevant content remains an open question for mostly all ambiguous queries. As an example, many queries have temporal implicit intents associated with them but they are not inferred by search engines. Inferring the user intentions and the period he has in mind, may therefore play an extremely important role in the improvement of the results. Our work goes in this direction. We aim to introduce a temporal analysis framework for analyzing documents in a temporal dimension in order to identify and understand the temporal nature of any given query, namely implicit ones. Our analysis is not based on metadata, but on the exploitation of temporal information from the content itself, particularly within web snippets, which are interesting pieces of concentrated information, where time clues, especially years, often appear. Our intention is to develop a language-independent solution and to model the degree of relationship between the terms and dates identified. This is the core part of the framework and the basis for both temporal query understanding and search results exploration, such as temporal clustering. We believe that inferring this knowledge is a very important step in the process of adding a temporal dimension to IR systems, thus disambiguating a large class of queries for which search engines continue to fail.
    Domain-specific information retrieval using recommenders BIBAFull-Text 1327-1328
      Wei Li
    The continuing increase in the volume of information available in our daily lives is creating ever greater challenges for people to find personally useful information. One approach used to addressing this problem is Personalized Information Retrieval (PIR). PIR systems collect a user's personal information from both implicit and explicit sources to build a user profile with the objective of giving retrieval results which better meet their individual user information needs than a standard Information Retrieval (IR) system. However, in many situations there may be no opportunity to learn about the specific interests of a user and build a personal model when this user is querying on a new topic, e.g. when a user visits a museum or exhibition which is unrelated to their normal search interests. Under this condition, the experiences and behaviours of other previous users, who have made similar queries, could be used to build a model of user behavior in this domain. My PhD proposes to focus on the development of new and innovative methods of domain-specific IR. My work seeks to combine recommender algorithms trained using previous search behaviours from different searchers with a standard ranked IR method to form a domain-specific IR model to improve the search effectiveness for a user entering a query without personal prior search history on this topic. The challenges for my work are: how to provide users better results; how to train and evaluate the methods proposed in my work.
    Understanding and using contextual information in recommender systems BIBFull-Text 1329-1330
      Licai Wang
    Multidimensional search result diversification: diverse search results for diverse users BIBAFull-Text 1331-1332
      Sumit Bhatia
    Hundreds of millions of people today rely on Web based Search Engines to satisfy their information needs. In order to meet the expectations of this vast and diverse user population, the search engine should present a list of results such that the probability of satisfying the average user is maximized. This leads us to the problem of Search Result Diversification. Given a user submitted query, the search engine should include results that are relevant to the user query and at the same time, diverse enough to meet the expectations of diverse user populations. However, it is not clear in what respect the results should be diversified.
       Much of the current work in diversity focuses on ambiguous and underspecified queries and tries to include results corresponding to diverse interpretations of the ambiguous query. This is not always sufficient. My analysis of a commercial web search engine's logs reveals that even for well-specified informational queries, click entropy is very high indicating that different users prefer different types of documents. Very recently, a diversification algorithm fine-tuned for such informational queries has been proposed. Further, high click entropies were also observed for a large fraction of transactional queries. One major goal of my PhD thesis will then be to identify the various possible dimensions along which the search results can be diversified. Having such an information will enhance our understanding about the expectations of an average user from the search engine. By utilizing aggregate statistics about queries, users and their interaction with the search engine for different queries, more concrete evidences about diverse user preferences as well as relative importance of different diversity dimensions can be derived.
       Once we know different diversity dimensions, the next natural question is: given a query, how can we determine the diversification requirement best suited for the query? For some queries sub-topic coverage may be more important while for others diversification with respect to document source or stylistics might be important. This problem is related to the problem of selective diversification where the goal is to identify queries for which diversification techniques should be used. However, in addition, we are also interested in identifying different diversity classes a given query belongs to. Further, for some queries it may be required to diversify along multiple diversity dimensions. In such cases, it is also important to determine the relative importance of different diversity dimensions for the given query. By utilizing past user interaction data, query level features (like query clarity, entropy, lexical features etc.) and document level features (e.g. popularity, content quality, previous click history etc.), classifiers for diversification requirements can be developed.
       Given a user query, once we know the type of diversity requirements for the user, an appropriate diversification technique is required. I would like to study the problem of simultaneously diversifying search results along multiple dimensions, as discussed above. One possible way here could be to build upon the nugget based framework introduced by Clarke et al. where we represent each document as a set of nuggets, each nugget corresponding to a diversity dimension.

    Industrial track

    Sensor-aided mobile information management and retrieval BIBAFull-Text 1333-1334
      Edward Y. Chang
    The number of "smart" mobile devices such as wireless phones and tablet computers has been rapidly growing. These mobile devices are equipped with a variety of sensors such as camera, gyroscope, accelerometer, compass, NFC, WiFi, GPS, etc. These sensors can be used to capture images and voice, detect motion patterns, and predict locations, to name just a few. This keynote depicts techniques in configuration, calibration, computation, and fusion for improving sensor performance and conserving power consumption. We also present novel mobile information management and retrieval applications that can benefit a great deal from enhanced sensor technologies.
    Predicting eBay listing conversion BIBAFull-Text 1335-1336
      Ted Tao Yuan; Zhaohui Chen; Mike Mathieson
    At eBay Market Place, listing conversion rate can be measured by number of items sold divided by number of items in a sample set. For a given item, conversion rate can also be treated as the probability of sale. By investigating eBay listings' transactional patterns, as well as item attributes and user click-through data, we developed conversion models that allow us to predict a live listing's probability of sale. In this paper, we discuss the design and implementation of such conversion models. These models are highly valuable in analysis of inventory quality and ranking. Our work reveals the uniqueness of sales-oriented search at eBay and its similarity to general web search problems.
    A large scale machine learning system for recommending heterogeneous content in social networks BIBAFull-Text 1337-1338
      Yanxin Shi; David Ye; Andrey Goder; Srinivas Narayanan
    The goal of the Facebook recommendation engine is to compare and rank heterogeneous types of content in order to find the most relevant recommendations based on user preference and page context. The challenges for such a recommendation engine include several aspects: 1) the online queries being processed are at very large scale; 2) with new content types and new user-generated content constantly added to the system, the candidate object set and underlying data distribution change rapidly; 3) different types of content usually have very distinct characteristics, which makes generic feature engineering difficult; and 4) unlike a search engine that can capture intention of users based on their search queries, our recommendation engine needs to focus more on users' profile and interests, past behaviors and current actions in order to infer their cognitive states. In this presentation, we would like to introduce an effective, scalable, online machine learning framework we developed in order to address the aforementioned challenges. We also want to discuss the insights, approaches and experiences we have accumulated during our research and development process.
    Review of MSR-Bing web scale speller challenge BIBAFull-Text 1339-1340
      Kuansan Wang; Jan Pedersen
    In this paper, we provide an overview of the MSR-Bing Web Scale Speller Challenge of 2011. We describe the motivation and outline the algorithmic and engineering challenges posed by this activity. The design and the evaluation methods are also reviewed, and the online resources that will remain publicly available to the community are also described. The Challenge will culminate in a workshop after the time of the writing where the top prize winners will publish their approaches. The main findings and the lessons learned will be summarized and shared in the Industry Track presentation accompanying this paper.
    Elsevier SIGIR 2011 application challenge abstract BIBAFull-Text 1341-1342
      Jukka Valimaki; Remko Caprio
    Elsevier SIGIR 2011 Application Challenge is an international competition that encourages software developers to create applications that run on Elsevier's SciVerse platform. The Challenge is open to all SIGIR 2011 Conference participants.