HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 2324252627282930313233

ACM Transactions on Information Systems 33

Editors:Maarten de Rijke
Standard No:ISSN 1046-8188; HF S548.125 A33
Links:Table of Contents
  1. TOIS 2015-03 Volume 33 Issue 1
  2. TOIS 2015-02 Volume 33 Issue 2
  3. TOIS 2015-03 Volume 33 Issue 3
  4. TOIS 2015-06 Volume 33 Issue 4

TOIS 2015-03 Volume 33 Issue 1

Special Issue on Contextual Search and Recommendation

Overview of the Special Issue on Contextual Search and Recommendation BIBAFull-Text 1e
  Paul N. Bennett; Kevyn Collins-Thompson; Diane Kelly; Ryen W. White; Yi Zhang
We solicited articles for this special issue that describe the state of the art and emerging trends in contextual search and recommendation. Although manuscripts focusing on all areas of contextual search and recommendation were considered, we especially encouraged submissions that targeted exploratory and/or complex tasks -- in particular, representations and approaches to context that enable task-oriented search, including tasks that persist longitudinally. From 20 submissions, we selected four high-quality articles that represent current themes of research on contextual search and recommendation.
User Activity Patterns During Information Search BIBAFull-Text 1
  Michael J. Cole; Chathra Hendahewa; Nicholas J. Belkin; Chirag Shah
Personalization of support for information seeking depends crucially on the information retrieval system's knowledge of the task that led the person to engage in information seeking. Users work during information search sessions to satisfy their task goals, and their activity is not random. To what degree are there patterns in the user activity during information search sessions? Do activity patterns reflect the user's situation as the user moves through the search task under the influence of his or her task goal? Do these patterns reflect aspects of different types of information-seeking tasks? Could such activity patterns identify contexts within which information seeking takes place? To investigate these questions, we model sequences of user behaviors in two independent user studies of information search sessions (N = 32 users, 128 sessions, and N = 40 users, 160 sessions). Two representations of user activity patterns are used. One is based on the sequences of page use; the other is based on a cognitive representation of information acquisition derived from eye movement patterns in service of the reading process. One of the user studies considered journalism work tasks; the other concerned background research in genomics using search tasks taken from the TREC Genomics Track. The search tasks differed in basic dimensions of complexity, specificity, and the type of information product (intellectual or factual) needed to achieve the overall task goal. The results show that similar patterns of user activity are observed at both the cognitive and page use levels. The activity patterns at both representation layers are able to distinguish between task types in similar ways and, to some degree, between tasks of different levels of difficulty. We explore relationships between the results and task difficulty and discuss the use of activity patterns to explore events within a search session. User activity patterns can be at least partially observed in server-side search logs. A focus on patterns of user activity sequences may contribute to the development of information systems that better personalize the user's search experience.
Who, Where, When, and What: A Nonparametric Bayesian Approach to Context-aware Recommendation and Search for Twitter Users BIBAFull-Text 2
  Quan Yuan; Gao Cong; Kaiqi Zhao; Zongyang Ma; Aixin Sun
Micro-blogging services and location-based social networks, such as Twitter, Weibo, and Foursquare, enable users to post short messages with timestamps and geographical annotations. The rich spatial-temporal-semantic information of individuals embedded in these geo-annotated short messages provides exciting opportunity to develop many context-aware applications in ubiquitous computing environments. Example applications include contextual recommendation and contextual search. To obtain accurate recommendations and most relevant search results, it is important to capture users' contextual information (e.g., time and location) and to understand users' topical interests and intentions. While time and location can be readily captured by smartphones, understanding user's interests and intentions calls for effective methods in modeling user mobility behavior. Here, user mobility refers to who visits which place at what time for what activity. That is, user mobility behavior modeling must consider user (Who), spatial (Where), temporal (When), and activity (What) aspects. Unfortunately, no previous studies on user mobility behavior modeling have considered all of the four aspects jointly, which have complex interdependencies. In our preliminary study, we propose the first solution named W4 (short for Who, Where, When, and What) to discover user mobility behavior from the four aspects. In this article, we further enhance W4 and propose a nonparametric Bayesian model named EW4 (short for Enhanced W4). EW4 requires no parameter tuning and achieves better results over W4 in our experiments. Given some of the four aspects of a user (e.g., time), our model is able to infer information of the other aspects (e.g., location and topical words). Thus, our model has a variety of context-aware applications, particularly in contextual search and recommendation. Experimental results on two real-world datasets show that the proposed model is effective in discovering users' spatial-temporal topics. The model also significantly outperforms state-of-the-art baselines for various tasks including location prediction for tweets and requirement-aware location recommendation.
Task-Based Information Interaction Evaluation: The Viewpoint of Program Theory BIBAFull-Text 3
  Kalervo Järvelin; Pertti Vakkari; Paavo Arvola; Feza Baskaya; Anni Järvelin; Jaana Kekäläinen; Heikki Keskustalo; Sanna Kumpulainen; Miamaria Saastamoinen; Reijo Savolainen; Eero Sormunen
Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms' capability of ranking relevant documents optimally for the users, given a query. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies, the results have often suggested that better ranking does not necessarily lead to better search task, or work task, performance. Therefore, it is not clear which system or interface features should be developed to improve the effectiveness of human task performance. In the present article, we focus on the evaluation of task-based information interaction (TBII). We give special emphasis to learning tasks to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching information items, selecting between them, working with them, and synthesizing and reporting. These five generic activities contribute to task performance and outcome and can be supported by information systems. In an attempt toward task-based evaluation, we introduce program theory as the evaluation framework. Such evaluation can investigate whether a program consisting of TBII activities and tools works and how it works and, further, provides a causal description of program (in)effectiveness. Our goal in the present article is to structure TBII on the basis of the five generic activities and consider the evaluation of each activity using the program theory framework. Finally, we combine these activity-based program theories in an overall evaluation framework for TBII. Such an evaluation is complex due to the large number of factors affecting information interaction. Instead of presenting tested program theories, we illustrate how the evaluation of TBII should be accomplished using the program theory framework in the evaluation of systems and behaviors, and their interactions, comprehensively in context.
Profile-Based Summarisation for Web Site Navigation BIBAFull-Text 4
  Azhar Alhindi; Udo Kruschwitz; Chris Fox; M-Dyaa Albakour
Information systems that utilise contextual information have the potential of helping a user identify relevant information more quickly and more accurately than systems that work the same for all users and contexts. Contextual information comes in a variety of types, often derived from records of past interactions between a user and the information system. It can be individual or group based. We are focusing on the latter, harnessing the search behaviour of cohorts of users, turning it into a domain model that can then be used to assist other users of the same cohort. More specifically, we aim to explore how such a domain model is best utilised for profile-biased summarisation of documents in a navigation scenario in which such summaries can be displayed as hover text as a user moves the mouse over a link. The main motivation is to help a user find relevant documents more quickly. Given the fact that the Web in general has been studied extensively already, we focus our attention on Web sites and similar document collections. Such collections can be notoriously difficult to search or explore. The process of acquiring the domain model is not a research interest here; we simply adopt a biologically inspired method that resembles the idea of ant colony optimisation. This has been shown to work well in a variety of application areas. The model can be built in a continuous learning cycle that exploits search patterns as recorded in typical query log files. Our research explores different summarisation techniques, some of which use the domain model and some that do not. We perform task-based evaluations of these different techniques -- thus of the impact of the domain model and profile-biased summarisation -- in the context of Web site navigation.

TOIS 2015-02 Volume 33 Issue 2

A Comparative Analysis of Interleaving Methods for Aggregated Search BIBAFull-Text 5
  Aleksandr Chuklin; Anne Schuth; Ke Zhou; Maarten De Rijke
A result page of a modern search engine often goes beyond a simple list of "10 blue links." Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic Web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages, interleaving can easily break the desired Web layout in which vertical documents are grouped together, and hence hurt the user experience.
   We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods, we use real-world click data as well as simulated clicks and simulated ranking systems.
Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context BIBAFull-Text 6
  Lidong Bing; Wai Lam; Tak-Lam Wong; Shoaib Jameel
An important way to improve users' satisfaction in Web search is to assist them by issuing more effective queries. One such approach is query reformulation, which generates new queries according to the current query issued by users. A common procedure for conducting reformulation is to generate some candidate queries first, then a scoring method is employed to assess these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the history queries and cannot detect and maintain the semantic consistency of queries. In this article, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to detect semantic dependency of terms in a query and dependency among topics. Meanwhile, the graphical model also captures the term context in the history query by skip-bigram and n-gram language models. In addition, our model can be easily extended to consider users' history search interests when we conduct query reformulation for different users. In the task of candidate query generation, we investigate a social tagging data resource -- Delicious bookmark -- to generate addition and substitution patterns that are employed as supplements to the patterns generated from query log data.
TASC: A Transformation-Aware Soft Cascading Approach for Multimodal Video Copy Detection BIBAFull-Text 7
  Yonghong Tian; Mengren Qian; Tiejun Huang
How to precisely and efficiently detect near-duplicate copies with complicated audiovisual transformations from a large-scale video database is a challenging task. To cope with this challenge, this article proposes a transformation-aware soft cascading (TASC) approach for multimodal video copy detection. Basically, our approach divides query videos into some categories and then for each category designs a transformation-aware chain to organize several detectors in a cascade structure. In each chain, efficient but simple detectors are placed in the forepart, whereas effective but complex detectors are located in the rear. To judge whether two videos are near-duplicates, a Detection-on-Copy-Units mechanism is introduced in the TASC, which makes the decision of copy detection depending on the similarity between their most similar fractions, called copy units (CUs), rather than the video-level similarity. Following this, we propose a CU search algorithm to find a pair of CUs from two videos and a CU-based localization algorithm to find the precise locations of their copy segments that are with the asserted CUs as the center. Moreover, to address the problem that the copies and noncopies are possibly linearly inseparable in the feature space, the TASC also introduces a flexible strategy, called soft decision boundary, to replace the single threshold strategy for each detector. Its basic idea is to automatically learn two thresholds for each detector to examine the easy-to-judge copies and noncopies, respectively, and meanwhile to train a nonlinear classifier to further check those hard-to-judge ones. Extensive experiments on three benchmark datasets showed that the TASC can achieve excellent copy detection accuracy and localization precision with a very high processing efficiency.
Two-Stage Document Length Normalization for Information Retrieval BIBAFull-Text 8
  Seung-Hoon Na
The standard approach for term frequency normalization is based only on the document length. However, it does not distinguish the verbosity from the scope, these being the two main factors determining the document length. Because the verbosity and scope have largely different effects on the increase in term frequency, the standard approach can easily suffer from insufficient or excessive penalization depending on the specific type of long document. To overcome these problems, this article proposes two-stage normalization by performing verbosity and scope normalization separately, and by employing different penalization functions. In verbosity normalization, each document is prenormalized by dividing the term frequency by the verbosity of the document. In scope normalization, an existing retrieval model is applied in a straightforward manner to the prenormalized document, finally leading us to formulate our proposed verbosity normalized (VN) retrieval model. Experimental results carried out on standard TREC collections demonstrate that the VN model leads to marginal but statistically significant improvements over standard retrieval models.
Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods BIBAFull-Text 9
  Julien Ah-Pine; Gabriela Csurka; Stéphane Clinchant
Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.

TOIS 2015-03 Volume 33 Issue 3

Dynamic User Modeling in Social Media Systems BIBAFull-Text 10
  Hongzhi Yin; Bin Cui; Ling Chen; Zhiting Hu; Xiaofang Zhou
Social media provides valuable resources to analyze user behaviors and capture user preferences. This article focuses on analyzing user behaviors in social media systems and designing a latent class statistical mixture model, named temporal context-aware mixture model (TCAM), to account for the intentions and preferences behind user behaviors. Based on the observation that the behaviors of a user in social media systems are generally influenced by intrinsic interest as well as the temporal context (e.g., the public's attention at that time), TCAM simultaneously models the topics related to users' intrinsic interests and the topics related to temporal context and then combines the influences from the two factors to model user behaviors in a unified way. Considering that users' interests are not always stable and may change over time, we extend TCAM to a dynamic temporal context-aware mixture model (DTCAM) to capture users' changing interests. To alleviate the problem of data sparsity, we exploit the social and temporal correlation information by integrating a social-temporal regularization framework into the DTCAM model. To further improve the performance of our proposed models (TCAM and DTCAM), an item-weighting scheme is proposed to enable them to favor items that better represent topics related to user interests and topics related to temporal context, respectively. Based on our proposed models, we design a temporal context-aware recommender system (TCARS). To speed up the process of producing the top-k recommendations from large-scale social media data, we develop an efficient query-processing technique to support TCARS. Extensive experiments have been conducted to evaluate the performance of our models on four real-world datasets crawled from different social media sites. The experimental results demonstrate the superiority of our models, compared with the state-of-the-art competitor methods, by modeling user behaviors more precisely and making more effective and efficient recommendations.
Stochastic Query Covering for Fast Approximate Document Retrieval BIBAFull-Text 11
  Aris Anagnostopoulos; Luca Becchetti; Ilaria Bordino; Stefano Leonardi; Ida Mele; Piotr Sankowski
We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios.
Induced Sorting Suffixes in External Memory BIBAFull-Text 12
  Ge Nong; Wai Hong Chan; Sheng Qing Hu; Yi Wu
We present in this article an external memory algorithm, called disk SA-IS (DSA-IS), to exactly emulate the induced sorting algorithm SA-IS previously proposed for sorting suffixes in RAM. DSA-IS is a new disk-friendly method for sequentially retrieving the preceding character of a sorted suffix to induce the order of the preceding suffix. For a sizen string of a constant or integer alphabet, given the RAM capacity Ω ((nW)0.5), where W is the size of each I/O buffer that is large enough to amortize the overhead of each access to disk, both the CPU time and peak disk use of DSA-IS are O(n). Our experimental study shows that on average, DSA-IS achieves the best time and space results of all of the existing external memory algorithms based on the induced sorting principle.
Browsing Hierarchy Construction by Minimum Evolution BIBAFull-Text 13
  Hui Yang
Hierarchies serve as browsing tools to access information in document collections. This article explores techniques to derive browsing hierarchies that can be used as an information map for task-based search. It proposes a novel minimum-evolution hierarchy construction framework that directly learns semantic distances from training data and from users to construct hierarchies. The aim is to produce globally optimized hierarchical structures by incorporating user-generated task specifications into the general learning framework. Both an automatic version of the framework and an interactive version are presented. A comparison with state-of-the-art systems and a user study jointly demonstrate that the proposed framework is highly effective.
Metrics and Algorithms for Routing Questions to User Communities BIBAFull-Text 14
  Aditya Pal
An online community consists of a group of users who share a common interest, background, or experience, and their collective goal is to contribute toward the welfare of the community members. Several websites allow their users to create and manage niche communities, such as Yahoo! Groups, Facebook Groups, Google+ Circles, and WebMD Forums. These community services also exist within enterprises, such as IBM Connections. Question answering within these communities enables their members to exchange knowledge and information with other community members. However, the onus of finding the right community for question asking lies with an individual user. The overwhelming number of communities necessitates the need for a good question routing strategy so that new questions get routed to an appropriately focused community and thus get resolved in a reasonable time frame.
   In this article, we consider the novel problem of routing a question to the right community and propose a framework for selecting and ranking the relevant communities for a question. We propose several novel features for modeling the three main entities of the system: questions, users, and communities. We propose features such as language attributes, inclination to respond, user familiarity, and difficulty of a question; based on these features, we propose similarity metrics between the routed question and the system entities. We introduce a Cutoff-Aggregation (CA) algorithm that aggregates the entity similarity within a community to compute that community's relevance. We introduce two k-nearest-neighbor (knn) algorithms that are a natural instantiation of the CA algorithm, which are computationally efficient and evaluate several ranking algorithms over the aggregate similarity scores computed by the two knn algorithms. We propose clustering techniques to speed up our recommendation framework and show how pipelining can improve the model performance. We demonstrate the effectiveness of our framework on two large real-world datasets.
A General SIMD-Based Approach to Accelerating Compression Algorithms BIBAFull-Text 15
  Wayne Xin Zhao; Xudong Zhang; Daniel Lemire; Dongdong Shan; Jian-Yun Nie; Hongfei Yan; Ji-Rong Wen
Compression algorithms are important for data-oriented tasks, especially in the era of "Big Data." Modern processors equipped with powerful SIMD instruction sets provide us with an opportunity for achieving better compression performance. Previous research has shown that SIMD-based optimizations can multiply decoding speeds. Following these pioneering studies, we propose a general approach to accelerate compression algorithms. By instantiating the approach, we have developed several novel integer compression algorithms, called Group-Simple, Group-Scheme, Group-AFOR, and Group-PFD, and implemented their corresponding vectorized versions. We evaluate the proposed algorithms on two public TREC datasets, a Wikipedia dataset, and a Twitter dataset. With competitive compression ratios and encoding speeds, our SIMD-based algorithms outperform state-of-the-art nonvectorized algorithms with respect to decoding speeds.

TOIS 2015-06 Volume 33 Issue 4

Understanding and Supporting Cross-Device Web Search for Exploratory Tasks with Mobile Touch Interactions BIBAFull-Text 16
  Shuguang Han; Zhen Yue; Daqing He
Mobile devices enable people to look for information at the moment when their information needs are triggered. While experiencing complex information needs that require multiple search sessions, users may utilize desktop computers to fulfill information needs started on mobile devices. Under the context of mobile-to-desktop web search, this article analyzes users' behavioral patterns and compares them to the patterns in desktop-to-desktop web search. Then, we examine several approaches of using Mobile Touch Interactions (MTIs) to infer relevant content so that such content can be used for supporting subsequent search queries on desktop computers. The experimental data used in this article was collected through a user study involving 24 participants and six properly designed cross-device web search tasks. Our experimental results show that (1) users' mobile-to-desktop search behaviors do significantly differ from desktop-to-desktop search behaviors in terms of information exploration, sense-making and repeated behaviors. (2) MTIs can be employed to predict the relevance of click-through documents, but applying document-level relevant content based on the predicted relevance does not improve search performance. (3) MTIs can also be used to identify the relevant text chunks at a fine-grained subdocument level. Such relevant information can achieve better search performance than the document-level relevant content. In addition, such subdocument relevant information can be combined with document-level relevance to further improve the search performance. However, the effectiveness of these methods relies on the sufficiency of click-through documents. (4) MTIs can also be obtained from the Search Engine Results Pages (SERPs). The subdocument feedbacks inferred from this set of MTIs even outperform the MTI-based subdocument feedback from the click-through documents.
Selective Search: Efficient and Effective Search of Large Textual Collections BIBAFull-Text 17
  Anagha Kulkarni; Jamie Callan
The traditional search solution for large collections divides the collection into subsets (shards), and processes the query against all shards in parallel (exhaustive search). The search cost and the computational requirements of this approach are often prohibitively high for organizations with few computational resources. This article investigates and extends an alternative: selective search, an approach that partitions the dataset based on document similarity to obtain topic-based shards, and searches only a few shards that are estimated to contain relevant documents for the query. We propose shard creation techniques that are scalable, efficient, self-reliant, and create topic-based shards with low variance in size, and high density of relevant documents.
   The experimental results demonstrate that the effectiveness of selective search is on par with that of exhaustive search, and the corresponding search costs are substantially lower with the former. Also, the majority of the queries perform as well or better with selective search. An oracle experiment that uses optimal shard ranking for a query indicates that selective search can outperform the effectiveness of exhaustive search. Comparison with a query optimization technique shows higher improvements in efficiency with selective search. The overall best efficiency is achieved when the two techniques are combined in an optimized selective search approach.
Belief Dynamics and Biases in Web Search BIBAFull-Text 18
  Ryen W. White; Eric Horvitz
We investigate how beliefs about the efficacy of medical interventions are influenced by searchers' exposure to information on retrieved Web pages. We present a methodology for measuring participants' beliefs and confidence about the efficacy of treatment before, during, and after search episodes. We consider interventions studied in the Cochrane collection of meta-analyses. We extract related queries from search engine logs and consider the Cochrane assessments as ground truth. We analyze the dynamics of belief over time and show the influence of prior beliefs and confidence at the end of sessions. We present evidence for confirmation bias and for anchoring-and-adjustment during search and retrieval. Then, we build predictive models to estimate postsearch beliefs using sets of features about behavior and content. The findings provide insights about the influence of Web content on the beliefs of people and have implications for the design of search systems.
Fast Forward Index Methods for Pseudo-Relevance Feedback Retrieval BIBAFull-Text 19
  Edward Kai Fung Dang; Robert Wing Pong Luk; James Allan
The inverted index is the dominant indexing method in information retrieval systems. It enables fast return of the list of all documents containing a given query term. However, for retrieval schemes involving query expansion, as in pseudo-relevance feedback (PRF), the retrieval time based on an inverted index increases linearly with the number of expansion terms. In this regard, we have examined the use of a forward index, which consists of the mapping of each document to its constituent terms. We propose a novel forward index-based reranking scheme to shorten the PRF retrieval time. In our method, a first retrieval of the original query is performed using an inverted index, and then a forward index is employed for the PRF part. We have studied several new forward indexes, including using a novel spstring data structure and the weighted variable bit-block compression (wvbc) signature. With modern hardware such as solid-state drives (SSDs) and sufficiently large main memory, forward index methods are particularly promising. We find that with the whole index stored in main memory, PRF retrieval using a spstring or wvbc forward index excels in time efficiency over an inverted index, being able to obtain the same levels of performance measures at shorter times.
The Query Change Model: Modeling Session Search as a Markov Decision Process BIBAFull-Text 20
  Hui Yang; Dongyi Guan; Sicong Zhang
Modern information retrieval (IR) systems exhibit user dynamics through interactivity. These dynamic aspects of IR, including changes found in data, users, and systems, are increasingly being utilized in search engines. Session search is one such IR task -- document retrieval within a session. During a session, a user constantly modifies queries to find documents that fulfill an information need. Existing IR techniques for assisting the user in this task are limited in their ability to optimize over changes, learn with a minimal computational footprint, and be responsive. This article proposes a novel query change retrieval model (QCM), which uses syntactic editing changes between consecutive queries, as well as the relationship between query changes and previously retrieved documents, to enhance session search. We propose modeling session search as a Markov decision process (MDP). We consider two agents in this MDP: the user agent and the search engine agent. The user agent's actions are query changes that we observe, and the search engine agent's actions are term weight adjustments as proposed in this work. We also investigate multiple query aggregation schemes and their effectiveness on session search. Experiments show that our approach is highly effective and outperforms top session search systems in TREC 2011 and TREC 2012.
A Pólya Urn Document Language Model for Improved Information Retrieval BIBAFull-Text 21
  Ronan Cummins; Jiaul H. Paik; Yuanhua Lv
The multinomial language model has been one of the most effective models of retrieval for more than a decade. However, the multinomial distribution does not model one important linguistic phenomenon relating to term dependency -- that is, the tendency of a term to repeat itself within a document (i.e., word burstiness). In this article, we model document generation as a random process with reinforcement (a multivariate Pólya process) and develop a Dirichlet compound multinomial language model that captures word burstiness directly.
   We show that the new reinforced language model can be computed as efficiently as current retrieval models, and with experiments on an extensive set of TREC collections, we show that it significantly outperforms the state-of-the-art language model for a number of standard effectiveness metrics. Experiments also show that the tuning parameter in the proposed model is more robust than that in the multinomial language model. Furthermore, we develop a constraint for the verbosity hypothesis and show that the proposed model adheres to the constraint. Finally, we show that the new language model essentially introduces a measure closely related to idf, which gives theoretical justification for combining the term and document event spaces in tf-idf type schemes.