HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 9697989900010203040506070809101112131415

Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Susan Dumais; Efthimis N. Efthimiadis; David Hawking; Kalervo Jarvelin
Location:Seattle, Washington, USA
Dates:2006-Aug-06 to 2006-Aug-11
Standard No:ISBN 1-59593-369-7; ACM Order Number: 606060; ACM DL: Table of Contents hcibib: IR06
  1. Keynote
  2. User behavior and modeling
  3. Handling messages and finding experts
  4. Speech and music
  5. Web 1: exploiting graph structure
  6. Semantics
  7. Fusion and spam
  8. Relevance feedback
  9. Formal models
  10. Cross language
  11. Keynote
  12. Question and answering
  13. Machine learning
  14. Evaluation 1: user models and test collections
  15. Web 2
  16. Distributed IR
  17. Efficiency
  18. Keynote
  19. Queries
  20. Clustering
  21. The first page of results
  22. Users: clarification, feedback, and browsing
  23. Classification and machine learning
  24. Recommendation: use and abuse
  25. Evaluation 2
  26. Web IR: current topics
  27. Summarization: multidocuments and new applications
  28. Posters
  29. Demos


Quantum haystacks BIBAFull-Text 1-2
  C. J. 'Keith' van Rijsbergen
This acceptance talk is a curious mixture of personal history and developing ideas in the context of the growing field of IR covering several decades. I want to concentrate on models and theories, interpreted loosely, and try and give an insight into where I have got to in my thinking, where the ideas came from, and where I believe we are going.
   In the last few years I have been working on the development of what might be coined as a design language for IR. It takes its inspiration from Quantum Mechanics, but by analogy only. The mathematical objects represent documents; these objects might be vectors (or density operators) in an n-dimensional vector space (usually a Hilbert space). A request for information, or a query, is taken as an observable and is represented as a linear operator on the space. Linear operators can be expressed as matrices. Such an operator, Hermitian, has a set of eigenvectors forming a basis for the space; which we interpret as a point of view or perspective from which to understand the space. Thus any document-vector can be located with respect to the basis, and we can calculate an inner product between such a vector and any basis vector, which may be interpreted as a probability of relevance. The probability of observing any given eigenvector is now given by the square of that inner product assuming all vectors are normalised. Hence we connect the probability of observation to the geometry of the space. Furthermore, the subspaces of the space make up a lattice structure which is equivalent to a logic. This makes up the entire mathematical structure, and the language for handling this structure is linear algebra: vectors, matrices, projections, inner-products, neatly captured by the Dirac notation used in quantum mechanics. Our probability is slightly different from classical probability, the same for logic; we end up with quantum logic and quantum probability.
   A commitment to this kind of mathematical structure, with which to model objects and processes in IR, depends on two critical assumptions.

User behavior and modeling

Learning user interaction models for predicting web search result preferences BIBAFull-Text 3-10
  Eugene Agichtein; Eric Brill; Susan Dumais; Robert Ragno
Evaluating user preferences of web search results is crucial for search engine development, deployment, and maintenance. We present a real-world study of modeling the behavior of web search users to predict web search result preferences. Accurate modeling and interpretation of user behavior has important applications to ranking, click spam detection, web search personalization, and other tasks. Our key insight to improving robustness of interpreting implicit feedback is to model query-dependent deviations from the expected "noisy" user behavior. We show that our model of clickthrough interpretation improves prediction accuracy over state-of-the-art clickthrough methods. We generalize our approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone. We report results of a large-scale experimental evaluation that show substantial improvements over published implicit feedback interpretation methods.
User performance versus precision measures for simple search tasks BIBAFull-Text 11-18
  Andrew Turpin; Falk Scholer
Several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as SIGIR and TREC do not translate into a benefit for users. Two of the studies used an instance recall task, and a third used a question answering task, so perhaps it is unsurprising that the precision based measures of IR system effectiveness on one-shot query evaluation do not correlate with user performance on these tasks. In this study, we evaluate two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a single document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of relevant documents that users can identify within five minutes. Users employ search engines with controlled mean average precision (MAP) of between 55% and 95%. Our results show that there is no significant relationship between system effectiveness measured by MAP and the precision-based task. A significant, but weak relationship is present for the precision at one document returned metric. A weak relationship is present between MAP and the simple recall-based task.
Improving web search ranking by incorporating user behavior information BIBAFull-Text 19-26
  Eugene Agichtein; Eric Brill; Susan Dumais
We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000 queries and 12 million user interactions with a popular web search engine. We show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithms by as much as 31% relative to the original performance.

Handling messages and finding experts

Contextual search and name disambiguation in email using graphs BIBAFull-Text 27-34
  Einat Minkov; William W. Cohen; Andrew Y. Ng
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
Thread detection in dynamic text message streams BIBAFull-Text 35-42
  Dou Shen; Qiang Yang; Jian-Tao Sun; Zheng Chen
Text message stream is a newly emerging type of Web data which is produced in enormous quantities with the popularity of Instant Messaging and Internet Relay Chat. It is beneficial for detecting the threads contained in the text stream for various applications, including information retrieval, expert recognition and even crime prevention. Despite its importance, not much research has been conducted so far on this problem due to the characteristics of the data in which the messages are usually very short and incomplete. In this paper, we present a stringent definition of the thread detection task and our preliminary solution to it. We propose three variations of a single-pass clustering algorithm for exploiting the temporal information in the streams. An algorithm based on linguistic features is also put forward to exploit the discourse structure information. We conducted several experiments to compare our approaches with some existing algorithms on a real dataset. The results show that all three variations of the single-pass algorithm outperform the basic single-pass algorithm. Our proposed algorithm based on linguistic features improves the performance relatively by 69.5% and 9.7% when compared with the basic single-pass algorithm and the best variation algorithm in terms of F1 respectively.
Formal models for expert finding in enterprise corpora BIBAFull-Text 43-50
  Krisztian Balog; Leif Azzopardi; Maarten de Rijke
Searching an organization's document repositories for experts provides a cost effective solution for the task of expert finding. We present two general strategies to expert searching given a document collection which are formalized using generative probabilistic models. The first of these directly models an expert's knowledge based on the documents that they are associated with, whilst the second locates documents on topic, and then finds the associated expert. Forming reliable associations is crucial to the performance of expert finding systems. Consequently, in our evaluation we compare the different approaches, exploring a variety of associations along with other operational parameters (such as topicality). Using the TREC Enterprise corpora, we show that the second strategy consistently outperforms the first. A comparison against other unsupervised techniques, reveals that our second model delivers excellent performance.

Speech and music

Spoken document retrieval from call-center conversations BIBAFull-Text 51-58
  Jonathan Mamou; David Carmel; Ron Hoory
We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording quality, which makes automatic speech recognition (ASR) a highly difficult task. For typical call-center data, even state-of-the-art large vocabulary continuous speech recognition systems produce a transcript with word error rate of 30% or higher. In addition to the output transcript, advanced systems provide word confusion networks (WCNs), a compact representation of word lattices associating each word hypothesis with its posterior probability. Our work exploits the information provided by WCNs in order to improve retrieval performance. In this paper, we show that the mean average precision (MAP) is improved using WCNs compared to the raw word transcripts. Finally, we analyze the effect of increasing ASR word error rate on search effectiveness. We show that MAP is still reasonable even under extremely high error rate.
Towards efficient automated singer identification in large music databases BIBAFull-Text 59-66
  Jialie Shen; Bin Cui; John Shepherd; Kian-Lee Tan
Automated singer identification is important in organising, browsing and retrieving data in large music databases. In this paper, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for automated singer recognition. HSI can effectively use multiple low-level features extracted from both vocal and non-vocal music segments to enhance the identification process with a hybrid architecture and build profiles of individual singer characteristics based on statistical mixture models. Extensive experimental results conducted on a large music database demonstrate the superiority of our method over state-of-the-art approaches.
Music structure based vector space retrieval BIBAFull-Text 67-74
  Namunu C. Maddage; Haizhou Li; Mohan S. Kankanhalli
This paper proposes a novel framework for music content indexing and retrieval. The music structure information, i.e., timing, harmony and music region content, is represented by the layers of the music structure pyramid. We begin by extracting this layered structure information. We analyze the rhythm of the music and then segment the signal proportional to the inter-beat intervals. Thus, the timing information is incorporated in the segmentation process, which we call Beat Space Segmentation. To describe Harmony Events, we propose a two-layer hierarchical approach to model the music chords. We also model the progression of instrumental and vocal content as Acoustic Events. After information extraction, we propose a vector space modeling approach which uses these events as the indexing terms. In query-by-example music retrieval, a query is represented by a vector of the statistics of the n-gram events. We then propose two effective retrieval models, a hard-indexing scheme and a soft-indexing scheme. Experiments show that the vector space modeling is effective in representing the layered music information, achieving 82.5% top-5 retrieval accuracy using 15-sec music clips as the queries. The soft-indexing outperforms hard-indexing in general.

Web 1: exploiting graph structure

AggregateRank: bringing order to web sites BIBAFull-Text 75-82
  Guang Feng; Tie-Yan Liu; Ying Wang; Ying Bao; Zhiming Ma; Xu-Dong Zhang; Wei-Ying Ma
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website.
   In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods.
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models BIBAFull-Text 83-90
  Oren Kurland; Lillian Lee
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform reranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them.
   We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based reranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.
Topical link analysis for web search BIBAFull-Text 91-98
  Lan Nie; Brian D. Davison; Xiaoguang Qi
Traditional web link-based ranking schemes use a single score to measure a page's authority without concern of the community from which that authority is derived. As a result, a resource that is highly popular for one topic may dominate the results of another topic in which it is less authoritative. To address this problem, we suggest calculating a score vector for each page to distinguish the contribution from different topics, using a random walk model that probabilistically combines page topic distribution and link structure. We show how to incorporate the topical model within both PageRank and HITS without affecting the overall property and still render insight into topic-level transition. Experiments on multiple datasets indicate that our technique outperforms other ranking approaches that incorporate textual analysis.


The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine BIBAFull-Text 99-106
  Jimmy Lin; Dina Demner-Fushman
Despite its intuitive appeal, the hypothesis that retrieval at the level of "concepts" should outperform purely term-based approaches remains unverified empirically. In addition, the use of "knowledge" has not consistently resulted in performance gains. After identifying possible reasons for previous negative results, we present a novel framework for "conceptual retrieval" that articulates the types of knowledge that are important for information seeking. We instantiate this general framework in the domain of clinical medicine based on the principles of evidence-based medicine (EBM). Experiments show that an EBM-based scoring algorithm dramatically outperforms a state-of-the-art baseline that employs only term statistics. Ablation studies further yield a better understanding of the performance contributions of different components. Finally, we discuss how other domains can benefit from knowledge-based approaches.
A parallel derivation of probabilistic information retrieval models BIBAFull-Text 107-114
  Thomas Roelleke; Jun Wang
This paper investigates in a stringent mathematical formalism the parallel derivation of three grand probabilistic retrieval models: binary independent retrieval (BIR), Poisson model (PM), and language modelling (LM). The investigation has been motivated by a number of questions. Firstly, though sharing the same origin, namely the probability of relevance, the models differ with respect to event spaces. How can this be captured in a consistent notation, and can we relate the event spaces? Secondly, BIR and PM are closely related, but how does LM fit in? Thirdly, how are tf-idf and probabilistic models related? The parallel investigation of the models leads to a number of formalised results:
  • 1. BIR and PM assume the collection to be a set of non-relevant documents,
        whereas LM assumes the collection to be a set of terms from relevant
  • 2. PM can be viewed as a bridge connecting BIR and LM.
  • 3. A BIR-LM equivalence explains BIR as a special LM case.
  • 4. PM explains tf-idf, and both, BIR and LM probabilities express tf-idf in a
        dual way.
  • Semantic term matching in axiomatic approaches to information retrieval BIBAFull-Text 115-122
      Hui Fang; ChengXiang Zhai
    A common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching of terms in the queries and documents, without allowing distinct but semantically related terms to match each other and contribute to the retrieval score. In this paper, we show that semantic term matching can be naturally incorporated into the axiomatic retrieval model through defining the primitive weighting function based on a semantic similarity function of terms. We define several desirable retrieval constraints for semantic term matching and use such constraints to extend the axiomatic model to directly support semantic term matching based on the mutual information of terms computed on some document set. We show that such extension can be efficiently implemented as query expansion. Experiment results on several representative data sets show that, with mutual information computed over the documents in either the target collection for retrieval or an external collection such as the Web, our semantic expansion consistently and substantially improves retrieval accuracy over the baseline axiomatic retrieval model. As a pseudo feedback method, our method also outperforms a state-of-the-art language modeling feedback method.

    Fusion and spam

    On-line spam filter fusion BIBAFull-Text 123-130
      Thomas R. Lynam; Gordon V. Cormack; David R. Cheriton
    We show that a set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters. The results of fifty-three spam filters evaluated at the TREC 2005 Spam Track were combined post-hoc so as to simulate the parallel on-line operation of the filters. The combined results were evaluated using the TREC methodology, yielding more than a factor of two improvement over the best filter. The simplest method -- averaging the binary classifications returned by the individual filters -- yields a remarkably good result. A new method -- averaging log-odds estimates based on the scores returned by the individual filters -- yields a somewhat better result, and provides input to SVM- and logistic-regression-based stacking methods. The stacking methods appear to provide further improvement, but only for very large corpora. Of the stacking methods, logistic regression yields the better result. Finally, we show that it is possible to select a priori small subsets of the filters that, when combined, still outperform the best individual filter by a substantial margin.
    Building bridges for web query classification BIBAFull-Text 131-138
      Dou Shen; Jian-Tao Sun; Qiang Yang; Zheng Chen
    Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7% and 3.8% in terms of precision and F1 respectively compared with the best results of KDDCUP 2005.
    ProbFuse: a probabilistic approach to data fusion BIBAFull-Text 139-146
      David Lillis; Fergus Toolan; Rem Collier; John Dunnion
    Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval effectiveness over that of the individual results.
       This paper presents probFuse, a probabilistic approach to data fusion. ProbFuse assumes that the performance of the individual input systems on a number of training queries is indicative of their future performance. The fused result set is based on probabilities of relevance calculated during this training process. Retrieval experiments using data from the TREC ad hoc collection demonstrate that probFuse achieves results superior to that of the popular CombMNZ fusion algorithm.

    Relevance feedback

    Using web-graph distance for relevance feedback in web search BIBAFull-Text 147-153
      Sergei Vassilvitskii; Eric Brill
    We study the effect of user supplied relevance feedback in improving web search results. Rather than using query refinement or document similarity measures to rerank results, we show that the web-graph distance between two documents is a robust measure of their relative relevancy. We demonstrate how the use of this metric can improve the rankings of result URLs, even when the user only rates one document in the dataset. Our research suggests that such interactive systems can significantly improve search results.
    Improving the estimation of relevance models using large external corpora BIBAFull-Text 154-161
      Fernando Diaz; Donald Metzler
    Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.
    Regularized estimation of mixture models for robust pseudo-relevance feedback BIBAFull-Text 162-169
      Tao Tao; ChengXiang Zhai
    Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.

    Formal models

    Context-sensitive semantic smoothing for the language modeling approach to genomic IR BIBAFull-Text 170-177
      Xiaohua Zhou; Xiaohua Hu; Xiaodan Zhang; Xia Lin; Il-Yeol Song
    Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. The implemented semantic smoothing models, such as the translation model which statistically maps document terms to query terms, and a number of works that have followed have shown good experimental results. However, these models are unable to incorporate contextual information. Thus, the resulting translation might be mixed and fairly general. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document or a query into a set of weighted context-sensitive topic signatures and then translate those topic signatures into query terms. In detail, we solve this problem through (1) choosing concept pairs as topic signatures and adopting an ontology-based approach to extract concept pairs; (2) estimating the translation model for each topic signature using the EM algorithm; and (3) expanding document and query models based on topic signature translations. The new smoothing method is evaluated on TREC 2004/05 Genomics Track collections and significant improvements are obtained. The MAP (mean average precision) achieves a 33.6% maximal gain over the simple language model, as well as a 7.8% gain over the language model with context-insensitive semantic smoothing.
    LDA-based document models for ad-hoc retrieval BIBAFull-Text 178-185
      Xing Wei; W. Bruce Croft
    Search algorithms incorporating some form of topic model have a long history in information retrieval. For example, cluster-based retrieval has been studied since the 60s and has recently produced good results in the language model framework. An approach to building topic models based on a formal generative model of documents, Latent Dirichlet Allocation (LDA), is heavily cited in the machine learning literature, but its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use LDA to improve ad-hoc retrieval. We propose an LDA-based document model within the language modeling framework, and evaluate it on several TREC collections. Gibbs sampling is employed to conduct approximate inference in LDA and the computational complexity is analyzed. We show that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.
    Adapting ranking SVM to document retrieval BIBAFull-Text 186-193
      Yunbo Cao; Jun Xu; Tie-Yan Liu; Hang Li; Yalou Huang; Hsiao-Wuen Hon
    The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

    Cross language

    A study of statistical models for query translation: finding a good unit of translation BIBAFull-Text 194-201
      Jianfeng Gao; Jian-Yun Nie
    This paper presents a study of three statistical query translation models that use different units of translation. We begin with a review of a word-based translation model that uses co-occurrence statistics for resolving translation ambiguities. The translation selection problem is then formulated under the framework of graphic model resorting to which the modeling assumptions and limitations of the co-occurrence model are discussed, and the research of finding better translation units is motivated. Then, two other models that use larger, linguistically motivated translation units (i.e., noun phrase and dependency triple) are presented. For each model, the modeling and training methods are described in detail. All query translation models are evaluated using TREC collections. Results show that larger translation units lead to more specific models that usually achieve better translation and cross-language information retrieval results.
    Combining bidirectional translation and synonymy for cross-language information retrieval BIBAFull-Text 202-209
      Jianqiang Wang; Douglas W. Oard
    This paper introduces a general framework for the use of translation probabilities in cross-language information retrieval based on the notion that information retrieval fundamentally requires matching what the searcher means with what the author of a document meant. That perspective yields a computational formulation that provides a natural way of combining what have been known as query and document translation. Two well-recognized techniques are shown to be a special case of this model under restrictive assumptions. Cross-language search results are reported that are statistically indistinguishable from strong monolingual baselines for both French and Chinese documents.


    Social networks, incentives, and search BIBAFull-Text 210-211
      Jon Kleinberg
    The role of network structure has grown in significance over the past ten years in the field of information retrieval, stimulated to a great extent by the importance of link analysis in the development of Web search techniques [4]. This body of work has focused primarily on the network that is most clearly visible on the Web: the network of hyperlinks connecting documents to documents. But the Web has always contained a second network, less explicit but equally important, and this is the social network on its users, with latent person-to-person links encoding a variety of relationships including friendship, information exchange, and influence. Developments over the past few years -- including the emergence of social networking systems and rich social media, as well as the availability of large-scale e-mail and instant messenging datasets -- have highlighted the crucial role played by on-line social networks, and at the same time have made them much easier to uncover and analyze. There is now a considerable opportunity to exploit the information content inherent in these networks, and this prospect raises a number of interesting research challenge.
       Within this context, we focus on some recent efforts to formalize the problem of searching a social network. The goal is to capture the issues underlying a variety of related scenarios: a member of a social networking system such as MySpace seeks a piece of information that may be held by a friend of a friend [27, 28]; an employee in a large company searches his or her network of colleagues for expertise in a particular subject [9]; a node in a decentralized peer-to-peer file-sharing system queries for a file that is likely to be a small number of hops away [2, 6, 16, 17]; or a user in a distributed IR or federated search setting traverses a network of distributed resources connected by links that may not just be informational but also economic or contractual [3, 5, 7, 8, 13, 18, 21]. In their most basic forms, these scenarios have some essential features in common: a node in a network, without global knowledge, must find a short path to a desired "target" node (or to one of several possible target nodes).
       To frame the underlying problem, we go back to one of the most well-known pieces of empirical social network analysis -- Stanley Milgram's research into the small-world phenomenon, also known as the "six degrees of separation" [19, 24, 25]. The form of Milgram's experiments, in which randomly chosen starters had to forward a letter to a designated target individual, established not just that short chains connecting far-flung pairs of people are abundant in large social networks, but also that the individuals in these networks, operating with purely local information about their own friends and acquaintances, are able to actually find these chains [10]. The Milgram experiments thus constituted perhaps the earliest indication that large-scale social networks are structured to support this type of decentralized search. Within a family of random-graph models proposed by Watts and Strogatz [26], we have shown that the ability of a network to support this type of decentralized search depends in subtle ways on how its "long-range" connections are correlated with the underlying spatial or organizational structure in which it is embedded [10, 11]. Recent studies using data on communication within organizations [1] and the friendships within large on-line communities [15] have established the striking fact that real social networks closely match some of the structural features predicted by these mathematical models.
       If one looks further at the on-line settings that provide the initial motivation for these issues, there is clearly interest from many directions in their long-term economic implications -- essentially, the consequences that follow from viewing distributed information retrieval applications, peer-to-peer systems, or social-networking sites as providing marketplaces for information and services. How does the problem of decentralized search in a network change when the participants are not simply agents following a fixed algorithm, but strategic actors who make decisions in their own self-interest, and may demand compensation for taking part in a protocol? Such considerations bring us into the realm of algorithmic game theory, an active area of current research that uses game-theoretic notions to quantify the performance of systems in which the participants follow their own self-interest [20, 23] In a simple model for decentralized search in the presence of incentives, we find that performance depends crucially on both the rarity of the information and the richness of the network topology [12] -- if the network is too structurally impoverished, an enormous investment may be required to produce a path from a query to an answer.

    Question and answering

    Probabilistic model for definitional question answering BIBAFull-Text 212-219
      Kyoung-Soo Han; Young-In Song; Hae-Chang Rim
    This paper proposes a probabilistic model for definitional question answering (QA) that reflects the characteristics of the definitional question. The intention of the definitional question is to request the definition about the question target. Therefore, an answer for the definitional question should contain the content relevant to the topic of the target, and have a representation form of the definition style. Modeling the problem of definitional QA from both the topic and definition viewpoints, the proposed probabilistic model converts the task of answering the definitional questions into that of estimating the three language models: topic language model, definition language model, and general language model. The proposed model systematically combines several evidences in a probabilistic framework. Experimental results show that a definitional QA system based on the proposed probabilistic model is comparable to state-of-the-art systems.
    Answering complex questions with random walk models BIBAFull-Text 220-227
      Sanda Harabagiu; Finley Lacatusu; Andrew Hickl
    We present a novel framework for answering complex questions that relies on question decomposition. Complex questions are decomposed by a procedure that operates on a Markov chain, by following a random walk on a bipartite graph of relations established between concepts related to the topic of a complex question and subquestions derived from topic-relevant passages that manifest these relations. Decomposed questions discovered during this random walk are then submitted to a state-of-the-art Question Answering (Q/A) system in order to retrieve a set of passages that can later be merged into a comprehensive answer by a Multi-Document Summarization (MDS) system. In our evaluations, we show that access to the decompositions generated using this method can significantly enhance the relevance and comprehensiveness of summary-length answers to complex questions.
    A framework to predict the quality of answers with non-textual features BIBAFull-Text 228-235
      Jiwoon Jeon; W. Bruce Croft; Joon Ho Lee; Soyeon Park
    New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

    Machine learning

    Latent semantic analysis for multiple-type interrelated data objects BIBAFull-Text 236-243
      Xuanhui Wang; Jian-Tao Sun; Zheng Chen; ChengXiang Zhai
    Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment results show that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.
    Identifying comparative sentences in text documents BIBAFull-Text 244-251
      Nitin Jindal; Bing Liu
    This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers' opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is "the sound quality of CD player X is poor". An example comparative sentence is "the sound quality of CD player X is not as good as that of CD player Y". Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents. Experiment results using three types of documents, news articles, consumer reviews of products, and Internet forum postings, show a precision of 79% and recall of 81%. More detailed results are given in the paper.
    Tackling concept drift by temporal inductive transfer BIBAFull-Text 252-259
      George Forman
    Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework-the Daily Classification Task-which can be applied to large time-based datasets, such as Reuters RCV1. In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifier learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested.

    Evaluation 1: user models and test collections

    Evaluation in (XML) information retrieval: expected precision-recall with user modelling (EPRUM) BIBAFull-Text 260-267
      Benjamin Piwowarski; Georges Dupret
    Standard Information Retrieval (IR) metrics assume a simple model where documents are understood as independent units. Such an assumption is not adapted to new paradigms like XML or Web IR where retrievable informations are parts of documents or sets of related documents. Moreover, classical hypotheses assumes that the user ignores the structural or logical context of document elements and hence the possibility of navigation between units. EPRUM is a generalisation of Precision-Recall (PR) that aims at allowing the user to navigate or browse in the corpus structure. Like the Cumulated Gain metrics, it is able to handle continuous valued relevance. We apply and compare EPRUM in the context of XML Retrieval -- a very active field for evaluation metrics. We also explain how EPRUM can be used in other IR paradigms.
    Minimal test collections for retrieval evaluation BIBAFull-Text 268-275
      Ben Carterette; James Allan; Ramesh Sitaraman
    Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of real-world implementations is at best inefficient, at worst infeasible. In this work we link evaluation with test collection construction to gain an understanding of the minimal judging effort that must be done to have high confidence in the outcome of an evaluation. A new way of looking at average precision leads to a natural algorithm for selecting documents to judge and allows us to estimate the degree of confidence by defining a distribution over possible document judgments. A study with annotators shows that this method can be used by a small group of researchers to rank a set of systems in under three hours with 95% confidence. Information retrieval metrics such as average precision require large sets of relevance judgments to be accurately estimated. Building these sets is infeasible and often inefficient for many real-world retrieval implementations. We present a new way of looking at average precision that allows us to estimate the confidence in an evaluation based on the size of the test collection. We use this to build an algorithm for selecting the best documents to judge to have maximum confidence in an evaluation with a minimal number of relevance judgments. A study with annotators shows how the algorithm can be used by a small group of researchers to quickly rank a set of systems with 95% confidence.
    Dynamic test collections: measuring search effectiveness on the live web BIBAFull-Text 276-283
      Ian Soboroff
    Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of "collection maintenance" which supports measuring search performance both for a single system and between systems run at different points in time.

    Web 2

    Finding near-duplicate web pages: a large-scale evaluation of algorithms BIBAFull-Text 284-291
      Monika Henzinger
    Broder et al.'s [3] shingling algorithm and Charikar's [4] random projection based approach are considered "state-of-the-art" algorithms for finding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very large scale, namely on a set of 1.6B distinct web pages. The results show that neither of the algorithms works well for finding near-duplicate pairs on the same site, while both achieve high precision for near-duplicate pairs on different sites. Since Charikar's algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall, namely 0.50 versus 0.38 for Broder et al.'s algorithm. We present a combined algorithm which achieves precision 0.79 with 79% of the recall of the other algorithms.
    Structure-driven crawler generation by example BIBAFull-Text 292-299
      Marcio L. A. Vidal; Altigran S. da Silva; Edleno S. de Moura; Joao M. B. Cavalcanti
    Many Web IR and Digital Library applications require a crawling process to collect pages with the ultimate goal of taking advantage of useful information available on Web sites. For some of these applications the criteria to determine when a page is to be present in a collection are related to the page content. However, there are situations in which the inner structure of the pages provides a better criteria to guide the crawling process than their content. In this paper, we present a structure-driven approach for generating Web crawlers that requires a minimum effort from users. The idea is to take as input a sample page and an entry point to a Web site and generate a structure-driven crawler based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page. In the experiments we have carried out, structure-driven crawlers generated by our new approach were able to collect all pages that match the samples given, including those pages added after their generation.
    Building implicit links from content for forum search BIBAFull-Text 300-307
      Gu Xu; Wei-Ying Ma
    The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine-grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.
    Generalizing PageRank: damping functions for link-based ranking algorithms BIBAFull-Text 308-315
      Ricardo Baeza-Yates; Paolo Boldi; Carlos Castillo
    This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family.
       The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths.
       We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data.
       Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's t on large domain datasets.

    Distributed IR

    Capturing collection size for distributed non-cooperative retrieval BIBAFull-Text 316-323
      Milad Shokouhi; Justin Zobel; Falk Scholer; S. M. M. Tahaghoghi
    Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.
    Probabilistic latent query analysis for combining multiple retrieval sources BIBAFull-Text 324-331
      Rong Yan; Alexander G. Hauptmann
    Combining the output from multiple retrieval sources over the same document collection is of great importance to a number of retrieval tasks such as multimedia retrieval, web retrieval and meta-search. To merge retrieval sources adaptively according to query topics, we propose a series of new approaches called probabilistic latent query analysis (pLQA), which can associate non-identical combination weights with latent classes underlying the query space. Compared with previous query independent and query-class based combination methods, the proposed approaches have the advantage of being able to discover latent query classes automatically without using prior human knowledge, to assign one query to a mixture of query classes, and to determine the number of query classes under a model selection principle. Experimental results on two retrieval tasks, i.e., multimedia retrieval and meta-search, demonstrate that the proposed methods can uncover sensible latent classes from training data, and can achieve considerable performance gains.
    User modeling for full-text federated search in peer-to-peer networks BIBAFull-Text 332-339
      Jie Lu; Jamie Callan
    User modeling for information retrieval has mostly been studied to improve the effectiveness of information access in centralized repositories. In this paper we explore user modeling in the context of full-text federated search in peer-to-peer networks. Our approach models a user's persistent, long-term interests based on past queries, and uses the model to improve search efficiency for future queries that represent interests similar to past queries. Our approach also enables queries representing a user's transient, ad-hoc interests to be automatically recognized so that search for these queries can rely on a relatively large search radius to avoid sacrificing effectiveness for efficiency. Experimental results demonstrate that our approach can significantly improve the efficiency of full-text federated search without degrading its accuracy. Furthermore, the proposed approach does not require a large amount of training data, and is robust to a range of parameter values.
    Distributed query sampling: a quality-conscious approach BIBAFull-Text 340-347
      James Caverlee; Ling Liu; Joonsoo Bae
    We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches.


    Load balancing for term-distributed parallel retrieval BIBAFull-Text 348-355
      Alistair Moffat; William Webber; Justin Zobel
    Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system.
    Hybrid index maintenance for growing text collections BIBAFull-Text 356-363
      Stefan Buttcher; Charles L. A. Clarke; Brad Lushman
    We present a new family of hybrid index maintenance strategies to be used in on-line index construction for monotonically growing text collections. These new strategies improve upon recent results for hybrid index maintenance in dynamic text retrieval systems. Like previous techniques, our new method distinguishes between short and long posting lists: While short lists are maintained using a merge strategy, long lists are kept separate and are updated in-place. This way, costly relocations of long posting lists are avoided.
       We discuss the shortcomings of previous hybrid methods and give an experimental evaluation of the new technique, showing that its index maintenance performance is superior to that of the earlier methods, especially when the amount of main memory available to the indexing system is small. We also present a complexity analysis which proves that, under a Zipfian term distribution, the asymptotical number of disk accesses performed by the best hybrid maintenance strategy is linear in the size of the text collection, implying the asymptotical optimality of the proposed strategy.
    Type less, find more: fast autocompletion search with a succinct index BIBAFull-Text 364-371
      Holger Bast; Ingmar Weber
    We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.
    Pruned query evaluation using pre-computed impacts BIBAFull-Text 372-379
      Vo Ngoc Anh; Alistair Moffat
    Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably "good enough" set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries.


    Information retrieval at Boeing: plans and successes BIBFull-Text 380-381
      Radha Radhakrishnan


    Mining dependency relations for query expansion in passage retrieval BIBAFull-Text 382-389
      Renxu Sun; Chai-Huat Ong; Tat-Seng Chua
    Classical query expansion techniques such as the local context analysis (LCA) make use of term co-occurrence statistics to incorporate additional contextual terms for enhancing passage retrieval. However, relevant contextual terms do not always co-occur frequently with the query terms and vice versa. Hence the use of such methods often brings in noise, which leads to reduced precision. Previous studies have demonstrated the importance of relationship analysis for natural language queries in passage retrieval. However, they found that without query expansion, the performance is not satisfactory for short queries. In this paper, we present two novel query expansion techniques that make use of dependency relation analysis to extract contextual terms and relations from external corpuses. The techniques are used to enhance the performance of density based and relation based passage retrieval frameworks respectively. We compare the performance of the resulting systems with LCA in a density based passage retrieval system (DBS) and a relation based system without any query expansion (RBS) using the factoid questions from the TREC-12 QA task. The results show that in terms of MRR scores, our relation based term expansion method with DBS outperforms the LCA by 9.81%, while our relation expansion method outperforms RBS by 17.49%.
    What makes a query difficult? BIBAFull-Text 390-397
      David Carmel; Elad Yom-Tov; Adam Darlow; Dan Pelleg
    This work tries to answer the question of what makes a query difficult. It addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty. The three components of a topic are the textual expression describing the information need (the query or queries), the set of documents relevant to the topic (the Qrels), and the entire collection of documents. We show experimentally that topic difficulty strongly depends on the distances between these components. In the absence of knowledge about one of the model components, the model is still useful by approximating the missing component based on the other components. We demonstrate the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.
    On ranking the effectiveness of searches BIBAFull-Text 398-404
      Vishwa Vinay; Ingemar J. Cox; Natasa Milic-Frayling; Ken Wood
    There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall t statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall t of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall t of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.


    Document clustering with prior knowledge BIBAFull-Text 405-412
      Xiang Ji; Wei Xu
    Document clustering is an important tool for text analysis and is used in many different applications. We propose to incorporate prior knowledge of cluster membership for document cluster analysis and develop a novel semi-supervised document clustering model. The method models a set of documents with weighted graph in which each document is represented as a vertex, and each edge connecting a pair of vertices is weighted with the similarity value of the two corresponding documents. The prior knowledge indicates pairs of documents that known to belong to the same cluster. Then, the prior knowledge is transformed into a set of constraints. The document clustering task is accomplished by finding the best cuts of the graph under the constraints. We apply the model to the Normalized Cut method to demonstrate the idea and concept. Our experimental evaluations show that the proposed document clustering model reveals remarkable performance improvements with very limited training samples, and hence is a very effective semi-supervised classification tool.
    Text clustering with extended user feedback BIBAFull-Text 413-420
      Yifen Huang; Tom M. Mitchell
    Text clustering is most commonly treated as a fully automated task without user feedback. However, a variety of researchers have explored mixed-initiative clustering methods which allow a user to interact with and advise the clustering algorithm. This mixed-initiative approach is especially attractive for text clustering tasks where the user is trying to organize a corpus of documents into clusters for some particular purpose (e.g., clustering their email into folders that reflect various activities in which they are involved). This paper introduces a new approach to mixed-initiative clustering that handles several natural types of user feedback. We first introduce a new probabilistic generative model for text clustering (the SpeClustering model) and show that it outperforms the commonly used mixture of multinomials clustering model, even when used in fully autonomous mode with no user input. We then describe how to incorporate four distinct types of user feedback into the clustering algorithm, and provide experimental evidence showing substantial improvements in text clustering when this user feedback is incorporated.
    Near-duplicate detection by instance-level constrained clustering BIBAFull-Text 421-428
      Hui Yang; Jamie Callan
    For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information retrieval community are not sufficiently accurate. This is due to the fact that the characteristics of near-duplicated documents are different from that of both "almost-identical" documents in the data cleaning task and "relevant" documents in the search task. This paper presents an instance-level constrained clustering approach for near-duplicate detection. The framework incorporates information such as document attributes and content structure into the clustering process to form near-duplicate clusters. Gathered from several collections of public comments sent to U.S. government agencies on proposed new regulations, the experimental results demonstrate that our approach outperforms other near-duplicate detection algorithms and as about as effective as human assessors.

    The first page of results

    Less is more: probabilistic models for retrieving fewer relevant documents BIBAFull-Text 429-436
      Harr Chen; David R. Karger
    Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the users information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents.
    High accuracy retrieval with multiple nested ranker BIBAFull-Text 437-444
      Irina Matveeva; Chris Burges; Timo Burkard; Andy Laucius; Leon Wong
    High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.
    Semantic search via XML fragments: a high-precision approach to IR BIBAFull-Text 445-452
      Jennifer Chu-Carroll; John Prager; Krzysztof Czuba; David Ferrucci; Pablo Duboue
    In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.

    Users: clarification, feedback, and browsing

    Elicitation of term relevance feedback: an investigation of term source and context BIBAFull-Text 453-460
      Diane Kelly; Xin Fu
    Term relevance feedback has had a long history in information retrieval. However, research on interactive term relevance feedback has yielded mixed results. In this paper, we investigate several aspects related to the elicitation of term relevance feedback: the display of document surrogates, the technique for identifying or selecting terms, and sources of expansion terms. We conduct a between subjects experiment (n=61) of three term relevance feedback interfaces using the 2005 TREC HARD collection, and evaluate each interface with respect to query length and retrieval performance. Results demonstrate that queries created with each experimental interface significantly outperformed corresponding baseline queries, even though there were no differences in performance between interface conditions. Results also demonstrate that pseudo-relevance feedback runs outperformed both baseline and experimental runs as assessed by recall-oriented measures, but that user-generated terms improved precision.
    Find-similar: similarity browsing as a search tool BIBAFull-Text 461-468
      Mark D. Smucker; James Allan
    Search systems have for some time provided users with the ability to request documents similar to a given document. Interfaces provide this feature via a link or button for each document in the search results. We call this feature find-similar or similarity browsing. We examined find-similar as a search tool, like relevance feedback, for improving retrieval performance. Our investigation focused on find-similar's document-to-document similarity, the reexamination of documents during a search, and the user's browsing pattern. Find-similar with a query-biased similarity, avoiding the reexamination of documents, and a breadth-like browsing pattern achieved a 23% increase in the arithmetic mean average precision and a 66% increase in the geometric mean average precision over our baseline retrieval. This performance matched that of a more traditionally styled iterative relevance feedback technique.
    Exploring the limits of single-iteration clarification dialogs BIBAFull-Text 469-476
      Jimmy Lin; Philip Wu; Dina Demner-Fushman; Eileen Abels
    Single-iteration clarification dialogs, as implemented in the TREC HARD track, represent an attempt to introduce interaction into ad hoc retrieval, while preserving the many benefits of large-scale evaluations. Although previous experiments have not conclusively demonstrated performance gains resulting from such interactions, it is unclear whether these findings speak to the nature of clarification dialogs, or simply the limitations of current systems. To probe the limits of such interactions, we employed a human intermediary to formulate clarification questions and exploit user responses. In addition to establishing a plausible upper bound on performance, we were also able to induce an "ontology of clarifications" to characterize human behavior. This ontology, in turn, serves as the input to a regression model that attempts to determine which types of clarification questions are most helpful. Our work can serve to inform the design of interactive systems that initiate user dialogs.

    Classification and machine learning

    Large scale semi-supervised linear SVMs BIBAFull-Text 477-484
      Vikas Sindhwani; S. Sathiya Keerthi
    Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease of implementation, interpretability and empirical performance. In this work, we present a family of semi-supervised linear support vector classifiers that are designed to handle partially-labeled sparse datasets with possibly very large number of examples and features. At their core, our algorithms employ recently developed modified finite Newton techniques. Our contributions in this paper are as follows: (a) We provide an implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets. (b) We propose a variant of TSVM that involves multiple switching of labels. Experimental results show that this variant provides an order of magnitude further improvement in training efficiency. (c) We present a new algorithm for semi-supervised learning based on a Deterministic Annealing (DA) approach. This algorithm alleviates the problem of local minimum in the TSVM optimization procedure while also being computationally attractive. We conduct an empirical study on several document classification tasks which confirms the value of our methods in large scale semi-supervised settings.
    Graph-based text classification: learn from your neighbors BIBAFull-Text 485-492
      Ralitsa Angelova; Gerhard Weikum
    Automatic classification of data items, based on training samples, can be boosted by considering the neighborhood of data items in a graph structure (e.g., neighboring documents in a hyperlink environment or co-authors and their publications for bibliographic data entries). This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability. Our approach is based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items. The graph neighborhood is taken into consideration to exploit locality patterns while at the same time avoiding overfitting. In contrast to prior work along these lines, our approach employs a number of novel techniques: dynamically inferring the link/class pattern in the graph in the run of the iterative relaxation labeling, judicious pruning of edges from the neighborhood graph based on node dissimilarities and node degrees, weighting the influence of edges based on a distance metric between the classification labels of interest and weighting edges by content similarity measures. Our techniques considerably improve the robustness and accuracy of the classification outcome, as shown in systematic experimental comparisons with previously published methods on three different real-world datasets.
    Constructing informative prior distributions from domain knowledge in text classification BIBAFull-Text 493-500
      Aynur Dayanik; David D. Lewis; David Madigan; Vladimir Menkov; Alexander Genkin
    Supervised learning approaches to text classification are in practice often required to work with small and unsystematically collected training sets. The alternative to supervised learning is usually viewed to be building classifiers by hand, using a domain expert's understanding of which features of the text are related to the class of interest. This is expensive, requires a degree of sophistication about linguistics and classification, and makes it difficult to use combinations of weak predictors. We propose instead combining domain knowledge with training examples in a Bayesian framework. Domain knowledge is used to specify a prior distribution for the parameters of a logistic regression model, and labeled training data is used to produce a posterior distribution, whose mode we take as the final classifier. We show on three text categorization data sets that this approach can rescue what would otherwise be disastrously bad training situations, producing much more effective classifiers.

    Recommendation: use and abuse

    Unifying user-based and item-based collaborative filtering approaches by similarity fusion BIBAFull-Text 501-508
      Jun Wang; Arjen P. de Vries; Marcel J. T. Reinders
    Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users or items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper re-formulates the memory-based collaborative filtering problem in a generative probabilistic framework, treating individual user-item ratings as predictors of missing ratings. The final rating is estimated by fusing predictions from three sources: predictions based on ratings of the same item by other users, predictions based on different item ratings made by the same user, and, third, ratings predicted based on data from other but similar users rating other but similar items. Existing user-based and item-based approaches correspond to the two simple cases of our framework. The complete model is however more robust to data sparsity, because the different types of ratings are used in concert, while additional ratings from similar users towards similar items are employed as a background model to smooth the predictions. Experiments demonstrate that the proposed methods are indeed more robust against data sparsity and give better recommendations.
    Personalized recommendation driven by information flow BIBAFull-Text 509-516
      Xiaodan Song; Belle L. Tseng; Ching-Yung Lin; Ming-Ting Sun
    We propose that the information access behavior of a group of people can be modeled as an information flow issue, in which people intentionally or unintentionally influence and inspire each other, thus creating an interest in retrieving or getting a specific kind of information or product. Information flow models how information is propagated in a social network. It can be a real social network where interactions between people reside; it can be, moreover, a virtual social network in that people only influence each other unintentionally, for instance, through collaborative filtering. We leverage users' access patterns to model information flow and generate effective personalized recommendations. First, an early adoption based information flow (EABIF) network describes the influential relationships between people. Second, based on the fact that adoption is typically category specific, we propose a topic-sensitive EABIF (TEABIF) network, in which access patterns are clustered with respect to the categories. Once an item has been accessed by early adopters, personalized recommendations are achieved by estimating whom the information will be propagated to with high probabilities. In our experiments with an online document recommendation system, the results demonstrate that the EABIF and the TEABIF can respectively achieve an improved (precision, recall) of (91.0%, 87.1%) and (108.5%, 112.8%) compared to traditional collaborative filtering, given an early adopter exists.
    Analysis of a low-dimensional linear model under recommendation attacks BIBAFull-Text 517-524
      Sheng Zhang; Yi Ouyang; James Ford; Fillia Makedon
    Collaborative filtering techniques have become popular in the past decade as an effective way to help people deal with information overload. Recent research has identified significant vulnerabilities in collaborative filtering techniques. Shilling attacks, in which attackers introduce biased ratings to influence recommendation systems, have been shown to be effective against memory-based collaborative filtering algorithms. We examine the effectiveness of two popular shilling attacks (the random attack and the average attack) on a model-based algorithm that uses Singular Value Decomposition (SVD) to learn a low-dimensional linear model. Our results show that the SVD-based algorithm is much more resistant to shilling attacks than memory-based algorithms. Furthermore, we develop an attack detection method directly built on the SVD-based algorithm and show that this method detects random shilling attacks with high detection rates and very low false alarm rates.

    Evaluation 2

    Evaluating evaluation metrics based on the bootstrap BIBAFull-Text 525-532
      Tetsuya Sakai
    This paper describes how the Bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics. First, we argue that Bootstrap Hypothesis Tests deserve more attention from the IR community, as they are based on fewer assumptions than traditional statistical significance tests. We then describe straightforward methods for comparing the sensitivity of IR metrics based on Bootstrap Hypothesis Tests. Unlike the heuristics-based "swap" method proposed by Voorhees and Buckley, our method estimates the performance difference required to achieve a given significance level directly from Bootstrap Hypothesis Test results. In addition, we describe a simple way of examining the accuracy of rank correlation between two metrics based on the Bootstrap Estimate of Standard Error. We demonstrate the usefulness of our methods using test collections and runs from the NTCIR CLIR track for comparing seven IR metrics, including those that can handle graded relevance and those based on the Geometric Mean.
    Statistical precision of information retrieval evaluation BIBAFull-Text 533-540
      Gordon V. Cormack; Thomas R. Lynam
    We introduce and validate bootstrap techniques to compute confidence intervals that quantify the effect of test-collection variability on average precision (AP) and mean average precision (MAP) IR effectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an infinite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques.
    A statistical method for system evaluation using incomplete judgments BIBAFull-Text 541-548
      Javed A. Aslam; Virgil Pavlu; Emine Yilmaz
    We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.
       Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived from a set of runs can be used to efficiently and effectively evaluate those runs. We further show that these sampling pools generalize well to unseen runs. Our experiments indicate that highly accurate estimates of standard performance measures can be obtained using a number of relevance judgments as small as 4% of the typical TREC-style judgment pool.

    Web IR: current topics

    Learning to advertise BIBAFull-Text 549-556
      Anisio Lacerda; Marco Cristo; Marcos Andre Concalves; Weiguo Fan; Nivio Ziviani; Berthier Ribeiro-Neto
    Content-targeted advertising, the task of automatically associating ads to a Web page, constitutes a key Web monetization strategy nowadays. Further, it introduces new challenging technical problems and raises interesting questions. For instance, how to design ranking functions able to satisfy conflicting goals such as selecting advertisements (ads) that are relevant to the users and suitable and profitable to the publishers and advertisers? In this paper we propose a new framework for associating ads with web pages based on Genetic Programming (GP). Our GP method aims at learning functions that select the most appropriate ads, given the contents of a Web page. These ranking functions are designed to optimize overall precision and minimize the number of misplacements. By using a real ad collection and web pages from a newspaper, we obtained a gain over a state-of-the-art baseline method of 61.7% in average precision. Further, by evolving individuals to provide good ranking estimations, GP was able to discover ranking functions that are very effective in placing ads in web pages while avoiding irrelevant ones.
    Getting work done on the web: supporting transactional queries BIBAFull-Text 557-564
      Yunyao Li; Rajasekar Krishnamurthy; Shivakumar Vaithyanathan; H. V. Jagadish
    Many searches on the web have a transactional intent. We argue that pages satisfying transactional needs can be distinguished from the more common pages that have some information and links, but cannot be used to execute a transaction. Based on this hypothesis, we provide a recipe for constructing a transaction annotator. By constructing an annotator with one corpus and then demonstrating its classification performance on another, we establish its robustness. Finally, we show experimentally that a search procedure that exploits such pre-annotation greatly outperforms traditional search for retrieving transactional pages.
    You are what you say: privacy risks of public mentions BIBAFull-Text 565-572
      Dan Frankowski; Dan Cosley; Shilad Sen; Loren Terveen; John Riedl
    In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.
       This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.

    Summarization: multidocuments and new applications

    A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization BIBAFull-Text 573-580
      Ani Nenkova; Lucy Vanderwende; Kathleen McKeown
    The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
    Information graphics: an untapped resource for digital libraries BIBAFull-Text 581-588
      Sandra Carberry; Stephanie Elzer; Seniz Demir
    Information graphics are non-pictorial graphics such as bar charts and line graphs that depict attributes of entities and relations among entities. Most information graphics appearing in popular media have a communicative goal or intended message; consequently, information graphics constitute a form of language. This paper argues that information graphics are a valuable knowledge resource that should be retrievable from a digital library and that such graphics should be taken into account when summarizing a multimodal document for subsequent indexing and retrieval. But to accomplish this, the information graphic must be understood and its message recognized. The paper presents our Bayesian system for recognizing the primary message of one kind of information graphic (simple bar charts) and discusses the potential role of an information graphic's message in indexing graphics and summarizing multimodal documents.
    News to go: hierarchical text summarization for mobile devices BIBAFull-Text 589-596
      Jahna Otterbacher; Dragomir Radev; Omer Kareem
    We present an evaluation of a novel hierarchical text summarization method that allows users to view summaries of Web documents from small, mobile devices. Unlike previous approaches, ours does not require the documents to be in HTML since it infers a hierarchical structure automatically. Currently, the method is used to summarize news articles sent to a Web mail account in plain text format. Subjects used a Web-enabled mobile phone emulator to access the account's inbox and view the summarized news articles. They then used the summaries to complete several information-seeking tasks, which involved answering factual questions about the stories. In comparing the hierarchical text summary setting to that in which subjects were given the full text articles, there was no significant difference in task accuracy or the time taken to complete the task. However, in the hierarchical summarization setting, the number of bytes transferred per user request is less than half that of the full text case. Finally, in comparing the new method to three other summarization methods, subjects achieved significantly better accuracy on the tasks when using hierarchical summaries.


    Clustering of search results using temporal attributes BIBAFull-Text 597-598
      Omar Alonso; Michael Gertz
    Clustering of search results is an important feature in many of today's information retrieval applications. The notion of hit list clustering appears in Web search engines and enterprise search engines as a mechanism that allows users to further explore the coverage of a query. However, there has been little work on exposing temporal attributes for constructing and presentation of clusters. These attributes appear in documents as part of the textual content, e.g., as a date and time token or as a temporal reference in a sentence. In this paper, we outline a model and describe a prototype that shows the main ideas.
    A complex document information processing prototype BIBAFull-Text 599-600
      S. Argamon; G. Agam; O. Frieder; D. Grossman; D. Lewis; G. Sohn; K. Voorhees
    We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities.
    Inferring document relevance via average precision BIBAFull-Text 601-602
      Javed A. Aslam; Emine Yilmaz
    We consider the problem of evaluating retrieval systems using a limited number of relevance judgments. Recent work has demonstrated that one can accurately estimate average precision via a judged pool corresponding to a relatively small random sample of documents. In this work, we demonstrate that given values or estimates of average precision, one can accurately infer the relevances of unjudged documents. Combined, we thus show how one can efficiently and accurately infer a large judged pool from a relatively small number of judged documents, thus permitting accurate and efficient retrieval evaluation on a large scale.
    Automatic construction of known-item finding test beds BIBKFull-Text 603-604
      Leif Azzopardi; Maarten de Rijke
    Keywords: evaluation, simulation, test collection formation
    Adaptive query-based sampling for distributed IR BIBKFull-Text 605-606
      Leif Azzopardi; Mark Baillie; Fabio Crestani
    Keywords: distributed Information retrieval, efficiency, query-based sampling, selection accuracy
    PENG: integrated search of distributed news archives BIBKFull-Text 607-608
      Mark Baillie; Fabio Crestani; Monica Landoni
    Keywords: news search, personalisation
    Examining assessor attributes at HARD 2005 BIBKFull-Text 609-610
      Mark Baillie; Ian Ruthven
    Keywords: clarification forms, query expansion
    User expectations from XML element retrieval BIBAFull-Text 611-612
      Stamatina Betsi; Mounia Lalmas; Anastasios Tombros; Theodora Tsikrika
    The primary aim of XML element retrieval is to return to users XML elements, rather than whole documents. This poster describes a small study, in which we elicited users' expectations, i.e. their anticipated experience, when interacting with an XML retrieval system, as compared to a traditional 'flat' document retrieval system.
    Theoretical benchmarks of XML retrieval BIBAFull-Text 613-614
      Tobias Blanke; Mounia Lalmas
    This poster investigates the use of theoretical benchmarks to describe the matching functions of XML retrieval systems and the properties of specificity and exhaustivity in XML retrieval. Theoretical benchmarks concern the formal representation of qualitative properties of IR models. To this end, Situation Theory framework for the meta-evaluation of XML retrieval is presented.
    Question classification with log-linear models BIBAFull-Text 615-616
      Phil Blunsom; Krystle Kocik; James R. Curran
    Question classification has become a crucial step in modern question answering systems. Previous work has demonstrated the effectiveness of statistical machine learning approaches to this problem. This paper presents a new approach to building a question classifier using log-linear models. Evidence from a rich and diverse set of syntactic and semantic features is evaluated, as well as approaches which exploit the hierarchical structure of the question classes.
    Community-based snippet-indexes for pseudo-anonymous personalization in web search BIBAFull-Text 617-618
      Oisin Boydell; Barry Smyth
    We describe and evaluate an approach to personalizing Web search that involves post-processing the results returned by some underlying search engine so that they reflect the interests of a community of like-minded searchers.
       To do this we leverage the search experiences of the community by mining the title and snippet texts of results that have been selected by community members in response to their queries. Our approach seeks to build a community-based snippet index that reflects the evolving interests of a group of searchers. This index is then used to re-rank the results returned by the underlying search engine by boosting the ranking of key results that have been freqently selected for similar queries by community members in the past.
    Bias and the limits of pooling BIBAFull-Text 619-620
      Chris Buckley; Darrin Dimmick; Ian Soboroff; Ellen Voorhees
    Modern retrieval test collections are built through a process called pooling in which only a sample of the entire document set is judged for each topic. The idea behind pooling is to find enough relevant documents such that when unjudged documents are assumed to be nonrelevant the resulting judgment set is sufficiently complete and unbiased. As document sets grow larger, a constant-size pool represents an increasingly small percentage of the document set, and at some point the assumption of approximately complete judgments must become invalid.
       This paper demonstrates that the AQUAINT 2005 test collection exhibits bias caused by pools that were too shallow for the document set size despite having many diverse runs contribute to the pools. The existing judgment set favors relevant documents that contain topic title words even though relevant documents containing few topic title words are known to exist in the document set. The paper concludes with suggested modifications to traditional pooling and evaluation methodology that may allow very large reusable test collections to be built.
    Term proximity scoring for ad-hoc retrieval on very large text collections BIBAFull-Text 621-622
      Stefan Buttcher; Charles L. A. Clarke; Brad Lushman
    We propose an integration of term proximity scoring into Okapi BM25. The relative retrieval effectiveness of our retrieval method, compared to pure BM25, varies from collection to collection.
       We present an experimental evaluation of our method and show that the gains achieved over BM25 as the size of the underlying text collection increases. We also show that for stemmed queries the impact of term proximity scoring is larger than for unstemmed queries.
    An exploratory web log study of multitasking BIBAFull-Text 623-624
      Nikolai (Nick) Buzikashvili
    The Web search multitasking study based on automatic task session detection procedure is described. The results of the study: 1) multitasking is very rare, 2) it usually covers only 2 task sessions, 3) it is frequently formed into a temporal inclusion of an interrupting task session into the interrupted session, 4) the quantitative characteristics of multitasking greatly differ from the characteristics of sequential execution of one and several tasks. A searcher minimizes task switching costs: he avoids multitasking and while multitasking he uses cheapest manner of task switching.
    Tensor space model for document analysis BIBAFull-Text 625-626
      Deng Cai; Xiaofei He; Jiawei Han
    Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space.
       In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.
    First large-scale information retrieval experiments on Turkish texts BIBAFull-Text 627-628
      Fazli Can; Seyit Kocberber; Erman Balcik; Cihan Kaynak; H. Cagdas Ocalan; Onur M. Vursavas
    We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching functions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.
    Learning a ranking from pairwise preferences BIBAFull-Text 629-630
      Ben Carterette; Desislava Petkova
    We introduce a novel approach to combining rankings from multiple retrieval systems. We use a logistic regression model or an SVM to learn a ranking from pairwise document preferences. Our approach requires no training data or relevance scores, and outperforms a popular voting algorithm.
    Automated performance assessment in interactive QA BIBAFull-Text 631-632
      Joyce Y. Chai; Tyler Baldwin; Chen Zhang
    In interactive question answering (QA), users and systems take turns to ask questions and provide answers. In such an interactive setting, user questions largely depend on the answers provided by the system. One question is whether user follow-up questions can provide feedback for the system to automatically assess its performance (e.g., assess whether a correct answer is delivered). This self-awareness can make QA systems more intelligent for information seeking, for example, by adapting better strategies to cope with problematic situations. Therefore, this paper describes our initial investigation in addressing this problem. Our results indicate that interaction context can provide useful cues for automated performance assessment in interactive QA.
    Stylistic text segmentation BIBAFull-Text 633-634
      Paul J. Chase; Shlomo Argamon
    This paper focuses on a method for the stylistic segmentation of text documents. Our technique involves mapping the change in a feature throughout a text. We use the linguistic features of conjunction and modality, through taxonomies from Systemic Functional Linguistics. This segmentation has applications in automated summarization, particularly of large documents.
    On hierarchical web catalog integration with conceptual relationships in thesaurus BIBAFull-Text 635-636
      Ing-Xiang Chen; Jui-Chi Ho; Cheng-Zen Yang
    Web catalog integration is an interesting problem in current digital content management. Past studies have shown that using a flattened structure with auxiliary information extracted from the source catalog can improve the integration results. However, the nature of a flattened structure ignores the hierarchical relationships, and thus the performance improvement of catalog integration may be reduced. In this paper, we propose an enhanced hierarchical catalog integration (EHCI) approach with conceptual thesauri extracted from the source catalog. The results show that our enhanced hierarchical integration approach effectively boosts the accuracy of hierarchical catalog integration.
    Rpref: a generalization of Bpref towards graded relevance judgments BIBAFull-Text 637-638
      Jan De Beer; Marie-Francine Moens
    We present rpref, our generalization of the bpref evaluation metric for assessing the quality of search engine results, given graded rather than binary user relevance judgments.
    A new web page summarization method BIBAFull-Text 639-640
      Qian Diao; Jiulong Shan
    In this paper, we present a novel multi-webpage summarization algorithm. It adds the graph based ranking algorithm into the framework of Maximum Marginal Relevance (MMR) method, to not only capture the main topic of the web pages but also eliminate the redundancy existing in the sentences of the summary result. The experiment result indicates that the new approach has the better performance than the previous methods.
    NMF and PLSI: equivalence and a hybrid algorithm BIBAFull-Text 641-642
      Chris Ding; Tao Li; Wei Peng
    In this paper, we show that PLSI and NMF optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. In addition, we also propose a new hybrid method that runs PLSI and NMF alternatively to achieve better solutions.
    Using historical data to enhance rank aggregation BIBAFull-Text 643-644
      Miriam Fernandez; David Vallet; Pablo Castells
    Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal.
    Enterprise search behaviour of software engineers BIBAFull-Text 645-646
      Luanne Freund; Elaine G. Toms
    Technical professionals spend 25% of their time at work searching for information, and have specialized information needs that are not well-served by generic enterprise search tools. In this study, we investigated how a group of software engineers use a workplace search system. We identify patterns of search behaviour specific to this group and distinct from general web and intranet search patterns, and make design recommendations for search systems that will better serve the needs of this group.
    Evaluating sources of query expansion terms BIBAFull-Text 647-648
      Xin Fu; Diane Kelly
    This study investigates the effectiveness of retrieval systems and human users in generating terms for query expansion. We compare three sources of terms: system generated terms, terms users select from top-ranked sentences, and user generated terms. Results demonstrate that overall the system generated more effective expansion terms than users, but that users' selection of terms improved precision at the top of the retrieved document list.
    Comparing two blind relevance feedback techniques BIBKFull-Text 649-650
      Daqing He; Yefei Peng
    Keywords: automatic query expansion, blind relevance feedback, comparison
    Information retrieval with commonsense knowledge BIBAFull-Text 651-652
      Ming-Hung Hsu; Hsin-Hsi Chen
    This paper employs ConceptNet, which covers a rich set of commonsense concepts, to retrieve images with text descriptions by focusing on spatial relationships. Evaluation on test data of the 2005 ImageCLEF shows that integrating commonsense knowledge in information retrieval is feasible.
    Refining hierarchical taxonomy structure via semi-supervised learning BIBKFull-Text 653-654
      Ruizhang Huang; Zhigang Zhang; Wai Lam
    Keywords: metric learning, semi-supervised learning
    Quantative analysis of the impact of judging inconsistency on the performance of relevance feedback BIBAFull-Text 655-656
      Xiangyu Jin; James French; Jonathan Michel
    Practical constrains of user interfaces make the user's judgment (during the feedback loop) deviate from real thoughts (when the full document is read). This is often overlooked in evaluation of relevance feedback.
       This paper quantitatively analyze the impact of judging inconsistency on the performance of relevance feedback.
    Swordfish: an unsupervised Ngram based approach to morphological analysis BIBAFull-Text 657-658
      Chris Jordan; John Healy; Vlado Keselj
    Extracting morphemes from words is a nontrivial task. Rule based stemming approaches such as Porter's algorithm have encountered some success, however they are restricted by their ability to identify a limited number of affixes and are language dependent. When dealing with languages with many affixes, rule based approaches generally require many more rules to deal with all the possible word forms. Deriving these rules requires a larger effort on the part of linguists and in some instances can be simply impractical. We propose an unsupervised ngram based approach, named Swordfish. Using ngram probabilities in the corpus, possible morphemes are identified. We look at two possible methods for identifying candidate morphemes, one using joint probabilities between two ngrams, and the second based on log odds between prefix probabilities. Initial results indicate the joint probability approach to be better for English while the prefix ratio approach is better for Finnish and Turkish.
    Authorship attribution with thousands of candidate authors BIBAFull-Text 659-660
      Moshe Koppel; Jonathan Schler; Shlomo Argamon; Eran Messeri
    In this paper, we use a blog corpus to demonstrate that we can often identify the author of an anonymous text even where there are many thousands of candidate authors. Our approach combines standard information retrieval methods with a text categorization meta-learning scheme that determines when to even venture a guess.
    Simple questions to improve pseudo-relevance feedback results BIBAFull-Text 661-662
      Giridhar Kumaran; James Allan
    We explore interactive methods to further improve the performance of pseudo-relevance feedback. Studies [4] suggest that new methods for tackling difficult queries are required. Our approach is to gather more information about the query from the user by asking her simple questions. The equally simple responses are used to modify the original query. Our experiments using the TREC Robust Track queries show that we can obtain a significant improvement in mean average precision averaging around 5% over pseudo-relevance feedback. This improvement is also spread across more queries compared to ordinary pseudo-relevance feedback, as suggested by geometric mean average precision.
    Is XML retrieval meaningful to users?: searcher preferences for full documents vs. elements BIBAFull-Text 663-664
      Birger Larsen; Anastasios Tombros; Saadia Malik
    The aim of this study is to investigate whether element retrieval (as opposed to full-text retrieval) is meaningful and useful for searchers when carrying out information-seeking tasks. Our results suggest that searchers find the structural breakdown of documents useful when browsing within retrieved documents, and provide support for the usefulness of element retrieval in interactive settings.
    Building a test collection for complex document information processing BIBAFull-Text 665-666
      D. Lewis; G. Agam; S. Argamon; O. Frieder; D. Grossman; J. Heard
    Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 1.5 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.
    Enhancing topic tracking with temporal information BIBAFull-Text 667-668
      Baoli Li; Wenjie Li; Qin Lu
    In this paper, we propose a new strategy with time granularity reasoning for utilizing temporal information in topic tracking. Compared with previous ones, our work has four distinguished characteristics. Firstly, we try to determine a set of topic times for a target topic from the given on-topic stories. It helps to avoid the negative influence from other irrelevant times. Secondly, we take into account time granularity variance when deciding whether a coreference relationship exists between two times. Thirdly, both publication time and times presented in texts are considered. Finally, as time is only one attribute of a topic, we increase the similarity between a story and a target topic only when they are related not only temporally but also semantically. Experiments on two TDT corpora show that our method makes good use of temporal information in news stories.
    A comparative study of the effect of search feature design on user experience in digital libraries (DLs) BIBAFull-Text 669-670
      Yuelin Li; Xiangmin Zhang; Ying Zhang; Jingjing Liu
    This study investigates the impact of different search feature designs in DLs on user search experience. The results indicate that the impact is significant in terms of the number of queries issued, search steps, zero-hits pages returned, and search errors.
    Representing clusters for retrieval BIBKFull-Text 671-672
      Xiaoyong Liu; W. Bruce Croft
    Keywords: cluster representation, cluster-based retrieval
    One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech BIBAFull-Text 673-674
      Baolong Liu; Douglas W. Oard
    Early speech retrieval experiments focused on news broadcasts, for which adequate Automatic Speech Recognition (ASR) accuracy could be obtained. Like newspapers, news broadcasts are a manually selected and arranged set of stories. Evaluation designs reflected that, using known story boundaries as a basis for evaluation. Substantial advances in ASR accuracy now make it possible to build search systems for some types of spontaneous conversational speech, but present evaluation designs continue to rely on known topic boundaries that are no longer well matched to the nature of the materials. We propose a new class of measures for speech retrieval based on manual annotation of points at which a user with specific topical interests would wish replay to begin.
    Combining fields in known-item email search BIBAFull-Text 675-676
      Craig Macdonald; Iadh Ounis
    Emails are examples of structured documents with various fields. These fields can be exploited to enhance the retrieval effectiveness of an Information Retrieval (IR) system that mailing list archives. In recent experiments of the TREC2005 Enterprise track, various fields were applied to varying degrees of success by the participants. In his work, using a field-based weighting model, we investigate the retrieval performance attainable by each field, and examine when fields evidence should be combined or not.
    Improving QA retrieval using document priors BIBAFull-Text 677-678
      James Mayfield; Paul McNamee
    We present a simple way to improve document retrieval for question answering systems. The method biases the retrieval system toward documents that contain words that have appeared in other documents containing answers to the same type of question. The method works with virtually any retrieval system, and exhibits a statistically significant performance improvement over a strong baseline.
    Content-based video retrieval: does video's semantic visual feature matter? BIBAFull-Text 679-680
      Xiangming Mu
    A new shot level video browsing method based on semantic visual features (e.g., car, mountain, and fire) is proposed to facilitate content-based retrieval. The video's binary semantic feature vector is utilized to calculate the score of similarity between two shot keyframes. The score is then used to browse the "similar" keyframes in terms of semantic visual features. A pilot user study was conducted to better understand users' behaviors in video retrieval context. Three video retrieval and browsing systems are compared: temporal neighbor, semantic visual feature, and fused browsing system. The initial results indicated that the semantic visual feature browsing was effective and efficient for Visual Centric tasks, but not for Non-visual Centric tasks.
    Action modeling: language models that predict query behavior BIBAFull-Text 681-682
      G. Craig Murray; Jimmy Lin; Abdur Chowdhury
    We present a novel language modeling approach to capturing the query reformulation behavior of Web search users. Based on a framework that categorizes eight different types of "user moves" (adding/removing query terms, etc.), we treat search sessions as sequence data and build n-gram language models to capture user behavior. We evaluated our models in a prediction task. The results suggest that useful patterns of activity can be extracted from user histories. Furthermore, by examining prediction performance under different order n-gram models, we gained insight into the amount of history/context that is associated with different types of user actions. Our work serves as the basis for more refined user models.
    A method of rating the credibility of news documents on the web BIBAFull-Text 683-684
      Ryosuke Nagura; Yohei Seki; Noriko Kando; Masaki Aono
    We propose a method to rate the credibility of news articles using three clues: (1) commonality of the contents of articles among different news publishers; (2) numerical agreement versus contradiction of numerical values reported in the articles; and (3) objectivity based on subjective speculative phrases and news sources. We tested this method on news stories taken from seven different news sites on the Web. The average agreement between the system-produced "credibility" and the manual judgments of three human assessors on the 52 sample articles was 69.1%. The limitations of the current approach and future directions are discussed.
    An analysis of the coupling between training set and neighborhood sizes for the kNN classifier BIBAFull-Text 685-686
      J. Scott Olsson
    We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.
    Fact-focused novelty detection: a feasibility study BIBAFull-Text 687-688
      Jahna Otterbacher; Dragomir Radev
    Methods for detecting sentences in an input document set, which are both relevant and novel with respect to an information need, would be of direct benefit to many systems, such as extractive text summarizers. However, satisfactory levels of agreement between judges performing this task manually have yet to demonstrated, leaving researchers to conclude that the task is too subjective. In previous experiments, judges were asked to first identify sentences that are relevant to a general topic, and then to eliminate sentences from the list that do not contain new information. Currently, a new task is proposed, in which annotators perform the same procedure, but within the context of a specific, factual information need. In the experiment, satisfactory levels of agreement between independent annotators were achieved on the first step of identifying sentences containing relevant information relevant. However, the results indicate that judges do not agree on which sentences contain novel information.
    Unity: relevance feedback using user query logs BIBAFull-Text 689-690
      Jignashu Parikh; Shyam Kapur
    The exponential growth of the Web and the increasing ability of web search engines to index data have led to a problem of plenty. The number of results returned per query is typically in the order of millions of documents for many common queries. Although there is the benefit of added coverage for every query, the problem of ranking these documents and giving the best results gets worse. The problem is even more difficult in case of temporal and ambiguous queries. We try to address this problem using feedback from user query logs. We leverage a technology called Units for generating query refinements which are shown as Also try queries on Yahoo! Search. We consider these refinements as sub-concepts which help define user intent and use them to improve search relevance. The results obtained via live testing on Yahoo! Search are encouraging.
    Improving personalized web search using result diversification BIBAFull-Text 691-692
      Filip Radlinski; Susan Dumais
    We present and evaluate methods for diversifying search results to improve personalized web search. A common personalization approach involves reranking the top N search results such that documents likely to be preferred by the user are presented higher. The usefulness of reranking is limited in part by the number and diversity of results considered. We propose three methods to increase the diversity of the top results and evaluate the effectiveness of these methods.
    Using small XML elements to support relevance BIBAFull-Text 693-694
      Georgina Ramirez; Thijs Westerveld; Arjen P. de Vries
    Small XML elements are often estimated relevant by the retrieval model but they are not desirable retrieval units. This paper presents a generic model that exploits the information obtained from small elements. We identify relationships between small and relevant elements and use this linking information to reinforce the relevance of other elements before removing the small ones. Our experiments using the INEX testbed show the effectiveness of our approach.
    Give me just one highly relevant document: P-measure BIBAFull-Text 695-696
      Tetsuya Sakai
    We introduce an evaluation metric called P-measure for the task of retrieving one highly relevant document. It models user behaviour in practical tasks such as known-item search, and is more stable and sensitive than Reciprocal Rank which cannot handle graded relevance.
    Feature diversity in cluster ensembles for robust document clustering BIBAFull-Text 697-698
      Xavier Sevillano; German Cobo; Francesc Alias; Joan Claudi Socoro
    The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. As a first step towards building robust document clusterers, a strategy based on feature diversity and cluster ensembles is presented in this work. Experiments conducted on a binary clustering problem show that our method is robust to near-optimal model order selection and able to detect constructive interactions between different document representations in the test bed.
    Lightening the load of document smoothing for better language modeling retrieval BIBAFull-Text 699-700
      Mark D. Smucker; James Allan
    We hypothesized that language modeling retrieval would improve if we reduced the need for document smoothing to provide an inverse document frequency (IDF) like effect. We created inverse collection frequency (ICF) weighted query models as a tool to partially separate the IDF-like role from document smoothing. Compared to maximum likelihood estimated (MLE) queries, the ICF weighted queries achieved a 6.4% improvement in mean average precision on description queries. The ICF weighted queries performed better with less document smoothing than that required by MLE queries. Language modeling retrieval may benefit from a means to separately incorporate an IDF-like behavior outside of document smoothing.
    The effect of OCR errors on stylistic text classification BIBAFull-Text 701-702
      Sterling Stuart Stein; Shlomo Argamon; Ophir Frieder
    Recently, interest is growing in non-topical text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.
    History repeats itself: repeat queries in Yahoo's logs BIBAFull-Text 703-704
      Jaime Teevan; Eytan Adar; Rosie Jones; Michael Potts
    Thanks to the ubiquity of the Internet search engine search box, users have come to depend on search engines both to find and re-find information. However, re-finding behavior has not been significantly addressed. Here we look at re-finding queries issued to the Yahoo! search engine by 114 users over a year.
    Early precision measures: implications from the downside of blind feedback BIBAFull-Text 705-706
      Stephen Tomlinson
    We report the statistically significant mean impacts of blind feedback, as implemented by 7 participants for the 2003 Reliable Information Access (RIA) Workshop, on 30 retrieval measures, including several primary recall measures not originally reported. We find that blind feedback was detrimental to measures focused on the first relevant item even when it boosted "early precision" measures such as mean Precision@10, implying that the conventional reporting of ad hoc precision needs enhancement.
    An experimental study on automatically labeling hierarchical clusters using statistical features BIBKFull-Text 707-708
      Pucktada Treeratpituk; Jamie Callan
    Keywords: cluster labeling, document hierarchy
    Strict and vague interpretation of XML-retrieval queries BIBAFull-Text 709-710
      Andrew Trotman; Mounia Lalmas
    Structural hints in XML-retrieval queries can be used to specify both the granularity of the search result (the target element) and where in a document to search (support elements). These hints might be interpreted either strictly or vaguely, but does it matter if an XML search engine interprets these in one way and the user in another? The performance of all runs submitted to INEX 2005 content and structure (CAS) tasks were measured for each of four different interpretations of CAS. Runs that perform well for one interpretation of target elements do so regardless of the interpretation of support elements; but how to interpret the target element does matter. This suggests that to perform well on all CAS queries it is necessary to know how the target structure specification should be interpreted. We extend the NEXI query language to include this, and hypothesize that using this will increase the overall performance of search engines.
    Why structural hints in queries do not help XML-retrieval BIBAFull-Text 711-712
      Andrew Trotman; Mounia Lalmas
    For many years it has been commonly held that a user who adds structural "hints" to a query will improve precision in an element retrieval search. At INEX 2005 we conducted an experiment to test this assumption. We present the unexpected result that structural hints in queries do not improve precision. An analysis of the topics and the judgments suggests that this is because users are particularly bad at giving structural hints.
    Searching the web using composed pages BIBFull-Text 713-714
      Ramakrishna Varadarajan; Vagelis Hristidis; Tao Li
    A study of real-time query expansion effectiveness BIBAFull-Text 715-716
      Ryen W. White; Gary Marchionini
    In this poster, we describe the study of an interface technique that provides a list of suggested additional query terms as a searcher types a search query, in effect offering interactive query expansion (IQE) options while the query is formulated. Analysis of the results shows that offering IQE during query formulation leads to better quality initial queries, and an increased uptake of query expansion. These findings have implications for how IQE should be offered in retrieval interfaces.
    A graph-based framework for relation propagation and its application to multi-label learning BIBAFull-Text 717-718
      Ming Wu; Rong Jin
    Label propagation exploits the structure of the unlabeled documents by propagating the label information of the training documents to the unlabeled documents. The limitation with the existing label propagation approaches is that they can only deal with a single type of objects. We propose a framework, named "relation propagation", that allows for information propagated among multiple types of objects. Empirical studies with multi-label text categorization showed that the proposed algorithm is more effective than several semi-supervised learning algorithms in that it is capable of exploring the correlation among different categories and the structure of unlabeled documents simultaneously.
    Measuring similarity of semi-structured documents with context weights BIBAFull-Text 719-720
      Christopher C. Yang; Nan Liu
    In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimental results based on a benchmark showed superior performance of the proposed measure over the baseline which ignores structural knowledge of XML documents.
    Incorporating query difference for learning retrieval functions in information retrieval BIBAFull-Text 721-722
      Hongyuan Zha; Zhaohui Zheng; Haoying Fu; Gordon Sun
    We discuss information retrieval methods that aim at serving a diverse stream of user queries. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine.
    Concept-based biomedical text retrieval BIBAFull-Text 723-724
      Ming Zhong; Xiangji Huang
    One challenging problem for biomedical text retrieval is to find accurate synonyms or name variants for biomedical entities. In this paper, we propose a new concept-based approach to tackle this problem. In this approach, a set of concepts instead of keywords will be extracted from a query first. Then these concepts will be used for retrieval purpose. The experiment results show that the proposed approach can boost the retrieval performance and it generates very good results on 2005 TREC Genomics data sets.


    The TIJAH XML information retrieval system BIBKFull-Text 725
      Henk Ernst Blok; Vojkan Mihajlovic; Georgina Ramirez; Thijs Westerveld; Djoerd Hiemstra; Arjen P. de Vries
    Keywords: XML, databases, information retrieval, region algebra, structured documents
    A location annotation system for personal photos BIBKFull-Text 726
      Chufeng Chen; Michael Oakes; John Tait
    Keywords: annotation, browsing, clustering
    Appraisal navigator BIBAFull-Text 727
      Navendu Garg; Kenneth Bloom; Shlomo Argamon
    Much interesting text on the web consists largely of opinionated or evaluative text, as opposed to directly informative text. The new field of 'sentiment analysis' seeks to characterize such aspects of natural language text, as opposed to just the bare facts. We suggest that 'appraisal expression extraction' should be viewed as a fundamental task for sentiment analysis. We define an 'appraisal expression' to be a piece of text expressing some evaluative stance towards a particular object. The task is to find these elements and characterize the type and orientation (positive or negative) of the evaluative stance, as well as its target and possibly its source. Potential applications of these methods include new approaches to the now-traditional tasks of sentiment classification and pinion mining, as well as possibly for adversarial textual analysis and intention detection for intelligence applications.
    A platform for Okapi-based contextual information retrieval BIBAFull-Text 728
      Xiangji Huang; Miao Wen; Aijun An; Yan-Rui Huang
    We present an extensible java-based platform for contextual retrieval based on the probabilistic information retrieval model. Modules for dual indexes, relevance feedback with blind or machine learning approaches and query expansion with context are integrated into the Okapi system to deal with the contextual information. This platform allows easy extension to include other types of contextual information.
    Project contexts to situate personal information BIBAFull-Text 729
      William Jones; Harry Bruce; Austin Foxley
    The Personal Project Planner prototype works as an extension to the file manager to provide people with rich-text overlays to their information (folders, files and also email, web pages, notes). Rich-text, document-like project plans can be created which then provide a context in which to create or reference the email messages, electronic documents, web pages, etc. that are needed to complete the plan. The user can later locate an information item such as an email message with reference to the plan (e.g., as an alternative to a mostly context-free search through the inbox or sent mail). The Planner explores a possibility that an effective organization of project-related information can emerge as a natural by-product of efforts to plan and structure the project.
    Cheshire3: retrieving from tera-scale grid-based digital libraries BIBFull-Text 730
      Ray R. Larson; Robert Sanderson
    DeWild: a tool for searching the web using wild cards BIBKFull-Text 731
      Haobin Li; Davood Rafiei
    Keywords: DeWild, data extraction, web search
    Searching for expertise using the terrier platform BIBKFull-Text 732
      Craig Macdonald; Iadh Ounis
    Keywords: expert finding, expert search information retrieval, expertise modelling, terrier
    DiLight: an ontology-based information access system for e-learning environments BIBKFull-Text 733
      Ming Mao; Yefei Peng; Daqing He
    Keywords: DiLight, e-Learn, information access, ontology
    Supporting semantic visual feature browsing in content-based video retrieval BIBAFull-Text 734
      Xiangming Mu
    A new shot level video retrieval system that supports semantic visual features (e.g., car, mountain, and fire) browsing is developed to facilitate content-based retrieval. The video's binary semantic feature vector is utilized to calculate the score of similarity between two shot keyframes. The score is then used to browse the "similar" keyframes in terms of semantic visual features.
    MathFind: a math-aware search engine BIBKFull-Text 735
      Rajesh Munavalli; Robert Miner
    Keywords: MathML, equation editor, math search