HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 97989900010203040506070809101112131415

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Charles L. A. Clarke; Norbert Fuhr; Noriko Kando; Wessel Kraaij; Arjen P. de Vries
Location:Amsterdam, The Netherlands
Dates:2007-Jul-23 to 2007-Jul-27
Publisher:ACM
Standard No:ISBN: 978-1-59593-597-7; ISBN: 1-59593-597-5; ACM Order Number: 606070; ACM DL: Table of Contents hcibib: IR07
Papers:221
Pages:928
  1. Keynote
  2. Personalization
  3. Routing and filtering
  4. Evaluation I
  5. Classification and clustering
  6. Image retrieval
  7. Summaries
  8. Users and the web
  9. Managing memory
  10. Topic detection and tracking
  11. Web IR I
  12. Interaction
  13. Learning to rank I
  14. Formal models
  15. Question answering
  16. Evaluation II
  17. Learning to rank II
  18. Spam spam spam
  19. Music retrieval
  20. Multi-lingual IR
  21. Link analysis
  22. Collection representation in distributed IR
  23. Index structures
  24. Web IR II
  25. Evaluation III
  26. Combination and fusion
  27. Spoken document retrieval
  28. Domain specific NLP
  29. Query processing strategies
  30. Posters
  31. Demonstrations
  32. Doctoral consortium

Keynote

Strategy follows technology BIBAFull-Text 1
  Edwin van Huis
In strategic management there has been a debate over many years. Already in 1962 Alfred Chandler had stated: Structure follows Strategy. In the nineteen eighties, Michael Porter modified Chandler's dictum about structure following strategy by introducing a second level of structure: organizational structure follows strategy, which in turn follows structure. So the question became: what is leading what?.
   Technology has in this debate been seen as a part of either the structure of the organisation itself, or part of the development of the environment in which the organisation tries to survive by adapting. The notion that technological advancement can also change the paradigm as of organisational strategy-development is new. This has mainly to do with the impact of the technological changes on the workflow and procedures of organisations. Never before they were so profound as in our days.
   Technological change affects us on different levels of our strategic development. I will give three examples of changes that are occurring or have occurred in "Sound and Vision".
   The first is the introduction of RFID transmitters in admission rings for the Sound and Vision experience. The second is the setup of a back office media asset management, storage and distribution structure for the Public Broadcasters. The third is the development of the archive towards becoming a Media-Application Service Provider.
2007 Athena Lecturer Award introduction BIBAFull-Text 2
  Karen Spärck Jones
Karen was aware of the severity of her illness. Within days of being informed of the Athena Award outcome and of the award of the BCS Lovelace Medal (she received both notifications on the same day in late February), Karen set about creating a video record of her acceptance presentation. Karen passed away on 4 April 2007, and was unable to realize her goal of attending SIGIR 2007 and accepting the Athena Lecturer Award in person. Please join with us as we remember the rich life and many contributions of Karen Spärck Jones -- scholar, academic, and friend. Further Information
  • http://www.cl.cam.ac.uk/~ksj21/
  • http://en.wikipedia.org/wiki/Karen_Sparck_Jones
  • http://campus.acm.org/public/pressroom/press_releases/3_2007/athena2007.cfm
  • http://www.cl.cam.ac.uk/misc/obituaries/sparck-jones/
  • Natural language and the information layer BIBAFull-Text 3-6
      Karen Spärck Jones
    This talk is in response to two Awards: the Association for Computing Machinery's Athena Award, given by the ACM's Committee on Women, on the nomination of the ACM Special Interest Group on Information Retrieval; and the British Computer Society's Lovelace Medal. It is a very great honour to have been given these awards. I would like to say how much I appreciate this recognition. Thank you, ACM and BCS. I would particularly like to say, and I hope the ACM will not take this amiss, how I appreciate being the first woman to be awarded the BCS Lovelace Medal. The awards carry the opportunity to give a lecture with them. I deeply regret not being able to do this live and in a way to suit each specifically. But I hope the single video based on this talk will go a little way as a substitute for two proper lectures.
       My talk has three parts: on the first phase of natural language processing research and its lessons; on subsequent developments up to the present and their lessons; and on where we are now and what I think are the wider implications for the future.

    Personalization

    Personalized query expansion for the web BIBAFull-Text 7-14
      Paul-Alexandru Chirita; Claudiu S. Firan; Wolfgang Nejdl
    The inherent ambiguity of short keyword queries demands for enhanced methods for Web retrieval. In this paper we propose to improve such Web queries by expanding them with terms collected from each user's Personal Information Repository, thus implicitly personalizing the search output. We introduce five broad techniques for generating the additional query keywords by analyzing user data at increasing granularity levels, ranging from term and compound level analysis up to global co-occurrence statistics, as well as to using external thesauri. Our extensive empirical analysis under four different scenarios shows some of these approaches to perform very well, especially on ambiguous queries, producing a very strong increase in the quality of the output rankings. Subsequently, we move this personalized search framework one step further and propose to make the expansion process adaptive to various features of each query. A separate set of experiments indicates the adaptive algorithms to bring an additional statistically significant improvement over the best static expansion approach.
    Using query contexts in information retrieval BIBAFull-Text 15-22
      Jing Bai; Jian-Yun Nie; Guihong Cao; Hugues Bouchard
    User query is an element that specifies an information need, but it is not the only one. Studies in literature have found many contextual factors that strongly influence the interpretation of a query. Recent studies have tried to consider the user's interests by creating a user profile. However, a single profile for a user may not be sufficient for a variety of queries of the user. In this study, we propose to use query-specific contexts instead of user-centric ones, including context around query and context within query. The former specifies the environment of a query such as the domain of interest, while the latter refers to context words within the query, which is particularly useful for the selection of relevant term relations. In this paper, both types of context are integrated in an IR model based on language modeling. Our experiments on several TREC collections show that each of the context factors brings significant improvements in retrieval effectiveness.
    Towards task-based personal information management evaluations BIBAFull-Text 23-30
      David Elsweiler; Ian Ruthven
    Personal Information Management (PIM) is a rapidly growing area of research concerned with how people store, manage and refind information. A feature of PIM research is that many systems have been designed to assist users manage and refind information, but very few have been evaluated. This has been noted by several scholars and explained by the difficulties involved in performing PIM evaluations. The difficulties include that people re-find information from within unique personal collections; researchers know little about the tasks that cause people to re-find information; and numerous privacy issues concerning personal information. In this paper we aim to facilitate PIM evaluations by addressing each of these difficulties. In the first part, we present a diary study of information re-finding tasks. The study examines the kind of tasks that require users to refind information and produces a taxonomy of refinding tasks for email messages and web pages. In the second part, we propose a task-based evaluation methodology based on our findings and examine the feasibility of the approach using two different methods of task creation.

    Routing and filtering

    Utility-based information distillation over temporally sequenced documents BIBAFull-Text 31-38
      Yiming Yang; Abhimanyu Lad; Ni Lao; Abhay Harpale; Bryan Kisiel; Monica Rogati
    This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with respect to long-lasting information needs ("tasks" with multiple queries). Our approach supports fine-grained user feedback via highlighting of arbitrary spans of text, and leverages such information for utility optimization in adaptive settings. For our experiments, we defined hypothetical tasks based on news events in the TDT4 corpus, with multiple queries per task. Answer keys (nuggets) were generated for each query and a semi-automatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses. We also propose an extension of the NDCG metric for assessing the utility of ranked passages as a combination of relevance and novelty. Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.
    Effective missing data prediction for collaborative filtering BIBAFull-Text 39-46
      Hao Ma; Irwin King; Michael R. Lyu
    Memory-based collaborative filtering algorithms have been widely adopted in many popular recommender systems, although these approaches all suffer from data sparsity and poor prediction quality problems. Usually, the user-item matrix is quite sparse, which directly leads to inaccurate recommendations. This paper focuses the memory-based collaborative filtering problems on two crucial factors: (1) similarity computation between users or items and (2) missing data prediction algorithms. First, we use the enhanced Pearson Correlation Coefficient (PCC) algorithm by adding one parameter which overcomes the potential decrease of accuracy when computing the similarity of users or items. Second, we propose an effective missing data prediction algorithm, in which information of both users and items is taken into account. In this algorithm, we set the similarity threshold for users and items respectively, and the prediction algorithm will determine whether predicting the missing data or not. We also address how to predict the missing data by employing a combination of user and item information. Finally, empirical studies on dataset MovieLens have shown that our newly proposed method outperforms other state-of-the-art collaborative filtering algorithms and it is more robust against data sparsity.
    Efficient Bayesian hierarchical user modeling for recommendation system BIBAFull-Text 47-54
      Yi Zhang; Jonathan Koren
    A content-based personalized recommendation system learns user specific profiles from user feedback so that it can deliver information tailored to each individual user's interest. A system serving millions of users can learn a better user profile for a new user, or a user with little feedback, by borrowing information from other users through the use of a Bayesian hierarchical model. Learning the model parameters to optimize the joint data likelihood from millions of users is very computationally expensive. The commonly used EM algorithm converges very slowly due to the sparseness of the data in IR applications. This paper proposes a new fast learning technique to learn a large number of individual user profiles. The efficacy and efficiency of the proposed algorithm are justified by theory and demonstrated on actual user data from Netflix and MovieLens.

    Evaluation I

    Robust test collections for retrieval evaluation BIBAFull-Text 55-62
      Ben Carterette
    Low-cost methods for acquiring relevance judgments can be a boon to researchers who need to evaluate new retrieval tasks or topics but do not have the resources to make thousands of judgments. While these judgments are very useful for a one-time evaluation, it is not clear that they can be trusted when re-used to evaluate new systems. In this work, we formally define what it means for judgments to be reusable: the confidence in an evaluation of new systems can be accurately assessed from an existing set of relevance judgments. We then present a method for augmenting a set of relevance judgments with relevance estimates that require no additional assessor effort. Using this method practically guarantees reusability: with as few as five judgments per topic taken from only two systems, we can reliably evaluate a larger set of ten systems. Even the smallest sets of judgments can be useful for evaluation of new systems.
    Reliable information retrieval evaluation with incomplete and biased judgements BIBAFull-Text 63-70
      Stefan Büttcher; Charles L. A. Clarke; Peter C. K. Yeung; Ian Soboroff
    Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique.
       We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.
    Alternatives to Bpref BIBAFull-Text 71-78
      Tetsuya Sakai
    Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP) to condensed lists, obtained by filtering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power.

    Classification and clustering

    An interactive algorithm for asking and incorporating feature feedback into support vector machines BIBAFull-Text 79-86
      Hema Raghavan; James Allan
    Standard machine learning techniques typically require ample training data in the form of labeled instances. In many situations it may be too tedious or costly to obtain sufficient labeled data for adequate classifier performance. However, in text classification, humans can easily guess the relevance of features, that is, words that are indicative of a topic, thereby enabling the classifier to focus its feature weights more appropriately in the absence of sufficient labeled data. We will describe an algorithm for tandem learning that begins with a couple of labeled instances, and then at each iteration recommends features and instances for a human to label. Tandem learning using an "oracle" results in much better performance than learning on only features or only instances. We find that humans can emulate the oracle to an extent that results in performance (accuracy) comparable to that of the oracle. Our unique experimental design helps factor out system error from human error, leading to a better understanding of when and why interactive feature selection works.
    Learn from web search logs to organize search results BIBAFull-Text 87-94
      Xuanhui Wang; ChengXiang Zhai
    Effective organization of search results is critical for improving the utility of any search engine. Clustering search results is an effective way to organize search results, which allows a user to navigate into relevant documents quickly. However, two deficiencies of this approach make it not always work well: (1) the clusters discovered do not necessarily correspond to the interesting aspects of a topic from the user's perspective; and (2) the cluster labels generated are not informative enough to allow a user to identify the right cluster. In this paper, we propose to address these two deficiencies by (1) learning "interesting aspects" of a topic from Web search logs and organizing search results accordingly; and (2) generating more meaningful cluster labels using past query words entered by users. We evaluate our proposed method on a commercial search engine log data. Compared with the traditional methods of clustering search results, our method can give better result organization and more meaningful labels.
    Regularized clustering for documents BIBAFull-Text 95-102
      Fei Wang; Changshui Zhang; Tao Li
    In recent years, document clustering has been receiving more and more attentions as an important and fundamental technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. In this paper, we propose a novel method for clustering documents using regularization. Unlike traditional globally regularized clustering methods, our method first construct a local regularized linear label predictor for each document vector, and then combine all those local regularizers with a global smoothness regularizer. So we call our algorithm Clustering with Local and Global Regularization (CLGR). We will show that the cluster memberships of the documents can be achieved by eigenvalue decomposition of a sparse symmetric matrix, which can be efficiently solved by iterative methods. Finally our experimental evaluations on several datasets are presented to show the superiorities of CLGR over traditional document clustering methods.

    Image retrieval

    Towards automatic extraction of event and place semantics from Flickr tags BIBAFull-Text 103-110
      Tye Rattenbury; Nathaniel Good; Mor Naaman
    We describe an approach for extracting semantics of tags, unstructured text-labels assigned to resources on the Web, based on each tag's usage patterns. In particular, we focus on the problem of extracting place and event semantics for tags that are assigned to photos on Flickr, a popular photo sharing website that supports time and location (latitude/longitude) metadata. We analyze two methods inspired by well-known burst-analysis techniques and one novel method: Scale-structure Identification. We evaluate the methods on a subset of Flickr data, and show that our Scale-structure Identification method outperforms the existing techniques. The approach and methods described in this work can be used in other domains such as geo-annotated web pages, where text terms can be extracted and associated with usage patterns.
    Hierarchical classification for automatic image annotation BIBAFull-Text 111-118
      Jianping Fan; Yuli Gao; Hangzai Luo
    In this paper, a hierarchical classification framework has been proposed for bridging the semantic gap effectively and achieving multi-level image annotation automatically. First, the semantic gap between the low-level computable visual features and users' real information needs is partitioned into four smaller gaps, and multiple approaches all are proposed to bridge these smaller gaps more effectively. To learn more reliable contextual relationships between the atomic image concepts and the co-appearances of salient objects, a multi-modal boosting algorithm is proposed. To enable hierarchical image classification and avoid inter-level error transmission, a hierarchical boosting algorithm is proposed by incorporating concept ontology and multi-task learning to achieve hierarchical image classifier training with automatic error recovery. To bridge the gap between the computable image concepts and the users' real information needs, a novel hyperbolic visualization framework is seamlessly incorporated to enable intuitive query specification and evaluation by acquainting the users with a good global view of large-scale image collections. Our experiments on large-scale image databases have also obtained very positive results.
    Laplacian optimal design for image retrieval BIBAFull-Text 119-126
      Xiaofei He; Wanli Min; Deng Cai; Kun Zhou
    Relevance feedback is a powerful technique to enhance Content-Based Image Retrieval (CBIR) performance. It solicits the user's relevance judgments on the retrieved images returned by the CBIR systems. The user's labeling is then used to learn a classifier to distinguish between relevant and irrelevant images. However, the top returned images may not be the most informative ones. The challenge is thus to determine which unlabeled images would be the most informative (i.e., improve the classifier the most) if they were labeled and used as training samples. In this paper, we propose a novel active learning algorithm, called Laplacian Optimal Design (LOD), for relevance feedback image retrieval. Our algorithm is based on a regression model which minimizes the least square error on the measured (or, labeled) images and simultaneously preserves the local geometrical structure of the image space. Specifically, we assume that if two images are sufficiently close to each other, then their measurements (or, labels) are close as well. By constructing a nearest neighbor graph, the geometrical structure of the image space can be described by the graph Laplacian. We discuss how results from the field of optimal experimental design may be used to guide our selection of a subset of images, which gives us the most amount of information. Experimental results on Corel database suggest that the proposed approach achieves higher precision in relevance feedback image retrieval.

    Summaries

    Fast generation of result snippets in web search BIBAFull-Text 127-134
      Andrew Turpin; Yohannes Tsegay; David Hawking; Hugh E. Williams
    The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.
    The influence of caption features on clickthrough patterns in web search BIBAFull-Text 135-142
      Charles L. A. Clarke; Eugene Agichtein; Susan Dumais; Ryen W. White
    Web search engines present lists of captions, comprising title, snippet, and URL, to help users decide which search results to visit. Understanding the influence of features of these captions on Web search behavior may help validate algorithms and guidelines for their improved generation. In this paper we develop a methodology to use clickthrough logs from a commercial search engine to study user behavior when interacting with search result captions. The findings of our study suggest that relatively simple caption features such as the presence of all terms query terms, the readability of the snippet, and the length of the URL shown in the caption, can significantly influence users' Web search behavior.
    CollabSum: exploiting multiple document clustering for collaborative single document summarizations BIBAFull-Text 143-150
      Xiaojun Wan; Jianwu Yang
    Almost all existing methods conduct the summarization tasks for single documents separately without interactions for each document under the assumption that the documents are considered independent of each other. This paper proposes a novel framework called CollabSum for collaborative single document summarizations by making use of mutual influences of multiple documents within a cluster context. In this study, CollabSum is implemented by first employing the clustering algorithm to obtain appropriate document clusters and then exploiting the graph-ranking based algorithm for collaborative document summarizations within each cluster. Both the with-document and cross-document relationships between sentences are incorporated in the algorithm. Experiments on the DUC2001 and DUC2002 datasets demonstrate the encouraging performance of the proposed approach. Different clustering algorithms have been investigated and we find that the summarization performance relies positively on the quality of document cluster.

    Users and the web

    Information re-retrieval: repeat queries in Yahoo's logs BIBAFull-Text 151-158
      Jaime Teevan; Eytan Adar; Rosie Jones; Michael A. S. Potts
    People often repeat Web searches, both to find new information on topics they have previously explored and to re-find information they have seen in the past. The query associated with a repeat search may differ from the initial query but can nonetheless lead to clicks on the same results. This paper explores repeat search behavior through the analysis of a one-year Web query log of 114 anonymous users and a separate controlled survey of an additional 119 volunteers. Our study demonstrates that as many as 40% of all queries are re-finding queries. Re-finding appears to be an important behavior for search engines to explicitly support, and we explore how this can be done. We demonstrate that changes to search engine results can hinder re-finding, and provide a way to automatically detect repeat searches and predict repeat clicks.
    Studying the use of popular destinations to enhance web search interaction BIBAFull-Text 159-166
      Ryen W. White; Mikhail Bilenko; Silviu Cucerzan
    We present a novel Web search interaction feature which, for a given query, provides links to websites frequently visited by other users with similar information needs. These popular destinations complement traditional search results, allowing direct navigation to authoritative resources for the query topic. Destinations are identified using the history of search and browsing behavior of many users over an extended time period, whose collective behavior provides a basis for computing source authority. We describe a user study which compared the suggestion of destinations with the previously proposed suggestion of related queries, as well as with traditional, unaided Web search. Results show that search enhanced by destination suggestions outperforms other systems for exploratory tasks, with best performance obtained from mining past user behavior at query-level granularity.
    Neighborhood restrictions in geographic IR BIBAFull-Text 167-174
      Steven Schockaert; Martine De Cock
    Geographic information retrieval (GIR) systems allow users to specify a geographic context, in addition to a more traditional query, enabling the system to pinpoint interesting search results whose relevancy is location-dependent. In particular local search services have become a widely used mechanism to find businesses, such as hotels, restaurants, and shops, which satisfy a geographical restriction. Unfortunately, many useful types of geographic restrictions are currently not supported in these systems, including restrictions that specify the neighborhood in which the business should be located. As the boundaries of city neighborhoods are not readily available, automated techniques to construct representations of the spatial extent of neighborhoods are required to support this kind of restrictions. In this paper, we propose such a technique, using fuzzy footprints to cope with the inherent vagueness of most neighborhood boundaries, and we provide experimental results that demonstrate the potential of our technique in a local search setting.

    Managing memory

    Efficient document retrieval in main memory BIBAFull-Text 175-182
      Trevor Strohman; W. Bruce Croft
    Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse.
       We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness.
       We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69% improvement in query throughput over previous methods.
    The impact of caching on search engines BIBAFull-Text 183-190
      Ricardo Baeza-Yates; Aristides Gionis; Flavio Junqueira; Vanessa Murdock; Vassilis Plachouras; Fabrizio Silvestri
    In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
    Pruning policies for two-tiered inverted index with correctness guarantee BIBAFull-Text 191-198
      Alexandros Ntoulas; Junghoo Cho
    The Web search engines maintain large-scale inverted indexes which are queried thousands of times per second by users eager for information. In order to cope with the vast amounts of query loads, search engines prune their index to keep documents that are likely to be returned as top results, and use this pruned index to compute the first batches of results. While this approach can improve performance by reducing the size of the index, if we compute the top results only from the pruned index we may notice a significant degradation in the result quality: if a document should be in the top results but was not included in the pruned index, it will be placed behind the results computed from the pruned index. Given the fierce competition in the online search market, this phenomenon is clearly undesirable.
       In this paper, we study how we can avoid any degradation of result quality due to the pruning-based performance optimization, while still realizing most of its benefit. Our contribution is a number of modifications in the pruning techniques for creating the pruned index and a new result computation algorithm that guarantees that the top-matching pages are always placed at the top search results, even though we are computing the first batch from the pruned index most of the time. We also show how to determine the optimal size of a pruned index and we experimentally evaluate our algorithms on a collection of 130 million Web pages.

    Topic detection and tracking

    Topic segmentation with shared topic detection and alignment of multiple documents BIBAFull-Text 199-206
      Bingjun Sun; Prasenjit Mitra; C. Lee Giles; John Yen; Hongyuan Zha
    Topic detection and tracking and topic segmentation play an important role in capturing the local and sequential information of documents. Previous work in this area usually focuses on single documents, although similar multiple documents are available in many domains. In this paper, we introduce a novel unsupervised method for shared topic detection and topic segmentation of multiple similar documents based on mutual information (MI) and weighted mutual information (WMI) that is a combination of MI and term weights. The basic idea is that the optimal segmentation maximizes MI (or WMI). Our approach can detect shared topics among documents. It can find the optimal boundaries in a document, and align segments among documents at the same time. It also can handle single-document segmentation as a special case of the multi-document segmentation and alignment. Our methods can identify and strengthen cue terms that can be used for segmentation and partially remove stop words by using term weights based on entropy learned from multiple documents. Our experimental results show that our algorithm works well for the tasks of single-document segmentation, shared topic detection, and multi-document segmentation. Utilizing information from multiple documents can tremendously improve the performance of topic segmentation, and using WMI is even better than using MI for the multi-document segmentation.
    Analyzing feature trajectories for event detection BIBAFull-Text 207-214
      Qi He; Kuiyu Chang; Ee-Peng Lim
    We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency -- inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each feature's burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.
    New event detection based on indexing-tree and named entity BIBAFull-Text 215-222
      Kuo Zhang; Juan Zi; Li Gang Wu
    New Event Detection (NED) aims at detecting from one or multiple streams of news stories that which one is reported on a new event (i.e. not reported previously). With the overwhelming volume of news available today, there is an increasing need for a NED system which is able to detect new events more efficiently and accurately. In this paper we propose a new NED model to speed up the NED task by using news indexing-tree dynamically. Moreover, based on the observation that terms of different types have different effects for NED task, two term reweighting approaches are proposed to improve NED accuracy. In the first approach, we propose to adjust term weights dynamically based on previous story clusters and in the second approach, we propose to employ statistics on training data to learn the named entity reweighting model for each class of stories. Experimental results on two Linguistic Data Consortium (LDC) datasets TDT2 and TDT3 show that the proposed model can improve both efficiency and accuracy of NED task significantly, compared to the baseline system and other existing systems.

    Web IR I

    Multiple-signal duplicate detection for search evaluation BIBAFull-Text 223-230
      Scott Huffman; April Lehman; Alexei Stolboushkin; Howard Wong-Toi; Fan Yang; Hein Roehrig
    We consider the problem of duplicate document detection for search evaluation. Given a query and a small number of web results for that query, we show how to detect duplicate web documents with precision 0.91 and recall 77. In contrast, Charikar's algorithm, designed for duplicate detection in an indexing pipeline, achieves precision 0.91 but with a recall of 0.58. Our improvement in recall while maintaining high precision comes from combining three ideas. First, because we are only concerned with duplicate detection among results for the same query, the number of pairwise comparisons is small. Therefore we can afford to compute multiple pairwise signals for each pair of documents. A model learned with standard machine-learning techniques improves recall to 0.68 with precision 0.90. Second, most duplicate detection has focused on text analysis of the HTML contents of a document. In some web pages the HTML is not a good indicator of the final contents of the page. We use extended fetching techniques to fill in frames and execute Java script. Including signals based on our richer fetches further improves the recall to 0.75 and the precision to 0.91. Finally, we also explore using signals based on the query. Comparing contextual snippets based on the richer fetches improves the recall to 0.77. We show that the overall accuracy of this final model approaches that of human judges.
    Robust classification of rare queries using web knowledge BIBAFull-Text 231-238
      Andrei Z. Broder; Marcus Fontoura; Evgeniy Gabrilovich; Amruta Joshi; Vanja Josifovski; Tong Zhang
    We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.
    Random walks on the click graph BIBAFull-Text 239-246
      Nick Craswell; Martin Szummer
    Search engines can record which documents were clicked for which query, and use these query-document pairs as "soft" relevance judgments. However, compared to the true judgments, click logs give noisy and sparse relevance information. We apply a Markov random walk model to a large click log, producing a probabilistic ranking of documents for a given query. A key advantage of the model is its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively. We conduct experiments on click logs from image search, comparing our ("backward") random walk model to a different ("forward") random walk, varying parameters such as walk length and self-transition probability. The most effective combination is a long backward walk with high self-transition probability.

    Interaction

    Supporting multiple information-seeking strategies in a single system framework BIBAFull-Text 247-254
      Xiaojun Yuan; Nicholas J. Belkin
    This paper reports on an experiment comparing the retrieval effectiveness of an interactive information retrieval (IIR) system which adapts to support different information seeking strategies, with that of a standard baseline IIR system. The experiment, with 32 subjects each searching on 8 different topics, indicates that using the integrated IIR system resulted in significantly better performance, including user satisfaction with search results, significantly more effective interaction, and significantly better usability than using the baseline system.
    Investigating the querying and browsing behavior of advanced search engine users BIBAFull-Text 255-262
      Ryen W. White; Dan Morris
    One way to help all users of commercial Web search engines be more successful in their searches is to better understand what those users with greater search expertise are doing, and use this knowledge to benefit everyone. In this paper we study the interaction logs of advanced search engine users (and those not so advanced) to better understand how these user groups search. The results show that there are marked differences in the queries, result clicks, post-query browsing, and search success of users we classify as advanced (based on their use of query operators), relative to those classified as non-advanced. Our findings have implications for how advanced users should be supported during their searches, and how their interactions could be used to help searchers of all experience levels find more relevant information and learn improved searching strategies.
    Term feedback for information retrieval with language models BIBAFull-Text 263-270
      Bin Tan; Atulya Velivelli; Hui Fang; ChengXiang Zhai
    In this paper we study term-based feedback for information retrieval in the language modeling approach. With term feedback a user directly judges the relevance of individual terms without interaction with feedback documents, taking full control of the query expansion process. We propose a cluster-based method for selecting terms to present to the user for judgment, as well as effective algorithms for constructing refined query language models from user term feedback. Our algorithms are shown to bring significant improvement in retrieval accuracy over a non-feedback baseline, and achieve comparable performance to relevance feedback. They are helpful even when there are no relevant documents in the top.

    Learning to rank I

    A support vector method for optimizing average precision BIBAFull-Text 271-278
      Yisong Yue; Thomas Finley; Filip Radlinski; Thorsten Joachims
    Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.
    Ranking with multiple hyperplanes BIBAFull-Text 279-286
      Tao Qin; Xu-Dong Zhang; De-Sheng Wang; Tie-Yan Liu; Wei Lai; Hang Li
    The central problem for many applications in Information Retrieval is ranking and learning to rank is considered as a promising approach for addressing the issue. Ranking SVM, for example, is a state-of-the-art method for learning to rank and has been empirically demonstrated to be effective. In this paper, we study the issue of learning to rank, particularly the approach of using SVM techniques to perform the task. We point out that although Ranking SVM is advantageous, it still has shortcomings. Ranking SVM employs a single hyperplane in the feature space as the model for ranking, which is too simple to tackle complex ranking problems. Furthermore, the training of Ranking SVM is also computationally costly. In this paper, we look at an alternative approach to Ranking SVM, which we call "Multiple Hyperplane Ranker" (MHR), and make comparisons between the two approaches. MHR takes the divide-and-conquer strategy. It employs multiple hyperplanes to rank instances and finally aggregates the ranking results given by the hyperplanes. MHR contains Ranking SVM as a special case, and MHR can overcome the shortcomings which Ranking SVM suffers from. Experimental results on two information retrieval datasets show that MHR can outperform Ranking SVM in ranking.
    A regression framework for learning ranking functions using relative relevance judgments BIBAFull-Text 287-294
      Zhaohui Zheng; Keke Chen; Gordon Sun; Hongyuan Zha
    Effective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: 1) absolute relevance judgments arising from explicit labeling of search results; and 2) relative relevance judgments extracted from user click throughs of search results or converted from the absolute relevance judgments. We propose a novel optimization framework emphasizing the use of relative relevance judgments. The main contribution is the development of an algorithm based on regression that can be applied to objective functions involving preference data, i.e., data indicating that a document is more relevant than another with respect to a query. Experimental results are carried out using data sets obtained from a commercial search engine. Our results show significant improvements of our proposed methods over some existing methods.

    Formal models

    An exploration of proximity measures in information retrieval BIBAFull-Text 295-302
      Tao Tao; ChengXiang Zhai
    In most existing retrieval models, documents are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies, and document lengths. Intuitively, the proximity of matched query terms in a document can also be exploited to promote scores of documents in which the matched query terms are close to each other. Such a proximity heuristic, however, has been largely under-explored in the literature; it is unclear how we can model proximity and incorporate a proximity measure into an existing retrieval model. In this paper, we systematically explore the query term proximity heuristic. Specifically, we propose and study the effectiveness of five different proximity measures, each modeling proximity from a different perspective. We then design two heuristic constraints and use them to guide us in incorporating the proposed proximity measures into an existing retrieval model. Experiments on five standard TREC test collections show that one of the proposed proximity measures is indeed highly correlated with document relevance, and by incorporating it into the KL-divergence language model and the Okapi BM25 model, we can significantly improve retrieval performance.
    Estimation and use of uncertainty in pseudo-relevance feedback BIBAFull-Text 303-310
      Kevyn Collins-Thompson; Jamie Callan
    Existing pseudo-relevance feedback methods typically perform averaging over the top-retrieved documents, but ignore an important statistical dimension: the risk or variance associated with either the individual document models, or their combination. Treating the baseline feedback method as a black box, and the output feedback model as a random variable, we estimate a posterior distribution for the feed-back model by resampling a given query's top-retrieved documents, using the posterior mean or mode as the enhanced feedback model. We then perform model combination over several enhanced models, each based on a slightly modified query sampled from the original query. We find that resampling documents helps increase individual feedback model precision by removing noise terms, while sampling from the query improves robustness (worst-case performance) by emphasizing terms related to multiple query aspects. The result is a meta-feedback algorithm that is both more robust and more precise than the original strong baseline method.
    Latent concept expansion using markov random fields BIBAFull-Text 311-318
      Donald Metzler; W. Bruce Croft
    Query expansion, in the form of pseudo-relevance feedback or relevance feedback, is a common technique used to improve retrieval effectiveness. Most previous approaches have ignored important issues, such as the role of features and the importance of modeling term dependencies. In this paper, we propose a robust query expansion technique based on the Markov random field model for information retrieval. The technique, called latent concept expansion, provides a mechanism for modeling term dependencies during expansion. Furthermore, the use of arbitrary features within the model provides a powerful framework for going beyond simple term occurrence features that are implicitly used by most other expansion techniques. We evaluate our technique against relevance models, a state-of-the-art language modeling query expansion technique. Our model demonstrates consistent and significant improvements in retrieval effectiveness across several TREC data sets. We also describe how our technique can be used to generate meaningful multi-term concepts for tasks such as query suggestion/reformulation.
    A study of Poisson query generation model for information retrieval BIBAFull-Text 319-326
      Qiaozhu Mei; Hui Fang; ChengXiang Zhai
    Many variants of language models have been proposed for information retrieval. Most existing models are based on multinomial distribution and would score documents based on query likelihood computed based on a query generation probabilistic model. In this paper, we propose and study a new family of query generation models based on Poisson distribution. We show that while in their simplest forms, the new family of models and the existing multinomial models are equivalent. However, based on different smoothing methods, the two families of models behave differently. We show that the Poisson model has several advantages, including naturally accommodating per-term smoothing and modeling accurate background more efficiently. We present several variants of the new model corresponding to different smoothing methods, and evaluate them on four representative TREC test collections. The results show that while their basic models perform comparably, the Poisson model can out perform multinomial model with per-term smoothing. The performance can be further improved with two-stage smoothing.

    Question answering

    Deconstructing nuggets: the stability and reliability of complex question answering evaluation BIBAFull-Text 327-334
      Jimmy Lin; Pengyi Zhang
    A methodology based on "information nuggets" has recently emerged as the de facto standard by which answers to complex questions are evaluated. After several implementations in the TREC question answering tracks, the community has gained a better understanding of its many characteristics. This paper focuses on one particular aspect of the evaluation: the human assignment of nuggets to answer strings, which serves as the basis of the F-score computation. As a byproduct of the TREC 2006 ciQA task, identical answer strings were independently evaluated twice, which allowed us to assess the consistency of human judgments. Based on these results, we explored simulations of assessor behavior that provide a method to quantify scoring variations. Understanding these variations in turn lets researchers be more confident in their comparisons of systems.
    Interesting nuggets and their impact on definitional question answering BIBAFull-Text 335-342
      Kian-Wei Kor; Tat-Seng Chua
    Current approaches to identifying definitional sentences in the context of Question Answering mainly involve the use of linguistic or syntactic patterns to identify informative nuggets. This is insufficient as they do not address the novelty factor that a definitional nugget must also possess. This paper proposes to address the deficiency by building a "Human Interest Model" from external knowledge. It is hoped that such a model will allow the computation of human interest in the sentence with respect to the topic. We compare and contrast our model with current definitional question answering models to show that interestingness plays an important factor in definitional question answering.
    A probabilistic graphical model for joint answer ranking in question answering BIBAFull-Text 343-350
      Jeongwoo Ko; Eric Nyberg; Luo Si
    Graphical models have been applied to various information retrieval and natural language processing tasks in the recent literature. In this paper, we apply a probabilistic graphical model for answer ranking in question answering. This model estimates the joint probability of correctness of all answer candidates, from which the probability of correctness of an individual candidate can be inferred. The joint prediction model can estimate both the correctness of individual answers as well as their correlations, which enables a list of accurate and comprehensive answers. This model was compared with a logistic regression model which directly estimates the probability of correctness of each individual answer candidate. An extensive set of empirical results based on TREC questions demonstrates the effectiveness of the joint model for answer ranking. Furthermore, we combine the joint model with the logistic regression model to improve the efficiency and accuracy of answer ranking.
    Structured retrieval for question answering BIBAFull-Text 351-358
      Matthew W. Bilotti; Paul Ogilvie; Jamie Callan; Eric Nyberg
    Bag-of-words retrieval is popular among Question Answering (QA) system developers, but it does not support constraint checking and ranking on the linguistic and semantic information of interest to the QA system. We present an approach to retrieval for QA, applying structured retrieval techniques to the types of text annotations that QA systems use. We demonstrate that the structured approach can retrieve more relevant results, more highly ranked, compared with bag-of-words, on a sentence retrieval task. We also characterize the extent to which structured retrieval effectiveness depends on the quality of the annotations.

    Evaluation II

    On the robustness of relevance measures with incomplete judgments BIBAFull-Text 359-366
      Tanuja Bompada; Chi-Chao Chang; John Chen; Ravi Kumar; Rajesh Shenoy
    We investigate the robustness of three widely used IR relevance measures for large data collections with incomplete judgments. The relevance measures we consider are the bpref measure introduced by Buckley and Voorhees [7], the inferred average precision (infAP) introduced by Aslam and Yilmaz [4], and the normalized discounted cumulative gain (NDCG) measure introduced by Järvelin and Kekäläinen [8]. Our main results show that NDCG consistently performs better than both bpref and infAP. The experiments are performed on standard TREC datasets, under different levels of incompleteness of judgments, and using two different evaluation methods, namely, the Kendall correlation measures order between system rankings and pairwise statistical significance testing; the latter may be of independent interest.
    Test theory for assessing IR test collections BIBAFull-Text 367-374
      David Bodoff; Pu Li
    How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.
    Strategic system comparisons via targeted relevance judgments BIBAFull-Text 375-382
      Alistair Moffat; William Webber; Justin Zobel
    Relevance judgments are used to compare text retrieval systems. Given a collection of documents and queries, and a set of systems being compared, a standard approach to forming judgments is to manually examine all documents that are highly ranked by any of the systems. However, not all of these relevance judgments provide the same benefit to the final result, particularly if the aim is to identify which systems are best, rather than to fully order them. In this paper we propose new experimental methodologies that can significantly reduce the volume of judgments required in system comparisons. Using rank-biased precision, a recently proposed effectiveness measure, we show that judging around 200 documents for each of 50 queries in a TREC-scale system evaluation containing over 100 runs is sufficient to identify the best systems.

    Learning to rank II

    FRank: a ranking method with fidelity loss BIBAFull-Text 383-390
      Ming-Feng Tsai; Tie-Yan Liu; Tao Qin; Hsin-Hsi Chen; Wei-Ying Ma
    Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost, and RankNet. Among them, RankNet, which is based on a probabilistic ranking framework, is leading to promising results and has been applied to a commercial Web search engine. In this paper we conduct further study on the probabilistic ranking framework and provide a novel loss function named fidelity loss for measuring loss of ranking. The fidelity loss not only inherits effective properties of the probabilistic ranking framework in RankNet, but possesses new properties that are helpful for ranking. This includes the fidelity loss obtaining zero for each document pair, and having a finite upper bound that is necessary for conducting query-level normalization. We also propose an algorithm named FRank based on a generalized additive model for the sake of minimizing the fidelity loss and learning an effective ranking function. We evaluated the proposed algorithm for two datasets: TREC dataset and real Web search dataset. The experimental results show that the proposed FRank algorithm outperforms other learning-based ranking methods on both conventional IR problem and Web search.
    AdaRank: a boosting algorithm for information retrieval BIBAFull-Text 391-398
      Jun Xu; Hang Li
    In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.
    A combined component approach for finding collection-adapted ranking functions based on genetic programming BIBAFull-Text 399-406
      Humberto Mossri de Almeida; Marcos André Gonçalves; Marco Cristo; Pável Calado
    In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA) is based on the combination of several term-weighting components (i.e., term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outperform standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the over-training, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison.
    Feature selection for ranking BIBAFull-Text 407-414
      Xiubo Geng; Tie-Yan Liu; Tao Qin; Hang Li
    Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.

    Spam spam spam

    Relaxed online SVMs for spam filtering BIBAFull-Text 415-422
      D. Sculley; Gabriel M. Wachman
    Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial practitioners disagree on how best to filter spam. The former have advocated the use of Support Vector Machines (SVMs) for content-based filtering, as this machine learning methodology gives state-of-the-art performance for text classification. However, similar performance gains have yet to be demonstrated for online spam filtering. Additionally, practitioners cite the high cost of SVMs as reason to prefer faster (if less statistically robust) Bayesian methods. In this paper, we offer a resolution to this controversy. First, we show that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets. Second, we show that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost. Our results are experimentally verified on email spam, blog spam, and splog detection tasks.
    Know your neighbors: web spam detection using the web topology BIBAFull-Text 423-430
      Carlos Castillo; Debora Donato; Aristides Gionis; Vanessa Murdock; Fabrizio Silvestri
    Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. In this paper we present a spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages. We find that linked hosts tend to belong to the same class: either both are spam or both are non-spam. We demonstrate three methods of incorporating the Web graph topology into the predictions obtained by our base classifier: (i) clustering the host graph, and assigning the label of all hosts in the cluster by majority vote, (ii) propagating the predicted labels to neighboring hosts, and (iii) using the predicted labels of neighboring hosts as new features and retraining the classifier. The result is an accurate system for detecting Web spam, tested on a large and public dataset, using algorithms that can be applied in practice to large-scale Web data.
    DiffusionRank: a possible penicillin for web spamming BIBAFull-Text 431-438
      Haixuan Yang; Irwin King; Michael R. Lyu
    While the PageRank algorithm has proven to be very effective for ranking Web pages, the rank scores of Web pages can be manipulated. To handle the manipulation problem and to cast a new insight on the Web structure, we propose a ranking algorithm called DiffusionRank. DiffusionRank is motivated by the heat diffusion phenomena, which can be connected to Web ranking because the activities flow on the Web can be imagined as heat flow, the link from a page to another can be treated as the pipe of an air-conditioner, and heat flow can embody the structure of the underlying Web graph. Theoretically we show that DiffusionRank can serve as a generalization of PageRank when the heat diffusion co-efficient Y tends to infinity. In such a case 1=Y= 0, DiffusionRank (PageRank) has low ability of anti-manipulation. When Y = 0, DiffusionRank obtains the highest ability of anti-manipulation, but in such a case, the web structure is completely ignored. Consequently, Y is an interesting factor that can control the balance between the ability of preserving the original Web and the ability of reducing the effect of manipulation. It is found empirically that, when Y = 1, DiffusionRank has a Penicillin-like effect on the link manipulation. Moreover, DiffusionRank can be employed to find group-to-group relations on the Web, to divide the Web graph into several parts, and to find link communities. Experimental results show that the DiffusionRank algorithm achieves the above mentioned advantages as expected.

    Music retrieval

    Towards musical query-by-semantic-description using the CAL500 data set BIBAFull-Text 439-446
      Douglas Turnbull; Luke Barrington; David Torres; Gert Lanckriet
    Query-by-semantic-description (QBSD)is a natural paradigm for retrieving content from large databases of music. A major impediment to the development of good QBSD systems for music information retrieval has been the lack of a cleanly-labeled, publicly-available, heterogeneous data set of songs and associated annotations. We have collected the Computer Audition Lab 500-song (CAL500) data set by having humans listen to and annotate songs using a survey designed to capture 'semantic associations' between music and words. We adapt the supervised multi-class labeling (SML) model, which has shown good performance on the task of image retrieval, and use the CAL500 data to learn a model for music retrieval. The model parameters are estimated using the weighted mixture hierarchies expectation-maximization algorithm which has been specifically designed to handle real-valued semantic association between words and songs, rather than binary class labels. The output of the SML model, a vector of class-conditional probabilities, can be interpreted as a semantic multinomial distribution over a vocabulary. By also representing a semantic query as a query multinomial distribution, we can quickly rank order the songs in a database based on the Kullback-Leibler divergence between the query multinomial and each song's semantic multinomial. Qualitative and quantitative results demonstrate that our SML model can both annotate a novel song with meaningful words and retrieve relevant songs given a multi-word, text-based query.
    A music search engine built upon audio-based and web-based similarity measures BIBAFull-Text 447-454
      Peter Knees; Tim Pohle; Markus Schedl; Gerhard Widmer
    An approach is presented to automatically build a search engine for large-scale music collections that can be queried through natural language. While existing approaches depend on explicit manual annotations and meta-data assigned to the individual audio pieces, we automatically derive descriptions by making use of methods from Web Retrieval and Music Information Retrieval. Based on the ID3 tags of a collection of mp3 files, we retrieve relevant Web pages via Google queries and use the contents of these pages to characterize the music pieces and represent them by term vectors. By incorporating complementary information about acoustic similarity we are able to both reduce the dimensionality of the vector space and improve the performance of retrieval, i.e. the quality of the results. Furthermore, the usage of audio similarity allows us to also characterize audio pieces when there is no associated information found on the Web.

    Multi-lingual IR

    Building simulated queries for known-item topics: an analysis using six european languages BIBAFull-Text 455-462
      Leif Azzopardi; Maarten de Rijke; Krisztian Balog
    There has been increased interest in the use of simulated queries for evaluation and estimation purposes in Information Retrieval. However, there are still many unaddressed issues regarding their usage and impact on evaluation because their quality, in terms of retrieval performance, is unlike real queries. In this paper, we focus on methods for building simulated known-item topics and explore their quality against real known-item topics. Using existing generation models as our starting point, we explore factors which may influence the generation of the known-item topic. Informed by this detailed analysis (on six European languages) we propose a model with improved document and term selection properties, showing that simulated known-item topics can be generated that are comparable to real known-item topics. This is a significant step towards validating the potential usefulness of simulated queries: for evaluation purposes, and because building models of querying behavior provides a deeper insight into the querying process so that better retrieval mechanisms can be developed to support the user.
    Cross-lingual query suggestion using query logs of different languages BIBAFull-Text 463-470
      Wei Gao; Cheng Niu; Jian-Yun Nie; Ming Zhou; Jian Hu; Kam-Fai Wong; Hsiao-Wuen Hon
    Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-lingual query suggestion (CLQS): for a query in one language, we suggest similar or relevant queries in other languages. This is very important to scenarios of cross-language information retrieval (CLIR) and cross-lingual keyword bidding for search engine advertisement. Instead of relying on existing query translation technologies for CLQS, we present an effective means to map the input query of one language to queries of the other language in the query log. Important monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics, etc. are used to estimate the cross-lingual query similarity with a discriminative model. Benchmarks show that the resulting CLQS system significantly out performs a baseline system based on dictionary-based query translation. Besides, the resulting CLQS is tested with French to English CLIR tasks on TREC collections. The results demonstrate higher effectiveness than the traditional query translation methods.

    Link analysis

    Hits on the web: how does it compare? BIBAFull-Text 471-478
      Marc A. Najork; Hugo Zaragoza; Michael J. Taylor
    This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.
    Hits hits TREC: exploring IR evaluation results with network analysis BIBAFull-Text 479-486
      Stefano Mizzaro; Stephen Robertson
    We propose a novel method of analysing data gathered from TREC or similar information retrieval evaluation experiments. We define two normalized versions of average precision, that we use to construct a weighted bipartite graph of TREC systems and topics. We analyze the meaning of well known -- and somewhat generalized -- indicators from social network analysis on the Systems-Topics graph. We apply this method to an analysis of TREC 8 data; among the results, we find that authority measures systems performance, that hubness of topics reveals that some topics are better than others at distinguishing more or less effective systems, that with current measures a system that wants to be effective in TREC needs to be effective on easy topics, and that by using different effectiveness measures this is no longer the case.
    Combining content and link for classification using matrix factorization BIBAFull-Text 487-494
      Shenghuo Zhu; Kai Yu; Yun Chi; Yihong Gong
    The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditional mining or learning methods for solving web mining problems, e.g., web page classification, by exploiting both the content and the link structure. The research in this direction has recently received considerable attention but are still in an early stage. Though a few methods exploit both the link structure or the content information, some of them combine the only authority information with the content information, and the others first decompose the link structure into hub and authority features, then apply them as additional document features. Being practically attractive for its great simplicity, this paper aims to design an algorithm that exploits both the content and linkage information, by carrying out a joint factorization on both the linkage adjacency matrix and the document-term matrix, and derives a new representation for web pages in a low-dimensional factor space, without explicitly separating them as content, hub or authority factors. Further analysis can be performed based on the compact representation of web pages. In the experiments, the proposed method is compared with state-of-the-art methods and demonstrates an excellent accuracy in hypertext classification on the WebKB and Cora benchmarks.

    Collection representation in distributed IR

    Federated text retrieval from uncooperative overlapped collections BIBAFull-Text 495-502
      Milad Shokouhi; Justin Zobel
    In federated text retrieval systems, the query is sent to multiple collections at the same time. The results returned by collections are gathered and ranked by a central broker that presents them to the user. It is usually assumed that the collections have little overlap. However, in practice collections may share many common documents as either exact or near duplicates, potentially leading to high numbers of duplicates in the final results. Considering the natural band width restrictions and efficiency issues of federated search, sending queries to redundant collections leads to unnecessary costs. We propose a novel method for estimating the rate of over-lap among collections based on sampling. Then, using the estimated overlap statistics, we propose two collection selection methods that aim to maximize the number of unique relevant documents in the final results. We show experimentally that, although our estimates of overlap are not in exact, our suggested techniques can significantly improve the search effectiveness when collections overlap.
    Evaluating sampling methods for uncooperative collections BIBAFull-Text 503-510
      Paul Thomas; David Hawking
    Many server selection methods suitable for distributed information retrieval applications rely, in the absence of cooperation, on the availability of unbiased samples of documents from the constituent collections. We describe a number of sampling methods which depend only on the normal query-response mechanism of the applicable search facilities. We evaluate these methods on a number of collections typical of a personal metasearch application. Results demonstrate that biases exist for all methods, particularly toward longer documents, and that in some cases these biases can be reduced but not eliminated by choice of parameters.
       We also introduce a new sampling technique, "multiple queries", which produces samples of similar quality to the best current techniques but with significantly reduced cost.
    Updating collection representations for federated search BIBAFull-Text 511-518
      Milad Shokouhi; Mark Baillie; Leif Azzopardi
    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current solution has not yet been proposed. In this study we examine both the implications of out-of-date representation sets on retrieval accuracy, as well as proposing three different policies for managing necessary updates. Each policy is evaluated on a testbed of forty-four dynamic collections over an eight-week period. Our findings show that out-of-date representations significantly degrade performance overtime, however, adopting a suitable update policy can minimise this problem.

    Index structures

    A time machine for text search BIBAFull-Text 519-526
      Klaus Berberich; Srikanta Bedathur; Thomas Neumann; Gerhard Weikum
    Text search over temporally versioned document collections such as web archives has received little attention as a research problem. As a consequence, there is no scalable and principled solution to search such a collection as of a specified time. In this work, we address this shortcoming and propose an efficient solution for time-travel text search by extending the inverted file index to make it ready for temporal search. We introduce approximate temporal coalescing as a tunable method to reduce the index size without significantly affecting the quality of results. In order to further improve the performance of time-travel queries, we introduce two principled techniques to trade off index size for its performance. These techniques can be formulated as optimization problems that can be solved to near-optimality. Finally, our approach is evaluated in a comprehensive series of experiments on two large-scale real-world datasets. Results unequivocally show that our methods make it possible to build an efficient "time machine" scalable to large versioned text collections.
    Principles of hash-based text retrieval BIBAFull-Text 527-534
      Benno Stein
    Hash-based similarity search reduces a continuous similarity relation to the binary concept "similar or not similar": two feature vectors are considered as similar if they are mapped on the same hash key. From its runtime performance this principle is unequaled -- while being unaffected by dimensionality concerns at the same time. Similarity hashing is applied with great success for near similarity search in large document collections, and it is considered as a key technology for near-duplicate detection and plagiarism analysis. This papers reveals the design principles behind hash-based search methods and presents them in a unified way. We introduce new stress statistics that are suited to analyze the performance of hash-based search methods, and we explain the rationale of their effectiveness. Based on these insights, we show how optimum hash functions for similarity search can be derived. We also present new results of a comparative study between different hash-based search methods.
    Compressed permuterm index BIBAFull-Text 535-542
      Paolo Ferragina; Rossano Venturini
    Recently [Manning et al., 2007] resorted the Permuterm index of Garfield (1976) as a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called, Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because its quadruples the dictionary size. In this paper we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in optimal query time, i.e. time proportional to the length of the searched pattern, and space close to the k-th order empirical entropy of the indexed dictionary. Our index can be used to solve also more sophisticated queries which involve several wild-card symbols, or require to prefix-match multiple fields in a database of records.
       The result is based on an elegant variant of the Burrows-Wheeler Transform defined on a dictionary of strings of variable length, which allows to easily adapt known compressed indexes [Makinen-Navarro, 2007] to solve the Tolerant Retrieval problem. Experiments show that our index supports fast queries within a space occupancy that is close to the one achievable by compressing the string dictionary via gzip, bzip or ppmdi. This improves known approaches based on front-coding by more than 50% in absolute space occupancy, still guaranteeing comparable query time.

    Web IR II

    Query performance prediction in web search environments BIBAFull-Text 543-550
      Yun Zhou; W. Bruce Croft
    Current prediction techniques, which are generally designed for content-based queries and are typically evaluated on relatively homogenous test collections of small sizes, face serious challenges in web search environments where collections are significantly more heterogeneous and different types of retrieval tasks exist. In this paper, we present three techniques to address these challenges. We focus on performance prediction for two types of queries in web search environments: content-based and Named-Page finding. Our evaluation is mainly performed on the GOV2 collection. In addition to evaluating our models for the two types of queries separately, we consider a more challenging and realistic situation that the two types of queries are mixed together without prior information on query types. To assist prediction under the mixed-query situation, a novel query classifier is adopted. Results show that our prediction of web query performance is substantially more accurate than the current state-of-the-art prediction techniques. Consequently, our paper provides a practical approach to performance prediction in real-world web settings.
    Broad expertise retrieval in sparse data environments BIBAFull-Text 551-558
      Krisztian Balog; Toine Bogers; Leif Azzopardi; Maarten de Rijke; Antal van den Bosch
    Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.
    A semantic approach to contextual advertising BIBAFull-Text 559-566
      Andrei Broder; Marcus Fontoura; Vanja Josifovski; Lance Riedel
    Contextual advertising or Context Match (CM) refers to the placement of commercial textual advertisements within the content of a generic web page, while Sponsored Search (SS) advertising consists in placing ads on result pages from a web search engine, with ads driven by the originating query. In CM there is usually an intermediary commercial ad-network entity in charge of optimizing the ad selection with the twin goal of increasing revenue (shared between the publisher and the ad-network) and improving the user experience. With these goals in mind it is preferable to have ads relevant to the page content, rather than generic ads. The SS market developed quicker than the CM market, and most textual ads are still characterized by "bid phrases" representing those queries where the advertisers would like to have their ad displayed. Hence, the first technologies for CM have relied on previous solutions for SS, by simply extracting one or more phrases from the given page content, and displaying ads corresponding to searches on these phrases, in a purely syntactic approach. However, due to the vagaries of phrase extraction, and the lack of context, this approach leads to many irrelevant ads. To overcome this problem, we propose a system for contextual ad matching based on a combination of semantic and syntactic features.

    Evaluation III

    How well does result relevance predict session satisfaction? BIBAFull-Text 567-574
      Scott B. Huffman; Michael Hochster
    Per-query relevance measures provide standardized, repeatable measurements of search result quality, but they ignore much of what users actually experience in a full search session. This paper examines how well we can approximate a user's ultimate session-level satisfaction using a simple relevance metric. We find that this relationship is surprisingly strong. By incorporating additional properties of the query itself, we construct a model which predicts user satisfaction even more accurately than relevance alone.
    A new approach for evaluating query expansion: query-document term mismatch BIBAFull-Text 575-582
      Tonya Custis; Khalid Al-Kofahi
    The effectiveness of information retrieval (IR) systems is influenced by the degree of term overlap between user queries and relevant documents. Query-document term mismatch, whether partial or total, is a fact that must be dealt with by IR systems. Query Expansion (QE) is one method for dealing with term mismatch. IR systems implementing query expansion are typically evaluated by executing each query twice, with and without query expansion, and then comparing the two result sets. While this measures an overall change in performance, it does not directly measure the effectiveness of IR systems in overcoming the inherent issue of term mismatch between the query and relevant documents, nor does it provide any insight into how such systems would behave in the presence of query-document term mismatch. In this paper, we propose a new approach for evaluating query expansion techniques. The proposed approach is attractive because it provides an estimate of system performance under varying degrees of query-document term mismatch, it makes use of readily available test collections, and it does not require any additional relevance judgments or any form of manual processing.
    Performance prediction using spatial autocorrelation BIBAFull-Text 583-590
      Fernando Diaz
    Evaluation of information retrieval systems is one of the core tasks in information retrieval. Problems include the inability to exhaustively label all documents for a topic, generalizability from a small number of topics, and incorporating the variability of retrieval systems. Previous work addresses the evaluation of systems, the ranking of queries by difficulty, and the ranking of individual retrievals by performance. Approaches exist for the case of few and even no relevance judgments. Our focus is on zero-judgment performance prediction of individual retrievals. One common shortcoming of previous techniques is the assumption of uncorrelated document scores and judgments. If documents are embedded in a high-dimensional space (as they often are), we can apply techniques from spatial data analysis to detect correlations between document scores. We find that the low correlation between scores of topically close documents often implies a poor retrieval performance. When compared to a state of the art baseline, we demonstrate that the spatial analysis of retrieval scores provides significantly better prediction performance. These new predictors can also be incorporated with classic predictors to improve performance further. We also describe the first large-scale experiment to evaluate zero-judgment performance prediction for a massive number of retrieval systems over a variety collections in several languages.

    Combination and fusion

    An outranking approach for rank aggregation in information retrieval BIBAFull-Text 591-598
      Mohamed Farah; Daniel Vanderpooten
    Research in Information Retrieval usually shows performance improvement when many sources of evidence are combined to produce a ranking of documents (e.g., texts, pictures, sounds, etc.). In this paper, we focus on the rank aggregation problem, also called data fusion problem, where rankings of documents, searched into the same collection and provided by multiple methods, are combined in order to produce a new ranking. In this context, we propose a rank aggregation method within a multiple criteria framework using aggregation mechanisms based on decision rules identifying positive and negative reasons for judging whether a document should get a better rank than another. We show that the proposed method deals well with the Information Retrieval distinctive features. Experimental results are reported showing that the suggested method performs better than the well-known CombSUM and CombMNZ operators.
    Enhancing relevance scoring with chronological term rank BIBAFull-Text 599-606
      Adam D. Troy; Guo-Qiang Zhang
    We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.
    ARSA: a sentiment-aware model for predicting sales performance using blogs BIBAFull-Text 607-614
      Yang Liu; Xiangji Huang; Aijun An; Xiaohui Yu
    Due to its high popularity, Weblogs (or blogs in short) present a wealth of information that can be very helpful in assessing the general public's sentiments and opinions. In this paper, we study the problem of mining sentiment information from blogs and investigate ways to use such information for predicting product sales performance. Based on an analysis of the complex nature of sentiments, we propose Sentiment PLSA (S-PLSA), in which a blog entry is viewed as a document generated by a number of hidden sentiment factors. Training an S-PLSA model on the blog data enables us to obtain a succinct summary of the sentiment information embedded in the blogs. We then present ARSA, an autoregressive sentiment-aware model, to utilize the sentiment information captured by S-PLSA for predicting product sales performance. Extensive experiments were conducted on a movie data set. We compare ARSA with alternative models that do not take into account the sentiment information, as well as a model with a different feature selection method. Experiments confirm the effectiveness and superiority of the proposed approach.

    Spoken document retrieval

    Vocabulary independent spoken term detection BIBAFull-Text 615-622
      Jonathan Mamou; Bhuvana Ramabhadran; Olivier Siohan
    We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts are indexed and query terms are retrieved from the index. However, query terms that are not part of the recognizer's vocabulary cannot be retrieved, and the recall of the search is affected. In addition to the output word transcript, advanced systems provide also phonetic transcripts, against which query terms can be matched phonetically. Such phonetic transcripts suffer from lower accuracy and cannot be an alternative to word transcripts.
       We present a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts. A speech recognizer generates word confusion networks and phonetic lattices. The transcripts are indexed for query processing and ranking purpose.
       The value of the proposed method is demonstrated by the relative high performance of our system, which received the highest overall ranking for US English speech data in the recent NIST Spoken Term Detection evaluation.
    Improving text classification for oral history archives with temporal domain knowledge BIBAFull-Text 623-630
      J. Scott Olsson; Douglas W. Oard
    This paper describes two new techniques for increasing the accuracy of topic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The second, temporal label weighting, takes the complementary perspective by using the position within an interview to bias label assignment probabilities. These methods, when used in combination, yield between 6% and 15% relative improvements in classification accuracy using a clipped R-precision measure that models the utility of label sets as segment summaries in interactive speech retrieval applications.
    Indexing confusion networks for morph-based spoken document retrieval BIBAFull-Text 631-638
      Ville T. Turunen; Mikko Kurimo
    In this paper, we investigate methods for improving the performance of morph-based spoken document retrieval in Finnish by extracting relevant index terms from confusion networks. Our approach uses morpheme-like subword units ("morphs") for recognition and indexing. This alleviates the problem of out-of-vocabulary words, especially with inflectional languages like Finnish. Confusion networks offer a convenient representation of alternative recognition candidates by aligning mutually exclusive terms and by giving the posterior probability of each term. The rank of the competing terms and their posterior probability is used to estimate term frequency for indexing. Comparing against 1-best recognizer transcripts, we show that retrieval effectiveness is significantly improved. Finally, the effect of pruning in recognition is analyzed, showing that when recognition speed is increased, the reduction in retrieval performance due to the increase in the 1-best error rate can be compensated by using confusion networks.

    Domain specific NLP

    Context sensitive stemming for web search BIBAFull-Text 639-646
      Fuchun Peng; Nawaaz Ahmed; Xin Li; Yumao Lu
    Traditionally, stemming has been applied to Information Retrieval tasks by transforming words in documents to the their root form before indexing, and applying a similar transformation to query terms. Although it increases recall, this naive strategy does not work well for Web Search since it lowers precision and requires a significant amount of additional computation.
       In this paper, we propose a context sensitive stemming method that addresses these two issues. Two unique properties make our approach feasible for Web Search. First, based on statistical language modeling, we perform context sensitive analysis on the query side. We accurately predict which of its morphological variants is useful to expand a query term with before submitting the query to the search engine. This dramatically reduces the number of bad expansions, which in turn reduces the cost of additional computation and improves the precision at the same time. Second, our approach performs a context sensitive document matching for those expanded variants. This conservative strategy serves as a safeguard against spurious stemming, and it turns out to be very important for improving precision. Using word pluralization handling as an example of our stemming approach, our experiments on a major Web search engine show that stemming only 29% of the query traffic, we can improve relevance as measured by average Discounted Cumulative Gain (DCG5) by 6.1% on these queries and 1.8% over all query traffic.
    Detecting, categorizing and clustering entity mentions in Chinese text BIBAFull-Text 647-654
      Wenjie Li; Donglei Qian; Qin Lu; Chunfa Yuan
    The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.
    Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature BIBAFull-Text 655-662
      Wei Zhou; Clement Yu; Neil Smalheiser; Vetle Torvik; Jie Hong
    This paper presents a study of incorporating domain-specific knowledge (i.e., information about concepts and relationships between concepts in a certain domain) in an information retrieval (IR) system to improve its effectiveness in retrieving biomedical literature. The effects of different types of domain-specific knowledge in performance contribution are examined. Based on the TREC platform, we show that appropriate use of domain-specific knowledge in a proposed conceptual retrieval model yields about 23% improvement over the best reported result in passage retrieval in the Genomics Track of TREC 2006.

    Query processing strategies

    Heavy-tailed distributions and multi-keyword queries BIBAFull-Text 663-670
      Surajit Chaudhuri; Kenneth Church; Arnd Christian König; Liying Sui
    Intersecting inverted indexes is a fundamental operation for many applications in information retrieval and databases. Efficient indexing for this operation is known to be a hard problem for arbitrary data distributions. However, text corpora used in Information Retrieval applications often have convenient power-law constraints (also known as Zipf's Law and long tails) that allow us to materialize carefully chosen combinations of multi-keyword indexes, which significantly improve worst-case performance without requiring excessive storage. These multi-keyword indexes limit the number of postings accessed when computing arbitrary index intersections. Our evaluation on an e-commerce collection of 20 million products shows that the indexes of up to four arbitrary keywords can be intersected while accessing less than 20% of the postings in the largest single-keyword index.
    ESTER: efficient search on text, entities, and relations BIBAFull-Text 671-678
      Holger Bast; Alexandru Chitea; Fabian Suchanek; Ingmar Weber
    We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB.
    Web text retrieval with a P2P query-driven index BIBAFull-Text 679-686
      Gleb Skobeltsyn; Toan Luu; Ivana Podnar Zarko; Martin Rajman; Karl Aberer
    In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.

    Posters

    Using gradient descent to optimize language modeling smoothing parameters BIBKFull-Text 687-688
      Donald Metzler
    Keywords: gradient descent, language modeling, parameter estimation
    Locality discriminating indexing for document classification BIBAFull-Text 689-690
      Jiani Hu; Weihong Deng; Jun Guo; Weiran Xu
    This paper introduces a locality discriminating indexing (LDI) algorithm for document classification. Based on the hypothesis that samples from different classes reside in class-specific manifold structures, LDI seeks for a projection which best preserves the within-class local structures while suppresses the between-class overlap. Comparative experiments show that the proposed method is able to derives compact discriminating document representations for classification.
    Management of keyword variation with frequency based generation of word forms in IR BIBAFull-Text 691-692
      Kimmo Kettunen
    This paper presents a new management method for morphological variation of keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. The proposed method has been evaluated so far with four languages, Finnish, Swedish, German and Russian, which show varying degrees of morphological complexity.
    OMES: a new evaluation strategy using optimal matching for document clustering BIBAFull-Text 693-694
      Xiaojun Wan
    Existing measures for evaluating clustering results (e.g. F-measure) have the limitation of overestimating cluster quality because they usually adopt the greedy matching between classes (reference clusters) and clusters (system clusters) to allow multiple classes to correspond to one same cluster, which is in fact a locally optimal solution. This paper proposes a new evaluation strategy to overcome the limitation of existing evaluation measures by using optimal matching in graph theory. A weighted bipartite graph is built with classes and clusters as two disjoint sets of vertices and the edge weight between any class and any cluster is computed using a basic metric. Then the total weight of the optimal matching in the graph is acquired and we use it to evaluate the quality of the clusters. The optimal matching allows only one-to-one matching between classes and clusters and a globally optimal solution can be achieved. A preliminary study is performed to demonstrate the effectiveness of the proposed evaluation strategy.
    Revisiting the dependence language model for information retrieval BIBAFull-Text 695-696
      Loïc Maisonnasse; Eric Gaussier; Jean-Pierre Chevallet
    In this paper, we revisit the dependence language model for information retrieval proposed in [1], and show that this model is deficient from a theoretical point of view. We then propose a new model, well founded theoretically, for integrating dependencies between terms in the language model.
       This new model is simpler, yet more general, than the one proposed in [1], and yields similar results in our experiments, on both syntactic and semantic dependencies.
    Quantify query ambiguity using ODP metadata BIBAFull-Text 697-698
      Guang Qiu; Kangmiao Liu; Jiajun Bu; Chun Chen; Zhiming Kang
    Query ambiguity prevents existing retrieval systems from returning reasonable results for every query. As there is already lots of work done on resolving ambiguity, vague queries could be handled using corresponding approaches separately if they can be identified in advance. Quantification of the degree of (lack of) ambiguity lays the groundwork for the identification. In this poster, we propose such a measure using query topics based on the topic structure selected from the Open Directory Project (ODP) taxonomy. We introduce clarity score to quantify the lack of ambiguity with respect to data sets constructed from the TREC collections and the rank correlation test results demonstrate a strong positive association between the clarity scores and retrieval precisions for queries.
    Combining error-correcting output codes and model-refinement for text categorization BIBAFull-Text 699-700
      Songbo Tan; Yuefen Wang
    In this work, we explore the use of error-correcting output codes (ECOC) to enhance the performance of centroid text classifier. The framework is to decompose one multi-class problem into multiple binary problems and then learn the individual binary classification problems by centroid classifier. However, this kind of decomposition incurs considerable bias for centroid classifier, which results in noticeable degradation of performance. To address this issue, we use Model-Refinement to adjust this so-called bias.
    User-oriented text segmentation evaluation measure BIBAFull-Text 701-702
      Martin Franz; J. Scott McCarley; Jian-Ming Xu
    The paper describes a user oriented performance evaluation measure for text segmentation. Experiments show that the proposed measure differentiates well between error distributions with varying user impact.
    Story segmentation of broadcast news in Arabic, Chinese and English using multi-window features BIBAFull-Text 703-704
      Martin Franz; Jian-Ming Xu
    The paper describes a maximum entropy based story segmentation system for Arabic, Chinese and English. In experiments with broadcast news data from TDT-3, TDT-4, and corpora collected in the DARPA GALE project we obtain a substantial performance gain using multiple overlapping windows for text-based features.
    Recommending citations for academic papers BIBAFull-Text 705-706
      Trevor Strohman; W. Bruce Croft; David Jensen
    We approach the problem of academic literature search by considering an unpublished manuscript as a query to a search system. We use the text of previous literature as well as the citation graph that connects it to find relevant related material. We evaluate our technique with manual and automatic evaluation methods, and find an order of magnitude improvement in mean average precision as compared to a text similarity baseline.
    Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search BIBAFull-Text 707-708
      Suleyman Cetintas; Luo Si
    Federated search is the task of retrieving relevant documents from different information resources. One of the main research problems in federated search is to combine the results from different sources into a single ranked list. Recent work proposed a regression based method to download some documents from each ranked list of the different sources, calculated comparable scores for the documents and estimated mapping functions that transform source-specific scores into comparable scores. Experiments have shown that downloading more documents improves the accuracy of results merging. However downloading more documents increases the computation and communication costs.
       This paper proposes a utility based optimization method that enables the system to automatically decide on the desired number of training documents to download according to the user's need for effectiveness and efficiency.
    Understanding the relationship of information need specificity to search query length BIBAFull-Text 709-710
      Nina Phan; Peter Bailey; Ross Wilkinson
    When searching, people's information needs flow through to expressing an information retrieval request posed to a search engine. We hypothesise that the degree of specificity of an IR request might correspond to the length of a search query. Our results show a strong correlation between decreasing query length and increasing broadness or generality of the IR request. We found an average cross-over point of specificity from broad to narrow of 3 words in the query. These results have implications for search engines in responding to queries of differing lengths.
    An effective snippet generation method using the pseudo relevance feedback technique BIBAFull-Text 711-712
      Youngjoong Ko; Hongkuk An; Jungyun Seo
    A (page or web) snippet is document excerpts allowing a user to understand if a document is indeed relevant without accessing it. This paper proposes an effective snippet generation method. The pseudo relevance feedback technique and text summarization techniques are applied to salient sentences extraction for generating good quality snippets. In the experimental results, the proposed method showed much better performance than other methods including Google and Naver.
    Probability ranking principle via optimal expected rank BIBAFull-Text 713-714
      H. C. Wu; Robert W. P. Luk; K. F. Wong
    This paper presents a new perspective of the probability ranking principle (PRP) by defining retrieval effectiveness in terms of our novel expected rank measure of a set of documents for a particular query. This perspective is based on preserving decision preferences, and it imposes weaker conditions on PRP than the utility-theoretic perspective of PRP.
    Combining term-based and event-based matching for question answering BIBAFull-Text 715-716
      Michael Wiegand; Jochen L. Leidner; Dietrich Klakow
    In question answering, two main kinds of matching methods for finding answer sentences for a question are term-based approaches -- which are simple, efficient, effective, and yield high recall -- and event-based approaches that take syntactic and semantic information into account. The latter often sacrifice recall for increased precision, but actually capture the meaning of the events denoted by the textual units of a passage or sentence. We propose a robust, data-driven method that learns the mapping between questions and answers using logistic regression and show that combining term-based and event-based approaches significantly outperforms the individual methods.
    Confluence: enhancing contextual desktop search BIBAFull-Text 717-718
      Karl Anders Gyllstrom; Craig Soules; Alistair Veitch
    We present Confluence, an enhancement to a desktop file search tool called Confluence which extracts conceptual relationships between files by their temporal access patterns in the file system. A limitation of a purely file-based approach is that as file operations are increasingly abstracted by applications, their correlation to a user's activity weakens and thereby reduces the applicability of their temporal patterns. To deal with this problem, we augment the file event stream with a stream of window focus events from the UI layer. We present 3 algorithms that analyze this new stream, extracting the user's task information which informs the existing Confluence algorithms. We present results and conclusions from a preliminary user study on Confluence.
    Estimating the value of automatic disambiguation BIBAFull-Text 719-720
      Paul Thomas; Tom Rowlands
    A common motivation for personalised search systems is the ability to disambiguate queries based on some knowledge of a user's interests. An analysis of log files from three search providers, covering a range of scenarios, suggests that this sort of disambiguation would be of marginal use for more specialised providers but may be of use for whole-of-Web search.
    A generic framework for machine transliteration BIBKFull-Text 721-722
      A. Kumaran; Tobias Kellner
    Keywords: crosslingual information retrieval, machine transliteration
    Where to start reading a textual XML document? BIBAFull-Text 723-724
      Jaap Kamps; Marijn Koolen; Mounia Lalmas
    In structured information retrieval, the aim is to exploit document structure to retrieve relevant components, allowing the user to go straight to the relevant material. This paper looks at the so-called best entry points (BEPs), which are intended to give the user the best starting point to access the relevant information in the document. We examine the relationship between BEPs and relevant components in the INEX 2006 ad hoc assessments. Our main findings are the following: First, although documents are short, assessors often choose the best entry point some distance from the start of the document. Second, many of the best entry points coincide with the first relevant character in relevant documents, showing a strong relation between the BEP and relevant text. Third, we find browsing BEPs in articles with a single relevant passages, and container BEPs or context BEPs in articles with more relevant passages.
    Novelty detection using local context analysis BIBKFull-Text 725-726
      Ronald T. Fernández; David E. Losada
    Keywords: local context analysis, novelty detection
    Intra-assessor consistency in question answering BIBAFull-Text 727-728
      Ian Ruthven; Leif Azzopardi Glasgow; Mark Baillie; Ralf Bierig; Emma Nicol; Simon Sweeney; Murat Yakici
    In this paper we investigate the consistency of answer assessment in a complex question answering task examining features of assessor consistency, types of answers and question type.
    Towards robust query expansion: model selection in the language modeling framework BIBAFull-Text 729-730
      Mattan Winaver; Oren Kurland; Carmel Domshlak
    We propose a language-model-based approach for addressing the performance robustness problem -- with respect to free-parameters' values -- of pseudo-feedback-based query-expansion methods. Given a query, we create a set of language models representing different forms of its expansion by varying the parameters' values of some expansion method; then, we select a single model using criteria originally proposed for evaluating the performance of using the original query, or for deciding whether to employ expansion at all. Experimental results show that these criteria are highly effective in selecting relevance language models that are not only significantly more effective than poor performing ones, but that also yield performance that is almost indistinguishable from that of manually optimized relevance models.
    Automatic classification of web pages into bookmark categories BIBAFull-Text 731-732
      Chris Staff; Ian Bugeja
    We describe a technique to automatically classify a web page into an existing bookmark category to help a user to bookmark a page. HyperBK compares a bag-of-words representation of the page to descriptions of categories in the user's bookmark file. Unlike default web browser dialog boxes in which the user may be presented with the category into which he or she saved the last bookmarked file, HyperBK also offers the category most similar to the page being bookmarked. The user can also opt to create a new category; or save the page elsewhere. In an evaluation, the user's preferred category was offered on average 61% of the time.
    What emotions do news articles trigger in their readers? BIBAFull-Text 733-734
      Kevin Hsin-Yih Lin; Changhua Yang; Hsin-Hsi Chen
    We study the classification of news articles into emotions they invoke in their readers. Our work differs from previous studies, which focused on the classification of documents into their authors' emotions instead of the readers'. We use various combinations of feature sets to find the best combination for identifying the emotional influences of news articles on readers.
    Evaluating discourse-based answer extraction for why-question answering BIBKFull-Text 735-736
      Suzan Verberne; Lou Boves; Nelleke Oostdijk; Peter-Arno Coppen
    Keywords: RST, answer extraction, discourse annotation, why-questions
    Topic segmentation using weighted lexical links (WLL) BIBAFull-Text 737-738
      Laurianne Sitbon; Patrice Bellot
    This paper presents two new approaches of lexical chains for topic segmentation using weighted lexical chains (WLC) or weighted lexical links (WLL) between repeated occurrences of lemmas along the text. The main advantage of using these new approaches is the suppression of the empirical parameter called hiatus in lexical chain processing. An evaluation according to the WindowDiff measure on a large automatically built corpus shows slight improvements in WLL compared to state-of-the-art methods based on lexical chains.
    Lexical analysis for modeling web query reformulation BIBAFull-Text 739-740
      Alessandro Bozzon; Paul-Alexandru Chirita; Claudiu S. Firan; Wolfgang Nejdl
    Modeling Web query reformulation processes is still an unsolved problem. In this paper we argue that lexical analysis is highly beneficial for this purpose. We propose to use the variation in Query Clarity, as well as the Part-Of-Speech pattern transitions as indicators of user's search actions. Experiments with a log of 2.4 million queries showed our techniques to be more flexible than the current approaches, while also providing us with interesting insights into user's Web behavioral patterns.
    Bridging the digital divide: understanding information access practices in an indian village community BIBAFull-Text 741-742
      Mounia Lalmas; Ramnath Bhat; Maxine Frank; David Frohlich; Matt Jones
    For digital library and information retrieval technologies to provide solutions for bridging the digital divide in developing countries, we need to understand the information access practices of remote and often poor communities in these countries. We must understand the information needs of these communities, and the best means to provide them access to relevant information. To this end, we investigated the current information access practices in an Indian village.
    BordaConsensus: a new consensus function for soft cluster ensembles BIBAFull-Text 743-744
      Xavier Sevillano; Francesc Alías; Joan Claudi Socoró
    Consensus clustering is the task of deriving a single labeling by applying a consensus function on a cluster ensemble. This work introduces BordaConsensus, a new consensus function for soft cluster ensembles based on the Borda voting scheme. In contrast to classic, hard consensus functions that operate on labelings, our proposal considers cluster membership information, thus being able to tackle multiclass clustering problems. Initial small scale experiments reveal that, compared to state-of-the-art consensus functions, BordaConsensus constitutes a good performance vs. complexity trade-off.
    A flexible retrieval system of shapes in binary images BIBAFull-Text 745-746
      Gloria Bordogna; Luca Ghilardi; Simone Milesi; Marco Pagani
    This poster overviews the main characteristics of a flexible retrieval systems of shapes present in binary images and discusses some evaluation results. The system applies multiple indexing criteria of the shapes synthesizing distinct characteristics such as global features of the objects contour (Fourier Coefficients), boundary irregularities (Multifractal Spectrum), presence of concavities and convexities on the boundary (Contour Scale Space distribution). The system is flexible since it allows customizing the retrieval function to fit an application need. The query is a binary image containing the desired shape and a set of parameters specifying the distinct importance of the shape characteristics that must be taken into account to evaluate the relevance of the retrieved shapes. The retrieval function is then defined as a Flexible Multicriteria fusion Function producing ranked results. The evaluation experiments showed that this system can be suited to different retrieval purposes, and that generally the combination of the distinct shape indexing criteria increases both Recall and Precision with respect to the application of any single indexing criterion alone.
    Semantic text classification of disease reporting BIBAFull-Text 747-748
      Yi Zhang; Bing Liu
    Traditional text classification studied in the IR literature is mainly based on topics. That is, each class or category represents a particular topic, e.g., sports, politics or sciences. However, many real-world text classification problems require more refined classification based on some semantic aspects. For example, in a set of documents about a particular disease, some documents may report the outbreak of the disease, some may describe how to cure the disease, some may discuss how to prevent the disease, and yet some others may include all the above information. To classify text at this semantic level, the traditional "bag of words" model is no longer sufficient. In this paper, we report a text classification study at the semantic level and show that sentence semantic and structure features are very useful for such kind of classification. Our experimental results based on a disease outbreak dataset demonstrated the effectiveness of the proposed approach.
    Evaluating relevant in context: document retrieval with a twist BIBAFull-Text 749-750
      Jaap Kamps; Mounia Lalmas; Jovan Pehcevski
    The Relevant in Context retrieval task is document or article retrieval with a twist, where not only the relevant articles should be retrieved but also the relevant information within each article (captured by a set of XML elements) should be correctly identified. Our main research question is: how to evaluate the Relevant in Context task? We propose a generalized average precision measure that meets two main requirements: i) the score reflects the ranked list of articles inherent in the result list, and at the same time ii) the score also reflects how well the retrieved information per article (i.e., the set of elements) corresponds to the relevant information. The resulting measure was used at INEX 2006.
    IDF revisited: a simple new derivation within the Robertson-Spärck Jones probabilistic model BIBAFull-Text 751-752
      Lillian Lee
    There have been a number of prior attempts to theoretically justify the effectiveness of the inverse document frequency (IDF). Those that take as their starting point Robertson and Sparck Jones's probabilistic model are based on strong or complex assumptions. We show that a more intuitively plausible assumption suffices. Moreover, the new assumption, while conceptually very simple, provides a solution to an estimation problem that had been deemed intractable by Robertson and Walker (1997).
    Validity and power of t-test for comparing MAP and GMAP BIBAFull-Text 753-754
      Gordon V. Cormack; Thomas R. Lynam
    We examine the validity and power of the t-test, Wilcoxon test, and sign test in determining whether or not the difference in performance between two IR systems is significant. Empirical tests conducted on subsets of the TREC2004 Robust Retrieval collection indicate that the p-values computed by these tests for the difference in mean average precision (MAP) between two systems are very accurate fora wide range of sample sizes and significance estimates. Similarly, these tests have good power, with the t-test proving superior overall. The t-test is also valid for comparing geometric mean average precision (GMAP), exhibiting slightly superior accuracy and slightly inferior power than for MAP comparison.
    Model-averaged latent semantic indexing BIBAFull-Text 755-756
      Miles Efron
    This poster introduces a novel approach to information retrieval that uses statistical model averaging to improve latent semantic indexing (LSI). Instead of choosing a single dimensionality $k$ for LSI , we propose using several models of differing dimensionality to inform retrieval. To manage this ensemble we weight each model's contribution to an extent inversely proportional to its AIC (Akaike information criterion). Thus each model contributes proportionally to its expected Kullback-Leibler divergence from the distribution that generated the data. We present results on three standard IR test collections, demonstrating significant improvement over both the traditional vector space model and single-model LSI.
    Characterizing the value of personalizing search BIBAFull-Text 757-758
      Jaime Teevan; Susan T. Dumais; Eric Horvitz
    We investigate the diverse goals that people have when they issue the same query to a search engine, and the ability of current search engines to address such diversity. We quantify the potential value of personalizing search results based on this analysis. Great variance was found in the results that different individuals rated as relevant for the same query -- even when the same information goal was expressed. Our analysis suggests that while search engines do a good job of ranking results to maximize global happiness, they do not do a very good job for specific individuals.
    Improving retrieval accuracy by weighting document types with clickthrough data BIBAFull-Text 759-760
      Peter C. K. Yeung; Charles L. A. Clarke; Stefan Büttcher
    For enterprise search, there exists a relationship between work task and document type that can be used to refine search results. In this poster, we adapt the popular Okapi BM25 scoring function to weight term frequency based on the relevance of a document type to a work task. Also, we use click frequency for each task-type pair to estimate a realistic weight. Using the W3C collection from the TREC Enterprise track for evaluations, our approach leads to significant improvements on search precision.
    Protecting source privacy in federated search BIBAFull-Text 761-762
      Wei Jiang; Luo Si; Jing Li
    Many information sources contain information that can only be accessed through search-specific search engines. Federated search provides search solutions of this type of hidden information that cannot be searched by conventional search engines. In many scenarios of federated search, such as the search among health care providers or among intelligence agencies, an individual information source does not want to disclose the source of the search results to users or other sources. Therefore, this paper proposes a two-step federated search protocol that protects the privacy of information sources. As far as we know, this is the first attempt to address the research problem of protecting source privacy in federated text search.
    Applying ranking SVM in query relaxation BIBAFull-Text 763-764
      Ciya Liao; Thomas Chang
    We propose an approach QRRS (Query Relaxative Ranking SVM) that divides a ranking function into different relaxation steps, so that only cheap features are used in Ranking SVM of early steps for query efficiency. We show search quality in the approach is improved compared to conventional Ranking SVM.
    Learning to rank collections BIBAFull-Text 765-766
      Jingfang Xu; Xing Li
    Collection selection, ranking collections according to user query is crucial in distributed search. However, few features are used to rank collections in the current collection selection methods, while hundreds of features are exploited to rank web pages in web search. The lack of features affects the efficiency of collection selection in distributed search. In this paper, we exploit some new features and learn to rank collections with them through SVM and RankingSVM respectively. Experimental results show that our features are beneficial to collection selection, and the learned ranking functions outperform the classical CORI algorithm.
    VideoReach: an online video recommendation system BIBAFull-Text 767-768
      Tao Mei; Bo Yang; Xian-Sheng Hua; Linjun Yang; Shi-Qiang Yang; Shipeng Li
    This paper presents a novel online video recommendation system called VideoReach, which alleviates users' efforts on finding the most relevant videos according to current viewings without a sufficient collection of user profiles as required in traditional recommenders. In this system, video recommendation is formulated as finding a list of relevant videos in terms of multimodal relevance (i.e. textual, visual, and aural relevance) and user click-through. Since different videos have different intra-weights of relevance within an individual modality and inter-weights among different modalities, we adopt relevance feedback to automatically find optimal weights by user click-though, as well as an attention fusion function to fuse multimodal relevance. We use 20 clips as the representative test videos, which are searched by top 10 queries from more than 13k online videos, and report superior performance compared with an existing video site.
    Modelling epistemic uncertainty in ir evaluation BIBAFull-Text 769-770
      Murat Yakici; Mark Baillie; Ian Ruthven; Fabio Crestani
    Modern information retrieval (IR) test collections violate the completeness assumption of the Cranfield paradigm. In order to maximise the available resources, only a sample of documents (i.e. the pool) are judged for relevance by a human assessor(s). The subsequent evaluation protocol does not make any distinctions between assessed or unassessed documents, as documents that are not in the pool are assumed to be not relevant for the topic. This is beneficial from a practical point of view, as the relative performance can be compared with confidence if the experimental conditions are fair for all systems. However, given the incompleteness of relevance assessments, two forms of uncertainty emerge during evaluation. The first is Aleatory uncertainty, which refers to variation in system performance across the topic set, which is often addressed through the use of statistical significance tests. The second form of uncertainty is Epistemic, which refers to the amount of knowledge (or ignorance) we have about the estimate of a system's performance. Epistemic uncertainty is a consequence of incompleteness and is not addressed by the current evaluation protocol. In this study, we present a first attempt at modelling both aleatory and epistemic uncertainty associated with IR evaluation. We aim to account for both the variability associated with system performance and the amount of knowledge known about the performance estimate.
    On the importance of preserving the part-order in shape retrieval BIBAFull-Text 771-772
      Arne Schuldt; Björn Gottfried; Ole Osterhagen; Otthein Herzog
    This paper discusses the importance of part-order-preservation in shape matching. A part descriptor is introduced that supports both preserving and abandoning the order of parts. The evaluation shows that retrieval results are improved by almost 38% if the original ordering is preserved.
    The relationship between IR effectiveness measures and user satisfaction BIBAFull-Text 773-774
      Azzah Al-Maskari; Mark Sanderson; Paul Clough
    This paper presents an experimental study of users assessing the quality of Google web search results. In particular we look at how users' satisfaction correlates with the effectiveness of Google as quantified by IR measures such as precision and the suite of Cumulative Gain measures (CG, DCG, NDCG). Results indicate strong correlation between users' satisfaction, CG and precision, moderate correlation with DCG, with perhaps surprisingly negligible correlation with NDCG. The reasons for the low correlation with NDCG are examined.
    A multi-criteria content-based filtering system BIBAFull-Text 775-776
      Gabriella Pasi; Gloria Bordogna; Robert Villa
    In this paper we present a novel filtering system, based on a new model which reshapes the aims of content-based filtering. The filtering system has been developed within the EC project PENG, aimed at providing news professionals, such as journalists, with a system supporting both filtering and retrieval capabilities. In particular, we suggest that in tackling the problem of information overload, it is necessary for filtering systems to take into account multiple aspects of incoming documents in order to estimate their relevance to a user's profile, and in order to help users better understand documents, as distinct from solely attempting to either select relevant material from a stream, or block inappropriate material. Aiming to so this, a filtering model based on multiple criteria has been defined, based on the ideas gleamed in the project requirements stage. The filtering model is briefly described in this paper.
    Boosting static pruning of inverted files BIBAFull-Text 777-778
      Roi Blanco; Alvaro Barreiro
    This paper revisits the static term-based pruning technique presented in Carmel et al., SIGIR 2001 for ad-hoc retrieval, addressing different issues concerning its algorithmic design not yet taken into account. Although the original technique is able to retain precision when a considerable part of the inverted file is removed, we show that it is possible to improve precision in some scenarios if some key design features are properly selected.
    Resource monitoring in information extraction BIBAFull-Text 779-780
      Jochen L. Leidner
    It is often argued that in information extraction (IE), certain machine learning (ML) approaches save development time over others, or that certain ML methods (e.g. Active Learning) require less training data than others, thus saving development cost. However, such development cost claims are not normally backed up by controlled studies which show that such development cost savings actually occur. This situation in Language Engineering is contrasted with Software Engineering in general, where a lot of studies investigating system development cost have been carried out. We argue for the need of controlled studies that measure actual system development time in language engineering. To this end, we carry out an experiment in resource monitoring for an IE task: three named entity taggers for the same "surprise" domain are developed in parallel, using competing methods. Their human development time is accounted for using a logging facility.
       We report development cost results for parallel implementations of a named entity tagger and present a breakdown of the development time for the three alternative methods. We are not aware of detailed previous parallel studies that detail how system development time is spent when creating a named entity tagger.
    The DILIGENT framework for distributed information retrieval BIBKFull-Text 781-782
      Fabio Simeoni; Fabio Crestani; Ralf Bierig
    Keywords: GRID, digital libraries, distributed information retrieval, service oriented architecture
    Varying approaches to topical web query classification BIBAFull-Text 783-784
      Steven M. Beitzel; Eric C. Jensen; Abdur Chowdhury; Ophir Frieder
    Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query is used to retrieve documents. We examine two previously unaddressed issues in query classification: pre versus post-retrieval classification effectiveness and the effect of training explicitly from classified queries versus bridging a classifier trained using a document taxonomy. Bridging classifiers map the categories of a document taxonomy onto those of a query classification problem to provide sufficient training data. We find that training classifiers explicitly from manually classified queries outperforms the bridged classifier by 48% in F1 score. Also, a pre-retrieval classifier using only the query terms performs merely 11% worse than the bridged classifier which requires snippets from retrieved documents.
    A comparison of pooled and sampled relevance judgments BIBAFull-Text 785-786
      Ian Soboroff
    Test collections are most useful when they are reusable, that is, when they can be reliably used to rank systems that did not contribute to the pools. Pooled relevance judgments for very large collections may not be reusable for two reasons: they will be very sparse and not sufficiently complete, and they may be biased in the sense that they will unfairly rank some class of systems. The TREC 2006 terabyte track judged both a pool and a deep random sample in order to measure the effects of sparseness and bias.
    Clustering short texts using Wikipedia BIBAFull-Text 787-788
      Somnath Banerjee; Krishnan Ramanathan; Ajay Gupta
    Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.
    Estimating collection size with logistic regression BIBAFull-Text 789-790
      Jingfang Xu; Sheng Wu; Xing Li
    Collection size is an important feature to represent the content summaries of a collection, and plays a vital role in collection selection for distributed search. In uncooperative environments, collection size estimation algorithms are adopted to estimate the sizes of collections with their search interfaces. This paper proposes heterogeneous capture (HC) algorithm, in which the capture probabilities of documents are modeled with logistic regression. With heterogeneous capture probabilities, HC algorithm estimates collection size through conditional maximum likelihood. Experimental results on real web data show that our HC algorithm outperforms both multiple capture-recapture and capture history algorithms.
    Selection and ranking of text from highly imperfect transcripts for retrieval of video content BIBAFull-Text 791-792
      Alexander Haubold
    In the domain of video content retrieval, we present an approach for selecting words and phrases from highly imperfect automatically generated transcripts. Extracted terms are ranked according to their descriptiveness and presented to the user in a multimedia browser interface. We use sense querying from the WordNet lexical database for our method of text selection and ranking. Evaluation of 679 video summarization tasks from 442 users shows that the method of ranking and emphasizing terms according to descriptiveness results in higher accuracy responses in less time compared to the baseline of no ranking.
    Enhancing patent retrieval by citation analysis BIBAFull-Text 793-794
      Atsushi Fujii
    This paper proposes a method to combine text-based and citation-based retrieval methods in the invalidity patent search. Using the NTCIR-6 test collection including eight years of USPTO patents, we show the effectiveness of our method experimentally.
    MRF based approach for sentence retrieval BIBAFull-Text 795-796
      Keke Cai; Chun Chen; Kangmiao Liu; Jiajun Bu; Peng Huang
    This poster focuses on the study of term context dependence in the application of sentence retrieval. Based on Markov Random Field (MRF), three forms of dependence among query terms are considered. Under different assumptions of term dependence relationship, three feature functions are defined, with the purpose to utilize association features between query terms in sentence to evaluate the relevance of sentence. Experimental results have proven the efficiency of the proposed retrieval models in improving the performance of sentence retrieval.
    Improving weak ad-hoc queries using Wikipedia asexternal corpus BIBAFull-Text 797-798
      Yinghao Li; Wing Pong Robert Luk; Kei Shiu Edward Ho; Fu Lai Korris Chung
    In an ad-hoc retrieval task, the query is usually short and the user expects to find the relevant documents in the first several result pages. We explored the possibilities of using Wikipedia's articles as an external corpus to expand ad-hoc queries. Results show promising improvements over measures that emphasize on weak queries.
    Fine-grained named entity recognition and relation extraction for question answering BIBKFull-Text 799-800
      Changki Lee; Yi-Gyu Hwang; Myung-Gil Jang
    Keywords: fine-grained named entity recognition, question answering, relation extraction
    World knowledge in broad-coverage information filtering BIBKFull-Text 801-802
      Bennett A. Hagedorn; Massimiliano Ciaramita; Jordi Atserias
    Keywords: NLP, financial news filtering, machine learning, world knowledge
    The influence of basic tokenization on biomedical document retrieval BIBAFull-Text 803-804
      Dolf Trieschnigg; Wessel Kraaij; Franciska de Jong
    Tokenization is a fundamental preprocessing step in Information Retrieval systems in which text is turned into index terms. This paper quantifies and compares the influence of various simple tokenization techniques on document retrieval effectiveness in two domains: biomedicine and news. As expected, biomedical retrieval is more sensitive to small changes in the tokenization method. The tokenization strategy can make the difference between a mediocre and well performing IR system, especially in the biomedical domain.
    Using clustering to enhance text classification BIBAFull-Text 805-806
      Antonia Kyriakopoulou; Theodore Kalamboukis
    This paper addresses the problem of learning to classify texts by exploiting information derived from clustering both training and testing sets. The incorporation of knowledge resulting from clustering into the feature space representation of the texts is expected to boost the performance of a classifier. Experiments conducted on several widely used datasets demonstrate the effectiveness of the proposed algorithm especially for small training sets.
    A fact/opinion classifier for news articles BIBAFull-Text 807-808
      Adam Stepinski; Vibhu Mittal
    Many online news/blog aggregators like Google, Yahoo and MSN allow users to browse/search many hundreds of news sources. This results in dozens, often hundreds, of stories about the same event. While the news aggregators cluster these stories, allowing the user to efficiently scan the major news items at any given time, they do not currently allow alternative browsing mechanisms within the clusters. Furthermore, their intra-cluster ranking mechanisms are often based on a notion of authority/popularity of the source. In many cases, this leads to the classic power law phenomenon -- the popular stories/sources are the ones that are already popular/authoritative, thus reinforcing one dominant viewpoint. Ideally, these aggregators would exploit the availability of the tremendous number of sources to identify the various dominant threads or viewpoints about a story and highlight these threads for the users. This paper presents an initial limited approach to such an interface: it classifies articles into two categories: fact and opinion. We show that the combination of (i) a classifier trained on a small (140K) training set of editorials/reports and (ii) an interactive user interface that ameliorates classification errors by re-ordering the presentation can be effective in highlighting different underlying viewpoints in a story-cluster. We briefly discuss the classifier used here, the training set and the UI and report on some initial anecdotal user feedback and evaluation.
    Matching resumes and jobs based on relevance models BIBAFull-Text 809-810
      Xing Yi; James Allan; W. Bruce Croft
    We investigate the difficult problem of matching semi-structured resumes and jobs in a large scale real-world collection. We compare standard approaches to Structured Relevance Models (SRM), an extension of relevance-based language model for modeling and retrieving semi-structured documents. Preliminary experiments show that the SRM approach achieved promising performance and performed better than typical unstructured relevance models.
    The utility of linguistic rules in opinion mining BIBAFull-Text 811-812
      Xiaowen Ding; Bing Liu
    Online product reviews are one of the important opinion sources on the Web. This paper studies the problem of determining the semantic orientations (positive or negative) of opinions expressed on product features in reviews. Most existing approaches use a set of opinion words for the purpose. However, the semantic orientations of many words are context dependent. In this paper, we propose to use some linguistic rules to deal with the problem together with a new opinion aggregation function. Extensive experiments show that these rules and the function are highly effective. A system, called Opinion Observer, has also been built.
    A comparison of sentence retrieval techniques BIBAFull-Text 813-814
      Niranjan Balasubramanian; James Allan; W. Bruce Croft
    Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques depends on the similarity measure used to rank sentences. An effective retrieval model should be able to handle low word overlap between query and candidate sentences and go beyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outperformed TF-IDF and word overlap based methods for ranking sentences. In this paper, we compare the performance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information.
    High-dimensional visual vocabularies for image retrieval BIBAFull-Text 815-816
      Joao Magalhaes; Stefan Rueger
    In this paper we formulate image retrieval by text query as a vector space classification problem. This is achieved by creating a high-dimensional visual vocabulary that represents the image documents in great detail. We show how the representation of these image documents enables the application of well known text retrieval techniques such as Rocchio tf-idf and naive Bayes to the semantic image retrieval problem. We tested these methods on a Corel images subset and achieve state-of-the-art retrieval performance using the proposed methods.
    A web page topic segmentation algorithm based on visual criteria and content layout BIBAFull-Text 817-818
      Idir Chibane; Bich-Lien Doan
    This paper presents experiments using an algorithm of web page topic segmentation that show significant precision improvement in the retrieval of documents issued from the Web track corpus of TREC 2001. Instead of processing the whole document, a web page is segmented into different semantic blocks according to visual criteria (such as horizontal lines, colors) and structural tags (such as headings <H1>~<H6>, paragraph <P>). We conclude that combining visual and content layout criteria gives the best results for increasing the precision: the ranking of the page is calculated for relevant segments of pages resulting from the segmentation algorithm.
    Document clustering: an optimization problem BIBAFull-Text 819-820
      Ao Feng
    Clustering algorithms have been widely used in information retrieval applications. However, it is difficult to define an objective "best" result. This article analyzes some document clustering algorithms and illustrates that they are equivalent to the optimization problem of some global functions. Experiments show their good performance, but there are still counter-examples where they fail to return the optimal solution. We argue that Monte-Carlo algorithms in the global optimization framework have the potential to find better solutions than traditional clustering, and they are able to handle more complex structures.
    Finding similar experts BIBAFull-Text 821-822
      Krisztian Balog; Maarten de Rijke
    The task of finding people who are experts on a topic has recently received increased attention. We introduce a different expert finding task for which a small number of example experts is given (instead of a natural language query), and the system's task is to return similar experts. We define, compare, and evaluate a number of ways of representing experts, and investigate how the size of the initial example set affects performance. We show that more fine-grained representations of candidates result in higher performance, and larger sample sets as input lead to improved precision.
    Active learning for class imbalance problem BIBAFull-Text 823-824
      Seyda Ertekin; Jian Huang; C. Lee Giles
    The class imbalance problem has been known to hinder the learning performance of classification algorithms. Various real-world classification tasks such as text categorization suffer from this phenomenon. We demonstrate that active learning is capable of solving the problem.
    Strategies for retrieving plagiarized documents BIBAFull-Text 825-826
      Benno Stein; Sven Meyer zu Eissen; Martin Potthast
    For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using the entire Wikipedia corpus we compile n-gram indexes and compare them to a new kind of fingerprint index in a plagiarism analysis use case. Our index provides an analysis speed-up by factor 1.5 and is an order of magnitude smaller, while being equivalent in terms of precision and recall.
    Generative modeling of persons and documents for expert search BIBAFull-Text 827-828
      Pavel Serdyukov; Djoerd Hiemstra; Maarten Fokkinga; Peter M. G. Apers
    In this paper we address the task of automatically finding an expert within the organization, known as the expert search problem. We present the theoretically-based probabilistic algorithm which models retrieved documents as mixtures of expert candidate language models. Experiments show that our approach outperforms existing theoretically sound solutions.
    Random walk term weighting for information retrieval BIBAFull-Text 829-830
      Roi Blanco; Christina Lioma
    We present a way of estimating term weights for Information Retrieval (IR), using term co-occurrence as a measure of dependency between terms.
       We use the random walk graph-based ranking algorithm on a graph that encodes terms and co-occurrence dependencies in text, from which we derive term weights that represent a quantification of how a term contributes to its context. Evaluation on two TREC collections and 350 topics shows that the random walk-based term weights perform at least comparably to the traditional tf-idf term weighting, while they outperform it when the distance between co-occurring terms is between 6 and 30 terms.
    Comparing query logs and pseudo-relevance feedback for web-search query refinement BIBAFull-Text 831-832
      Ryen W. White; Charles L. A. Clarke; Silviu Cucerzan
    Query logs and pseudo-relevance feedback (PRF) offer ways in which terms to refine Web searchers' queries can be selected, offered to searchers, and used to improve search effectiveness. In this poster we present a study of these techniques that aims to characterize the degree of similarity between them across a set of test queries, and the same set broken out by query type. The results suggest that: (i) similarity increases with the amount of evidence provided to the PRF algorithm, (ii) similarity is higher when titles/snippets are used for PRF than full-text, and (iii) similarity is higher for navigational than informational queries. The findings have implications for the combined usage of query logs and PRF in generating query refinement alternatives.
    Automatic extension of non-english wordnets BIBKFull-Text 833-834
      Katja Hofmann; Erik Tjong Kim Sang
    Keywords: dependency patterns, hypernym extraction, lexical acquisition
    First experiments searching spontaneous Czech speech BIBKFull-Text 835-836
      Pavel Ircing; Douglas W. Oard; Jan Hoidekr
    Keywords: speech retrieval, spontaneous speech
    Power and bias of subset pooling strategies BIBAFull-Text 837-838
      Gordon V. Cormack; Thomas R. Lynam
    We define a method to estimate the random and systematic errors resulting from incomplete relevance assessments.
       Mean Average Precision (MAP) computed over a large number of topics with a shallow assessment pool substantially outperforms -- for the same adjudication effort MAP computed over fewer topics with deeper pools, and P@k computed with pools of the same depth. Move-to-front pooling, previously reported to yield substantially better rank correlation, yields similar power, and lower bias, compared to fixed-depth pooling.
    Problems with Kendall's tau BIBAFull-Text 839-840
      Mark Sanderson; Ian Soboroff
    This poster describes a potential problem with a relatively well used measure in Information Retrieval research: Kendall's Tau rank correlation coefficient. The coefficient is best known for its use in determining the similarity of test collections when ranking sets of retrieval runs. Threshold values for the coefficient have been defined and used in a number of published studies in information retrieval. However, this poster presents results showing that basing decisions on such thresholds is not as reliable as has been assumed.
    Opinion holder extraction from author and authority viewpoints BIBAFull-Text 841-842
      Yohei Seki
    Opinion holder extraction research is important for discriminating between opinions that are viewed from different perspectives. In this paper, we describe our experience of participation in the NTCIR-6 Opinion Analysis Pilot Task by focusing on opinion holder extraction results in Japanese and English. Our approach to opinion holder extraction was based on the discrimination between author and authority viewpoints in opinionated sentences, and the evaluation results were fair with respect to the Japanese documents.
    Incorporating term dependency in the DFR framework BIBAFull-Text 843-844
      Jie Peng; Craig Macdonald; Ben He; Vassilis Plachouras; Iadh Ounis
    Term dependency, or co-occurrence, has been studied in language modelling, for instance by Metzler & Croft who showed that retrieval performance could be significantly enhanced using term dependency information. In this work, we show how term dependency can be modelled within the Divergence From Randomness (DFR) framework. We evaluate our term dependency model on the two ad hoc retrieval tasks using the TREC .GOV2 Terabyte collection. Furthermore, we examine the effect of varying the term dependency window size on the retrieval performance of the proposed model. Our experiments show that term dependency can indeed be successfully incorporated within the DFR framework.
    Hits on question answer portals: exploration of link analysis for author ranking BIBAFull-Text 845-846
      Pawel Jurczyk; Eugene Agichtein
    Question-Answer portals such as Naver and Yahoo! Answers are growing in popularity. However, despite the increased popularity, the quality of answers is uneven, and while some users usually provide good answers, many others often provide bad answers. Hence, estimating the authority, or the expected quality of users, is a crucial task for this emerging domain, with potential applications to answer ranking and to incentive mechanism design. We adapt a powerful link analysis methodology from the web domain as a first step towards estimating authority in Question Answer portals. Our experimental results over more than 3 million answers from Yahoo! Answers are promising, and warrant further exploration along the lines outlined in this poster.
    Heads and tails: studies of web search with common and rare queries BIBAFull-Text 847-848
      Doug Downey; Susan Dumais; Eric Horvitz
    A large fraction of queries submitted to Web search engines occur very infrequently. We describe search log studies aimed at elucidating behaviors associated with rare and common queries. We present several analyses and discuss research directions.
    Dimensionality reduction for dimension-specific search BIBAFull-Text 849-850
      Zi Huang; Hengtao Shen; Xiaofang Zhou; Dawei Song; Stefan Rüger
    Dimensionality reduction plays an important role in efficient similarity search, which is often based on k-nearest neighbor (k-NN) queries over a high-dimensional feature space. In this paper, we introduce a novel type of k-NN query, namely conditional k-NN (ck-NN), which considers dimension-specific constraint in addition to the inter-point distances. However, existing dimensionality reduction methods are not applicable to this new type of queries. We propose a novel Mean-Std (standard deviation) guided Dimensionality Reduction (MSDR) to support a pruning based efficient ck-NN query processing strategy. Our preliminary experimental results on 3D protein structure data demonstrate that the MSDR method is promising.
    An effective method for finding best entry points in semi-structured documents BIBAFull-Text 851-852
      Eugen Popovici; Pierre-François Marteau; Gildas Ménier
    Focused structured document retrieval employs the concept of best entry point (BEP), which is intended to provide optimal starting-point from which users can browse to relevant document components [4]. In this paper we describe and evaluate a method for finding BEPs in XML documents. Experiments conducted within the framework of INEX 2006 evaluation campaign on the Wikipedia XML collection [2] shown the effectiveness of the proposed approach.
    Query rewriting using active learning for sponsored search BIBAFull-Text 853-854
      Wei Vivian Zhang; Xiaofei He; Benjamin Rey; Rosie Jones
    Sponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus increases the amount of web advertisements available. The match relevance is critical for clicks. In this study, we aim to improve query rewriting relevance. For this purpose, we use an active learning algorithm called Transductive Experimental Design to select the most informative samples to train the query rewriting relevance model. Experiments show that this approach significantly improves model accuracy and rewriting relevance.
    An analysis of peer-to-peer file-sharing system queries BIBAFull-Text 855-856
      Linh Thai Nguyen; Dongmei Jia; Wai Gen Yee; Ophir Frieder
    Many studies focus on the Web, but yet, few focus on peer-to-peer file-sharing system queries despite their massive scale in terms of Internet traffic. We analyzed several million queries collected on the Gnutella network and differentiated our findings from those of Web queries.
    Investigating the relevance of sponsored results for web ecommerce queries BIBAFull-Text 857-858
      Bernard J. Jansen
    Are sponsored links, the primary business model for Web search engines, providing Web consumers with relevant results? This research addresses this issue by investigating the relevance of sponsored and non-sponsored links for ecommerce queries from the major search engines. The results show that average relevance ratings for sponsored and non-sponsored links are virtually the same, although the relevance ratings for sponsored links are statistically higher. We used 108 ecommerce queries and 8,256 retrieved links for these queries from three major Web search engines, Google, MSN, and Yahoo!. We present the implications for Web search engines and sponsored search as a long-term business model as well as a mechanism for finding relevant information for searchers.
    Viewing online searching within a learning paradigm BIBAFull-Text 859-860
      Bernard J. Jansen; Brian Smith; Danielle L. Booth
    In this research, we investigate whether one can model online searching as a learning paradigm. We examined the searching characteristics of 41 participants engaged in 246 searching tasks. We classified the searching tasks according to Anderson and Krathwohl's Taxonomy, an updated version of Bloom's taxonomy. Anderson and Krathwohl is a six level categorization of cognitive learning. Research results show that Applying takes the most searching effort as measured by queries per session and specific topics searched per sessions. The categories of Remembering and Understanding, which are lower-order learning levels, exhibit searching characteristics similar to the higher order categories of Evaluating and Creating. It seems that searchers rely primarily on their internal knowledge and use searching primarily as fact checking and verification when engaged in Evaluating and Creating. Implications are that the commonly held notions of Web searchers having simple information goals may not be correct. We discuss the implications for Web searching, including designing interfaces to support exploration and alternate views.
    More efficient parallel computation of PageRank BIBKFull-Text 861-862
      John R. Wicks; Amy Greenwald
    Keywords: PageRank, power iteration, web graph
    Using similarity links as shortcuts to relevant web pages BIBAFull-Text 863-864
      Mark D. Smucker; James Allan
    Successful navigation from a relevant web page to other relevant pages depends on the page linking to other relevant pages. We measured the distance to travel from relevant page to relevant page and found a bimodal distribution of distances peaking at 4 and 15 hops. In an attempt to make it easier to navigate among relevant pages, we added content similarity links to pages. With these additional links, significantly more relevant documents were close to each other. A browser plug-in or other tool that provides links to pages similar to a given page should increase the ability of web users to find relevant pages via navigation.
    Fast exact maximum likelihood estimation for mixture of language models BIBAFull-Text 865-866
      Yi Zhang; Wei Xu
    A common language modeling approach assumes the data D is generated from a mixture of several language models. EM algorithm is usually used to find the maximum likelihood estimation of one unknown mixture component, given the mixture weights and the other language models. In this paper, we provide an efficient algorithm of O(k) complexity to find the exact solution, where k is the number of words occurred at least once in D. Another merit is that the probabilities of many words are exactly zeros, which means that the mixture language model also serves as a feature selection technique.
    TimedTextRank: adding the temporal dimension to multi-document summarization BIBAFull-Text 867-868
      Xiaojun Wan
    Graph-ranking based algorithms (e.g. TextRank) have been proposed for multi-document summarization in recent years. However, these algorithms miss an important dimension, the temporal dimension, for summarizing evolving topics. For an evolving topic, recent documents are usually more important than earlier documents because recent documents contain much more novel information than earlier documents and a novelty-oriented summary should be more appropriate to reflect the changing topic. We propose the TimedTextRank algorithm to make use of the temporal information of documents based on the graph-ranking based algorithm. A preliminary study is performed to demonstrate the effectiveness of the proposed TimedTextRank algorithm for dynamic multi-document summarization.
    Winnowing wheat from the chaff: propagating trust to sift spam from the web BIBAFull-Text 869-870
      Lan Nie; Baoning Wu; Brian D. Davison
    The Web today includes many pages intended to deceive search engines, and attain an unwarranted result ranking. Since the links among web pages are used to calculate authority, ranking systems would benefit from knowing which pages contain content to be trusted and which do not. We propose and compare various trust propagation methods to estimate the trustworthiness of each page. We find that a non-trust-preserving propagation method is able to achieve close to a fifty percent improvement over TrustRank in separating spam from non-spam pages.
    Feature engineering for mobile (SMS) spam filtering BIBAFull-Text 871-872
      Gordon V. Cormack; José María Gómez Hidalgo; Enrique Puertas Sánz
    Mobile spam in an increasing threat that may be addressed using filtering systems like those employed against email spam. We believe that email filtering techniques require some adaptation to reach good levels of performance on SMS spam, especially regarding message representation. In order to test this assumption, we have performed experiments on SMS filtering using top performing email spam filters on mobile spam messages using a suitable feature representation, with results supporting our hypothesis.
    Ranking by community relevance BIBAFull-Text 873-874
      Lan Nie; Brian D. Davison; Baoning Wu
    A web page may be relevant to multiple topics; even when nominally on a single topic, the page may attract attention (and thus links) from multiple communities. Instead of indiscriminately summing the authority provided by all pages, we decompose a web page into separate subnodes with respect to each community pointing to it. By considering the relevance of these communities, we are able to better model the query-specific reputation for each potential result. We apply a total of 125 queries to the TREC .GOV dataset to demonstrate how the use of community relevance can improve ranking performance.
    Query suggestion based on user landing pages BIBAFull-Text 875-876
      Silviu Cucerzan; Ryen W. White
    This poster investigates a novel query suggestion technique that selects query refinements through a combination of many users' post-query navigation patterns and the query logs of a large search engine. We compare this technique, which uses the queries that retrieve in the top-ranked search results places where searchers end up after post-query browsing (i.e., the landing pages), with an approach based on query refinements from user search sessions extracted from query logs. Our findings demonstrate the effectiveness of using landing pages for the direct generation of query suggestions, as well as the complementary nature of the suggestions it generates with regard to traditional query log based refinement methodologies.
    Making mind and machine meet: a study of combining cognitive and algorithmic relevance feedback BIBAFull-Text 877-878
      Chirag Shah; Diane Kelly; Xin Fu
    Using Saracevic's relevance types, we explore approaches to combining algorithm and cognitive relevance in a term relevance feedback scenario. Data collected from 21 users who provided relevance feedback about terms suggested by a system for 50 TREC HARD topics are used. The former type of feedback is considered as cognitive relevance and the latter type is considered as algorithm relevance. We construct retrieval runs using these two types of relevance feedback and experiment with ways of combining them with simple Boolean operators. Results show minimal differences in performance with respect to the different techniques.
    Using collaborative queries to improve retrieval for difficult topics BIBAFull-Text 879-880
      Xin Fu; Diane Kelly; Chirag Shah
    We describe a preliminary analysis of queries created by 81 users for 4 topics from the TREC Robust Track. Our goal was to explore the potential benefits of using queries created by multiple users on retrieval performance for difficult topics. We first examine the overlap in users' queries and the overlap in results with respect to different queries for the same topic. We then explore the potential benefits of combining users' queries in various ways. Our results provide some evidence that having access to multiple users' queries can improve retrieval for individual searchers and for difficult topics.
    Retrieval of discussions from enterprise mailing lists BIBAFull-Text 881-882
      Maheedhar Kolla; Olga Vechtomova
    Mailing list archives in an enterprise are a valuable source for employees to dig into the past proceedings of the organization that could be relevant to their present task. Going through the proceedings of discussions about certain topics might be cumbersome and regular search techniques might not work in this context due to the genre that the documents belong to. In this paper, we propose methods, based on theory of subjectivity, to retrieve email messages that could contain argumentative discussions about the topic that the user is interested in.
    Effects of highly agreed documents in relevancy prediction BIBAFull-Text 883-884
      Andres R. Masegosa; Hideo Joho; Joemon M. Jose
    Finding significant contextual features is a challenging task in the development of interactive information retrieval (IR) systems. This paper investigated a simple method to facilitate such a task by looking at aggregated relevance judgements of retrieved documents. Our study suggested that the agreement on relevance judgements can indicate the effectiveness of retrieved documents as the source of significant features. The effect of highly agreed documents gives us practical implication for the design of adaptive search models in interactive IR systems.
    Detecting word substitutions: PMI vs. HMM BIBAFull-Text 885-886
      Dmitri Roussinov; SzeWang Fong; David Skillicorn
    Those who want to conceal the content of their communications can do so by replacing words that might trigger attention. For example, instead of writing "The bomb is in position", a terrorist may chose to write "The flower is in position." The substituted sentence would sound a bit "odd" for a human reader and it has been shown in prior research that such oddity is detectable by text mining approaches. However, the importance of each component in the suggested oddity detection approach has not been thoroughly investigated. Also, the approach has not been compared with such an obvious candidate for the task as Hidden Markov Models (HMM). In this work, we explore further oddity detection algorithms reported earlier, specifically those based on pointwise mutual information (PMI) and Hidden Markov Models (HMM).
    Workload sampling for enterprise search evaluation BIBAFull-Text 887-888
      Tom Rowlands; David Hawking; Ramesh Sankaranarayana
    In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company's query log as a 'gold standard' test set we demonstrate that biases in sitemap-derived and top n query sets can lead to significant perturbations in engine rankings and big differences in estimated performance levels.
    Document layout and color driven image retrieval BIBAFull-Text 889-890
      Pere Obrador
    This paper presents a contribution to image indexing applied to the document creation task. The presented method ranks a set of photographs based on how well they aesthetically work within a predefined document. Color harmony, document visual balance and image quality are taken into consideration. A user study conducted on people with a range of expertise in document creation helped gather the right visual features to consider by the algorithm. This shows some benefits for the traditional document creation task, as well as for the case of ever-changing web page banner colors and layout.
    Large-scale cluster-based retrieval experiments on Turkish texts BIBAFull-Text 891-892
      Ismail Sengor Altingovde; Rifat Ozcan; Huseyin Cagdas Ocalan; Fazli Can; Özgür Ulusoy
    We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents. In particular, we compare CBR effectiveness with full-text search (FS) and evaluate several implementation alternatives for CBR. Our findings reveal that CBR yields comparable effectiveness figures with FS. Furthermore, by using a specifically tailored cluster-skipping inverted index we significantly improve in-memory query processing efficiency of CBR in comparison to other traditional CBR techniques and even FS.
    Improving active learning recall via disjunctive boolean constraints BIBAFull-Text 893-894
      Emre Velipasaoglu; Hinrich Schütze; Jan O. Pedersen
    Active learning efficiently hones in on the decision boundary between relevant and irrelevant documents, but in the process can miss entire clusters of relevant documents, yielding classifiers with low recall. In this paper, we propose a method to increase active learning recall by constraining sampling to a document subset rich in relevant examples.
    Creativity support: information discovery and exploratory search BIBAFull-Text 895-896
      Eunyee Koh; Andruid Kerne; Rodney Hill
    We are developing support for creativity in learning through information discovery and exploratory search. Users engage in creative tasks, such as inventing new products and services. The system supports evolving information needs. It gathers and presents relevant information visually using images and text. Users are able to search, browse, and explore results from multiple queries and interact with information elements by manipulating design and expressing interest. A field study was conducted to evaluate the system in an undergraduate class. The results demonstrated the efficacy of our system for developing creative ideas. Exposure to diverse information in visual and interactive forms is shown to support students engaged in invention tasks.

    Demonstrations

    MQX: multi-query engine for compressed XML data BIBKFull-Text 897
      Xiaoling Wang; Aoying Zhou; Juzhen He; Wilfred Ng
    Keywords: XML, compression, multi-query
    ISKODOR: unified user modeling for integrated searching BIBAFull-Text 898
      Melanie Gnasa; Armin B. Cremers; Douglas W. Oard
    ISKODOR integrates personal collections, peer search, and centralized search services. User modeling in ISKODOR fills three roles: discovery of sites with suitable information stores, context-based query interpretation, and automatic profile-based filtering of new information. Explanation and control are achieved through graphical depiction of sources, explicit feedback regarding utility, and explicit control over peer association behavior and information sharing.
    Babel: a machine transliteration workbench BIBKFull-Text 899
      A. Kumaran; Tobias Kellner
    Keywords: crosslingual information retrieval, machine transliteration
    X-Site: a workplace search tool for software engineers BIBAFull-Text 900
      Peter C. K. Yeung; Luanne Freund; Charles L. A. Clarke
    Professionals in the workplace need high-precision search tools capable of retrieving information that is useful and appropriate to the task at hand. One approach to identifying content, which is not only relevant but also useful, is to make use of the task context of the search. We present X-Site, an enterprise search engine for the software engineering domain that exploits relationships between user's tasks and document genres in the collection to improve retrieval precision.
    The wild thing goes local BIBAFull-Text 901
      Kenneth Ward Church; Bo Thiesson
    Suppose you are on a mobile device with no keyboard (e.g., a cell phone) and you want to perform a "near me" search. Where is the nearest pizza? How do you enter queries quickly? T9? The Wild Thing encourages users to enter patterns with implicit and explicit wild cards (regular expressions). The search engine uses Microsoft Local Live logs to find the most likely queries for a particular location. For example, 7#6 is short-hand for the regular expression: /^[PQRS].*[ ][MNO].*/, which matches "post office" in many places (but "Space Needle" in Seattle). Some queries are more local than others. Pizza is likely everywhere, whereas "Boeing Company," is very likely in Seattle and Chicago, moderately likely nearby, and somewhat likely elsewhere. Smoothing is important. Not every query is observed everywhere.
    DiscoverInfo: a tool for discovering information with relevance and novelty BIBKFull-Text 902
      Chirag Shah; Gary Marchionini
    Keywords: novelty, relevance, term-cloud
    Radio Oranje: searching the queen's speech(es) BIBAFull-Text 903
      Willemijn Heeren; Laurens van der Werff; Roeland Ordelman; Arjan van Hessen; Franciska de Jong
    The 'Radio Oranje' demonstrator shows an attractive multimedia user experience in the cultural heritage domain based on a collection of mono-media audio documents. It supports online search and browsing of the collection using indexing techniques, specialized content visualizations and a related photo database.
    Mobile interface of the memoria project BIBAFull-Text 904
      Ricardo Dias; Rui Jesus; Rute Frias; Nuno Correia
    This project develops tools to manage personal memories that include a multimedia retrieval system and user interfaces for different devices. This paper and demonstration presents the mobile interface which allows browsing, retrieving, and taking pictures that are automatically annotated with GPS data and audio information. The multimedia retrieval system uses multimodal information: visual content, GPS metadata and audio information. The interface was evaluated in a cultural heritage site.
    A full-text retrieval toolkit for mobile desktop search BIBKFull-Text 905
      Wei Chen; Jiajun Bu; Kangmiao Liu; Chun Chen; Chen Zhang
    Keywords: full-text retrieval, mobile desktop search
    EXPOSE: searching the web for expertise BIBKFull-Text 906
      Fabian Kaiser; Holger Schwarz; Mihály Jakob
    Keywords: expert finding, information retrieval, knowledge management, search engine, web search
    Text categorization for streams BIBAFull-Text 907
      D. L. Thomas; W. J. Teahan
    We describe a novel system for evaluating and performing stream-based text categorization. Stream-based text categorization considers the text being categorized as a stream of symbols, which differs from the traditional feature-based approach which relies on extracting features from the text. The system implements character-based languages models -- specifically models based on the PPM text compression scheme -- as well as count-based measures such as R-Measure and C-Measure. Use of the system demonstrates that all of these techniques outperform SVM, a feature-based classifier, at stream-related classification tasks such as authorship ascription.
    Search results using timeline visualizations BIBKFull-Text 908
      Omar Alonso; Michael Gertz; Ricardo Baeza-Yates
    Keywords: information visualization, temporal information
    Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search BIBAFull-Text 909
      Martin Potthast
    We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index "Wikipedia in the Pocket", which contains about 2 million English and German Wikipedia articles.1 This index -- along with a search interface -- fits on a conventional CD (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing.
    Nexus: a real time QA system BIBKFull-Text 910
      Kisuh Ahn; Bonnie Webber
    Keywords: information retrieval, question answering
    Geographic ranking for a local search engine BIBAFull-Text 911
      Tony Abou-Assaleh; Weizheng Gao
    Traditional ranking schemes of the relevance of a Web page to a user query in a search engine are less appropriate when the search term contains geographic information. Often, geographic entities, such as addresses, city names, and location names, appear only once or twice in a Web page, and are typically not in a heading or larger font. Consequently, an alternative ranking approach to the traditional weighted tf*idf relevance ranking is need. Further, if a Web site contains a geographic entity, it is often the case that its in- and out-neighbours do not refer to the same entity, although they may refer to other geographic entities. We present a local search engine that applies a novel ranking algorithm suitable for ranking Web pages with geographic content. We describe its major components: geographic ranking, focused crawling, geographic extractor, and the related web-sites feature.
    Focused ranking in a vertical search engine BIBAFull-Text 912
      Philip O'Brien; Tony Abou-Assaleh
    Since the debut of PageRank and HITS, hyperlink-induced Web document ranking has come a long way. The Web has become increasingly vast and topically diverse. Such vastness has led many into the area of topic-sensitive ranking and its variants. We address the high dimensionality of the Web by providing tools for focused search. A focused search engine is one which seeks coverage over a subset of topics of the Web and presents users with relevant search results in a known domain. This demonstration will introduce readers to the GenieKnows.com Vertical Search Engine.
    A "do-it-yourself" evaluation service for music information retrieval systems BIBKFull-Text 913
      M. Cameron Jones; Mert Bay; J. Stephen Downie; Andreas F. Ehmann
    Keywords: evaluation, music information retrieval
    IR-Toolbox: an experiential learning tool for teaching IR BIBKFull-Text 914
      Efthimis N. Efthimiadis; Nathan G. Freier
    Keywords: experiential learning, science students, teaching information retrieval to library and information

    Doctoral consortium

    Beyond classical measures: how to evaluate the effectiveness of interactive information retrieval system? BIBAFull-Text 915
      Azzah Al-Maskari
    This research explores the relationship between Information Retrieval (IR) systems' effectiveness and users' performance (accuracy and speed) and their satisfaction with the retrieved results (precision of the results, completeness of the results and overall system success). Previous studies have concluded that improvements in IR systems based on increase in IR effectiveness measures do not reflect on improvement in users' performance. This work aims at exploiting factors that can possibly be considered as confounding variables in Interactive Information Retrieval (IIR) evaluation. In this research, we look at substantive approaches to evaluate IIR systems. We aim to build an interactive evaluation framework that brings together aspects of systems' effectiveness and users' performance and satisfaction. This research also involves developing methods for capturing users' satisfaction with the retrieved results of IR systems, as well as examination how users assess their own performance in task completion. Furthermore, we are also interested in identifying evaluation measures which are used in batch mode (non-interactive experiment), but correlate well in interactive IR system. Thus, by the end of this research, we hope to develop a valid and reliable metrics for IIR evaluation.
    People search in the enterprise BIBKFull-Text 916
      Krisztian Balog
    Keywords: enterprise search, expertise finding, people search
    Efficient integration of proximity for text, semi-structured and graph retrieval BIBKFull-Text 917
      Andreas Broschart
    Keywords: XML, efficiency, proximity
    Attention-based information retrieval BIBAFull-Text 918
      Georg Buscher
    In the proposed PhD thesis, it will be examined how attention data from the user, especially generated by an eye tracker, can be exploited in order to enhance and personalize information retrieval methods.
    A summarisation logic for structured documents BIBAFull-Text 919
      Jan Frederik Forst
    The logical approach to Information Retrieval tries to model the relevance of a document given a query as the logical implication between documents and queries. In early work, van Rijsbergen states that the retrieval status value of a document given a query is proportional to the degree of implication between a document and a query. Based on this, several probabilistic logics for information retrieval have been conceived, which add an additional layer of abstraction to the information retrieval task: probabilistic models for the retrieval of documents are expressed in those logics, rather than implemented directly.
       The aim of the research presented here is to develop a logic for the IR task of document summarisation. Such a summarisation logic adds an abstraction layer to the summarisation task, similarly to the way a logic for document retrieval adds a layer to the retrieval task. Probabilistic models for document summarisation will thus be expressed as logical formulae, with the actual implementation hidden by the logical expressions. The "extract-worthiness" of textual components in this logic is measured as the degree of implication from those textual components to their surrounding contexts, providing a measure of how much said components are "about" the context within which they are situated.
    Information-behaviour modeling with external cues BIBAFull-Text 920
      Michael Huggett
    Much of human activity defines an information context. We awaken, start work, and hold meetings at roughly the same time every day, and retrieve the same information items (day planners, itineraries, schedules, agendas, reports, menus, web pages, etc.) for many of these activities. Information retrieval systems in general lack sensitivity to such recurrent context, requiring users to remember and re-enter search cues for objects regardless of how regularly or consistently the objects are used, and to develop ad-hoc storage strategies.
       We propose that in addition to semantic cues, information objects should also be indexed by temporal and sensory cues, such as clock time and location, so that objects can be retrieved by external environmental context, in addition to any internal semantic content. Our cue-event-object (CEO) model uses a network representation to associate information objects with the times and conditions (location, weather, etc.) when they are typically used. Users can query the system to review their activities, revealing what they do at particular times, and which information objects tend to be most often used and when. The system can also pre-fetch items that have proven useful in past similar situations. The CEO model is incremental, real-time, and dynamic, maintaining an accurate summary even as a user's information behaviour changes over time.
       Such environmentally-aware systems have applications in personal information management, mobile devices, and smart homes. As a memory prosthesis, the model can support autonomous living for the cognitively impaired. We present a comprehensive research agenda based on some promising preliminary findings.
    Fuzzy temporal and spatial reasoning for intelligent information retrieval BIBAFull-Text 921
      Steven Schockaert
    Temporal and spatial information in text documents is often expressed in a qualitative way. Moreover, both are frequently affected by vagueness, calling for appropriate extensions of traditional frameworks for qualitative reasoning about time and space. Our research aims at defining such extensions based on fuzzy set theory, and applying the resulting frameworks to two important kinds of intelligent information retrieval, viz. temporal question answering and geographic information retrieval.
    Paragraph retrieval for why-question answering BIBKFull-Text 922
      Suzan Verberne
    Keywords: discourse annotation, paragraph retrieval, why-questions
    Global resources for peer-to-peer text retrieval BIBAFull-Text 923
      Hans Friedrich Witschel
    The thesis presented in this paper tackles selected issues in unstructured peer-to-peer information retrieval (P2PIR) systems, using world knowledge for solving P2PIR problems. A first part uses so-called reference corpora for estimating global term weights such as IDF instead of sampling them from the distributed collection. A second part of the work will be dedicated to the question of query routing in unstructured P2PIR systems using peer resource descriptions and world knowledge for query expansion.
    Automatic query-time generation of retrieval expert coefficients for multimedia retrieval BIBAFull-Text 924
      Peter Wilkins
    Content-based Multimedia Information Retrieval can be defined as the task of matching a multi-modal information need against various components of a multimedia corpus and retrieving relevant elements. Generally the matching and retrieval takes place across multiple 'features' which can either be visual or audio, or can be high-level or low-level, and each of which can be seen to be an independent retrieval expert. The task of answering a query can thus be formulated as a data fusion problem. Depending on the query, each expert may perform differently and so retrieval coefficients can be used to weight each expert to increase overall performance. Previous approaches to expert coefficient generation have included query-independent coefficients, identification of query-classes and machine learning methods, to name a few.
       The approach I propose is different, as it seeks to dynamically create expert coefficients which are query-dependent. This approach is based upon earlier experiments where an initial correlation was observed between the score distribution of a retrieval expert, and its relative performance when compared against other experts for that query. I have created a basic method which leverages these observations to create query-time coefficients which achieve comparable performance to oracle-determined query-independent weights, for the experts and collections used in the aforementioned experiment. Previous research which examined score distribution did so with respect to relevance, whereas this work seeks to compare expert scores for a given query to determine relative performance.
       In my work I aim to explore this correlation by eliminating potential bias from the data collections, the retrieval experts and the queries used in experiments to obtain more robust observations. Using and extending previous investigations into data fusion, I will explore where data fusion succeeds in multimedia retrieval, and where it does not. I then aim to refine and extend my existing techniques for automatic coefficient generation to incorporate the new observations, so as to improve performance. Finally I will combine this approach with existing data fusion methods, such as query-class coefficients, with each approach complimenting the other to achieve further performance improvements.