| An interdisciplinary perspective on information retrieval | | BIBK | Full-Text | 1-2 | |
| Susan T. Dumais | |||
Keywords: Salton award lecture, evaluation, human-computer interaction, information
retrieval, interdisciplinary research | |||
| Context-aware query classification | | BIBAK | Full-Text | 3-10 | |
| Huanhuan Cao; Derek Hao Hu; Dou Shen; Daxin Jiang; Jian-Tao Sun; Enhong Chen; Qiang Yang | |||
| Understanding users'search intent expressed through their search queries is
crucial to Web search and online advertisement. Web query classification (QC)
has been widely studied for this purpose. Most previous QC algorithms classify
individual queries without considering their context information. However, as
exemplified by the well-known example on query "jaguar", many Web queries are
short and ambiguous, whose real meanings are uncertain without the context
information. In this paper, we incorporate context information into the problem
of query classification by using conditional random field (CRF) models. In our
approach, we use neighboring queries and their corresponding clicked URLs (Web
pages) in search sessions as the context information. We perform extensive
experiments on real world search logs and validate the effectiveness and
efficiency of our approach. We show that we can improve the F1 score by 52% as
compared to other state-of-the-art baselines. Keywords: query classification, search context | |||
| Refined experts: improving classification in large taxonomies | | BIBAK | Full-Text | 11-18 | |
| Paul N. Bennett; Nam Nguyen | |||
| While large-scale taxonomies -- especially for web pages -- have been in
existence for some time, approaches to automatically classify documents into
these taxonomies have met with limited success compared to the more general
progress made in text classification. We argue that this stems from three
causes: increasing sparsity of training data at deeper nodes in the taxonomy,
error propagation where a mistake made high in the hierarchy cannot be
recovered, and increasingly complex decision surfaces in higher nodes in the
hierarchy. While prior research has focused on the first problem, we introduce
methods that target the latter two problems -- first by biasing the training
distribution to reduce error propagation and second by propagating up
"first-guess" expert information in a bottom-up manner before making a refined
top down choice. Finally, we present an empirical study demonstrating that the
suggested changes lead to 10-30% improvements in F1 scores versus an accepted
competitive baseline, hierarchical SVMs. Keywords: large-scale hierarchy, text classification | |||
| Dynamicity vs. effectiveness: studying online clustering for scatter/gather | | BIBAK | Full-Text | 19-26 | |
| Weimao Ke; Cassidy R. Sugimoto; Javed Mostafa | |||
| We proposed and implemented a novel clustering algorithm called LAIR2, which
has constant running time average for on-the-fly Scatter/Gather browsing [4].
Our experiments showed that when running on a single processor, the LAIR2
on-line clustering algorithm was several hundred times faster than a parallel
Buckshot algorithm running on multiple processors [11]. This paper reports on a
study that examined the effectiveness of the LAIR2 algorithm in terms of
clustering quality and its impact on retrieval performance. We conducted a user
study on 24 subjects to evaluate on-the-fly LAIR2 clustering in Scatter/Gather
search tasks by comparing its performance to the Buckshot algorithm, a classic
method for Scatter/Gather browsing [4]. Results showed significant differences
in terms of subjective perceptions of clustering quality. Subjects perceived
that the LAIR2 algorithm produced significantly better quality clusters than
the Buckshot method did. Subjects felt that it took less effort to complete the
tasks with the LAIR2 system, which was more effective in helping them in the
tasks. Interesting patterns also emerged from subjects' comments in the final
open-ended questionnaire. We discuss implications and future research. Keywords: clustering, effectiveness, efficiency, exploratory search, interactive
visualization, scalability, scatter/gather, user study | |||
| Web searching for daily living | | BIBAK | Full-Text | 27-34 | |
| Takuya Maekawa; Yutaka Yanagisawa; Yasushi Sakurai; Yasue Kishino; Koji Kamei; Takeshi Okadome | |||
| The new concept proposed in this paper is a query free web search that
automatically retrieves a web page including information related to the daily
activity that we are currently engaged in for automatically displaying the page
on Internet-connected domestic appliances around us such as televisions. When
we are washing a coffee maker, for example, a web page is retrieved that
includes tips such as 'cleaning a coffee maker with vinegar removes stains
well.' A method designed on the basis of this concept automatically searches
for a web page by using a query constructed from the use of ordinary household
objects that is detected by sensors attached to the objects. An in-situ
experiment tests a variety of IR techniques and the experiment confirmed that
our daily activities can produce related web pages with high accuracy. Keywords: daily living, experiment, sensors, web search | |||
| Global ranking by exploiting user clicks | | BIBAK | Full-Text | 35-42 | |
| Shihao Ji; Ke Zhou; Ciya Liao; Zhaohui Zheng; Gui-Rong Xue; Olivier Chapelle; Gordon Sun; Hongyuan Zha | |||
| It is now widely recognized that user interactions with search results can
provide substantial relevance information on the documents displayed in the
search results. In this paper, we focus on extracting relevance information
from one source of user interactions, i.e., user click data, which records the
sequence of documents being clicked and not clicked in the result set during a
user search session. We formulate the problem as a global ranking problem,
emphasizing the importance of the sequential nature of user clicks, with the
goal to predict the relevance labels of all the documents in a search session.
This is distinct from conventional learning to rank methods that usually design
a ranking model defined on a single document; in contrast, in our model the
relational information among the documents as manifested by an aggregation of
user clicks is exploited to rank all the documents jointly. In particular, we
adapt several sequential supervised learning algorithms, including the
conditional random field (CRF), the sliding window method and the recurrent
sliding window method, to the global ranking problem. Experiments on the click
data collected from a commercial search engine demonstrate that our methods can
outperform the baseline models for search results re-ranking. Keywords: conditional random field, experimental evaluation, implicit relevance
feedback, learning to rank, sequential supervised learning, user clicks | |||
| Good abandonment in mobile and PC internet search | | BIBAK | Full-Text | 43-50 | |
| Jane Li; Scott Huffman; Akihito Tokuda | |||
| Query abandonment by search engine users is generally considered to be a
negative signal. In this paper, we explore the concept of good abandonment. We
define a good abandonment as an abandoned query for which the user's
information need was successfully addressed by the search results page, with no
need to click on a result or refine the query. We present an analysis of
abandoned internet search queries across two modalities (PC and mobile) in
three locales. The goal is to approximate the prevalence of good abandonment,
and to identify types of information needs that may lead to good abandonment,
across different locales and modalities. Our study has three key findings:
First, queries potentially indicating good abandonment make up a significant
portion of all abandoned queries. Second, the good abandonment rate from mobile
search is significantly higher than that from PC search, across all locales
tested. Third, classified by type of information need, the major classes of
good abandonment vary dramatically by both locale and modality. Our findings
imply that it is a mistake to uniformly consider query abandonment as a
negative signal. Further, there is a potential opportunity for search engines
to drive additional good abandonment, especially for mobile search users, by
improving search features and result snippets. Keywords: PC internet search, good abandonment, mobile internet search, query analysis | |||
| Efficient query expansion for advertisement search | | BIBAK | Full-Text | 51-58 | |
| Haofen Wang; Yan Liang; Linyun Fu; Gui-Rong Xue; Yong Yu | |||
| Online advertising represents a growing part of the revenues of major
Internet service providers such as Google and Yahoo. A commonly used strategy
is to place advertisements (ads) on the search result pages according to the
users' submitted queries. Relevant ads are likely to be clicked by a user and
to increase the revenues of both advertisers and publishers. However, bid
phrases defined by ad-owners are usually contained in limited number of ads.
Directly matching user queries with bid phrases often results in finding few
appropriate ads. To address this shortcoming, query expansion is often used to
increase the chances to match the ads. Nevertheless, query expansion on top of
the traditional inverted index faces efficiency issues such as high time
complexity and heavy I/O costs. Moreover, precision cannot always be improved,
sometimes even hurt due to the involvement of additional noise.
In this paper, we propose an efficient ad search solution relying on a block-based index able to tackle the issues associated with query expansion. Our index structure places clusters of similar bid phrases in corresponding blocks with their associated ads. It reduces the number of merge operations significantly during query expansion and allows sequential scans rather than random accesses, saving I/O costs. We adopt flexible block sizes according to the clustering results of bid phrases to further optimize the index structure for efficient ad search. The pre-computation of such clusters is achieved through an agglomerative iterative clustering algorithm. Finally, we adapt the spreading activation mechanism to return the top-k relevant ads, improving search precision. The experimental results of our prototype, AdSearch, show that we can indeed return a larger number of relevant ads without sacrificing execution speed. Keywords: ad rank, ad search, agglomerative clustering, block-based index, efficient
query expansion | |||
| Query dependent pseudo-relevance feedback based on wikipedia | | BIBAK | Full-Text | 59-66 | |
| Yang Xu; Gareth J. F. Jones; Bin Wang | |||
| Pseudo-relevance feedback (PRF) via query-expansion has been proven to be
eîective in many information retrieval (IR) tasks. In most existing work,
the top-ranked documents from an initial search are assumed to be relevant and
used for PRF. One problem with this approach is that one or more of the top
retrieved documents may be non-relevant, which can introduce noise into the
feedback process. Besides, existing methods generally do not take into account
the significantly different types of queries that are often entered into an IR
system. Intuitively, Wikipedia can be seen as a large, manually edited document
collection which could be exploited to improve document retrieval effectiveness
within PRF. It is not obvious how we might best utilize information from
Wikipedia in PRF, and to date, the potential of Wikipedia for this task has
been largely unexplored. In our work, we present a systematic exploration of
the utilization of Wikipedia in PRF for query dependent expansion.
Specifically, we classify TREC topics into three categories based on Wikipedia:
1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and
study the effectiveness of three methods for expansion term selection, each
modeling the Wikipedia based pseudo-relevance information from a different
perspective. We incorporate the expansion terms into the original query and use
language modeling IR to evaluate these methods. Experiments on four TREC test
collections, including the large web collection GOV2, show that retrieval
performance of each type of query can be improved. In addition, we demonstrate
that the proposed method out-performs the baseline relevance model in terms of
precision and robustness. Keywords: entity, information retrieval, pseudo-relevance feedback, query expansion,
wikipedia | |||
| Segment-level display time as implicit feedback: a comparison to eye tracking | | BIBAK | Full-Text | 67-74 | |
| Georg Buscher; Ludger van Elst; Andreas Dengel | |||
| We examine two basic sources for implicit relevance feedback on the segment
level for search personalization: eye tracking and display time. A controlled
study has been conducted where 32 participants had to view documents in front
of an eye tracker, query a search engine, and give explicit relevance ratings
for the results. We examined the performance of the basic implicit feedback
methods with respect to improved ranking and compared their performance to a
pseudo relevance feedback baseline on the segment level and the original
ranking of a Web search engine.
Our results show that feedback based on display time on the segment level is much coarser than feedback from eye tracking. But surprisingly, for re-ranking and query expansion it did work as well as eye-tracking-based feedback. All behavior-based methods performed significantly better than our non-behavior-based baseline and especially improved poor initial rankings of the Web search engine. The study shows that segment-level display time yields comparable results as eye-tracking-based feedback. Thus, it should be considered in future personalization systems as an inexpensive but precise method for implicit feedback. Keywords: display time, eye tracking, implicit feedback, personalization | |||
| Addressing morphological variation in alphabetic languages | | BIBAK | Full-Text | 75-82 | |
| Paul McNamee; Charles Nicholas; James Mayfield | |||
| The selection of indexing terms for representing documents is a key decision
that limits how effective subsequent retrieval can be. Often stemming
algorithms are used to normalize surface forms, and thereby address the problem
of not finding documents that contain words related to query terms through
infectional or derivational morphology. However, rule-based stemmers are not
available for every language and it is unclear which methods for coping with
morphology are most effective. In this paper we investigate an assortment of
techniques for representing text and compare these approaches using data sets
in eighteen languages and five different writing systems.
We find character n-gram tokenization to be highly effective. In half of the languages examined n-grams outperform unnormalized words by more than 25%; in highly infective languages relative improvements over 50% are obtained. In languages with less morphological richness the choice of tokenization is not as critical and rule-based stemming can be an attractive option, if available. We also conducted an experiment to uncover the source of n-gram power and a causal relationship between the morphological complexity of a language and n-gram effectiveness was demonstrated. Keywords: CLIR, character n-grams, morphology, stemming, tokenization | |||
| Web derived pronunciations for spoken term detection | | BIBAK | Full-Text | 83-90 | |
| Dogan Can; Erica Cooper; Arnab Ghoshal; Martin Jansche; Sanjeev Khudanpur; Bhuvana Ramabhadran; Michael Riley; Murat Saraclar; Abhinav Sethy; Morgan Ulinski; Christopher White | |||
| Indexing and retrieval of speech content in various forms such as broadcast
news, customer care data and on-line media has gained a lot of interest for a
wide range of applications, from customer analytics to on-line media search.
For most retrieval applications, the speech content is typically first
converted to a lexical or phonetic representation using automatic speech
recognition (ASR). The first step in searching through indexes built on these
representations is the generation of pronunciations for named entities and
foreign language query terms. This paper summarizes the results of the work
conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term
Detection team, on mining the web for pronunciations and analyzing their impact
on spoken term detection. We will first present methods to use the vast amount
of pronunciation information available on the Web, in the form of IPA and
ad-hoc transcriptions. We describe techniques for extracting candidate
pronunciations from Web pages and associating them with orthographic words,
filtering out poorly extracted pronunciations, normalizing IPA pronunciations
to better conform to a common transcription standard, and generating phonemic
representations from ad-hoc transcriptions. We then present an analysis of the
effectiveness of using these pronunciations to represent Out-Of-Vocabulary
(OOV) query terms on the performance of a spoken term detection (STD) system.
We will provide comparisons of Web pronunciations against automated techniques
for pronunciation generation as well as pronunciations generated by human
experts. Our results cover a range of speech indexes based on lattices,
confusion networks and one-best transcriptions at both word and word fragments
levels. Keywords: open vocabulary, pronunciation normalization, spoken term detection, web
pronunciation extraction | |||
| Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search | | BIBAK | Full-Text | 91-98 | |
| J. Scott Olsson; Douglas W. Oard | |||
| Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been
shown to generally be more effective than vocabulary-independent techniques for
ranked retrieval of spoken content when one or the other approach is used
alone. Tuning LVCSR systems to a topic domain can be costly, however, and the
experiments in this paper show that Out-Of-Vocabulary (OOV) query terms can
significantly reduce retrieval effectiveness when that tuning is not performed.
Further experiments demonstrate, however, that retrieval effectiveness for
queries with OOV terms can be substantially improved by combining evidence from
LVCSR with additional evidence from vocabulary-independent Ranked Utterance
Retrieval (RUR). The combination is performed by using relevance judgments from
held-out topics to learn generic (i.e., topic-independent), smooth,
non-decreasing transformations from LVCSR and RUR system scores to
probabilities of topical relevance. Evaluated using a CLEF collection that
includes topics, spontaneous conversational speech audio, and relevance
judgments, the system recovers 57% of the mean uninterpolated average precision
that could have been obtained through LVCSR domain tuning for very short
queries (or 41% for longer queries). Keywords: ranked utterance retrieval, speech retrieval, vocabulary-independent spoken
term detection | |||
| Risky business: modeling and exploiting uncertainty in information retrieval | | BIBAK | Full-Text | 99-106 | |
| Jianhan Zhu; Jun Wang; Ingemar J. Cox; Michael J. Taylor | |||
| Most retrieval models estimate the relevance of each document to a query and
rank the documents accordingly. However, such an approach ignores the
uncertainty associated with the estimates of relevancy. If a high estimate of
relevancy also has a high uncertainty, then the document may be very relevant
or not relevant at all. Another document may have a slightly lower estimate of
relevancy but the corresponding uncertainty may be much less. In such a
circumstance, should the retrieval engine risk ranking the first document
highest, or should it choose a more conservative (safer) strategy that gives
preference to the second document? There is no definitive answer to this
question, as it depends on the risk preferences of the user and the information
retrieval system. In this paper we present a general framework for modeling
uncertainty and introduce an asymmetric loss function with a single parameter
that can model the level of risk the system is willing to accept. By adjusting
the risk preference parameter, our approach can effectively adapt to users'
different retrieval strategies.
We apply this asymmetric loss function to a language modeling framework and a practical risk-aware document scoring function is obtained. Our experiments on several TREC collections show that our "risk-averse" approach significantly improves the Jelinek-Mercer smoothing language model, and a combination of our "risk-averse" approach and the Jelinek-Mercer smoothing method generally outperforms the Dirichlet smoothing method. Experimental results also show that the "risk-averse" approach, even without smoothing from the collection statistics, performs as well as three commonly-adopted retrieval models, namely, the Jelinek-Mercer and Dirichlet smoothing methods, and BM25 model. Keywords: language models, loss functions, probabilistic retrieval models, probability
ranking principle, risk adjustment | |||
| Approximating true relevance distribution from a mixture model based on irrelevance data | | BIBAK | Full-Text | 107-114 | |
| Peng Zhang; Yuexian Hou; Dawei Song | |||
| Pseudo relevance feedback (PRF), which has been widely applied in IR, aims
to derive a distribution from the top n pseudo relevant documents D. However,
these documents are often a mixture of relevant and irrelevant documents. As a
result, the derived distribution is actually a mixture model, which has long
been limiting the performance of PRF. This is particularly the case when we
deal with difficult queries where the truly relevant documents in D are very
sparse. In this situation, it is often easier to identify a small number of
seed irrelevant documents, which can form a seed irrelevant distribution. Then,
a fundamental and challenging problem arises: solely based on the mixed
distribution and a seed irrelevance distribution, how to automatically generate
an optimal approximation of the true relevance distribution? In this paper, we
propose a novel distribution separation model (DSM) to tackle this problem.
Theoretical justifications of the proposed algorithm are given. Evaluation
results from our extensive simulated experiments on several large scale TREC
data sets demonstrate the effectiveness of our method, which outperforms a well
respected PRF Model, the Relevance Model (RM), as well as the use of RM on D
with the seed negative documents directly removed. Keywords: distribution separation model, irrelevant data, pseudo-relevance feedback,
true relevance distribution | |||
| Portfolio theory of information retrieval | | BIBAK | Full-Text | 115-122 | |
| Jun Wang; Jianhan Zhu | |||
| This paper studies document ranking under uncertainty. It is tackled in a
general situation where the relevance predictions of individual documents have
uncertainty, and are dependent between each other. Inspired by the Modern
Portfolio Theory, an economic theory dealing with investment in financial
markets, we argue that ranking under uncertainty is not just about picking
individual relevant documents, but about choosing the right combination of
relevant documents. This motivates us to quantify a ranked list of documents on
the basis of its expected overall relevance (mean) and its variance; the latter
serves as a measure of risk, which was rarely studied for document ranking in
the past. Through the analysis of the mean and variance, we show that an
optimal rank order is the one that balancing the overall relevance (mean) of
the ranked list against its risk level (variance). Based on this principle, we
then derive an efficient document ranking algorithm. It generalizes the
well-known probability ranking principle (PRP) by considering both the
uncertainty of relevance predictions and correlations between retrieved
documents. Moreover, the benefit of diversification is mathematically
quantified; we show that diversifying documents is an effective way to reduce
the risk of document ranking. Experimental results in text retrieval confirm
performance. Keywords: document ranking under uncertainty, mean-variance analysis, modern portfolio
theory, risk management, the probability ranking principle | |||
| A statistical comparison of tag and query logs | | BIBAK | Full-Text | 123-130 | |
| Mark J. Carman; Mark Baillie; Robert Gwadera; Fabio Crestani | |||
| We investigate tag and query logs to see if the terms people use to annotate
websites are similar to the ones they use to query for them. Over a set of
URLs, we compare the distribution of tags used to annotate each URL with the
distribution of query terms for clicks on the same URL. Understanding the
relationship between the distributions is important to determine how useful tag
data may be for improving search results and conversely, query data for
improving tag prediction. In our study, we compare both term frequency
distributions using vocabulary overlap and relative entropy. We also test
statistically whether the term counts come from the same underlying
distribution. Our results indicate that the vocabulary used for tagging and
searching for content are similar but not identical. We further investigate the
content of the websites to see which of the two distributions (tag or query) is
most similar to the content of the annotated/searched URL. Finally, we analyze
the similarity for different categories of URLs in our sample to see if the
similarity between distributions is dependent on the topic of the website or
the popularity of the URL. Keywords: folksonomy, query log, tag, web search | |||
| Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications | | BIBAK | Full-Text | 131-138 | |
| Chen Lin; Jiang-Ming Yang; Rui Cai; Xin-Jing Wang; Wei Wang | |||
| The huge amount of knowledge in web communities has motivated the research
interests in threaded discussions. The dynamic nature of threaded discussions
poses lots of challenging problems for computer scientists. Although techniques
such as semantic models and structural models have been shown to be useful in a
number of areas, they are inefficient in understanding threaded discussions due
to three reasons: (I) as most of users read existing messages before posting,
posts in a discussion thread are temporally dependent on the previous ones; It
causes the semantics and structure to be coupled with each other in threaded
discussions; (II) in online discussion threads, there are a lot of junk posts
which are useless and may disturb content analysis; and (III) it is very hard
to judge the quality of a post. In this paper, we propose a sparse coding-based
model named SMSS to Simultaneously Model Semantics and Structure of threaded
discussions. The model projects each post into a topic space, and approximates
each post by a linear combination of previous posts in the same discussion
thread. Meanwhile, the model also imposes two sparse constraints to force a
sparse post reconstruction in the topic space and a sparse post approximation
from previous posts. The sparse properties effectively take into account the
characteristics of threaded discussions. Towards the above three problems, we
demonstrate the competency of our model in three applications: reconstructing
reply structure of threaded discussions, identifying junk posts, and finding
experts in a given board/sub-board in web communities. Experimental results
show encouraging performance of the proposed SMSS model in all these
applications. Keywords: expert finding, junk identification, reply reconstruction, sparse coding,
threaded discussion | |||
| Enhancing cluster labeling using wikipedia | | BIBAK | Full-Text | 139-146 | |
| David Carmel; Haggai Roitman; Naama Zwerdling | |||
| This work investigates cluster labeling enhancement by utilizing Wikipedia,
the free on-line encyclopedia. We describe a general framework for cluster
labeling that extracts candidate labels from Wikipedia in addition to important
terms that are extracted directly from the text. The "labeling quality" of each
candidate is then evaluated by several independent judges and the top evaluated
candidates are recommended for labeling.
Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system. Keywords: cluster labeling, wikipedia | |||
| Compressing term positions in web indexes | | BIBAK | Full-Text | 147-154 | |
| Hao Yan; Shuai Ding; Torsten Suel | |||
| Large search engines process thousands of queries per second on billions of
pages, making query processing a major factor in their operating costs. This
has led to a lot of research on how to improve query throughput, using
techniques such as massive parallelism, caching, early termination, and
inverted index compression. We focus on techniques for compressing term
positions in web search engine indexes. Most previous work has focused on
compressing docID and frequency data, or position information in other types of
text collections. Compression of term positions in web pages is complicated by
the fact that term occurrences tend to cluster within documents but not across
document boundaries, making it harder to exploit clustering effects. Also,
typical access patterns for position data are different from those for docID
and frequency data. We perform a detailed study of a number of existing and new
techniques for compressing position data in web indexes. We also study how to
efficiently access position data for ranking functions that take proximity
features into account. Keywords: index compression, inverted index, search engines | |||
| Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce | | BIBAK | Full-Text | 155-162 | |
| Jimmy Lin | |||
| This paper explores the problem of computing pairwise similarity on document
collections, focusing on the application of "more like this" queries in the
life sciences domain. Three MapReduce algorithms are introduced: one based on
brute force, a second where the problem is treated as large-scale ad hoc
retrieval, and a third based on the Cartesian product of postings lists. Each
algorithm supports one or more approximations that trade effectiveness for
efficiency, the characteristics of which are studied experimentally. Results
show that the brute force algorithm is the most efficient of the three when
exact similarity is desired. However, the other two algorithms support
approximations that yield large efficiency gains without significant loss of
effectiveness. Keywords: distributed algorithms, hadoop | |||
| Efficiency trade-offs in two-tier web search systems | | BIBAK | Full-Text | 163-170 | |
| Ricardo Baeza-Yates; Vanessa Murdock; Claudia Hauff | |||
| Search engines rely on searching multiple partitioned corpora to return
results to users in a reasonable amount of time. In this paper we analyze the
standard two-tier architecture for Web search with the difference that the
corpus to be searched for a given query is predicted in advance. We show that
any predictor better than random yields time savings, but this decrease in the
processing time yields an increase in the infrastructure cost. We provide an
analysis and investigate this trade-off in the context of two different
scenarios on real-world data. We demonstrate that in general the decrease in
answer time is justified by a small increase in infrastructure cost. Keywords: distributed systems, pre-retrieval prediction, tiered indexing | |||
| A classification-based approach to question answering in discussion boards | | BIBAK | Full-Text | 171-178 | |
| Liangjie Hong; Brian D. Davison | |||
| Discussion boards and online forums are important platforms for people to
share information. Users post questions or problems onto discussion boards and
rely on others to provide possible solutions and such question-related content
sometimes even dominates the whole discussion board. However, to retrieve this
kind of information automatically and effectively is still a non-trivial task.
In addition, the existence of other types of information (e.g., announcements,
plans, elaborations, etc.) makes it difficult to assume that every thread in a
discussion board is about a question. We consider the problems of identifying
question-related threads and their potential answers as classification tasks.
Experimental results across multiple datasets demonstrate that our method can
significantly improve the performance in both question detection and answer
finding subtasks. We also do a careful comparison of how different types of
features contribute to the final result and show that non-content features play
a key role in improving overall performance. Finally, we show that a ranking
scheme based on our classification approach can yield much better performance
than prior published methods. Keywords: classification, discussion boards, online forums, question answering | |||
| Ranking community answers by modeling question-answer relationships via analogical reasoning | | BIBAK | Full-Text | 179-186 | |
| Xin-Jing Wang; Xudong Tu; Dan Feng; Lei Zhang | |||
| The method of finding high-quality answers has significant impact on user
satisfaction in community question answering systems. However, due to the
lexical gap between questions and answers as well as spam typically existing in
user-generated content, filtering and ranking answers is very challenging.
Previous solutions mainly focus on generating redundant features, or finding
textual clues using machine learning techniques; none of them ever consider
questions and their answers as relational data but instead model them as
independent information. Moreover, they only consider the answers of the
current question, and ignore any previous knowledge that would be helpful to
bridge the lexical and semantic gap. We assume that answers are connected to
their questions with various types of latent links, i.e. positive indicating
high-quality answers, negative links indicating incorrect answers or
user-generated spam, and propose an analogical reasoning-based approach which
measures the analogy between the new question-answer linkages and those of
relevant knowledge which contains only positive links; the candidate answer
which has the most analogous link is assumed to be the best answer. We
conducted experiments based on 29.8 million Yahoo!Answer question-answer
threads and showed the effectiveness of our approach. Keywords: analogical reasoning, community question answering, probabilistic relational
modeling, ranking | |||
| A syntactic tree matching approach to finding similar questions in community-based qa services | | BIBAK | Full-Text | 187-194 | |
| Kai Wang; Zhaoyan Ming; Tat-Seng Chua | |||
| While traditional question answering (QA) systems tailored to the TREC QA
task work relatively well for simple questions, they do not suffice to answer
real world questions. The community-based QA systems offer this service well,
as they contain large archives of such questions where manually crafted answers
are directly available. However, finding similar questions in the QA archive is
not trivial. In this paper, we propose a new retrieval framework based on
syntactic tree structure to tackle the similar question matching problem. We
build a ground-truth set from Yahoo! Answers, and experimental results show
that our method outperforms traditional bag-of-word or tree kernel based
methods by 8.3% in mean average precision. It further achieves up to 50%
improvement by incorporating semantic features as well as matching of potential
answers. Our model does not rely on training, and it is demonstrated to be
robust against grammatical errors as well. Keywords: Yahoo! answer, question answering, question matching, syntactic structure | |||
| On social networks and collaborative recommendation | | BIBAK | Full-Text | 195-202 | |
| Ioannis Konstas; Vassilios Stathopoulos; Joemon M. Jose | |||
| Social network systems, like last.fm, play a significant role in Web 2.0,
containing large amounts of multimedia-enriched data that are enhanced both by
explicit user-provided annotations and implicit aggregated feedback describing
the personal preferences of each user. It is also a common tendency for these
systems to encourage the creation of virtual networks among their users by
allowing them to establish bonds of friendship and thus provide a novel and
direct medium for the exchange of data.
We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks. In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method. Keywords: collaborative filtering, social network, social tagging, web 2.0 IR | |||
| Learning to recommend with social trust ensemble | | BIBAK | Full-Text | 203-210 | |
| Hao Ma; Irwin King; Michael R. Lyu | |||
| As an indispensable technique in the field of Information Filtering,
Recommender System has been well studied and developed both in academia and in
industry recently. However, most of current recommender systems suffer the
following problems: (1) The large-scale and sparse data of the user-item matrix
seriously affect the recommendation quality. As a result, most of the
recommender systems cannot easily deal with users who have made very few
ratings. (2) The traditional recommender systems assume that all the users are
independent and identically distributed; this assumption ignores the
connections among users, which is not consistent with the real world
recommendations. Aiming at modeling recommender systems more accurately and
realistically, we propose a novel probabilistic factor analysis framework,
which naturally fuses the users' tastes and their trusted friends' favors
together. In this framework, we coin the term Social Trust Ensemble to
represent the formulation of the social trust restrictions on the recommender
systems. The complexity analysis indicates that our approach can be applied to
very large datasets since it scales linearly with the number of observations,
while the experimental results show that our method performs better than the
state-of-the-art approaches. Keywords: matrix factorization, recommender systems, social network, social trust
ensemble | |||
| Fast nonparametric matrix factorization for large-scale collaborative filtering | | BIBAK | Full-Text | 211-218 | |
| Kai Yu; Shenghuo Zhu; John Lafferty; Yihong Gong | |||
| With the sheer growth of online user data, it becomes challenging to develop
preference learning algorithms that are sufficiently flexible in modeling but
also affordable in computation. In this paper we develop nonparametric matrix
factorization methods by allowing the latent factors of two low-rank matrix
factorization methods, the singular value decomposition (SVD) and probabilistic
principal component analysis (pPCA), to be data-driven, with the dimensionality
increasing with data size. We show that the formulations of the two
nonparametric models are very similar, and their optimizations share similar
procedures. Compared to traditional parametric low-rank methods, nonparametric
models are appealing for their flexibility in modeling complex data
dependencies. However, this modeling advantage comes at a computational price
-- it is highly challenging to scale them to large-scale problems, hampering
their application to applications such as collaborative filtering. In this
paper we introduce novel optimization algorithms, which are simple to
implement, which allow learning both nonparametric matrix factorization models
to be highly efficient on large-scale problems. Our experiments on EachMovie
and Netflix, the two largest public benchmarks to date, demonstrate that the
nonparametric models make more accurate predictions of user ratings, and are
computationally comparable or sometimes even faster in training, in comparison
with previous state-of-the-art parametric matrix factorization models. Keywords: collaborative filtering, matrix factorization, nonparametric models | |||
| Building enriched document representations using aggregated anchor text | | BIBAK | Full-Text | 219-226 | |
| Donald Metzler; Jasmine Novak; Hang Cui; Srihari Reddy | |||
| It is well known that anchor text plays a critical role in a variety of
search tasks performed over hypertextual domains, including enterprise search,
wiki search, and web search. It is common practice to enrich a document's
standard textual representation with all of the anchor text associated with its
incoming hyperlinks. However, this approach does not help match relevant pages
with very few inlinks. In this paper, we propose a method for overcoming anchor
text sparsity by enriching document representations with anchor text that has
been aggregated across the hyperlink graph. This aggregation mechanism acts to
smooth, or diffuse, anchor text within a domain. We rigorously evaluate our
proposed approach on a large web search test collection. Our results show the
approach significantly improves retrieval effectiveness, especially for longer,
more difficult queries. Keywords: anchor text, link structure, term weighting, web search | |||
| Using anchor texts with their hyperlink structure for web search | | BIBAK | Full-Text | 227-234 | |
| Zhicheng Dou; Ruihua Song; Jian-Yun Nie; Ji-Rong Wen | |||
| As a good complement to page content, anchor texts have been extensively
used, and proven to be useful, in commercial search engines. However, anchor
texts have been assumed to be independent, whether they come from the same Web
site or not. Intuitively, an anchor text from unrelated Web sites should be
considered as stronger evidence than that from the same site. This paper
proposes two new methods to take into account the possible relationships
between anchor texts. We consider two relationships in this paper: links from
the same site and links from related sites. The importance assigned to the
anchor texts in these two situations is discounted. Experimental results show
that these two new models outperform the baseline model which assumes
independence between hyperlinks. Keywords: anchor text, hyperlink structure, web site relationship | |||
| Link analysis for private weighted graphs | | BIBAK | Full-Text | 235-242 | |
| Jun Sakuma; Shigenobu Kobayashi | |||
| Link analysis methods have been used successfully for knowledge discovery
from the link structure of mutually linking entities. Existing link analysis
methods have been inherently designed based on the fact that the entire link
structure of the target graph is observable such as public web documents;
however, link information in graphs in the real world, such as human
relationship or economic activities, is rarely open to public. If link analysis
can be performed using graphs with private links in a privacy-preserving way,
it enables us to rank entities connected with private ties, such as people,
organizations, or business transactions. In this paper, we present a secure
link analysis for graphs with private links by means of cryptographic
protocols. Our solutions are designed as privacy-preserving expansions of
well-known link analysis methods, PageRank and HITS. The outcomes of our
protocols are completely equivalent to those of PageRank and HITS. Furthermore,
our protocols theoretically guarantee that the private link information
possessed by each node is not revealed to other nodes. We demonstrate the
efficiency of our solution by experimental studies, comparing with existing
solutions, such as secure function evaluation, decentralized spectral analysis,
and privacy-preserving link-analysis. Keywords: hits, link analysis, pagerank, privacy, ranking | |||
| Learning to rank for quantity consensus queries | | BIBAK | Full-Text | 243-250 | |
| Somnath Banerjee; Soumen Chakrabarti; Ganesh Ramakrishnan | |||
| Web search is increasingly exploiting named entities like persons, places,
businesses, addresses and dates. Entity ranking is also of current interest at
INEX and TREC. Numerical quantities are an important class of entities,
especially in queries about prices and features related to products, services
and travel. We introduce Quantity Consensus Queries (QCQs), where each answer
is a tight quantity interval distilled from evidence of relevance in thousands
of snippets. Entity search and factoid question answering have benefited from
aggregating evidence from multiple promising snippets, but these do not readily
apply to quantities. Here we propose two new algorithms that learn to aggregate
information from multiple snippets. We show that typical signals used in entity
ranking, like rarity of query words and their lexical proximity to candidate
quantities, are very noisy. Our algorithms learn to score and rankquantity
intervals directly, combining snippet quantity and snippet text information. We
report on experiments using hundreds of QCQs with ground truth taken from TREC
QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of
candidate snippets and quantities. Our algorithms yield about 20% better MAP
and NDCG compared to the best-known collective rankers, and are 35% better than
scoring snippets independent of each other. Keywords: aggregating evidence from snippets, learning to rank, quantity search | |||
| Learning in a pairwise term-term proximity framework for information retrieval | | BIBAK | Full-Text | 251-258 | |
| Ronan Cummins; Colm O'Riordan | |||
| Traditional ad hoc retrieval models do not take into account the closeness
or proximity of terms. Document scores in these models are primarily based on
the occurrences or non-occurrences of query-terms considered independently of
each other. Intuitively, documents in which query-terms occur closer together
should be ranked higher than documents in which the query-terms appear far
apart.
This paper outlines several term-term proximity measures and develops an intuitive framework in which they can be used to fully model the proximity of all query-terms for a particular topic. As useful proximity functions may be constructed from many proximity measures, we use a learning approach to combine proximity measures to develop a useful proximity function in the framework. An evaluation of the best proximity functions show that there is a significant improvement over the baseline ad hoc retrieval model and over other more recent methods that employ the use of single proximity measures. Keywords: information retrieval, learning to rank, proximity | |||
| Robust sparse rank learning for non-smooth ranking measures | | BIBAK | Full-Text | 259-266 | |
| Zhengya Sun; Tao Qin; Qing Tao; Jue Wang | |||
| Recently increasing attention has been focused on directly optimizing
ranking measures and inducing sparsity in learning models. However, few
attempts have been made to relate them together in approaching the problem of
learning to rank. In this paper, we consider the sparse algorithms to directly
optimize the Normalized Discounted Cumulative Gain (NDCG) which is a
widely-used ranking measure. We begin by establishing a reduction framework
under which we reduce ranking, as measured by NDCG, to the importance weighted
pairwise classification. Furthermore, we provide a sound theoretical guarantee
for this reduction, bounding the realized NDCG regret in terms of a properly
weighted pairwise classification regret, which implies that good performance
can be robustly transferred from pairwise classification to ranking. Based on
the converted pairwise loss function, it is conceivable to take into account
sparsity in ranking models and to come up with a gradient possessing certain
performance guarantee. For the sake of achieving sparsity, a novel algorithm
named RSRank has also been devised, which performs L1 regularization using
truncated gradient descent. Finally, experimental results on benchmark
collection confirm the significant advantage of RSRank in comparison with
several baseline methods. Keywords: information retrieval, learn to rank, rsrank, truncated gradient | |||
| Named entity recognition in query | | BIBAK | Full-Text | 267-274 | |
| Jiafeng Guo; Gu Xu; Xueqi Cheng; Hang Li | |||
| This paper addresses the problem of Named Entity Recognition in Query
(NERQ), which involves detection of the named entity in a given query and
classification of the named entity into predefined classes. NERQ is potentially
useful in many applications in web search. The paper proposes taking a
probabilistic approach to the task using query log data and Latent Dirichlet
Allocation. We consider contexts of a named entity (i.e., the remainders of the
named entity in queries) as words of a document, and classes of the named
entity as topics. The topic model is constructed by a novel and general
learning method referred to as WS-LDA (Weakly Supervised Latent Dirichlet
Allocation), which employs weakly supervised learning (rather than unsupervised
learning) using partially labeled seed entities. Experimental results show that
the proposed method based on WS-LDA can accurately perform NERQ, and outperform
the baseline methods. Keywords: named entity recognition, topic model | |||
| A 2-poisson model for probabilistic coreference of named entities for improved text retrieval | | BIBAK | Full-Text | 275-282 | |
| Seung-Hoon Na; Hwee Tou Ng | |||
| Text retrieval queries frequently contain named entities. The standard
approach of term frequency weighting does not work well when estimating the
term frequency of a named entity, since anaphoric expressions (like he, she,
the movie, etc) are frequently used to refer to named entities in a document,
and the use of anaphoric expressions causes the term frequency of named
entities to be underestimated. In this paper, we propose a novel 2-Poisson
model to estimate the frequency of anaphoric expressions of a named entity,
without explicitly resolving the anaphoric expressions. Our key assumption is
that the frequency of anaphoric expressions is distributed over named entities
in a document according to the probabilities of whether the document is elite
for the named entities. This assumption leads us to formulate our proposed
Co-referentially Enhanced Entity Frequency (CEEF). Experimental results on the
text collection of TREC Blog Track show that CEEF achieves significant and
consistent improvements over state-of-the-art retrieval methods using standard
term frequency estimation. In particular, we achieve a 3% increase of MAP over
the best performing run of TREC 2008 Blog Track. Keywords: co-reference resolution, entity retrieval, term frequency | |||
| Mining employment market via text block detection and adaptive cross-domain information extraction | | BIBAK | Full-Text | 283-290 | |
| Tak-Lam Wong; Wai Lam; Bo Chen | |||
| We have developed an approach for analyzing online job advertisements in
different domains (industries) from different regions worldwide. Our approach
is able to extract precise information from the text content supporting useful
employment market analysis locally and globally. A major component in our
approach is an information extraction framework which is composed of two
challenging tasks. The first task is to detect unformatted text blocks
automatically based on an unsupervised learning model. Identifying these useful
text blocks through this learning model allows the generation of highly
effective features for the next task which is text fragment extraction
learning. The task of text fragment extraction learning is formulated as a
domain adaptation model for text fragment classification. One advantage of our
approach is that it can easily adapt to a large number of online job
advertisements in different and new domains. Extensive experiments have been
conducted to demonstrate the effectiveness and flexibility of our approach. Keywords: information extraction, text block detection | |||
| A proximity language model for information retrieval | | BIBAK | Full-Text | 291-298 | |
| Jinglei Zhao; Yeogirl Yun | |||
| The proximity of query terms in a document is a very important information
to enable ranking models go beyond the "bag of word" assumption in information
retrieval. This paper studies the integration of term proximity information
into the unigram language modeling. A new proximity language model (PLM) is
proposed which views query terms' proximity centrality as the Dirichlet
hyper-parameter that weights the parameters of the unigram document language
model. Several forms of proximity measure are developed to be used in PLM which
could compute a query term's proximate centrality in a specific document. In
experiments, the proximity language model is compared with the basic language
model and previous works that combine the proximity information with language
model using linear score combination. The experiment results show that the
proposed model performs better in both top precision and average precision. Keywords: information retrieval, proximity language model, proximity model | |||
| Positional language models for information retrieval | | BIBAK | Full-Text | 299-306 | |
| Yuanhua Lv; ChengXiang Zhai | |||
| Although many variants of language models have been proposed for information
retrieval, there are two related retrieval heuristics remaining "external" to
the language modeling approach: (1) proximity heuristic which rewards a
document where the matched query terms occur close to each other; (2) passage
retrieval which scores a document mainly based on the best matching passage.
Existing studies have only attempted to use a standard language model as a
"black box" to implement these heuristics, making it hard to optimize the
combination parameters.
In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model. Keywords: passage retrieval, positional language models, proximity | |||
| A Bayesian learning approach to promoting diversity in ranking for biomedical information retrieval | | BIBAK | Full-Text | 307-314 | |
| Xiangji Huang; Qinmin Hu | |||
| In this paper, we propose a Bayesian learning approach to promoting
diversity for information retrieval in biomedicine and a re-ranking model to
improve retrieval performance in the biomedical domain. First, the re-ranking
model computes the maximum posterior probability of the hidden property
corresponding to each retrieved passage. Then it iteratively groups the
passages into subsets according to their properties. Finally, these passages
are re-ranked from the subsets as our output. There is no need for our proposed
method to use any external biomedical resource. We evaluate our Bayesian
learning approach by conducting extensive experiments on the TREC 2004-2007
Genomics data sets. The experimental results show the effectiveness of the
proposed Bayesian learning approach for promoting diversity in ranking for
biomedical information retrieval on four years TREC data sets. Keywords: bayesian learning, biomedical IR, promoting diversity | |||
| Sources of evidence for vertical selection | | BIBAK | Full-Text | 315-322 | |
| Jaime Arguello; Fernando Diaz; Jamie Callan; Jean-Francois Crespo | |||
| Web search providers often include search services for domain-specific
subcollections, called verticals, such as news, images, videos, job postings,
company summaries, and artist profiles. We address the problem of vertical
selection, predicting relevant verticals (if any) for queries issued to the
search engine's main web search page. In contrast to prior query classification
and resource selection tasks, vertical selection is associated with unique
resources that can inform the classification decision. We focus on three
sources of evidence: (1) the query string, from which features are derived
independent of external resources, (2) logs of queries previously issued
directly to the vertical, and (3) corpora representative of vertical content.
We focus on 18 different verticals, which differ in terms of semantics, media
type, size, and level of query traffic. We compare our method to prior work in
federated search and retrieval effectiveness prediction. An in-depth error
analysis reveals unique challenges across different verticals and provides
insight into vertical selection for future work. Keywords: aggregated search, distributed information retrieval, query classification,
resource selection, vertical selection | |||
| Adaptation of offline vertical selection predictions in the presence of user feedback | | BIBAK | Full-Text | 323-330 | |
| Fernando Diaz; Jaime Arguello | |||
| Web search results often integrate content from specialized corpora known as
verticals. Given a query, one important aspect of aggregated search is the
selection of relevant verticals from a set of candidate verticals. One drawback
to previous approaches to vertical selection is that methods have not
explicitly modeled user feedback. However, production search systems often
record a variety of feedback information. In this paper, we present algorithms
for vertical selection which adapt to user feedback. We evaluate algorithms
using a novel simulator which models performance of a vertical selector
situated in realistic query traffic. Keywords: aggregated search, distributed information retrieval, resource selection,
simulation, user feedback, vertical selection | |||
| A probabilistic topic-based ranking framework for location-sensitive domain information retrieval | | BIBAK | Full-Text | 331-338 | |
| Huajing Li; Zhisheng Li; Wang-Chien Lee; Dik Lun Lee | |||
| It has been observed that many queries submitted to search engines are
location-sensitive. Traditional search techniques fail to interpret the
significance of such geographical clues and as such are unable to return highly
relevant search results. Although there have been efforts in the literature to
support location-aware information retrieval, critical challenges still remain
in terms of search result quality and data scalability. In this paper, we
propose an innovative probabilistic ranking framework for domain information
retrieval where users are interested in a set of location-sensitive topics. Our
proposed method recognizes the geographical distribution of topic influence in
the process of ranking documents and models it accurately using probabilistic
Gaussian Process classifiers. Additionally, we demonstrate the effectiveness of
the proposed ranking framework by implementing it in a Web search service for
NBA news. Extensive performance evaluation is performed on real Web document
collections, which confirms that our proposed mechanism works significantly
better (around 29.7% averagely using DCG20 measure) than other popular
location-aware information retrieval techniques in ranking quality. Keywords: gaussian process, ranking, spatial search, topic model | |||
| Entropy-biased models for query representation on the click graph | | BIBAK | Full-Text | 339-346 | |
| Hongbo Deng; Irwin King; Michael R. Lyu | |||
| Query log analysis has received substantial attention in recent years, in
which the click graph is an important technique for describing the relationship
between queries and URLs. State-of-the-art approaches based on the raw click
frequencies for modeling the click graph, however, are not noise-eliminated.
Nor do they handle heterogeneous query-URL pairs well. In this paper, we
investigate and develop a novel entropy-biased framework for modeling click
graphs. The intuition behind this model is that various query-URL pairs should
be treated differently, i.e., common clicks on less frequent but more specific
URLs are of greater value than common clicks on frequent and general URLs.
Based on this intuition, we utilize the entropy information of the URLs and
introduce a new concept, namely the inverse query frequency (IQF), to weigh the
importance (discriminative ability) of a click on a certain URL. The IQF
weighting scheme is never explicitly explored or statistically examined for any
bipartite graphs in the information retrieval literature. We not only formally
define and quantify this scheme, but also incorporate it with the click
frequency and user frequency information on the click graph for an effective
query representation. To illustrate our methodology, we conduct experiments
with the AOL query log data for query similarity analysis and query suggestion
tasks. Experimental results demonstrate that considerable improvements in
performance are obtained with our entropy-biased models. Moreover, our method
can also be applied to other bipartite graphs. Keywords: click frequency, click graph, entropy-biased model, inverse query frequency,
user frequency | |||
| Click-through prediction for news queries | | BIBAK | Full-Text | 347-354 | |
| Arnd Christian König; Michael Gamon; Qiang Wu | |||
| A growing trend in commercial search engines is the display of specialized
content such as news, products, etc. interleaved with web search results.
Ideally, this content should be displayed only when it is highly relevant to
the search query, as it competes for space with "regular" results and
advertisements. One measure of the relevance to the search query is the
click-through rate the specialized content achieves when displayed; hence, if
we can predict this click-through rate accurately, we can use this as the basis
for selecting when to show specialized content. In this paper, we consider the
problem of estimating the click-through rate for dedicated news search results.
For queries for which news results have been displayed repeatedly before, the
click-through rate can be tracked online; however, the key challenge for which
previously unseen queries to display news results remains. In this paper we
propose a supervised model that offers accurate prediction of news
click-through rates and satisfies the requirement of adapting quickly to
emerging news events. Keywords: CTR prediction, click-through rates, news search | |||
| Smoothing clickthrough data for web search ranking | | BIBAK | Full-Text | 355-362 | |
| Jianfeng Gao; Wei Yuan; Xiao Li; Kefeng Deng; Jian-Yun Nie | |||
| Incorporating features extracted from clickthrough data (called clickthrough
features) has been demonstrated to significantly improve the performance of
ranking models for Web search applications. Such benefits, however, are
severely limited by the data sparseness problem, i.e., many queries and
documents have no or very few clicks. The ranker thus cannot rely strongly on
clickthrough features for document ranking. This paper presents two smoothing
methods to expand clickthrough data: query clustering via Random Walk on click
graphs and a discounting method inspired by the Good-Turing estimator. Both
methods are evaluated on real-world data in three Web search domains.
Experimental results show that the ranking models trained on smoothed
clickthrough features consistently outperform those trained on unsmoothed
features. This study demonstrates both the importance and the benefits of
dealing with the sparseness problem in clickthrough data. Keywords: clickthrough data, discounting, learning to rank, random walk, smoothing,
web search | |||
| Predicting user interests from contextual information | | BIBAK | Full-Text | 363-370 | |
| Ryen W. White; Peter Bailey; Liwei Chen | |||
| Search and recommendation systems must include contextual information to
effectively model users' interests. In this paper, we present a systematic
study of the effectiveness of five variant sources of contextual information
for user interest modeling. Post-query navigation and general browsing
behaviors far outweigh direct search engine interaction as an
information-gathering activity. Therefore we conducted this study with a focus
on Website recommendations rather than search results. The five contextual
information sources used are: social, historic, task, collection, and user
interaction. We evaluate the utility of these sources, and overlaps between
them, based on how effectively they predict users' future interests. Our
findings demonstrate that the sources perform differently depending on the
duration of the time window used for future prediction, and that context
overlap outperforms any isolated source. Designers of Website suggestion
systems can use our findings to provide improved support for post-query
navigation and general browsing behaviors. Keywords: context, user interest modeling, website recommendation | |||
| A comparison of query and term suggestion features for interactive searching | | BIBAK | Full-Text | 371-378 | |
| Diane Kelly; Karl Gyllstrom; Earl W. Bailey | |||
| Query formulation is one of the most difficult and important aspects of
information seeking and retrieval. Two techniques, term relevance feedback and
query suggestion, provide methods to help users formulate queries, but each is
limited in different ways. In this research we combine these two techniques by
automatically creating query suggestions using term relevance feedback
techniques. To evaluate our approach, we conducted an interactive information
retrieval study with 55 subjects and 20 topics. Each subject completed four
topics, half with a term suggestion system and half with a query suggestion
system. We also investigated the source of the suggestions: approximately half
of all subjects were provided with system-generated suggestions, while half
were provided with user-generated suggestions. Results show that subjects used
more query suggestions than term suggestions and saved more documents with
these suggestions, even though there were no significant differences in
performance. Subjects preferred the query suggestion system and rated it higher
along a number of dimensions including its ability to help them think of new
approaches to searching. Qualitative data provided insight into subjects' usage
and ratings, and indicated that subjects often used the suggestions even when
they did not click on them. Keywords: interactive searching, query suggestion, relevance feedback, term suggestion | |||
| An aspectual interface for supporting complex search tasks | | BIBAK | Full-Text | 379-386 | |
| Robert Villa; Iván Cantador; Hideo Joho; Joemon M. Jose | |||
| With the increasing importance of search systems on the web, there is a
continuing push to design interfaces which are a better match with the kinds of
real-world tasks in which users are engaged. In this paper, we consider how
broad, complex search tasks may be supported via the search interface. In
particular, we consider search tasks which may be composed of multiple aspects,
or multiple related subtasks. For example, in decision making tasks the user
may investigate multiple possible solutions before settling on a single, final
solution, while other tasks, such as report writing, may involve searching on
multiple interrelated topics.
A search interface is presented which is designed to support such broad search tasks, allowing a user to create search aspects, each of which models an independent subtask of some larger task. The interface is built on the intuition that users should be able to structure their searching environment when engaged on complex search tasks, where the act of structuring and organization may aid the user in understanding his or her task. A user study was carried out which compared our aspectual interface to a standard web-search interface. The results suggest that an aspectual interface can aid users when engaged in broad search tasks where the search aspects must be identified during searching; for a task where search aspects were pre-defined, no advantage over the baseline was found. Results for a decision making task were less clear cut, but show some evidence for improved task performance. Keywords: aspectual search, complex search tasks, exploratory search | |||
| Combining audio content and social context for semantic music discovery | | BIBAK | Full-Text | 387-394 | |
| Douglas R. Turnbull; Luke Barrington; Gert Lanckriet; Mehrdad Yazdani | |||
| When attempting to annotate music, it is important to consider both acoustic
content and social context. This paper explores techniques for collecting and
combining multiple sources of such information for the purpose of building a
query-by-text music retrieval system. We consider two representations of the
acoustic content (related to timbre and harmony) and two social sources (social
tags and web documents). We then compare three algorithms that combine these
information sources: calibrated score averaging (CSA), RankBoost, and kernel
combination support vector machines (KC-SVM). We demonstrate empirically that
each of these algorithms is superior to algorithms that use individual
information sources. Keywords: calibrated score averaging, combining data sources, kernel combination svm,
music ir, rankboost | |||
| Automatic video tagging using content redundancy | | BIBAK | Full-Text | 395-402 | |
| Stefan Siersdorfer; Jose San Pedro; Mark Sanderson | |||
| The analysis of the leading social video sharing platform YouTube reveals a
high amount of redundancy, in the form of videos with overlapping or duplicated
content. In this paper, we show that this redundancy can provide useful
information about connections between videos. We reveal these links using
robust content-based video analysis techniques and exploit them for generating
new tag assignments. To this end, we propose different tag propagation methods
for automatically obtaining richer video annotations. Our techniques provide
the user with additional information about videos, and lead to enhanced feature
representations for applications such as automatic data organization and
search. Experiments on video clustering and classification as well as a user
evaluation demonstrate the viability of our approach. Keywords: automatic tagging, content-based links, data organization, neighbor-based
tagging, tag propagation, video duplicates | |||
| CompositeMap: a novel framework for music similarity measure | | BIBAK | Full-Text | 403-410 | |
| Bingjun Zhang; Jialie Shen; Qiaoliang Xiang; Ye Wang | |||
| With the continuing advances in data storage and communication technology,
there has been an explosive growth of music information from different
application domains. As an effective technique for organizing, browsing, and
searching large data collections, music information retrieval is attracting
more and more attention. How to measure and model the similarity between
different music items is one of the most fundamental yet challenging research
problems. In this paper, we introduce a novel framework based on a multimodal
and adaptive similarity measure for various applications. Distinguished from
previous approaches, our system can effectively combine music properties from
different aspects into a compact signature via supervised learning. In
addition, an incremental Locality Sensitive Hashing algorithm has been
developed to support efficient retrieval processes with different kinds of
queries. Experimental results based on two large music collections reveal
various advantages of the proposed framework including effectiveness,
efficiency, adaptiveness, and scalability. Keywords: browsing, music, personalization, recommendation, search, similarity measure | |||
| Quantifying performance and quality gains in distributed web search engines | | BIBAK | Full-Text | 411-418 | |
| B. Barla Cambazoglu; Vassilis Plachouras; Ricardo Baeza-Yates | |||
| Distributed search engines based on geographical partitioning of a central
Web index emerge as a feasible solution to the immense growth of the Web, user
bases, and query traffic. However, there is still lack of research in
quantifying the performance and quality gains that can be achieved by such
architectures. In this paper, we develop various cost models to evaluate the
performance benefits of a geographically distributed search engine architecture
based on partial index replication and query forwarding. Specifically, we focus
on possible performance gains due to the distributed nature of query processing
and Web crawling processes. We show that any response time gain achieved by
distributed query processing can be utilized to improve search relevance as the
use of complex but more accurate algorithms can now be enabled for document
ranking. We also show that distributed Web crawling leads to better Web
coverage and try to see if this improves the search quality. We verify the
validity of our claims over large, real-life datasets via simulations. Keywords: data centers, distributed query processing, index partitioning, web crawling | |||
| SUSHI: scoring scaled samples for server selection | | BIBAK | Full-Text | 419-426 | |
| Paul Thomas; Milad Shokouhi | |||
| Modern techniques for distributed information retrieval use a set of
documents sampled from each server, but these samples have been underutilised
in server selection. We describe a new server selection algorithm, SUSHI, which
unlike earlier algorithms can make full use of the text of each sampled
document and which does not need training data. SUSHI can directly optimise for
many common cases, including high precision retrieval, and by including a
simple stopping condition can do so while reducing network traffic.
Our experiments compare SUSHI with alternatives and show it achieves the same effectiveness as the best current methods while being substantially more efficient, selecting as few as 20% as many servers. Keywords: document samples | |||
| Effective query expansion for federated search | | BIBAK | Full-Text | 427-434 | |
| Milad Shokouhi; Leif Azzopardi; Paul Thomas | |||
| While query expansion techniques have been shown to improve retrieval
performance in a centralized setting, they have not been well studied in a
federated setting. In this paper, we consider how query expansion may be
adapted to federated environments and propose several new methods: where
focused expansions are used in a selective fashion to produce specific queries
for each source (or a set of sources). On a number of different testbeds, we
show that focused query expansion can significantly outperform the previously
proposed global expansion method, and -- contrary to earlier work -- show that
query expansion can improve performance over standard federated retrieval.
These findings motivate further research examining the different methods for query expansion, and other forms of system and user interaction, in order to continue improving the performance of interactive federated search systems. Keywords: distributed information retrieval, query expansion | |||
| From networks to human behavior | | BIBAK | Full-Text | 435 | |
| Albert-László Barabási | |||
| Highly interconnected networks with amazingly complex topology describe
systems as diverse as the World Wide Web, our cells, social systems or the
economy. Recent studies indicate that these networks are the result of
self-organizing processes governed by simple but generic laws, resulting in
architectural features that makes them much more similar to each other than one
would have expected by chance. I will discuss the amazing order characterizing
our interconnected world and its implications to network robustness and
spreading processes. Finally, most of these networks are driven by the temporal
patterns characterizing human activity. I will use communication and web
browsing data to show that there is deep order in the temporal domain of human
dynamics, and discuss the different ways to understand and model the emerging
patterns. Keywords: network science, world wide web | |||
| On rank correlation and the distance between rankings | | BIBAK | Full-Text | 436-443 | |
| Ben Carterette | |||
| Rank correlation statistics are useful for determining whether a there is a
correspondence between two measurements, particularly when the measures
themselves are of less interest than their relative ordering. Kendall's -- in
particular has found use in Information Retrieval as a "meta-evaluation"
measure: it has been used to compare evaluation measures, evaluate system
rankings, and evaluate predicted performance. In the meta-evaluation domain,
however, correlations between systems confound relationships between
measurements, practically guaranteeing a positive and significant estimate of
-- regardless of any actual correlation between the measurements. We introduce
an alternative measure of distance between rankings that corrects this by
explicitly accounting for correlations between systems over a sample of topics,
and moreover has a probabilistic interpretation for use in a test of
statistical significance. We validate our measure with theory, simulated data,
and experiment. Keywords: distance measures, evaluation, information retrieval, rank correlation | |||
| Score adjustment for correction of pooling bias | | BIBAK | Full-Text | 444-451 | |
| William Webber; Laurence A. F. Park | |||
| Information retrieval systems are evaluated against test collections of
topics, documents, and assessments of which documents are relevant to which
topics. Documents are chosen for relevance assessment by pooling runs from a
set of existing systems. New systems can return unassessed documents, leading
to an evaluation bias against them. In this paper, we propose to estimate the
degree of bias against an unpooled system, and to adjust the system's score
accordingly. Bias estimation can be done via leave-one-out experiments on the
existing, pooled systems, but this requires the problematic assumption that the
new system is similar to the existing ones. Instead, we propose that all
systems, new and pooled, be fully assessed against a common set of topics, and
the bias observed against the new system on the common topics be used to adjust
scores on the existing topics. We demonstrate using resampling experiments on
TREC test sets that our method leads to a marked reduction in error, even with
only a relatively small number of common topics, and that the error decreases
as the number of topics increases. Keywords: evaluation, retrieval experiment, system measurement | |||
| Towards methods for the collective gathering and quality control of relevance assessments | | BIBAK | Full-Text | 452-459 | |
| Gabriella Kazai; Natasa Milic-Frayling; Jamie Costello | |||
| Growing interest in online collections of digital books and video content
motivates the development and optimization of adequate retrieval systems.
However, traditional methods for collecting relevance assessments to tune
system performance are challenged by the nature of digital items in such
collections, where assessors are faced with a considerable effort to review and
assess content by extensive reading, browsing, and within-document searching.
The extra strain is caused by the length and cohesion of the digital item and
the dispersion of topics within it. We propose a method for the collective
gathering of relevance assessments using a social game model to instigate
participants' engagement. The game provides incentives for assessors to follow
a predefined review procedure and makes provisions for the quality control of
the collected relevance judgments. We discuss the approach in detail, and
present the results of a pilot study conducted on a book corpus to validate the
approach. Our analysis reveals intricate relationships between the affordances
of the system, the incentives of the social game, and the behavior of the
assessors. We show that the proposed game design achieves two designated goals:
the incentive structure motivates endurance in assessors and the review process
encourages truthful assessment. Keywords: relevance assessments, social game, test collection construction | |||
| On the local optimality of LambdaRank | | BIBAK | Full-Text | 460-467 | |
| Pinar Donmez; Krysta M. Svore; Christopher J. C. Burges | |||
| A machine learning approach to learning to rank trains a model to optimize a
target evaluation measure with respect to training data. Currently, existing
information retrieval measures are impossible to optimize directly except for
models with a very small number of parameters. The IR community thus faces a
major challenge: how to optimize IR measures of interest directly. In this
paper, we present a solution. Specifically, we show that LambdaRank, which
smoothly approximates the gradient of the target measure, can be adapted to
work with four popular IR target evaluation measures using the same underlying
gradient construction. It is likely, therefore, that this construction is
extendable to other evaluation measures. We empirically show that LambdaRank
finds a locally optimal solution for mean NDCG@10, mean NDCG, MAP and MRR with
a 99% confidence rate. We also show that the amount of effective training data
varies with IR measure and that with a sufficiently large training set size,
matching the training optimization measure to the target evaluation measure
yields the best accuracy. Keywords: learning to rank, web search | |||
| Document selection methodologies for efficient and effective learning-to-rank | | BIBAK | Full-Text | 468-475 | |
| Javed A. Aslam; Evangelos Kanoulas; Virgil Pavlu; Stefan Savev; Emine Yilmaz | |||
| Learning-to-rank has attracted great attention in the IR community. Much
thought and research has been placed on query-document feature extraction and
development of sophisticated learning-to-rank algorithms. However, relatively
little research has been conducted on selecting documents for learning-to-rank
data sets nor on the effect of these choices on the efficiency and
effectiveness of learning-to-rank algorithms.
In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation-depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets. Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections. Keywords: document selection methodologies, evaluation, learning-to-rank, sampling | |||
| An improved markov random field model for supporting verbose queries | | BIBAK | Full-Text | 476-483 | |
| Matthew Lease | |||
| Recent work in supervised learning of term-based retrieval models has shown
significantly improved accuracy can often be achieved via better model
estimation. In this paper, we show retrieval accuracy with Metzler and Croft's
Markov random field (MRF) approach can be similarly improved via supervised
learning. While the original MRF method estimates a parameter for each of its
three feature classes from data, parameters within each class are set via a
uniform weighting scheme adopted from the standard unigram. We conjecture
greater MRF retrieval accuracy should be possible by better estimating
within-class parameters, particularly for verbose queries employing natural
language terms. Retrieval experiments with these queries on three TREC document
collections show our improved MRF consistently out-performs both the original
MRF and supervised unigram baselines. Additional experiments using
blind-feedback and evaluation with optimal weighting demonstrate both the
immediate value and further potential of our method. Keywords: information retrieval, markov random field, supervised learning | |||
| Placing flickr photos on a map | | BIBAK | Full-Text | 484-491 | |
| Pavel Serdyukov; Vanessa Murdock; Roelof van Zwol | |||
| In this paper we investigate generic methods for placing photos uploaded to
Flickr on the World map. As primary input for our methods we use the textual
annotations provided by the users to predict the single most probable location
where the image was taken. Central to our approach is a language model based
entirely on the annotations provided by users. We define extensions to improve
over the language model using tag-based smoothing and cell-based smoothing, and
leveraging spatial ambiguity. Further we demonstrate how to incorporate
GeoNames (http://www.geonames.org visited May 2009), a large external database
of locations. For varying levels of granularity, we are able to place images on
a map with at least twice the precision of the state-of-the-art reported in the
literature. Keywords: flickr, image localization, language models | |||
| An automatic translation of tags for multimedia contents using folksonomy networks | | BIBAK | Full-Text | 492-499 | |
| Tae-Gil Noh; Seong-Bae Park; Hee-Geun Yoon; Sang-Jo Lee; Se-Young Park | |||
| This paper proposes a novel method to translate tags attached to multimedia
contents for cross-language retrieval. The main issue in this problem is the
sense disambiguation of tags given with few textual contexts. In order to solve
this problem, the proposed method represents both tags and its translation
candidates as networks of co-occurring tags since a network allows richer
expression of contexts than other expressions such as co-occurrence vectors.
The method translates a tag by selecting the optimal one from possible
candidates based on a network similarity even when neither the textual contexts
nor sophisticated language resources are available. The experiments on the MIR
Flickr-2008 test set show that the proposed method achieves 90.44% accuracy in
translating tags from English into German, which is significantly higher than
the baseline methods of a frequency based translation and a co-occurrence-based
translation. Keywords: cross-language ir, image ir, meta-data, social tagging | |||
| CrowdReranking: exploring multiple search engines for visual search reranking | | BIBAK | Full-Text | 500-507 | |
| Yuan Liu; Tao Mei; Xian-Sheng Hua | |||
| Most existing approaches to visual search reranking predominantly focus on
mining information within the initial search results. However, the initial
ranked list cannot provide enough cues for reranking by itself due to the
typically unsatisfying visual search performance. This paper presents a new
method for visual search reranking called CrowdReranking, which is
characterized by mining relevant visual patterns from image search results of
multiple search engines which are available on the Internet. Observing that
different search engines might have different data sources for indexing and
methods for ranking, it is reasonable to assume that there exist different
search results yet certain common visual patterns relevant to a given query
among those results. We first construct a set of visual words based on the
local image patches collected from multiple image search engines. We then
explicitly detect two kinds of visual patterns, i.e., salient and concurrent
patterns, among the visual words. Theoretically, we formalize reranking as an
optimization problem on the basis of the mined visual patterns and propose a
close-form solution. Empirically, we conduct extensive experiments on several
real-world search engines and one benchmark dataset, and show that the proposed
CrowdReranking is superior to the state-of-the-art works. Keywords: data mining, search reranking, visual search | |||
| Including summaries in system evaluation | | BIBAK | Full-Text | 508-515 | |
| Andrew Turpin; Falk Scholer; Kalvero Jarvelin; Mingfang Wu; J. Shane Culpepper | |||
| In batch evaluation of retrieval systems, performance is calculated based on
predetermined relevance judgements applied to a list of documents returned by
the system for a query. This evaluation paradigm, however, ignores the current
standard operation of search systems which require the user to view summaries
of documents prior to reading the documents themselves.
In this paper we modify the popular IR metrics MAP and P@10 to incorporate the summary reading step of the search process, and study the effects on system rankings using TREC data. Based on a user study, we establish likely disagreements between relevance judgements of summaries and of documents, and use these values to seed simulations of summary relevance in the TREC data. Re-evaluating the runs submitted to the TREC Web Track, we find the average correlation between system rankings and the original TREC rankings is 0.8 (Kendall τ), which is lower than commonly accepted for system orderings to be considered equivalent. The system that has the highest MAP in TREC generally remains amongst the highest MAP systems when summaries are taken into account, but other systems become equivalent to the top ranked system depending on the simulated summary relevance. Given that system orderings alter when summaries are taken into account, the small amount of effort required to judge summaries in addition to documents (19 seconds vs 88 seconds on average in our data) should be undertaken when constructing test collections. Keywords: ir evaluation, summaries, trec | |||
| When more is less: the paradox of choice in search engine use | | BIBAK | Full-Text | 516-523 | |
| Antti Oulasvirta; Janne P. Hukkinen; Barry Schwartz | |||
| In numerous everyday domains, it has been demonstrated that increasing the
number of options beyond a handful can lead to paralysis and poor choice and
decrease satisfaction with the choice. Were this so-called paradox of choice to
hold in search engine use, it would mean that increasing recall can actually
work counter to user satisfaction if it implies choice from a more extensive
set of result items. The existence of this effect was demonstrated in an
experiment where users (N=24) were shown a search scenario and a query and were
required to choose the best result item within 30 seconds. Having to choose
from six results yielded both higher subjective satisfaction with the choice
and greater confidence in its correctness than when there were 24 items on the
results page. We discuss this finding in the wider context of "choice
architecture" -- that is, how result presentation affects choice and
satisfaction. Keywords: relevance judgments, satisfaction, search engines, user interfaces | |||
| Where to stop reading a ranked list?: threshold optimization using truncated score distributions | | BIBAK | Full-Text | 524-531 | |
| Avi Arampatzis; Jaap Kamps; Stephen Robertson | |||
| Ranked retrieval has a particular disadvantage in comparison with
traditional Boolean retrieval: there is no clear cut-off point where to stop
consulting results. This is a serious problem in some setups. We investigate
and further develop methods to select the rank cut-off value which optimizes a
given effectiveness measure. Assuming no other input than a system's output for
a query -- document scores and their distribution -- the task is essentially a
score-distributional threshold optimization problem. The recent trend in
modeling score distributions is to use a normal-exponential mixture: normal for
relevant, and exponential for non-relevant document scores. We discuss the two
main theoretical problems with the current model, support incompatibility and
non-convexity, and develop new models that address them. The main contributions
of the paper are two truncated normal-exponential models, varying in the way
the out-truncated score ranges are handled. We conduct a range of experiments
using the TREC 2007 and 2008 Legal Track data, and show that the truncated
models lead to significantly better results. Keywords: distributed retrieval, effectiveness measure optimization, expectation
maximization, filtering, fusion, meta-search, probability of relevance, score
distribution, score normalization, threshold optimization, truncated
distribution | |||
| The wisdom of the few: a collaborative filtering approach based on expert opinions from the web | | BIBAK | Full-Text | 532-539 | |
| Xavier Amatriain; Neal Lathia; Josep M. Pujol; Haewoon Kwak; Nuria Oliver | |||
| Nearest-neighbor collaborative filtering provides a successful means of
generating recommendations for web users. However, this approach suffers from
several shortcomings, including data sparsity and noise, the cold-start
problem, and scalability. In this work, we present a novel method for
recommending items to users based on expert opinions. Our method is a variation
of traditional collaborative filtering: rather than applying a nearest neighbor
algorithm to the user-rating data, predictions are computed using a set of
expert neighbors from an independent dataset, whose opinions are weighted
according to their similarity to the user. This method promises to address some
of the weaknesses in traditional collaborative filtering, while maintaining
comparable accuracy. We validate our approach by predicting a subset of the
Netflix data set. We use ratings crawled from a web portal of expert reviews,
measuring results both in terms of prediction accuracy and recommendation list
precision. Finally, we explore the ability of our method to generate useful
recommendations, by reporting the results of a user-study where users prefer
the recommendations generated by our approach. Keywords: collaborative filtering, cosine similarity, experts, nearest neighbors,
recommender system, top-n recommendations | |||
| Personalized tag recommendation using graph-based ranking on multi-type interrelated objects | | BIBAK | Full-Text | 540-547 | |
| Ziyu Guan; Jiajun Bu; Qiaozhu Mei; Chun Chen; Can Wang | |||
| Social tagging is becoming increasingly popular in many Web 2.0 applications
where users can annotate resources (e.g. Web pages) with arbitrary keywords
(i.e. tags). A tag recommendation module can assist users in tagging process by
suggesting relevant tags to them. It can also be directly used to expand the
set of tags annotating a resource. The benefits are twofold: improving user
experience and enriching the index of resources. However, the former one is not
emphasized in previous studies, though a lot of work has reported that
different users may describe the same concept in different ways. We address the
problem of personalized tag recommendation for text documents. In particular,
we model personalized tag recommendation as a "query and ranking" problem and
propose a novel graph-based ranking algorithm for interrelated multi-type
objects. When a user issues a tagging request, both the document and the user
are treated as a part of the query. Tags are then ranked by our graph-based
ranking algorithm which takes into consideration both relevance to the document
and preference of the user. Finally, the top ranked tags are presented to the
user as suggestions. Experiments on a large-scale tagging data set collected
from Del.icio.us have demonstrated that our proposed algorithm significantly
outperforms algorithms which fail to consider the diversity of different users'
interests. Keywords: personalization, ranking, recommender systems, social tagging | |||
| Leveraging sources of collective wisdom on the web for discovering technology synergies | | BIBAK | Full-Text | 548-555 | |
| Cai-Nicolas Ziegler; Stefan Jung | |||
| One of the central tasks of R&D strategy and portfolio management at large
technology companies and research institutions refers to the identification of
technological synergies throughout the organization. These efforts are geared
towards saving resources by consolidating scattered expertise, sharing best
practices, and reusing available technologies across multiple product lines. In
the past, this task has been done in a manual evaluation process by technical
domain experts. While feasible, the major drawback of this approach is the
enormous effort in terms of availability and time: For a structured and
complete analysis every combination of any two technologies has to be rated
explicitly. We present a novel approach that recommends technological synergies
in an automated fashion, making use of abundant collective wisdom from the Web,
both in pure textual form as well as classification ontologies. Our method has
been deployed for practical support of the synergy evaluation process within
our company. We have also conducted empirical evaluations based on randomly
selected technology pairs so as to benchmark the accuracy of our approach, as
compared to a group of general computer science technologists as well as a
control group of domain experts. Keywords: collective wisdom, semantic similarity, text mining, web 2.0 | |||
| Query side evaluation: an empirical analysis of effectiveness and effort | | BIBAK | Full-Text | 556-563 | |
| Leif Azzopardi | |||
| Typically, Information Retrieval evaluation focuses on measuring the
performance of the system's ability at retrieving relevant information, and not
the query's ability. However, the effectiveness of a retrieval system is
strongly influenced by the quality of the query submitted. In this paper, the
effectiveness and effort of querying is empirically examined in the context of
the Principle of Least Effort, Zipf's Law and the Law of Diminishing Returns.
This query focused investigation leads to a number of novel findings which
should prove useful in the development of future retrieval methods and
evaluation techniques. While, also motivating further research into query side
evaluation. Keywords: evaluation, simulation | |||
| Reducing long queries using query quality predictors | | BIBAK | Full-Text | 564-571 | |
| Giridhar Kumaran; Vitor R. Carvalho | |||
| Long queries frequently contain many extraneous terms that hinder retrieval
of relevant documents. We present techniques to reduce long queries to more
effective shorter ones that lack those extraneous terms. Our work is motivated
by the observation that perfectly reducing long TREC description queries can
lead to an average improvement of 30% in mean average precision. Our approach
involves transforming the reduction problem into a problem of learning to rank
all sub-sets of the original query (sub-queries) based on their predicted
quality, and selecting the top sub-query. We use various measures of query
quality described in the literature as features to represent sub-queries, and
train a classifier. Replacing the original long query with the top-ranked
sub-query chosen by the ranker results in a statistically significant average
improvement of 8% on our test sets. Analysis of the results shows that query
reduction is well-suited for moderately-performing long queries, and a small
set of query quality predictors are well-suited for the task of ranking
sub-queries. Keywords: long queries, query quality, query reduction, verbose queries | |||
| Extracting structured information from user queries with semi-supervised conditional random fields | | BIBAK | Full-Text | 572-579 | |
| Xiao Li; Ye-Yi Wang; Alex Acero | |||
| When search is against structured documents, it is beneficial to extract
information from user queries in a format that is consistent with the backend
data structure. As one step toward this goal, we study the problem of query
tagging which is to assign each query term to a pre-defined category. Our
problem could be approached by learning a conditional random field (CRF) model
(or other statistical models) in a supervised fashion, but this would require
substantial human-annotation effort. In this work, we focus on a
semi-supervised learning method for CRFs that utilizes two data sources: (1) a
small amount of manually-labeled queries, and (2) a large amount of queries in
which some word tokens have derived labels, i.e., label information
automatically obtained from additional resources. We present two principled
ways of encoding derived label information in a CRF model. Such information is
viewed as hard evidence in one setting and as soft evidence in the other. In
addition to the general methodology of how to use derived labels in
semi-supervised CRFs, we also present a practical method on how to obtain them
by leveraging user click data and an in-domain database that contains
structured documents. Evaluation on product search queries shows the
effectiveness of our approach in improving tagging accuracies. Keywords: conditional random fields, information extraction, metadata, semi-supervised
learning | |||
| The impact of crawl policy on web search effectiveness | | BIBAK | Full-Text | 580-587 | |
| Dennis Fetterly; Nick Craswell; Vishwa Vinay | |||
| Crawl selection policy has a direct influence on Web search effectiveness,
because a useful page that is not selected for crawling will also be absent
from search results. Yet there has been little or no work on measuring this
effect. We introduce an evaluation framework, based on relevance judgments
pooled from multiple search engines, measuring the maximum potential NDCG that
is achievable using a particular crawl. This allows us to evaluate different
crawl policies and investigate important scenarios like selection stability
over multiple iterations. We conduct two sets of crawling experiments at the
scale of 1~billion and 100~million pages respectively. These show that crawl
selection based on PageRank, indegree and trans-domain indegree all allow
better retrieval effectiveness than a simple breadth-first crawl of the same
size. PageRank is the most reliable and effective method. Trans-domain indegree
can outperform PageRank, but over multiple crawl iterations it is less
effective and more unstable. Finally we experiment with combinations of crawl
selection methods and per-domain page limits, which yield crawls with greater
potential NDCG than PageRank. Keywords: corpus selection, crawl ordering, web crawling | |||
| Optimizing search engine revenue in sponsored search | | BIBAK | Full-Text | 588-595 | |
| Yunzhang Zhu; Gang Wang; Junli Yang; Dakan Wang; Jun Yan; Jian Hu; Zheng Chen | |||
| Displaying sponsored ads alongside the search results is a key monetization
strategy for search engine companies. Since users are more likely to click ads
that are relevant to their query, it is crucial for search engine to deliver
the right ads for the query and the order in which they are displayed. There
are several works investigating on how to learn a ranking function to maximize
the number of ad clicks. In this paper, we address a new revenue optimization
problem and aim to answer the question: how to construct a ranking model that can deliver high quality ads to the user as well as maximize search engine revenue? We introduce two novel methods from different machine learning perspectives, and both of them take the revenue component into careful considerations. The algorithms are built upon the click-through log data with real ad clicks and impressions. The extensively experimental results verify the proposed algorithm that can produce more revenue than other methods as well as avoid losing relevance accuracy. To provide deep insight into the importance of each feature to search engine revenue, we extract twelve basic features from four categories. The experimental study provides a feature ranking list according to the revenue benefit of each feature. Keywords: machine learning, ranking, revenue optimization, sponsored search | |||
| Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies | | BIBAK | Full-Text | 596-603 | |
| A Marius Pa#351;ca; Enrique Alfonseca | |||
| A weakly-supervised extraction method identifies concepts within conceptual
hierarchies, at the appropriate level of specificity (e.g., Bank vs.
Institution), to which attributes (e.g., routing number) extracted from
unstructured text best apply. The extraction exploits labeled classes of
instances acquired from a combination of Web documents and query logs, and
inserted into existing conceptual hierarchies. The correct concept is
identified within the top three positions on average over gold-standard
attributes, which corresponds to higher accuracy than in alternative
experiments. Keywords: class attributes, conceptual hierarchies, knowledge acquisition, named
entities, unstructured text, web search | |||
| Spam filter evaluation with imprecise ground truth | | BIBAK | Full-Text | 604-611 | |
| Gordon V. Cormack; Aleksander Kolcz | |||
| When trained and evaluated on accurately labeled datasets, online email spam
filters are remarkably effective, achieving error rates an order of magnitude
better than classifiers in similar applications. But labels acquired from user
feedback or third-party adjudication exhibit higher error rates than the best
filters -- even filters trained using the same source of labels. It is
appropriate to use naturally occuring labels -- including errors -- as training
data in evaluating spam filters. Erroneous labels are problematic, however,
when used as ground truth to measure filter effectiveness. Any measurement of
the filter's error rate will be augmented and perhaps masked by the label error
rate. Using two natural sources of labels, we demonstrate automatic and
semi-automatic methods that reduce the influence of labeling errors on
evaluation, yielding substantially more precise measurements of true filter
error rates. Keywords: classification, email, filtering, noise, spam | |||
| Telling experts from spammers: expertise ranking in folksonomies | | BIBAK | Full-Text | 612-619 | |
| Michael G. Noll; Ching-man Au Yeung; Nicholas Gibbins; Christoph Meinel; Nigel Shadbolt | |||
| With a suitable algorithm for ranking the expertise of a user in a
collaborative tagging system, we will be able to identify experts and discover
useful and relevant resources through them. We propose that the level of
expertise of a user with respect to a particular topic is mainly determined by
two factors. Firstly, an expert should possess a high quality collection of
resources, while the quality of a Web resource depends on the expertise of the
users who have assigned tags to it. Secondly, an expert should be one who tends
to identify interesting or useful resources before other users do. We propose a
graph-based algorithm, SPEAR (SPamming-resistant Expertise Analysis and
Ranking), which implements these ideas for ranking users in a folksonomy. We
evaluate our method with experiments on data sets collected from Delicious.com
comprising over 71,000 Web documents, 0.5 million users and 2 million shared
bookmarks. We also show that the algorithm is more resistant to spammers than
other methods such as the original HITS algorithm and simple statistical
measures. Keywords: collaborative tagging, expertise, folksonomy, ranking, spam | |||
| Detecting spammers and content promoters in online video social networks | | BIBAK | Full-Text | 620-627 | |
| Fabrício Benevenuto; Tiago Rodrigues; Virgílio Almeida; Jussara Almeida; Marcos Gonçalves | |||
| A number of online video social networks, out of which YouTube is the most
popular, provides features that allow users to post a video as a response to a
discussion topic. These features open opportunities for users to introduce
polluted content, or simply pollution, into the system. For instance, spammers
may post an unrelated video as response to a popular one aiming at increasing
the likelihood of the response being viewed by a larger number of users.
Moreover, opportunistic users -- promoters -- may try to gain visibility to a
specific video by posting a large number of (potentially unrelated) responses
to boost the rank of the responded video, making it appear in the top lists
maintained by the system. Content pollution may jeopardize the trust of users
on the system, thus compromising its success in promoting social interactions.
In spite of that, the available literature is very limited in providing a deep
understanding of this problem.
In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users. Keywords: promoter, social media, social networks, spammer, video promotion, video
response, video spam | |||
| AdOn: an intelligent overlay video advertising system | | BIBAK | Full-Text | 628-629 | |
| Jinlian Guo; Tao Mei; Falin Liu; Xian-Sheng Hua | |||
| This paper presents a new video advertising system, called AdOn, which
supports intelligent overlay video ads. Unlike most current ad-networks such as
Youtube that overlay the ads at fixed positions in the videos (e.g., on the
bottom fifth of videos 15 seconds in), AdOn is able to automatically detect a
set of spatio-temporal nonintrusive positions and associate the contextually
relevant ads with these positions. The overlay positions are obtained on the
basis of video structuring, face and text detection, as well as visual saliency
analysis, so that the intrusiveness to the users can be minimized. The ads are
selected according to content-based multimodal relevance so that advertising
relevance can be maximized. AdOn represents one of the first attempts towards
intelligent overlay video advertising by leveraging video content analysis
techniques. Keywords: contextual relevance, saliency detection, video advertising | |||
| Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes | | BIBAK | Full-Text | 630-631 | |
| Mark D. Smucker; James Allan; Ben Carterette | |||
| Research has shown that little practical difference exists between the
randomization, Student's paired t, and bootstrap tests of statistical
significance for TREC ad-hoc retrieval experiments with 50 topics. We compared
these three tests on runs with topic sizes down to 10 topics. We found that
these tests show increasing disagreement as the number of topics decreases. At
smaller numbers of topics, the randomization test tended to produce smaller
p-values than the t-test for p-values less than 0.1. The bootstrap exhibited a
systematic bias towards p-values strictly less than the t-test with this bias
increasing as the number of topics decreased. We recommend the use of the
randomization test although the t-test appears to be suitable even when the
number of topics is small. Keywords: statistical significance | |||
| Annotation of URLs: more than the sum of parts | | BIBAK | Full-Text | 632-633 | |
| Max Hinne; Wessel Kraaij; Stephan Raaijmakers; Suzan Verberne; Theo van der Weide; Maarten van der Heijden | |||
| Recently a number of studies have demonstrated that search engine logfiles
are an important resource to determine the relevance relation between URLs and
query terms. We hypothesized that the queries associated with a URL could also
be presented as useful URL metadata in a search engine result list, e.g. for
helping to determine the semantic category of a URL. We evaluated this
hypothesis by a classification experiment based on the DMOZ dataset. Our method
can also annotate URLs that have no associated queries. Keywords: URL annotation, click data | |||
| Automatic URL completion and prediction using fuzzy type-ahead search | | BIBAK | Full-Text | 634-635 | |
| Jiannan Wang; Guoliang Li; Jianhua Feng | |||
| Type-ahead search is a new information-access paradigm, in which systems can
find answers to keyword queries "on-the-fly" as a user types in a query. It
improves traditional autocomplete search by allowing query keywords to appear
at different places in an answer. In this paper we study the problem of
automatic URL completion and prediction using fuzzy type-ahead search. That is,
we interactively find relevant URLs that contain words matching query keywords,
even approximately, as the user types in a query. Supporting fuzzy search is
very important when the user has limited knowledge about URLs. We describe the
design and implementation of our method, and report the experimental results on
Firefox. Keywords: URL completion, fuzzy search, type-ahead search | |||
| Beyond session segmentation: predicting changes in search intent with client-side user interactions | | BIBAK | Full-Text | 636-637 | |
| Qi Guo; Eugene Agichtein | |||
| Effective search session segmentation "grouping queries according to common
task or intent" can be useful for improving relevance, search evaluation, and
query suggestion. Previous work has largely attempted to segment search
sessions off-line, after the fact. In contrast, we present preliminary
investigation of predicting, in real time, whether a user is about to switch
interest -- that is, whether the user is about to finish the current search and
switch to another search task (or stop searching altogether). We explore an
approach for this task using client-side user behavior such as clicks, scrolls,
and mouse movements, contextualized by the content of the search result pages
and previous searches. Our experiments over thousands of real searches show
that we can identify context and user behavior patterns that indicate that a
user is about to switch to a new search task. These preliminary results can be
helpful for more effective query suggestion and personalization. Keywords: search behavior modeling, search sessions, user intent inference | |||
| Blog distillation using random walks | | BIBAK | Full-Text | 638-639 | |
| Mostafa Keikha; Mark James Carman; Fabio Crestani | |||
| This paper addresses the blog distillation problem. That is, given a user
query find the blogs most related to the query topic. We model the blogosphere
as a single graph that includes extra information besides the content of the
posts. By performing a random walk on this graph we extract most relevant blogs
for each query. Our experiments on the TREC'07 data set show 15% improvement in
MAP and 8% improvement in Precision@10 over the Language Modeling baseline. Keywords: blog retrieval, random walks | |||
| A case for improved evaluation of query difficulty prediction | | BIBAK | Full-Text | 640-641 | |
| Falk Scholer; Steven Garcia | |||
| Query difficulty prediction aims to identify, in advance, how well an
information retrieval system will perform when faced with a particular search
request. The current standard evaluation methodology involves calculating a
correlation coefficient, to indicate how strongly the predicted query
difficulty is related with an actual system performance measure, usually
Average Precision. We run a series of experiments based on predictors that have
been shown to perform well in the literature, comparing these across different
TREC runs. Our results demonstrate that the current evaluation methodology is
severely limited. Although it can be used to demonstrate the performance of a
predictor for a single system, such performance is not consistent over a
variety of retrieval systems. We conclude that published results in the query
difficulty area are generally not comparable, and recommend that prediction be
evaluated against a spectrum of underlying search systems. Keywords: evaluation, query difficulty prediction | |||
| Characterizing the subjectivity of topics | | BIBAK | Full-Text | 642-643 | |
| Marc-Allen Cartright; Elif Aktolga; Jeffrey Dalton | |||
| A document or web page in isolation may appear completely reasonable, but
may represent a biased perspective on the topic being discussed. Given the
topic of a document, we propose new metrics provocativeness and balance that
suggest when the topic could be controversial. We explore the use of these
metrics to characterize the subjectivity of the topics in the TREC Blog Track. Keywords: opinion classification, sentiment analysis, subjectivity/polarity detection | |||
| Classifying library catalogue by author profiling | | BIBAK | Full-Text | 644-645 | |
| Tadashi Nomoto | |||
| This paper presents a novel approach to classifying library records by
making use of what we call "author profile," a representation of an author's
expertise along a library classification. Coupled with a string kernel
classifier, the idea is shown to bring a significant improvement over a
baseline. Keywords: SVM, author profiling, library classification, string kernel | |||
| Cluster-based query expansion | | BIBAK | Full-Text | 646-647 | |
| Inna Gelfer Kalmanovich; Oren Kurland | |||
| We demonstrate the merits of using document clusters that are created
offline to improve the overall effectiveness and performance robustness of a
state-of-the-art pseudo-feedback-based query expansion method -- the relevance
model. Keywords: clusters, query expansion, relevance model | |||
| Comparing both relevance and robustness in selection of web ranking functions | | BIBAK | Full-Text | 648-649 | |
| Fan Li; Xin Li; Shihao Ji; Zhaohui Zheng | |||
| In commercial search engines, a ranking function is selected for deployment
mainly by comparing the relevance measurements over candidates. In this paper
we suggest to select Web ranking functions according to both their relevance
and robustness to the changes that may lead to relevance degradation over time.
We argue that the ranking robustness can be effectively measured by taking into
account the ranking score distribution across Web pages. We then improve NDCG
with two new metrics and show their superiority in terms of stability to
ranking score turbulence and stability in function selection. Keywords: NDCG, ranking robustness, web ranking | |||
| A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation | | BIBAK | Full-Text | 650-651 | |
| Christof Monz; Wouter Weerkamp | |||
| This paper describes a simple clustering approach to person name
disambiguation of retrieved documents. The methods are based on standard IR
concepts and do not require any task-specific features. We compare different
term-weighting and indexing methods and evaluate their performance against the
Web People Search task (WePS). Despite their simplicity these approaches
achieve very competitive performance. Keywords: clustering, person name disambiguation | |||
| Compression-based document length prior for language models | | BIBAK | Full-Text | 652-653 | |
| Javier Parapar; David E. Losada; Álvaro Barreiro | |||
| The inclusion of document length factors has been a major topic in the
development of retrieval models. We believe that current models can be further
improved by more refined estimations of the document's scope. In this poster we
present a new document length prior that uses the size of the compressed
document. This new prior is introduced in the context of Language Modeling with
Dirichlet smoothing. The evaluation performed on several collections shows
significant improvements in effectiveness. Keywords: compression, document length, document priors, language models | |||
| Concept representation based video indexing | | BIBAK | Full-Text | 654-655 | |
| Meng Wang; Yan Song; Xian-Sheng Hua | |||
| This poster introduces a novel concept-based video indexing approach. It is
developed based on a rich set of base concepts, of which the models are
available. Then, for a given concept with several labeled samples, we combine
the base concepts to fit it and its model can thus be obtained accordingly.
Empirical results demonstrate that this method can achieve great performance
even with very limited labeled data. We have compared different representation
approaches including both sparse and non-sparse methods. Our conclusion is that
the sparse method will lead to much better performance. Keywords: concept detection, sparse representation, video indexing | |||
| Context transfer in search advertising | | BIBAK | Full-Text | 656-657 | |
| Hila Becker; Andrei Broder; Evgeniy Gabrilovich; Vanja Josifovski; Bo Pang | |||
| We define and study the process of context transfer in search advertising,
which is the transition of a user from the context of Web search to the context
of the landing page that follows an ad-click. We conclude that in the vast
majority of cases, the user is shown one of three types of pages, which can be
accurately distinguished using automatic text classification. Keywords: landing page taxonomy, online advertising | |||
| Counting ancestors to estimate authority | | BIBAK | Full-Text | 658-659 | |
| Jian Wang; Brian D. Davison | |||
| The AncestorRank algorithm calculates an authority score by using just one
characteristic of the web graph-the number of ancestors per node. For
scalability, we estimate the number of ancestors by using a probabilistic
counting algorithm. We also consider the case in which ancestors which are
closer to the node have more influence than those farther from the node. Thus
we further apply a decay factor delta on the contributions from successively
earlier ancestors. The resulting authority score is used in combination with a
content-based ranking algorithm. Our experiments show that as long as delta is
in the range of [0.1, 0.9], AncestorRank can greatly improve BM25 performance,
and in our experiments is often better than PageRank. Keywords: PageRank, link analysis, probabilistic counting | |||
| Cross language name matching | | BIBAK | Full-Text | 660-661 | |
| J. Scott McCarley | |||
| Cross language information retrieval methods are used to determine which
segments of Arabic language documents match name-based English queries. We
investigate and contrast a word-based translation model with a character-based
transliteration model in order to handle spelling variation and previously
unseen names. We measure performance by making a novel use of the training data
from the 2007 ACE Entity Translation. Keywords: algorithms, named entities, sentence retrieval | |||
| Deep versus shallow judgments in learning to rank | | BIBAK | Full-Text | 662-663 | |
| Emine Yilmaz; Stephen Robertson | |||
| Much research in learning to rank has been placed on developing
sophisticated learning methods, treating the training set as a given. However,
the number of judgments in the training set directly affects the quality of the
learned system. Given the expense of obtaining relevance judgments for
constructing training data, one often has a limited budget in terms of how many
judgments he can get. The major problem then is how to distribute this judgment
effort across different queries. In this paper, we investigate the tradeoff
between the number of queries and the number of judgments per query when
training sets are constructed. In particular, we show that up to a limit,
training sets with more queries but shallow (less) judgments per query are more
cost effective than training sets with less queries but deep (more) judgments
per query. Keywords: learning to rank, relevance judgments, training | |||
| Developing energy efficient filtering systems | | BIBAK | Full-Text | 664-665 | |
| Leif Azzopardi; Wim Vanderbauwhede; Mahmoud Moadeli | |||
| Processing large volumes of information generally requires massive amounts
of computational power, which consumes a significant amount of energy. An
emerging challenge is the development of "environmentally friendly" systems
that are not only efficient in terms of time, but also energy efficient. In
this poster, we outline our initial efforts at developing greener filtering
systems by employing Field Programmable Gate Arrays (FPGA) to perform the core
information processing task. FPGAs enable code to be executed in parallel at a
chip level, while consuming only a fraction of the power of a standard (von
Neuman style) processor. On a number of test collections, we demonstrate that
the FPGA filtering system performs 10-20 times faster than the Itanium based
implementation, resulting in considerable energy savings. Keywords: FPGA, efficiency, filtering | |||
| Enhancing topical ranking with preferences from click-through data | | BIBAK | Full-Text | 666-667 | |
| Yi Chang; Anlei Dong; Ciya Liao; Zhaohui Zheng | |||
| To overcome the training data insufficiency problem for dedicated model in
topical ranking, this paper proposes to utilize click-through data to improve
learning. The efficacy of click-through data is explored under the framework of
preference learning. The empirical experiment on a commercial search engine
shows that, the model trained with the dedicated labeled data combined with
skip-next preferences could beat the baseline model and the generic model in
NDCG5 for 4.9% and 2.4% respectively. Keywords: click-through data, preference learning, topical ranking | |||
| Equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis | | BIBAK | Full-Text | 668-669 | |
| Wei Peng | |||
| This paper establishes a connection between NMF and PLSA on multi-way data,
called NTF and T-PLSA respectively. Two types of T-PLSA models are proven to be
equivalent to non-negative PARAFAC and non-negative Tucker3. This paper also
shows that by running NTF and T-PLSA alternatively, they can jump out of each
other's local minima and achieve a better clustering solution. Keywords: NTF, PLSA, multi-way data, tensor | |||
| The ESA retrieval model revisited | | BIBAK | Full-Text | 670-671 | |
| Maik Anderka; Benno Stein | |||
| Among the retrieval models that have been proposed in the last years, the
ESA model of Gabrilovich and Markovitch received much attention. The authors
report on a significant improvement in the retrieval performance, which is
explained with the semantic concepts introduced by the document collection
underlying ESA. Their explanation appears plausible but our analysis shows that
the connections are more involved and that the "concept hypothesis" does not
hold. In our contribution we analyze several properties that in fact affect the
retrieval performance. Moreover, we introduce a formalization of ESA, which
reveals its close connection to existing retrieval models. Keywords: explicit semantic analysis, retrieval models | |||
| Estimating query performance using class predictions | | BIBAK | Full-Text | 672-673 | |
| Kevyn Collins-Thompson; Paul N. Bennett | |||
| We investigate using topic prediction data, as a summary of document
content, to compute measures of search result quality. Unlike existing quality
measures such as query clarity that require the entire content of the
top-ranked results, class-based statistics can be computed efficiently online,
because class information is compact enough to precompute and store in the
index. In an empirical study we compare the performance of class-based
statistics to their language-model counterparts for predicting two measures:
query difficulty and expansion risk. Our findings suggest that using class
predictions can offer comparable performance to full language models while
reducing computation overhead. Keywords: query difficulty, text classification | |||
| Evaluating effects of machine translation accuracy on cross-lingual patent retrieval | | BIBAK | Full-Text | 674-675 | |
| Atsushi Fujii; Masao Utiyama; Mikio Yamamoto; Takehito Utsuro | |||
| We organized a machine translation (MT) task at the Seventh NTCIR Workshop.
Participating groups were requested to machine translate sentences in patent
documents and also search topics for retrieving patent documents across
languages. We analyzed the relationship between the accuracy of MT and its
effects on the retrieval accuracy. Keywords: NTCIR, cross-lingual information retrieval, evaluation measures, machine
translation | |||
| Evaluating web search using task completion time | | BIBAK | Full-Text | 676-677 | |
| Ya Xu; David Mease | |||
| We consider experiments to measure the quality of a web search algorithm
based on how much total time users take to complete assigned search tasks using
that algorithm. We first analyze our data to verify that there is in fact a
negative relationship between a user's total search time and a user's
satisfaction for the types of tasks under consideration. Secondly, we fit a
model with the user's total search time as the response to compare two
different search algorithms. Finally, we propose an alternative experimental
design which we demonstrate to be a substantial improvement over our current
design in terms of variance reduction and efficiency. Keywords: evaluation metrics, experiment design, interactive ir and visualization,
question answering | |||
| An evaluation of entity and frequency based query completion methods | | BIBAK | Full-Text | 678-679 | |
| Edgar Meij; Peter Mika; Hugo Zaragoza | |||
| We present a semantic approach to suggesting query completions which
leverages entity and type information. When compared to a frequency-based
approach, we show that such information mostly helps rare queries. Keywords: query completion, query log analysis, web search | |||
| Evolutionary document summarization for disaster management | | BIBAK | Full-Text | 680-681 | |
| Dingding Wang; Li Zheng; Tao Li; Yi Deng | |||
| In this poster, we develop an evolutionary document summarization system for
discovering the changes and differences in each phase of a disaster evolution.
Given a collection of document streams describing an event, our system
generates a short summary delivering the main development theme of the event by
extracting the most representative and discriminative sentences at each phase.
Experimental results on the collection of press releases for Hurricane Wilma in
2005 demonstrate the efficacy of our proposal. Keywords: evolutionary summarization | |||
| Experiments in CLIR using fuzzy string search based on surface similarity | | BIBAK | Full-Text | 682-683 | |
| Sethuramalingam Subramaniam; Anil Kumar Singh; Pradeep Dasigi; Vasudeva Varma | |||
| Cross Language Information Retrieval (CLIR) between languages of the same
origin is an interesting topic of research. The similarity of the writing
systems used for these languages can be used effectively to not only improve
CLIR, but to overcome the problems of textual variations, textual errors, and
even the lack of linguistic resources like stemmers to an extent. We have
conducted CLIR experiments between three languages which use writing systems
(scripts) of Brahmi-origin, namely Hindi, Bengali and Marathi. We found
significant improvements for all the six language pairs using a method for
fuzzy text search based on Surface Similarity. In this paper we report these
results and compare them with a baseline CLIR system and a CLIR system that
uses Scaled Edit Distance (SED) for fuzzy string matching. Keywords: fuzzy text search, spelling variations, surface similarity | |||
| Feature selection for automatic taxonomy induction | | BIBAK | Full-Text | 684-685 | |
| Hui Yang; Jamie Callan | |||
| Most existing automatic taxonomy induction systems exploit one or more
features to induce a taxonomy; nevertheless there is no systematic study
examining which are the best features for the task under various conditions.
This paper studies the impact of using different features on taxonomy induction
for different types of relations and for terms at different abstraction levels.
The evaluation shows that different conditions need different technologies or
different combination of the technologies. In particular, co-occurrence and
lexico-syntactic patterns are good features for is-a, sibling and part-of
relations; contextual, co-occurrence, patterns, and syntactic features work
well for concrete terms; co-occurrence works well for abstract terms. Keywords: ontology learning, semantic features, taxonomy | |||
| Finding advertising keywords on video scripts | | BIBAK | Full-Text | 686-687 | |
| Jung-Tae Lee; Hyungdong Lee; Hee-Seon Park; Young-In Song; Hae-Chang Rim | |||
| A key to success to contextual in-video advertising is finding advertising
keywords on video contents effectively, but there has been little literature in
the area so far. This paper presents some preliminary results of our
learning-based system that finds relevant advertising keywords on particular
scene of video contents using their scripts. The system is trained with not
only features proven useful in earlier studies but novel features that reflect
the situation of a targeted scene. Experimental results show that the new
features are potentially helpful for enhancing the accuracy of keyword
extraction for contextual in-video advertising. Keywords: contextual in-video advertising, keyword extraction | |||
| Fitting score distribution for blog opinion retrieval | | BIBAK | Full-Text | 688-689 | |
| Ben He; Jie Peng; Iadh Ounis | |||
| Current blog opinion retrieval approaches cannot be applied if the topic
relevance and opinion score distributions by rank are dissimilar. This problem
severely limits the feasibility of these approaches. We propose to tackle this
problem by fitting the distribution of opinion scores, which replaces the
original topic relevance score distribution with the simulated one. Our
proposed score distribution fitting method markedly enhances the feasibility of
a state-of-the-art dictionary-based opinion retrieval approach. Evaluation on a
standard TREC blog test collection shows significant improvements over high
quality topic relevance baselines. Keywords: blog search, distribution fitting, opinion finding | |||
| A graph-based approach to mining multilingual word associations from wikipedia | | BIBAK | Full-Text | 690-691 | |
| Zheng Ye; Xiangji Huang; Hongfei Lin | |||
| In this paper, we propose a graph-based approach to constructing a
multilingual association dictionary from Wikipedia, in which we exploit two
kinds of links in Wikipedia articles to associate multilingual words and
concepts together in a graph. The mined association dictionary is applied in
cross language information retrieval (CLIR) to verify its quality. We evaluate
our approach on four CLIR data sets and the experimental results show that it
is possible to mine a good multilingual association dictionary from Wikipedia
articles. Keywords: CLIR, association dictionary, wikipedia | |||
| Has adhoc retrieval improved since 1994? | | BIBA | Full-Text | 692-693 | |
| Timothy G. Armstrong; Alistair Moffat; William Webber; Justin Zobel | |||
| Evaluation forums such as TREC allow systematic measurement and comparison of information retrieval techniques. The goal is consistent improvement, based on reliable comparison of the effectiveness of different approaches and systems. In this paper we report experiments to determine whether this goal has been achieved. We ran five publicly available search systems, in a total of seventeen different configurations, against nine TREC adhoc-style collections, spanning 1994 to 2005. These runsets were then used as a benchmark for reassessing the relative effectiveness of the original TREC runs for those collections. Surprisingly, there appears to have been no overall improvement in effectiveness for either median or top-end TREC submissions, even after allowing for several possible confounds. We therefore question whether the effectiveness of adhoc information retrieval has improved over the past decade and a half. | |||
| High precision retrieval using relevance-flow graph | | BIBAK | Full-Text | 694-695 | |
| Jangwon Seo; Jiwoon Jeon | |||
| Traditional bag-of-words information retrieval models use aggregated term
statistics to measure the relevance of documents, making it difficult to detect
non-relevant documents that contain many query terms by chance or in the wrong
context. In-depth document analysis is needed to filter out these deceptive
documents. In this paper, we hypothesize that truly relevant documents have
relevant sentences in predictable patterns. Our experimental results show that
we can successfully identify and exploit these patterns to significantly
improve retrieval precision at top ranks. Keywords: re-ranking, relevance flow, relevant sentence | |||
| Identifying the original contribution of a document via language modeling | | BIBAK | Full-Text | 696-697 | |
| Benyah Shaparenko; Thorsten Joachims | |||
| One goal of text mining is to provide readers with automatic methods for
quickly finding the key ideas in individual documents and whole corpora. To
this effect, we propose a statistically well-founded method for identifying the
original ideas that a document contributes to a corpus, focusing on
self-referential diachronic corpora such as research publications, blogs,
email, and news articles. Our statistical model of passage impact defines
(interesting) original content through a combination of impact and novelty, and
it can be used to identify the most original passages in a document. Unlike
heuristic approaches, this statistical model is extensible and open to
analysis. We evaluate the approach on both synthetic and real data, showing
that the passage impact model outperforms a heuristic baseline method. Keywords: language modeling, original contributions, topic modeling | |||
| The importance of manual assessment in link discovery | | BIBAK | Full-Text | 698-699 | |
| Wei Che Huang; Andrew Trotman; Shlomo Geva | |||
| Using a ground truth extracted from the Wikipedia, and a ground truth
created through manual assessment, we show that the apparent performance
advantage seen in machine learning approaches to link discovery are an artifact
of trivial links that are actively rejected by manual assessors. Keywords: INEX, assessment, evaluation, link discovery, wikipedia | |||
| Improving search relevance for implicitly temporal queries | | BIBK | Full-Text | 700-701 | |
| Donald Metzler; Rosie Jones; Fuchun Peng; Ruiqiang Zhang | |||
Keywords: query log analysis, temporal queries, web search | |||
| Improving user confidence in cultural heritage aggregated results | | BIBAK | Full-Text | 702-703 | |
| Junte Zhang; Alia Amin; Henriette S. M. Cramer; Vanessa Evers; Lynda Hardman | |||
| State of the art web search systems enable aggregation of information from
many sources. Users are challenged to assess the reliability of information
from different sources. We report on an empirical user study on the effect of
displaying credibility ratings of multiple cultural heritage sources (e.g.
museum websites, art blogs) on users' search performance and selection. The
results of our online interactive study (N=122) show that when explicitly
presenting these ratings, people become significantly more confident in their
selection of information from aggregated results. Keywords: aggregated search, credibility, cultural heritage, quantitative user study | |||
| Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization | | BIBAK | Full-Text | 704-705 | |
| Massih R. Amini; Nicolas Usunier | |||
| This paper presents a transductive approach to learn ranking functions for
extractive multi-document summarization. At the first stage, the proposed
approach identifies topic themes within a document collection, which help to
identify two sets of relevant and irrelevant sentences to a question. It then
iteratively trains a ranking function over these two sets of sentences by
optimizing a ranking loss and fitting a prior model built on keywords. The
output of the function is used to find further relevant and irrelevant
sentences. This process is repeated until a desired stopping criterion is met. Keywords: learning to rank, multi-document summarization | |||
| Integrating clusters created offline with query-specific clusters for document retrieval | | BIBAK | Full-Text | 706-707 | |
| Lior Meister; Oren Kurland; Inna Gelfer Kalmanovich | |||
| Previous work on cluster-based document retrieval has used either static
document clusters that are created offline, or query-specific (dynamic)
document clusters that are created from top-retrieved documents. We present the
potential merit of integrating these two types of clusters. Keywords: cluster-based retrieval, re-ranking | |||
| Integrating phrase inseparability in phrase-based model | | BIBAK | Full-Text | 708-709 | |
| Lixin Shi; Jian-Yun Nie | |||
| In this paper, we propose a new phrase-based IR model, which integrates a
measure of "inseparability" of phrases. Our experiments show its high potential
to produce large improvements in retrieval effectiveness. Keywords: inseparability, language model, phrase model | |||
| Is spam an issue for opinionated blog post search? | | BIBAK | Full-Text | 710-711 | |
| Craig Macdonald; Iadh Ounis; Ian Soboroff | |||
| In opinion-finding, the retrieval system is tasked with retrieving not just
relevant documents, but those that also express an opinion towards the query
target entity. This task has been studied in the context of the blogosphere by
groups participating in the 2006-2008 TREC Blog tracks. Spam blogs (splogs) are
thought to be a problem on the blogosphere. In this paper, we investigate the
extent to which spam has affected the participating groups' retrieval systems
over the three years of the TREC Blog track opinion-finding task. Our results
show that spam can be an issue, with most systems retrieving some spam for
every topic. However, removing spam from the rankings does not markedly change
the relative performance of opinion-finding approaches. Keywords: blogs, opinion-finding, spam, splogs | |||
| Is this urgent?: exploring time-sensitive information needs in collaborative question answering | | BIBAK | Full-Text | 712-713 | |
| Yandong Liu; Nitya Narasimhan; Venu Vasudevan; Eugene Agichtein | |||
| As online Collaborative Question Answering (CQA) services such as
Yahoo!Answers and Baidu Knows are attracting users, questions, and answers at
an explosive rate, the truly urgent and important questions are increasingly
getting lost in the crowd. That is, questions that require immediate responses
are pushed out of the way by the trivial but more recently arriving questions.
Unlike other questions in collaborative question answering (CQA) for which
users might be willing to wait until good answers appear, urgent questions are
likely to be of interest to the asker only if answered in the next few minutes
or hours. For such questions, late responses are either not useful or are
simply not applicable. Unfortunately, current collaborative question-answering
systems do not distinguish urgent questions from the rest, and could thus be
ineffective for urgent information needs. We explore text- and data-mining
methods for automatically identifying urgent questions in the CQA setting. Our
results indicate that modeling the question context (i.e., the particular
forum/category where the question was posted) can increase classification
accuracy compared to the text of the question alone. Keywords: community question answering, social media | |||
| It pays to be picky: an evaluation of thread retrieval in online forums | | BIBAK | Full-Text | 714-715 | |
| Jonathan L. Elsas; Jaime G. Carbonell | |||
| Online forums host a rich information exchange, often with contributions
from many subject matter experts. In this work we evaluate algorithms for
thread retrieval in a large and active online forum community. We compare
methods that utilize thread structure to a naïve method that treats a
thread as a single document. We find that thread structure helps, and
additionally selective methods of thread scoring, which only use evidence from
a small number of messages in the thread, significantly and consistently
outperform inclusive methods which use all the messages in the thread. Keywords: message boards, online forums, thread search | |||
| Knowledge transformation for cross-domain sentiment classification | | BIBAK | Full-Text | 716-717 | |
| Tao Li; Vikas Sindhwani; Chris Ding; Yi Zhang | |||
| With the explosion of user-generated web2.0 content in the form of blogs,
wikis and discussion forums, the Internet has rapidly become a massive dynamic
repository of public opinion on an unbounded range of topics. A key enabler of
opinion extraction and summarization is sentiment classification: the task of
automatically identifying whether a given piece of text expresses positive or
negative opinion towards a topic of interest. Building high-quality sentiment
classifiers using standard text categorization methods is challenging due to
the lack of labeled data in a target domain. In this paper, we consider the
problem of cross-domain sentiment analysis: can one, for instance, download
rated movie reviews from rottentomatoes.com or IMBD discussion forums, learn
linguistic expressions and sentiment-laden terms that generally characterize
opinionated reviews and then successfully transfer this knowledge to the target
domain, thereby building high-quality sentiment models without manual effort?
We outline a novel sentiment transfer mechanism based on constrained
non-negative matrix tri-factorizations of term-document matrices in the source
and target domains. We report some preliminary results with this approach. Keywords: non-negative matrix factorization, sentiment analysis, transfer learning | |||
| K-tree: large scale document clustering | | BIBAK | Full-Text | 718-719 | |
| Christopher M. De Vries; Shlomo Geva | |||
| We introduce K-tree in an information retrieval context. It is an efficient
approximation of the k-means clustering algorithm. Unlike k-means it forms a
hierarchy of clusters. It has been extended to address issues with sparse
representations. We compare performance and quality to CLUTO using document
collections. The K-tree has a low time complexity that is suitable for large
document collections. This tree structure allows for efficient disk based
implementations where space requirements exceed that of main memory. Keywords: B-tree, K-means, K-tree, document clustering, search tree | |||
| A latent topic model for linked documents | | BIBAK | Full-Text | 720-721 | |
| Zhen Guo; Shenghuo Zhu; Yun Chi; Zhongfei Zhang; Yihong Gong | |||
| Documents in many corpora, such as digital libraries and webpages, contain
both content and link information. To explicitly consider the document
relations represented by links, in this paper we propose a citation-topic (CT)
model which assumes a probabilistic generative process for corpora. In the CT
model a given document is modeled as a mixture of a set of topic distributions,
each of which is borrowed (cited) from a document that is related to the given
document. Moreover, the CT model contains a random process for selecting the
related documents according to the structure of the generative model determined
by links and therefore, the transitivity of the relations among documents is
captured. We apply the CT model on the document clustering task and the
experimental comparisons against several state-of-the-art approaches
demonstrate very promising performances. Keywords: document clustering, topic model | |||
| Measuring constraint violations in information retrieval | | BIBAK | Full-Text | 722-723 | |
| Ronan Cummins; Colm O'Riordan | |||
| Recently, an inductive approach to modelling term-weighting function
correctness has provided a number of axioms (constraints), to which all good
term-weighting functions should adhere. These constraints have been shown to be
theoretically and empirically sound in a number of works. It has been shown
that when a term-weighting function breaks one or more of the constraints, it
typically indicates sub-optimality of that function. This elegant inductive
approach may more accurately model the human process of determining the
relevance a document. It is intuitive that a person's notion of relevance
changes as terms that are either on or off-topic are encountered in a given
document. Ultimately, it would be desirable to be able to mathematically
determine the performance of term-weighting functions without the need for test
collections.
Many modern term-weighting functions do not satisfy the constraints in an unconditional manner. However, the degree to which these functions violate the constraints has not been investigated. A comparison between weighting functions from this perspective may shed light on the poor performance of certain functions in certain settings. Moreover, if a correlation exists between performance and the number of violations, measuring the degree of violation could help more accurately predict how a certain scheme will perform on a given collection. Keywords: axioms, constraints, information retrieval | |||
| Measuring the descriptiveness of web comments | | BIBAK | Full-Text | 724-725 | |
| Martin Potthast | |||
| This paper investigates whether Web comments are of descriptive nature, that
is, whether the combined text of a set of comments is similar in topic to the
commented object. If so, comments may be used in place of the respective object
in all kinds of cross-media retrieval tasks. Our experiments reveal that
comments on textual objects are indeed descriptive: 10 comments suffice to
expect a high similarity between the comments and the commented text; 100-500
comments suffice to replace the commented text in a ranking task, and to
measure the contribution of the commenters beyond the commented text. Keywords: comment descriptiveness, cross-media retrieval | |||
| Mining product reviews based on shallow dependency parsing | | BIBAK | Full-Text | 726-727 | |
| Qi Zhang; Yuanbin Wu; Tao Li; Mitsunori Ogihara; Joseph Johnson; Xuanjing Huang | |||
| This paper presents a novel method for mining product reviews, where it
mines reviews by identifying product features, expressions of opinions and
relations between them. By taking advantage of the fact that most of product
features are phrases, a concept of shallow dependency parsing is introduced,
which extends traditional dependency parsing to phrase level. This concept is
then implemented for extracting relation between product features and
expressions of opinions. Experimental evaluations show that the mining task can
benefit from shallow dependency parsing. Keywords: dependency parsing, product mining | |||
| Modeling facial expressions and peripheral physiological signals to predict topical relevance | | BIBAK | Full-Text | 728-729 | |
| Ioannis Arapakis; Ioannis Konstas; Joemon M. Jose; Ioannis Kompatsiaris | |||
| By analyzing explicit & implicit feedback information retrieval systems can
determine topical relevance and tailor search criteria to the user's needs. In
this paper we investigate whether it is possible to infer what is relevant by
observing user affective behaviour. The sensory data employed range between
facial expressions and peripheral physiological signals. We extract a set of
features from the signals and analyze the data using classification methods,
such as SVM and KNN. The results of our initial evaluation indicate that
prediction of relevance is possible, to a certain extent, and implicit feedback
models can benefit from taking into account user affective behavior. Keywords: affective feedback, classification, facial expression analysis, multimedia
retrieval, pattern recognition, physiological signal processing, support vector
machines | |||
| Modeling search response time | | BIBAK | Full-Text | 730-731 | |
| Dan Zhang; Luo Si | |||
| Modeling the response time of search engines is an important task for many
applications such as resource selection in federated text search. Limited
research has been conducted to address this task. Prior research calculated the
search response time of all queries in the same way either with the average
response time of several sample queries or with a single probability
distribution, which is irrelevant to the characteristics of queries. However,
the search response time may vary a lot for different types of queries. This
paper proposes a novel query-specific and source-specific approach to model
search response time. Some training data is acquired by measuring the search
response time of some sample queries from a search engine. Then, a
query-specific model is estimated with the training data and their
corresponding response times by utilizing Ridge Regression. The obtained model
can be used to predict search response times for new queries. A set of
empirical studies are conducted to show the effectiveness of the proposed
method. Keywords: response time, ridge regression, source selection | |||
| Multiclass VisualRank: image ranking method in clustered subsets based on visual features | | BIBAK | Full-Text | 732-733 | |
| Mitsuru Ambai; Yuichi Yoshida | |||
| This paper proposes Multiclass VisualRank, a method that expands the idea of
VisualRank into more than one category of images. Multiclass VisualRank divides
images retrieved from search engines into several categories based on
distinctive patterns of visual features, and gives ranking within the category.
Experimental results show that our method can extract several different image
categories relevant to given keyword and gives good ranking scores to retrieved
images. Keywords: clustering, ranking, visual feature | |||
| Multiple approaches to analysing query diversity | | BIBAK | Full-Text | 734-735 | |
| Paul Clough; Mark Sanderson; Murad Abouammoh; Sergio Navarro; Monica Paramita | |||
| In this paper we examine user queries with respect to diversity: providing a
mix of results across different interpretations. Using two query log analysis
techniques (click entropy and reformulated queries), 14.9 million queries from
the Microsoft Live Search log were analysed. We found that a broad range of
query types may benefit from diversification. Additionally, although there is a
correlation between word ambiguity and the need for diversity, the range of
results users may wish to see for an ambiguous query stretches well beyond
traditional notions of word sense. Keywords: ambiguity, diversity | |||
| Multiview clustering: a late fusion approach using latent models | | BIBAK | Full-Text | 736-737 | |
| Eric Bruno; Stephane Marchand-Maillet | |||
| Multi-view clustering is an important problem in information retrieval due
to the abundance of data offering many perspectives and generating multi-view
representations. We investigate in this short note a late fusion approach for
multi-view clustering based on the latent modeling of cluster-cluster
relationships. We derive a probabilistic multi-view clustering model
outperforming an early-fusion approach based on multi-view feature correlation
analysis. Keywords: data fusion, latent models, multi-view clustering | |||
| On efficient posting list intersection with multicore processors | | BIBK | Full-Text | 738-739 | |
| Shirish Tatikonda; Flavio Junqueira; B. Barla Cambazoglu; Vassilis Plachouras | |||
Keywords: list intersection, multicores, parallel query processing | |||
| On perfect document rankings for expert search | | BIBAK | Full-Text | 740-741 | |
| Craig Macdonald; Iadh Ounis | |||
| Expert search systems often employ a document search component to identify
on-topic documents, which are then used to identify people likely to have
relevant expertise. This work investigates the impact of the retrieval
effectiveness of the underlying document search component. It has been
previously shown that applying techniques to the underlying document search
component that normally improve the effectiveness of a document search engine
also have a positive impact on the retrieval effectiveness of the expert search
engine. In this work, we experiment with fictitious perfect document rankings,
to attempt to identify an upper-bound in expert search system performance. Our
surprising results infer that non-relevant documents can bring useful expertise
evidence, and that removing these does not lead to an upper-bound in retrieval
performance. Keywords: document search, expert search | |||
| On single-pass indexing with MapReduce | | BIBAK | Full-Text | 742-743 | |
| Richard M. C. McCreadie; Craig Macdonald; Iadh Ounis | |||
| Indexing is an important Information Retrieval (IR) operation, which must be
parallelised to support large-scale document corpora. We propose a novel
adaptation of the state-of-the-art single-pass indexing algorithm in terms of
the MapReduce programming model. We then experiment with this adaptation, in
the context of the Hadoop MapReduce implementation. In particular, we explore
the scale of improvements that can be achieved when using firstly more
processing hardware and secondly larger corpora. Our results show that indexing
speed increases in a close to linear fashion when scaling corpus size or number
of processing machines. This suggests that the proposed indexing implementation
is viable to support upcoming large-scale corpora. Keywords: MapReduce, indexing | |||
| On the relative age of spam and ham training samples for email filtering | | BIBAK | Full-Text | 744-745 | |
| Gordon V. Cormack; Jose-Marcio Martins da Cruz | |||
| Email spam filters are commonly trained on a sample of spam and ham
(non-spam) messages. We investigate the effect on filter performance of using
samples of spam and ham messages sent months before those to be filtered. Our
results show that filter performance deteriorates with the overall age of spam
and ham samples, but at different rates. Spam and ham samples of different ages
may be mixed to advantage, provided temporal cues are elided. Keywords: email, filtering, spam, training | |||
| Page Hunt: improving search engines using human computation games | | BIBAK | Full-Text | 746-747 | |
| Hao Ma; Raman Chandrasekar; Chris Quirk; Abhishek Gupta | |||
| There has been a lot of work on evaluating and improving the relevance of
web search engines. In this paper, we suggest using human computation games to
elicit data from players that can be used to improve search. We describe Page
Hunt, a single-player game. The data elicited using Page Hunt has several
applications including providing metadata for pages, providing query
alterations for use in query refinement, and identifying ranking issues. We
describe an experiment with over 340 game players, and highlight some
interesting aspects of the data obtained. Keywords: human computation games, query alterations, relevance, web search | |||
| Personalized music emotion recognition | | BIBAK | Full-Text | 748-749 | |
| Yi-Hsuan Yang; Yu-Ching Lin; Homer Chen | |||
| In recent years, there has been a dramatic proliferation of research on
information retrieval based on highly subjective concepts such as emotion,
preference and aesthetic. Such retrieval methods are fascinating but
challenging since it is difficult to built a general retrieval model that
performs equally well to everyone. In this paper, we propose two novel methods,
bag-of-users model and residual modeling, to accommodate the individual
differences for emotion-based music retrieval. The proposed methods are
intuitive and generally applicable to other information retrieval tasks that
involve subjective perception. Evaluation result shows the effectiveness of the
proposed methods. Keywords: emotion, perceptual residual, personalization | |||
| Predicting stopping behaviour: a preliminary analysis | | BIBAK | Full-Text | 750-751 | |
| Elaine G. Toms; Luanne Freund | |||
| The analysis of search transaction logs often characterizes a search session
but rarely looks at the end point. When do users stop, and what cues are
present suggesting that stopping is eminent? In this preliminary analysis of
the logs of 288 search sessions conducted in a laboratory setting, we
identified the activity performed by participants as well as search transitions
that were invoked over the course of a search session. The 4331 search
transitions (15 per task on average) contained a total of 9295 actions. We
isolated the final transition in each search session for detailed analysis. As
hypothesized some behaviours are predictable, and suggestive of stopping
behavior, with the potential for modeling. Keywords: human experiment, interactive ir, stopping behaviour | |||
| Protein identification as an information retrieval problem | | BIBAK | Full-Text | 752-753 | |
| Yiming Yang; Subramaniam Ganapathy; Abhay Harpale | |||
| We present the first interdisciplinary work on transforming a popular
problem in proteomics, i.e. protein identification from tandem mass spectra, to
an Information Retrieval (IR) problem. We present an empirical comparison of
popular IR approaches, such as those available from Indri and Lemur toolkits on
benchmark datasets, to representative popular baselines in the proteomics
literature. Our experiments demonstrate statistically significant evidence that
popular IR approaches outperform representative baseline approaches in
proteomics. Keywords: interdisciplinary investigation, protein identification, tandem mass
spectra, vector space models | |||
| Query sampling for ranking learning in web search | | BIBAK | Full-Text | 754-755 | |
| Linjun Yang; Li Wang; Bo Geng; Xian-Sheng Hua | |||
| Learning to rank has become a popular approach to build a ranking model for
Web search recently. Based on our observation, the constitution of the training
set will greatly influence the performance of the learned ranking model.
Meanwhile, the number of queries in Web search is nearly infinite and the human
labeling cost is expensive, hence a subset of queries need to be carefully
selected for training. In this paper, we develop a greedy algorithm to sample
the queries, by simultaneously taking the query density, difficulty and
diversity into consideration. The experimental results on a collected Web
search dataset comprising 2024 queries show that the proposed method can lead
to a more informative training set for building an effective model. Keywords: learning to rank, training set construction | |||
| A ranking approach to keyphrase extraction | | BIBAK | Full-Text | 756-757 | |
| Xin Jiang; Yunhua Hu; Hang Li | |||
| This paper addresses the issue of automatically extracting keyphrases from a
document. Previously, this problem was formalized as classification and
learning methods for classification were utilized. This paper points out that
it is more essential to cast the problem as ranking and employ a learning to
rank method to perform the task. Specifically, it employs Ranking SVM, a
state-of-art method of learning to rank, in keyphrase extraction. Experimental
results on three datasets show that Ranking SVM significantly outperforms the
baseline methods of SVM and Naive Bayes, indicating that it is better to
exploit learning to rank techniques in keyphrase extraction. Keywords: keyphrase extraction, learning to rank, ranking SVM | |||
| Reciprocal rank fusion outperforms condorcet and individual rank learning methods | | BIBAK | Full-Text | 758-759 | |
| Gordon V. Cormack; Charles L. A. Clarke; Stefan Buettcher | |||
| Reciprocal Rank Fusion (RRF), a simple method for combining the document
rankings from multiple IR systems, consistently yields better results than any
individual system, and better results than the standard method Condorcet Fuse.
This result is demonstrated by using RRF to combine the results of several TREC
experiments, and to build a meta-learner that ranks the LETOR 3 dataset better
than any previously reported method. Keywords: aggregation, fusion, ranking | |||
| Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis | | BIBAK | Full-Text | 760-761 | |
| Omar Alonso; Stefano Mizzaro | |||
| We discuss the concept of relevance criteria in the context of e-Commerce
search. A vast body of research literature describes the beyond-topical
criteria used to determine the relevance of the document to the need. We argue
that in an e-Commerce scenario there are some differences, and novel and
different criteria can be used to determine relevance. We experimentally
validate this hypothesis by means of Amazon Mechanical Turk using a
crowdsourcing approach. Keywords: IR evaluation, relevance, relevance criteria, user study | |||
| A relevance model based filter for improving ad quality | | BIBAK | Full-Text | 762-763 | |
| Hema Raghavan; Dustin Hillard | |||
| Recently there has been a surge in research that predicts retrieval
relevance using historical click-through data. While a larger number of clicks
between a query and a document provides a stronger "confidence" of relevance,
most models in the literature that learn from clicks are error-prone as they do
not take into account any confidence estimates. Sponsored Search models are
especially prone to this error as they are typically trained on search engine
logs in order to predict click-through-rate (CTR). The estimated CTR ultimately
determines the rank at which an ad is shown and also impacts the price
(cost-per-click) for the advertiser. In this paper, we improve a model that
applies collaborative filtering on click data by training a filter that has
been trained to predict pure relevance. Applying the filter to ads that have
seen few clicks on live traffic results in improved CTR and click-yield (CY).
Additionally, in offline experiments we find that using features based on the
organic results improves the relevance based filter's performance. Keywords: advertising, relevance modeling | |||
| A relevance-based topic model for news event tracking | | BIBAK | Full-Text | 764-765 | |
| Viet Ha-Thuc; Yelena Mejova; Christopher Harris; Padmini Srinivasan | |||
| Event tracking is the task of discovering temporal patterns of popular
events from text streams. Existing approaches for event tracking have two
limitations: scalability and inability to rule out non-relevant portions in
text streams. In this study, we propose a novel approach to tackle these
limitations. To demonstrate the approach, we track news events across a
collection of weblogs spanning a two-month time period. Keywords: LDA, event tracking, relevance models, topic models | |||
| Revisiting logical imaging for information retrieval | | BIBAK | Full-Text | 766-767 | |
| Guido Zuccon; Leif Azzopardi; Cornelis J. van Rijsbergen | |||
| Retrieval with Logical Imaging is derived from belief revision and provides
a novel mechanism for estimating the relevance of a document through logical
implication (i.e. P(q->d). In this poster, we perform the first
comprehensive evaluation of Logical Imaging (LI) in Information Retrieval (IR)
across several TREC test Collections. When compared against standard baseline
models, we show that LI fails to improve performance. This failure can be
attributed to a nuance within the model that means non-relevant documents are
promoted in the ranking, while relevant documents are demoted. This is an
important contribution because it not only contextualizes the effectiveness of
LI, but crucially explains why it fails. By addressing this nuance, future LI
models could be significantly improved. Keywords: logical imaging, probability kinematics | |||
| A robust retrieval system of polyphonic music based on chord progression similarity | | BIBAK | Full-Text | 768-769 | |
| Pierre Hanna; Thomas Rocher; Matthias Robine | |||
| Retrieval systems for polyphonic music rely on the automatic estimation of
similarity between two musical pieces. In the case of symbolic music, existing
systems either consider a monophonic reduction based on melody or propose
algorithms with high complexity. In this paper, we propose a new approach.
Musical pieces are represented as a sequence of chords which are estimated from
groups of notes sounding at the same time. A root and a mode are associated to
each chord. Local alignment is then applied for estimating a similarity score
between these sequences. Experiments performed on MIDI files collected on the
Internet show that the system proposed allows the retrieval of different
versions of the same song. Keywords: music information retrieval | |||
| Score and rank convergence of HITS | | BIBAK | Full-Text | 770-771 | |
| Enoch Peserico; Luca Pretto | |||
| How many iterations does the (ever more) popular HITS algorithm require to
converge in score and, perhaps more importantly, in rank (i.e. to get the nodes
of a graph "in the right order")? After pinning down the elusive notion of
convergence in rank we provide the first non-trivial bounds on the convergence
of HITS. A "worst case" example, requiring a number of iterations
superexponential in the size of the target graph to achieve even "mild"
convergence, suggests the need for greater caution in the experimental
evaluation of the algorithm -- as recent results of poor performance (e.g. vs.
SALSA) might be due to insufficient iterations, rather than to an intrinsic
deficiency of HITS. An almost matching upper bound shows that, as long as one
employs exponential acceleration e.g. through a "squaring trick", a polynomial
running time (practical in many application domains) always provides strong
convergence guarantees. Keywords: HITS, convergence, link analysis, rank | |||
| A search engine in a few lines.: yes, we can! | | BIBAK | Full-Text | 772-773 | |
| Stefan Savev | |||
| Many research implementations of search engines are written in C, C++, or
Java. They are difficult to understand and modify because they are at least a
few thousand lines of code and contain many low-level details. In this paper,
we show how to achieve a much shorter and higher level implementation: one in
about a few hundred lines. We accomplish this result through the use of a
high-level functional programming language, F#, and some of its features such
as sequences, pipes and structured input and output. By using a search engine
implementation as a case study, we argue that functional programming fits the
domain of Information Retrieval problems much better than imperative/OO
languages like C++ and Java.
Functional programming languages are ideal for rapid algorithm prototyping and data exploration in the field of Information Retrieval (IR). Additionally, our implementation can be used as case study in an IR course since it is a very high level, but nevertheless executable specification of a search engine. Keywords: search engine implementation techniques | |||
| Search engine predilection towards news media providers | | BIBAK | Full-Text | 774-775 | |
| Leif Azzopardi; Ciaran Owens | |||
| In this poster paper, we present a preliminary study on the predilection of
web search engines towards various online news media provider sites using an
access based measure. Keywords: accessibility, retrievability, search engine bias | |||
| Searching documentation using text, OCR, and image | | BIBAK | Full-Text | 776-777 | |
| Tom Yeh; Boris Katz | |||
| We describe a mixed-modality method to index and search software
documentation in three ways: plain text, OCR text of embedded figures, and
visual features of these figures. Using a corpus of 102 computer books with a
total of 62,943 pages and 75,800 figures, we empirically demonstrate that our
method achieves better precision/recall than do alternatives based on single
modalities. Keywords: computer vision, content-based image retrieval, multimodal search | |||
| Selecting hierarchical clustering cut points for web person-name disambiguation | | BIBAK | Full-Text | 778-779 | |
| Jun Gong; Douglas W. Oard | |||
| Hierarchical clustering is often used to cluster person-names referring to
the same entities. Since the correct number of clusters for a given person-name
is not known a priori, some way of deciding where to cut the resulting
dendrogram to balance risks of over- or under-clustering is needed. This paper
reports on experiments in which outcome-specific and result-set measures are
used to learn a global similarity threshold. Results on the Web People Search
(WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be
achieved on held-out data. Keywords: clustering, person-name disambiguation | |||
| Serendipitous search via wikipedia: a query log analysis | | BIBAK | Full-Text | 780-781 | |
| Tetsuya Sakai; Kenichi Nogami | |||
| We analyse the query log of a click-oriented Japanese search engine that
utilises the link structures of Wikipedia for encouraging the user to change
his information need and to perform repeated, serendipitous, exploratory
search. Our results show that users tend to make transitions within the same
query type: from person names to person names, from place names to place names,
and so on. Keywords: clickthrough, search engines | |||
| Spoken information retrieval for Turkish broadcast news | | BIBAK | Full-Text | 782-783 | |
| Siddika Parlak; Murat Saraclar | |||
| Speech Retrieval systems utilize automatic speech recognition (ASR) to
generate textual data for indexing. However, automatic transcriptions include
errors, either because of out-of-vocabulary (OOV) words or due to ASR
inaccuracy. In this work, we address spoken information retrieval in Turkish, a
morphologically rich language where OOV rates are high. We apply several
techniques, such as using subword units and indexing alternative hypotheses, to
cope with the OOV problem and ASR inaccuracy.
Experiments are performed on our Turkish Broadcast News (BN) Corpus which also incorporates a spoken IR collection. Results indicate that word segmentation is quite useful but the efficiency of indexing alternative hypotheses depends on retrieval type. Keywords: speech retrieval, spoken document retrieval, spoken term detection, subword
indexing | |||
| A study of inter-annotator agreement for opinion retrieval | | BIBAK | Full-Text | 784-785 | |
| Adam Bermingham; Alan F. Smeaton | |||
| Evaluation of sentiment analysis, like large-scale IR evaluation, relies on
the accuracy of human assessors to create judgments. Subjectivity in judgments
is a problem for relevance assessment and even more so in the case of sentiment
annotations. In this study we examine the degree to which assessors agree upon
sentence-level sentiment annotation. We show that inter-assessor agreement is
not contingent on document length or frequency of sentiment but correlates
positively with automated opinion retrieval performance. We also examine the
individual annotation categories to determine which categories pose most
difficulty for annotators. Keywords: annotation, inter-annotator agreement, opinion retrieval, sentiment analysis | |||
| SugarCube: quantification of topic propagation in the blogosphere using percolation theory | | BIBAK | Full-Text | 786-787 | |
| Ali Azimi Bolourian; Yashar Moshfeghi; C. J. van Rijsbergen | |||
| Blogs facilitate online debates and discussions for millions of people
around the world. Identifying the most popular and prevailing topics discussed
in the Blogosphere is a crucial task. This poster describes our novel approach
to the quantification of the level of topic propagation in the Blogosphere. Our
model uses graph-theoretic representations of the Blogosphere's link structures
that allows it to deduce the 'Percolation Threshold', which is then used in the
quantification and definition of a global topic. The result of our experiments
on a blog collection shows that our model is able to quantify the propagation
of topics. Moreover, our model is successful in identifying specific topics
that propagate throughout the Blogosphere and classifies them as 'Global'. Keywords: blogosphere, citation analysis, percolation theory, social networks, topic
propagation | |||
| System scoring using partial prior information | | BIBAK | Full-Text | 788-789 | |
| Sri Devi Ravana; Laurence A. Park; Alistair Moffat | |||
| We introduce smoothing of retrieval effectiveness scores, which balances
results from prior incomplete query sets against limited additional complete
information, in order to obtain more refined system orderings than would be
possible on the new queries alone. Keywords: partial information, score aggregation, smoothing, system ordering | |||
| Tag-based object similarity computation using term space dimension reduction | | BIBAK | Full-Text | 790-791 | |
| Yong Ki Lee; Sung Jun Lee; Jonghun Park | |||
| In this paper, we propose a novel approach for measuring similarity between
web objects. Our similarity measure is defined based on the representation of a
web object as a collection of tags. Precisely, we first construct a vector
space in which multiple terms are mapped into a single dimension by using
information available from Open Directory Project and Delicious.com. Then we
position web objects in the vector space and apply the traditional cosine
measure for similarity computation. We demonstrate that the proposed similarity
computation method is able to overcome the limitation of traditional vector
space approach while at the same time require less computational cost compares
to LSI (Latent Semantic Indexing). Keywords: open directory project, similarity, vector space model, web search | |||
| Tagging products using image classification | | BIBAK | Full-Text | 792-793 | |
| Brian Tomasik; Phyo Thiha; Douglas Turnbull | |||
| Associating labels with online products can be a labor-intensive task. We
study the extent to which a standard "bag of visual words" image classifier can
be used to tag products with useful information, such as whether a sneaker has
laces or velcro straps. Using Scale Invariant Feature Transform (SIFT) image
descriptors at random keypoints, a hierarchical visual vocabulary, and a
variant of nearest-neighbor classification, we achieve accuracies between 66%
and 98% on 2- and 3-class classification tasks using several dozen training
examples. We also increase accuracy by combining information from multiple
views of the same product. Keywords: bag of visual words, content-based tagging, image classification | |||
| Template-independent wrapper for web forums | | BIBAK | Full-Text | 794-795 | |
| Qi Zhang; Yang Shi; Xuanjing Huang; Lide Wu | |||
| This paper presents a novel work on the task of extracting data from Web
forums. Millions of users contribute rich information to Web forum everyday,
which has become an important resource for manyWeb applications, such as
product opinion retrieval, social network analysis, and so on. The novelty of
the proposed algorithm is that it can not only extract the pure text but also
distinguish between the original post and replies. Experimental results on a
large number of real Web forums indicate that the proposed algorithm can
correctly extract data from websites with various styles in most cases. Keywords: conditional random fields, crawler | |||
| Temporal collaborative filtering with adaptive neighbourhoods | | BIBAK | Full-Text | 796-797 | |
| Neal Lathia; Stephen Hailes; Licia Capra | |||
| Collaborative Filtering aims to predict user tastes, by minimising the mean
error produced when predicting hidden user ratings. The aim of a deployed
recommender system is to iteratively predict users' preferences over a dynamic,
growing dataset, and system administrators are confronted with the problem of
having to continuously tune the parameters calibrating their CF algorithm. In
this work, we formalise CF as a time-dependent, iterative prediction problem.
We then perform a temporal analysis of the Netflix dataset, and evaluate the
temporal performance of two CF algorithms. We show that, due to the dynamic
nature of the data, certain prediction methods that improve prediction accuracy
on the Netflix probe set do not show similar improvements over a set of
iterative train-test experiments with growing data. We then address the problem
of parameter selection and update, and propose a method to automatically assign
and update per-user neighbourhood sizes that (on the temporal scale)
outperforms setting global parameters. Keywords: temporal collaborative filtering | |||
| Temporal query substitution for ad search | | BIBAK | Full-Text | 798-799 | |
| Wen Zhang; Jun Yan; Shuicheng Yan; Ning Liu; Zheng Chen | |||
| Recently, information retrieval researchers have witnessed the increasing
interest in query substitution for ad search. Most previous works substitute
search queries via content based query similarities, and few of them take the
temporal characteristics of queries into consideration. In this extended
abstract, we propose a novel temporal similarity measurement for query
substitution in ad search task. We firstly extract temporal features, such as
burst and periodicity, from query frequency curves and then define the temporal
query similarity by integrating these new features with the temporal query
frequency distribution. Compared to the traditional temporal similarity
measurements such as correlation coefficient, our proposed approach is more
effective owing to the explicit extraction of high-level semantic query
temporal features for similarity measure. The experimental results demonstrate
that the proposed similarity measure can make the ads more relevant to user
search queries compared to ad search without temporal features. Keywords: ad search, query substitution, temporal similarity. | |||
| Term-based commercial intent analysis | | BIBAK | Full-Text | 800-801 | |
| Azin Ashkan; Charles L. A. Clarke | |||
| In this work, we investigate the contribution of query terms and their
corresponding ad click rates on commercial intent of queries. A probabilistic
model is proposed following the hypothesis that a query is likely to receive ad
clicks based on contributions from its individual terms. Keywords: ad targeting, clickthrough, query intent, query log | |||
| Topic (query) selection for IR evaluation | | BIBAK | Full-Text | 802-803 | |
| Jianhan Zhu; Jun Wang; Vishwa Vinay; Ingemar J. Cox | |||
| The need for evaluating large amounts of topics (queries) makes IR
evaluation an uneasy task. In this paper, we study a topic selection problem
for IR evaluation. The selection criterion is based on the overall difficulty
of the chosen set, as well as the uncertainty of the final IR metric applied to
the systems. Our preliminary experiments demonstrate that our approach helps to
identify a set of topics that provides confident estimates of systems'
performance while keeping the requirement of the query difficulty. Keywords: mean-variance analysis, portfolio theory, topic selection | |||
| Topic prerogative feature selection using multiple query examples for automatic video retrieval | | BIBAK | Full-Text | 804-805 | |
| P. Punitha; Joemon M. Jose; Anuj Goyal | |||
| Well acceptance of relevance feedback and collaborative systems has given
the users to express their preferences in terms of multiple query examples. The
technology devised to utilize these user preferences, is expected to mine the
semantic knowledge embedded within these query examples. In this paper, we
propose a video mining framework based on dynamic learning from queries, using
a statistical model for topic prerogative feature selection. The proposed
method is specifically designed for multiple query example scenarios. The
effectiveness of the proposed framework has been established with an extensive
experimentation on TRECVid2007 data collection. The results reveal that our
approach achieves a performance that is in par with the best results for this
corpus without the requirement of any textual data. Keywords: feature selection, multimedia retrieval, multiple query examples,
performance evaluation, result fusion | |||
| Topic set size redux | | BIBAK | Full-Text | 806-807 | |
| Ellen M. Voorhees | |||
| The cost as well as the power and reliability of a retrieval test collection
are all proportional to the number of topics included in it. Test collections
created through community evaluations such as TREC generally use 50 topics.
Prior work estimated the reliability of 50-topic sets by extrapolating
confidence levels from those of smaller sets, and concluded that 50 topics are
sufficient to have high confidence in a comparison, especially when the
comparison is statistically significant. Using topic sets that actually contain
50 topics, this paper shows that statistically significant differences can be
wrong, even when statistical significance is accompanied by moderately large
(>10%) relative differences in scores. Further, using standardized
evaluation scores rather than raw evaluation scores does not increase the
reliability of these paired comparisons. Researchers should continue to be
skeptical of conclusions demonstrated on only a single test collection. Keywords: evaluation, test collections, variability | |||
| Transforming patents into prior-art queries | | BIBAK | Full-Text | 808-809 | |
| Xiaoibng Xue; W. Bruce Croft | |||
| Searching for prior-art patents is an essential step for the patent examiner
to validate or invalidate a patent application. In this paper, we consider the
whole patent as the query, which reduces the burden on the user, and also makes
many more potential search features available. We explore how to automatically
transform the query patent into an effective search query, especially focusing
on the effect of different patent fields. Experiments show that the background
summary of a patent is the most useful source of terms for generating a query,
even though most previous work used the patent claims. Keywords: information retrieval, patent retrieval, prior-art search | |||
| Two-stage query segmentation for information retrieval | | BIBAK | Full-Text | 810-811 | |
| Michael Bendersky; W. Bruce Croft; David A. Smith | |||
| Modeling term dependence has been shown to have a significant positive
impact on retrieval. Current models, however, use sequential term dependencies,
leading to an increased query latency, especially for long queries. In this
paper, we examine two query segmentation models that reduce the number of
dependencies. We find that two-stage segmentation based on both query syntactic
structure and external information sources such as query logs, attains
retrieval performance comparable to the sequential dependence model, while
achieving a 50% reduction in query latency. Keywords: long queries, query segmentation, term dependence | |||
| Undergraduates' evaluations of assigned search topics | | BIBAK | Full-Text | 812-813 | |
| Earl W. Bailey; Diane Kelly; Karl Gyllstrom | |||
| This paper evaluates undergraduate students' knowledge, interests and
experiences with 20 topics from the TREC Robust Track collection. The goal is
to characterize these topics along several dimensions to help researchers make
more informed decisions about which topics are most appropriate to use in
experimental IIR evaluations with undergraduate student subjects. Keywords: assigned tasks, motivation, task difficulty, topic selection | |||
| A unified inverted index for an efficient image and text retrieval | | BIBAK | Full-Text | 814-815 | |
| Jonathan Mamou; Yosi Mass; Michal Shmueli-Scheuer; Benjamin Sznajder | |||
| We present an efficient method for approximate search in a combination of
several metric spaces -- which are a generalization of low level image features
-- using an inverted index. Our approximation gives very high recall with
subsecond response time on a real data set of one million images extracted from
Flickr.
We further exploit the inverted index to improve efficiency of the query processing by combining our search in metric features with search in associated textual metadata. Keywords: IR, approximation, inverted index, metric spaces, multimedia | |||
| Usefulness of click-through data in expert search | | BIBAK | Full-Text | 816-817 | |
| Craig Macdonald; Ryen W. White | |||
| The task in expert finding is to identify members of an organisation with
relevant expertise on a given topic. Typically, an expert search engine uses
evidence from the authors of on-topic documents found in the organisation's
intranet by search engines. The search result click-through behaviour of many
intranet search engine users provides an additional source of evidence to
identify topically-relevant documents, and via document authorship, experts. In
this poster, we assess the usefulness of click-through log data for expert
finding. We find that ranking authors based solely on the clicks their
documents receive is reasonably effective at correctly identifying relevant
experts. Moreover, we show that this evidence can successfully be integrated
with an existing expert search engine to increase its retrieval effectiveness. Keywords: click-through log data, expert finding, intranet | |||
| User-centric multi-criteria information retrieval | | BIBAK | Full-Text | 818-819 | |
| Shawn R. Wolfe; Yi Zhang | |||
| Information retrieval models usually represent content only, and not other
considerations, such as authority, cost, and recency. How could multiple
criteria be utilized in information retrieval, and how would it effect the
results? In our experiments, using multiple user-centric criteria always
produced better results than a single criteria. Keywords: multi-criteria decision making | |||
| Users' stopping behaviors and estimates of recall | | BIBAK | Full-Text | 820-821 | |
| Maureen Dostert; Diane Kelly | |||
| This paper investigates subjects' stopping behaviors and estimates of recall
in an interactive information retrieval (IIR) experiment. Subjects completed
four recall-oriented search tasks and were asked to estimate how many of the
relevant documents they believed they had found after each task. Subjects also
responded to an interview question probing their reasons for stopping a search.
Results show that most subjects believed they found about 51-60% of the
relevant documents and that this estimate was correlated positively with number
of documents saved and actual recall, even though subjects' recall estimates
were inaccurate. Reasons given for stopping search are also explored. Keywords: decision making, recall estimates, stopping behavior, user study | |||
| Using dynamic markov compression to detect vandalism in the wikipedia | | BIBAK | Full-Text | 822-823 | |
| Kelly Y. Itakura; Charles L. A. Clarke | |||
| We apply the Dynamic Markov Compression model to detect spam edits in the
Wikipedia. The method appears to outperform previous efforts based on
compression models, providing performance comparable to methods based on
manually constructed rules. Keywords: information retrieval, spam filtering | |||
| Using wikipedia categories for ad hoc search | | BIBAK | Full-Text | 824-825 | |
| Rianne Kaptein; Marijn Koolen; Jaap Kamps | |||
| In this paper we explore the use of category information for ad hoc
retrieval in Wikipedia. We show that techniques for entity ranking exploiting
this category information can also be applied to ad hoc topics and lead to
significant improvements. Automatically assigned target categories are good
surrogates for manually assigned categories, which perform only slightly
better. Keywords: ad hoc retrieval, category information, wikipedia | |||
| Visualizing the problems with the INEX topics | | BIBAK | Full-Text | 826 | |
| Andrew Trotman; Maria del Rocio Gomez Crisostomo; Mounia Lalmas | |||
| Topics form a crucial component of a test collection. We show, through
visualization, that the INEX 2008 topics have shortcomings, which questions
their validity for evaluating XML retrieval effectiveness. Keywords: INEX, IR methodology, XML-IR, element retrieval | |||
| What queries are likely to recur in web search? | | BIBAK | Full-Text | 827-828 | |
| Dell Zhang; Jinsong Lu | |||
| We study the recurrence dynamics of queries in Web search by analysing a
large real-world query log dataset. We find that query frequency is more useful
in predicting collective query recurrence whereas query recency is more useful
in predicting individual query recurrence. Our findings provide valuable
insights for understanding and improving Web search. Keywords: personalisation, query log analysis, web mining, web search | |||
| When is query performance prediction effective? | | BIBAK | Full-Text | 829-830 | |
| Claudia Hauff; Leif Azzopardi | |||
| The utility of Query Performance Prediction (QPP) methods is commonly
evaluated by reporting correlation coefficients to denote how well the methods
perform at predicting the retrieval performance of a set of queries. However, a
quintessential question remains unexplored: how strong does the correlation
need to be in order to realize an increase in retrieval performance? In this
work, we address this question in the context of Selective Query Expansion
(SQE) and perform a large-scale experiment. The results show that to
consistently and predictably improve retrieval effectiveness in the ideal SQE
setting, a Kendall's Tau correlation of tau>=0.5 is required, a threshold
which most existing query performance prediction methods fail to reach. Keywords: evaluation, query performance prediction | |||
| Who said what to whom?: capturing the structure of debates | | BIBAK | Full-Text | 831-832 | |
| Rianne Kaptein; Maarten Marx; Jaap Kamps | |||
| Transcripts of meetings are a document genre characterized by a complex
narrative structure. The essence is not only what is said, but also by who and
to whom. This paper investigates whether we can use semantic annotations like
the speaker in order to capture this debate structure, as well as the related
content of the debate. The structure is visualized in a graph, while the
content is condensed into word clouds, that are created using a parsimonious
language model. Evaluation shows that both tools adequately capture the
structure and content of the debate at an aggregated level. Keywords: political data, visualization, word clouds | |||
| EvaluatIR: an online tool for evaluating and comparing IR systems | | BIB | Full-Text | 833 | |
| Timothy G. Armstrong; Alistair Moffat; William Webber; Justin Zobel | |||
| Expertise search in academia using facets | | BIBK | Full-Text | 834 | |
| Duncan McDougall; Craig Macdonald | |||
Keywords: expert search, faceted search, terrier | |||
| Exploiting social context for expertise propagation | | BIBK | Full-Text | 835 | |
| Greg P. Milette; Michael K. Schneider; Kathy Ryall; Robert Hyland | |||
Keywords: expertise finding, recommendation, social network analysis, topic models | |||
| Social networks and discovery in the enterprise (SaND) | | BIBK | Full-Text | 836 | |
| Inbal Ronen; Elad Shahar; Sigalit Ur; Erel Uziel; Sivan Yogev; Naama Zwerdling; David Carmel; Ido Guy; Nadav Har'el; Shila Ofek-Koifman | |||
Keywords: enterprise search, social network, social search | |||
| Sifting micro-blogging stream for events of user interest | | BIBAK | Full-Text | 837 | |
| Maxim Grinev; Maria Grineva; Alexander Boldakov; Leonid Novak; Andrey Syssoev; Dmitry Lizorkin | |||
| Micro-blogging is a new form of social communication that encourages users
to share information about anything they are seeing or doing, the motivation
facilitated by the ability to post brief text messages through a variety of
devices. Twitter, the most popular micro-blogging tool, is exhibiting rapid
growth [3]: up to 11% of online Americans are using Twitter by December 2008,
compared to 6% in May 2008. Due to its nature, micro-blogosphere has unique
features: (i) It is a source of extremely up-to-date information about what is
happening in the world; (ii) It captures the wisdom of millions of people and
covers a broad range of domains. These features make micro-blogosphere more
than a popular medium of social communication: we believe that it has
additionally become a valuable source of extremely up-to-date news on virtually
any subject of user interest. Making use of micro-blogosphere in this new role
we meet the following challenges: (A) Since any given subject is generally
mentioned in the micro-blogging stream on the continuous basis, a method is
needed for locating periods of news on this subject. (B) Additionally, even for
such periods, stream filtering is required for removing noise and for
extracting messages that best describe the news. To address these challenges we
make and exploit the following observations: (A) For an arbitrary subject,
events that catch user interest gain distinguishably more attention than the
average mentioning of the subject resulting in message activity bursts for it.
(B) Most of the messages in an activity burst describe common event in close
variations -- either rephrased or "retweeted" between the users. We demonstrate
TweetSieve -- a system that allows obtaining news on any given subject by
sifting the Twitter stream. Our work is related to frequency-based analysis
applied to blogs [1], but higher latency and lower coverage in blogs makes the
analysis less effective than in case of micro-blogs. In TweetSieve demo, the
user is able to express the subject of her interest by an arbitrary search
string. The system shows the period of events occuring for the subject and
outputs tweets that best describe each of the events. Figure 1 shows a
screenshot of the system for "Semantic search" as a sample subject. The
underlying process consists of two steps: Identifying activity bursts. Counting
the messages matching the search string in the stream over time, the frequency
curve is constructed. Activity bursts in the curve are identified by taking the
periods of frequency exceeding the standard deviation from the average.
Selecting messages that best describe news events. For the set of all messages
matching the search string in an activity burst, we apply the message-granular
variation of our keyphrase extraction algorithm [2] that is specifically suited
to efficiently filtering noisy data. The algorithm clusters messages with
respect to their similarity to each other and chooses central messages from the
most dense clusters. As the similarity measure we use Jaccard coefficient for
the "bag of words" representation of messages. The demonstration illustrates
the potential of our approach in bringing news acquisition to a new level of
promptness and coverage range. Keywords: micro-blogging, news, twitter | |||
| Incentives for social annotation | | BIBAK | Full-Text | 838 | |
| Heather Roinestad; John Burgoon; Benjamin Markines; Filippo Menczer | |||
| The effectiveness of community-driven annotation, such as social
bookmarking, depends on user participation. Since the participation of many
users is motivated by selfish reasons, an effective way to encourage
participation is to create useful or entertaining applications. We demo two
such tools -- a browser extension and a game. Keywords: interactive IR and visualization, social tagging, web 2.0 IR | |||
| Accommodating colorblind users in image search | | BIBAK | Full-Text | 839 | |
| Meng Wang; Bo Liu; Linjun Yang; Xian-Sheng Hua | |||
| There are about 8% of men and 0.8% of women suffering from colorblindness.
Due to certain loss of color information, the existing image search techniques
may not provide satisfactory results for these users. In this demonstration, we
show an image search system that can accommodate colorblind users. It can help
these special users find and enjoy what they want by providing multiple
services for them, including search results reranking, image recoloring and
color indication. Keywords: colorblindness, image search | |||
| Generic similarity search engine demonstrated by an image retrieval application | | BIBAK | Full-Text | 840 | |
| David Novak; Michal Batko; Pavel Zezula | |||
| We introduce a generic engine for large-scale similarity search and
demonstrate it on a set of 100 million Flickr images. Keywords: content-based search, metric space, scalability, similarity search | |||
| Pharos: an audiovisual search platform | | BIBK | Full-Text | 841 | |
| Alessandro Bozzon; Marco Brambilla; Piero Fraternali; Francesco Nucci; Stefan Debald; Eric Moore; Wolfgang Neidl; Michel Plu; Patrick Aichroth; Olli Pihlajamaa; Cyril Laurier; Serge Zagorac; Gerhard Backfried; Daniel Weinland; Vincenzo Croce | |||
Keywords: audiovideo, multimedia, search, search based application | |||
| Agate: information gathering for risk monitoring | | BIBAK | Full-Text | 842 | |
| Arnaud Saval; Yann Mombrun | |||
| Internet sources provide new ways to acquire information about risk and to
follow up the evolution of natural disasters in real time. We will present
first, an architecture dedicated to unstructured information processing. Then,
we will show how the spatial and temporal representation extended with semantic
properties answers information ambiguity problem. Agate platform was evaluated
by a non-governmental organization which filtered alerts according to types of
disaster and their locations. Keywords: crawling, geographic, risks, search | |||
| wikiSearch: enabling interactivity in search | | BIBAK | Full-Text | 843 | |
| Elaine G. Toms; Tayze Mackenzie; Chris Jordan; Sam Hall | |||
| wikiSearch, is a search engine customized for the Wikipedia corpus but with
design features that may be generalized to other search systems. Its features
enhance basic functionality and enable more fluid interactivity while
supporting both workflow in the search process and the experimental process
used in lab testing. Keywords: interactive IR, search engine, search workflow, wikipedia | |||
| CDPRanking: discovering and ranking cross-document paths between entities | | BIBK | Full-Text | 844 | |
| Wei Jin; Xin Wu | |||
Keywords: graph mining, information extraction, text mining | |||
| Context-based health information retrieval | | BIBK | Full-Text | 845 | |
| Carla Lopes | |||
Keywords: contextual information retrieval, health information retrieval, user context | |||
| Modeling uncertainty in video retrieval: a retrieval model for uncertain semantic representations of videos | | BIBK | Full-Text | 846 | |
| Robin Aly | |||
Keywords: concept based search, multimedia retrieval, probabilistic information
retrieval, video retrieval | |||
| Toponym ambiguity in geographical information retrieval | | BIBAK | Full-Text | 847 | |
| Davide Buscaldi | |||
| The objectives of this research work is to study the effects of toponym
(place name) ambiguity in the Geographical Information Retrieval (GIR) task.
Our experience with GIR systems shows that toponym ambiguity may be an
important factor in the inability of these systems to take advantage from
geographical knowledge. Previous studies over ambiguity and Information
Retrieval (IR) suggested that disambiguation may be useful in some specific IR
scenario. We suppose that GIR may constitute such a scenario. This preliminary
study was carried out over the WordNet based, manually disambiguated collection
developed for the CLIR-WSD task, using the GeoCLEF collection of 100
geographically related topics. The employed GIR system was based on the
GeoWorSE system that participated in GeoCLEF 2008. The experiments were carried
out considering the manual disambiguation and comparing this result with those
obtained by randomly disambiguating the document collection and those obtained
by using always the most common referent. The obtained results show no
significant difference in the overall results, although the work gave an
insight into some errors that are produced by toponym ambiguity and how they
may affect the results. These preliminary results also suggest that WordNet is
not a suitable resource for the planned research. Keywords: geographical information retrieval, information retrieval, toponym
disambiguation, word sense disambiguation, wordnet | |||
| Exploiting temporal information in retrieval of archived documents | | BIBAK | Full-Text | 848 | |
| Nattiya Kanhabua | |||
| In a text retrieval community, many researchers have shown a good quality of
searching a current snapshot of the Web. However, only a small number have
demonstrated a good quality of searching a long-term archival domain, where
documents are preserved for a long time, i.e., ten years or more. In such a
domain, a search application is not only applicable for archivists or
historians, but also in a context of national library and enterprise search
(searching document repositories, emails, etc.). In the rest of this paper, we
will explain three problems of searching document archives and propose possible
approaches to solve these problems. Our main research question is: How to
improve the quality of search in a document archive using temporal information? Keywords: language models, query expansion, ranking, temporal search | |||
| Using document structure for automatic summarization | | BIBK | Full-Text | 849 | |
| Aurélien Bossard | |||
Keywords: automatic summarization | |||
| Topic structure for information retrieval | | BIBAK | Full-Text | 850 | |
| Jiyin He | |||
| In my research, I propose a coherence measure, with the goal of discovering
and using topic structures within and between documents, of which I explore its
extensions and applications in information retrieval. Keywords: semantic relatedness, topic structure | |||
| Using computational community interest as an indicator for ranking | | BIBAK | Full-Text | 851 | |
| Xiaozhong Liu | |||
| Ranking documents in response to users' information needs is a challenging
task, due, in part, to the dynamic nature of users' interests with respect to a
query. I hypothesize that the interests of a given user are similar to the
interests of the broader community of which he is a part and propose an
innovative method that uses social media to characterize the interests of the
community and use this characterization to improve future rankings. By
generating a community interest vector (CIV) for a given query, we use
community interest to alter the ranking score of individual documents retrieved
by the query. The CIV is based on a continuously updated set of recent (daily
or past few hours) user-oriented text data. The user-oriented data can be user
blogs or user comment tagged news. Preliminary evaluation shows that the new
ranking method significantly improves ranking performance. Keywords: blog, comments, community interest, ranking, user | |||
| Affective adaptive retrieval: study of emotion in adaptive retrieval | | BIBK | Full-Text | 852 | |
| Yashar Moshfeghi | |||
Keywords: adaptive retrieval, affect, emotion, filtering, personalisation,
recommendation | |||
| A study on performance volatility in information retrieval | | BIBAK | Full-Text | 853 | |
| Mehdi Hosseini | |||
| A common practice in comparative evaluation of information retrieval (IR)
systems is to create a test collection comprising a set of topics (queries), a
document corpus, and relevance judgments, and to monitor the performance of
retrieval systems over such a collection. A typical evaluation of a system
involves computing a performance metric, e.g., Average Precision (AP), for each
topic and then using the average performance metric, e.g., Mean Average
Precision (MAP) to express the overall system performance. However, averages do
not capture all the important aspects of system performance, and used alone may
not thoroughly express system effectiveness, i.e., average of performance can
mask large variance in individual topic effectiveness. The author hypothesis is
that, in addition to the average of overall performance, attention needs to be
paid to how a system performance varies across topics. This variability can be
measured by calculating the standard deviation (SD) of individual performance
scores. We refer to this performance variation as Volatility. Keywords: evaluation, prediction, volatility | |||
| Novelty detection across different source types and languages | | BIBK | Full-Text | 854 | |
| Johannes Schanda | |||
Keywords: new event detection, novelty detection | |||
| Personalizing information retrieval using task features, topic knowledge, and task product | | BIBAK | Full-Text | 855 | |
| Jingjing Liu | |||
| Personalization of information retrieval tailors search towards individual
users to meet their particular information needs. Personalization systems
obtain additional information about users and their contexts beyond the queries
they submit to the systems, and use this information to bring the desired
documents to top ranks. The additional information can come from various
sources: user preferences, user behaviors, contexts, etc. [1] To avoid users
taking extra effort in providing explicit preferences, most personalization
approaches have adopted an implicit strategy to obtain users' interests from
their behaviors and/or contexts, such as query history, browsing history, and
so on.
Task, topic knowledge, and desktop information have been used as evidence for personalization. Tailoring display time threshold based on task information was found to improve implicit relevance feedback performance [5]. User's familiarity with search topics was found to be positively correlated with reading time but negatively correlated with search efficacy [3]. This indicated the possibility of inferring topic familiarity from searching behavior. Desktop information was also found to be a good source for personalization [2, 4], and personalization using only those files relevant to user queries are more effective than using the entire desktop data [2]. Since search often happens in a work task environment, we examine how user-generated products and retained documents can help improve search performance. To these ends, this study looks at how the following factors can help personalize search: features of user's work tasks (including task stage and task type), user's familiarity with work task topic, user's saving and using behaviors, and task product(s) that the user generated for the work task. Work tasks are designed to include multiple sub-tasks, each being a stage. Two types of sub-task interdependence are considered: parallel, where the sub-tasks do not depend upon each other, or dependent, where one sub-task depends upon the accomplishment of other sub-task(s). The study examines the interaction effects of these factors, dwell time, and document usefulness. It also looks at a personalization technique that extracts terms for query expansion from work task product(s) and user behaviors. There are three research questions: RQ1: Does the stage of the user's task help predict document usefulness from dwell time in the parallel and the dependent tasks, respectively? RQ2. Does the user's familiarity with work task topic help predict document usefulness from dwell time in the parallel and the dependent tasks, respectively? RQ3. Do user's task product(s) and saving and using behaviors help with query disambiguation? Twenty-four participants are recruited, each coming three times (as three experiment sessions) to a usability laboratory working on three sub-tasks in a general task, either a parallel or a dependent. Take the parallel task as an example. It asks the participants to write a three-section article on hybrid cars, and each section is finished in one session. The three sections focus on Honda Civic sedan hybrid, Nissan Altima sedan hybrid, and Toyota Camry sedan hybrid, respectively. When searching for information, half of the participants use a query expansion condition, where the system recommends search terms based on their work in previous sessions, and the other half use a non-query expansion system condition. Data are collected by three major means: logging software that records user-system interactions, an eye tracker that records eye movement, and questionnaires that elicit users' background information and their perceptions on a number of aspects. The results will provide new evidence on personalizing search by taking account of the examined contextual factors. Keywords: contextual factors in IR, personalization of IR, task | |||
| Exploiting memory cues in personal lifelog retrieval | | BIBAK | Full-Text | 856 | |
| Yi Chen | |||
| In recent years personal lifelogs (PLs) have become an emerging field of
research. PLs are collections of digital data taken from an individual's life
experiences, gathered from both digital and physical worlds. PLs collections
can potentially be collected over periods of years and thus can be very large.
In our research group, four researchers have been engaged in collection of
individual one-year-long PLs data sets. These collections include logs of
computer and mobile phone activities, digital photos, in particular passively
captured Microsoft SenseCam images, geo-location (via GPS), surrounding people
or objects (via Bluetooth), and various biometric data. The complex data types
and heterogeneous structure of this corpus brings great challenges to
traditional content based information retrieval (IR). Yet, the rich connections
integral to personal experience offer exciting potential opportunities to
leverage features from human memory and associated models to support retrieval.
My PhD project aims to develop an interface to assist IR from PLs. In doing this I plan to exploit features in human memory, in particular the mechanisms in associative memory models. Previous studies in personal information re-finding have explored the use of generally well-remembered attributes or metadata of the search targets, such as date, item type/format, authors of documents [1]. There have also been systems which utilize associated computer items or real life events (e.g. [2, 3]) to assist re-finding tasks. However, few of them looked into exactly what types of associated items/events people tend to recall. I plan to explore associations among PL items, as well as their attributes regarding their role in an individual's memory, since I believe that some associations and types of metadata which are available and feasible for use, may have been omitted in existing systems; due to the methods used in previous research where the users' behaviour may have been guided by the searching or management tools available to them. As indicated by some information seeking studies (e.g. [4]), different search context, search motivation, or personal differences such as habits, may lead to varied recall of contents and information seeking behaviours. For this reason, I will also investigate: the influences on personal information re-finding behaviour of context, lifestyle, and differences in prior personal experiences of IR tools. Results from these studies will be used to explore personalisation in search, e.g. to dynamically increase the importance of geo-location in scoring of search results for subjects who travel frequently. As indicated by [4], people tend to make small steps to approach the targets they are looking for, rather than trying to do this in a single search with a "perfect query" comprising all of the relevant details, partially because of their trouble in recalling them. To relieve users from the heavy cognitive burden of recalling the exact target, and on the other hand to reduce the rate of inaccurate queries caused by false recall, my proposed interface will be based on browsing and recognizing, instead of traditional recalling and searching. For example, a user will be able to browse and narrow results by recognizing landmarks and estimating the target activities' time range from the user's digital or physical life [5]. An important issue in my work will be to consider the challenges of evaluating my work with only a very limited number of PL datasets. To partially address this issue, I am currently in engaged in a number of smaller scale diary studies of searching experiences for larger numbers of subjects. Keywords: associative memory, interactive IR, personal lifelogs | |||