[1]
Predicting Search Satisfaction Metrics with Interleaved Comparisons
Session 6A: Experiment Design
/
Schuth, Anne
/
Hofmann, Katja
/
Radlinski, Filip
Proceedings of the 2015 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2015-08-09
p.463-472
© Copyright 2015 ACM
Summary: The gold standard for online retrieval evaluation is AB testing. Rooted in
the idea of a controlled experiment, AB tests compare the performance of an
experimental system (treatment) on one sample of the user population, to that
of a baseline system (control) on another sample. Given an online evaluation
metric that accurately reflects user satisfaction, these tests enjoy high
validity. However, due to the high variance across users, these comparisons
often have low sensitivity, requiring millions of queries to detect
statistically significant differences between systems. Interleaving is an
alternative online evaluation approach, where each user is presented with a
combination of results from both the control and treatment systems. Compared to
AB tests, interleaving has been shown to be substantially more sensitive.
However, interleaving methods have so far focused on user clicks only, and lack
support for more sophisticated user satisfaction metrics as used in AB testing.
In this paper we present the first method for integrating user satisfaction
metrics with interleaving. We show how interleaving can be extended to (1)
directly match user signals and parameters of AB metrics, and (2) how
parameterized interleaving credit functions can be automatically calibrated to
predict AB outcomes. We also develop a new method for estimating the relative
sensitivity of interleaving and AB metrics, and show that our interleaving
credit functions improve agreement with AB metrics without sacrificing
sensitivity. Our results, using 38 large-scale online experiments encompassing
over 3 billion clicks in a web search setting, demonstrate up to a 22%
improvement in agreement with AB metrics (constituting over a 50% error
reduction), while maintaining sensitivity of one to two orders of magnitude
above the AB tests. This paves the way towards more sensitive and accurate
online evaluation.
[2]
Online Search Evaluation with Interleaving
OOEW 2015
/
Radlinski, Filip
Companion Proceedings of the 2015 International Conference on the World Wide
Web
2015-05-18
v.2
p.917
© Copyright 2015 ACM
Summary: Online evaluation allows information retrieval systems to be assessed based
on how real users respond to search results presented. Compared with
traditional offline evaluation based on manual relevance assessments, online
evaluation is particularly attractive in settings where reliable assessments
are difficult or too expensive to obtain. However, the successful use of online
evaluation requires the right metrics to be used, as real user behaviour is
often difficult to interpret. I will present interleaving, a sensitive online
evaluation approach that creates paired comparisons for every user query, and
compare it with alternative A/B online evaluation approaches. I will also show
how interleaving can be parameterized to create a family of evaluation metrics
that can be chosen to best match the goals of an evaluation.
[3]
Relevance and Effort: An Analysis of Document Utility
IR Session 1: IR Evaluation
/
Yilmaz, Emine
/
Verma, Manisha
/
Craswell, Nick
/
Radlinski, Filip
/
Bailey, Peter
Proceedings of the 2014 ACM Conference on Information and Knowledge
Management
2014-11-03
p.91-100
© Copyright 2014 ACM
Summary: In this paper, we study one important source of the mis-match between user
data and relevance judgments, those due to the high degree of effort required
by users to identify and consume the information in a document. Information
retrieval relevance judges are trained to search for evidence of relevance when
assessing documents. For complex documents, this can lead to judges' spending
substantial time considering each document. However, in practice, search users
are often much more impatient: if they do not see evidence of relevance
quickly, they tend to give up.
Relevance judgments sit at the core of test collection construction, and are
assumed to model the utility of documents to real users. However, comparisons
of judgments with signals of relevance obtained from real users, such as click
counts and dwell time, have demonstrated a systematic mismatch.
Our results demonstrate that the amount of effort required to find the
relevant information in a document plays an important role in the utility of
that document to a real user. This effort is ignored in the way relevance
judgments are currently obtained, despite the expectation that judges inform us
about real users. We propose that if the goal is to evaluate the likelihood of
utility to the user, effort as well as relevance should be taken into
consideration, and possibly characterized independently, when judgments are
obtained.
[4]
An Eye-tracking Study of User Interactions with Query Auto Completion
IR Session 5: Users
/
Hofmann, Kajta
/
Mitra, Bhaskar
/
Radlinski, Filip
/
Shokouhi, Milad
Proceedings of the 2014 ACM Conference on Information and Knowledge
Management
2014-11-03
p.549-558
© Copyright 2014 ACM
Summary: Query Auto Completion (QAC) suggests possible queries to web search users
from the moment they start entering a query. This popular feature of web search
engines is thought to reduce physical and cognitive effort when formulating a
query.
Perhaps surprisingly, despite QAC being widely used, users' interactions
with it are poorly understood. This paper begins to address this gap. We
present the results of an in-depth user study of user interactions with QAC in
web search. While study participants completed web search tasks, we recorded
their interactions using eye-tracking and client-side logging. This allows us
to provide a first look at how users interact with QAC. We specifically focus
on the effects of QAC ranking, by controlling the quality of the ranking in a
within-subject design.
We identify a strong position bias that is consistent across ranking
conditions. Due to this strong position bias, ranking quality affects QAC
usage. We also find an effect on task completion, in particular on the number
of result pages visited. We show how these effects can be explained by a
combination of searchers' behavior patterns, namely monitoring or ignoring QAC,
and searching for spelling support or complete queries to express a search
intent. We conclude the paper with a discussion of the important implications
of our findings for QAC evaluation.
[5]
On user interactions with query auto-completion
Poster session (short papers)
/
Mitra, Bhaskar
/
Shokouhi, Milad
/
Radlinski, Filip
/
Hofmann, Katja
Proceedings of the 2014 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2014-07-06
p.1055-1058
© Copyright 2014 ACM
Summary: Query Auto-Completion (QAC) is a popular feature of web search engines that
aims to assist users to formulate queries faster and avoid spelling mistakes by
presenting them with possible completions as soon as they start typing.
However, despite the wide adoption of auto-completion in search systems, there
is little published on how users interact with such services.
In this paper, we present the first large-scale study of user interactions
with auto-completion based on query logs of Bing, a commercial search engine.
Our results confirm that lower-ranked auto-completion suggestions receive
substantially lower engagement than those ranked higher. We also observe that
users are most likely to engage with auto-completion after typing about half of
the query, and in particular at word boundaries. Interestingly, we also noticed
that the likelihood of using auto-completion varies with the distance of query
characters on the keyboard.
Overall, we believe that the results reported in our study provide valuable
insights for understanding user engagement with auto-completion, and are likely
to inform the design of more effective QAC systems.
[6]
On correlation of absence time and search effectiveness
Poster session (short papers)
/
Chakraborty, Sunandan
/
Radlinski, Filip
/
Shokouhi, Milad
/
Baecke, Paul
Proceedings of the 2014 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2014-07-06
p.1163-1166
© Copyright 2014 ACM
Summary: Online search evaluation metrics are typically derived based on implicit
feedback from the users. For instance, computing the number of page clicks,
number of queries, or dwell time on a search result. In a recent paper, Dupret
and Lalmas introduced a new metric called absence time, which uses the time
interval between successive sessions of users to measure their satisfaction
with the system. They evaluated this metric on a version of Yahoo! Answers. In
this paper, we investigate the effectiveness of absence time in evaluating new
features in a web search engine, such as new ranking algorithm or a new user
interface. We measured the variation of absence time to the effects of 21
experiments performed on a search engine. Our findings show that the outcomes
of absence time agreed with the judgement of human experts performing a
thorough analysis of a wide range of online and offline metrics in 14 out of
these 21 cases.
We also investigated the relationship between absence time and a set of
commonly-used covariates (features) such as the number of queries and clicks in
the session. Our results suggest that users are likely to return to the search
engine sooner when their previous session has more queries and more clicks.
[7]
Choices and constraints: research goals and approaches in information
retrieval (part 1)
Tutorials
/
Kelly, Diane
/
Radlinski, Filip
/
Teevan, Jaime
Proceedings of the 2014 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2014-07-06
p.1283
© Copyright 2014 ACM
Summary: All research projects begin with a goal, for instance to describe search
behavior, to predict when a person will enter a second query, or to discover
which IR system performs the best. Different research goals suggest different
research approaches, ranging from field studies to lab studies to online
experimentation. This tutorial will provide an overview of the different types
of research goals, common evaluation approaches used to address each type, and
the constraints each approach entails. Participants will come away with a broad
perspective of research goals and approaches in IR, and an understanding of the
benefits and limitations of these research approaches. The tutorial will take
place in two independent, but interrelated parts, each focusing on a unique set
of research approaches but with the same intended tutorial outcomes. These
outcomes will be accomplished by deconstructing and analyzing our own published
research papers, with further illustrations of each technique using the broader
literature. By using our own research as anchors, we will provide insight about
the research process, revealing the difficult choices and trade-offs
researchers make when designing and conducting IR studies.
[8]
Choices and constraints: research goals and approaches in information
retrieval (part 2)
Tutorials
/
Kelly, Diane
/
Radlinski, Filip
/
Teevan, Jaime
Proceedings of the 2014 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2014-07-06
p.1284
© Copyright 2014 ACM
Summary: All research projects begin with a goal, for instance to describe search
behavior, to predict when a person will enter a second query, or to discover
which IR system performs the best. Different research goals suggest different
research approaches, ranging from field studies to lab studies to online
experimentation. This tutorial will provide an overview of the different types
of research goals, common evaluation approaches used to address each type, and
the constraints each approach entails. Participants will come away with a broad
perspective of research goals and approaches in IR, and an understanding of the
benefits and limitations of these research approaches. The tutorial will take
place in two independent, but interrelated parts, each focusing on a unique set
of research approaches but with the same intended tutorial outcomes. These
outcomes will be accomplished by deconstructing and analyzing our own published
research papers, with further illustrations of each technique using the broader
literature. By using our own research as anchors, we will provide insight about
the research process, revealing the difficult choices and trade-offs
researchers make when designing and conducting IR studies.
[9]
Fighting search engine amnesia: reranking repeated results
Users and interactive IR II
/
Shokouhi, Milad
/
White, Ryen W.
/
Bennett, Paul
/
Radlinski, Filip
Proceedings of the 2013 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2013-07-28
p.273-282
© Copyright 2013 ACM
Summary: Web search engines frequently show the same documents repeatedly for
different queries within the same search session, in essence forgetting when
the same documents were already shown to users. Depending on previous user
interaction with the repeated results, and the details of the session, we show
that sometimes the repeated results should be promoted, while some other times
they should be demoted.
Analysing search logs from two different commercial search engines, we find
that results are repeated in about 40% of multi-query search sessions, and that
users engage differently with repeats than with results shown for the first
time. We demonstrate how statistics about result repetition within search
sessions can be incorporated into ranking for personalizing search results. Our
results on query logs of two large-scale commercial search engines suggest that
we successfully promote documents that are more likely to be clicked by the
user in the future while maintaining performance over standard measures of
non-personalized relevance.
[10]
Practical Online Retrieval Evaluation
Tutorials
/
Radlinski, Filip
/
Hofmann, Katja
Proceedings of ECIR'13, the 2013 European Conference on Information
Retrieval
2013-03-24
p.878-881
Keywords: Interleaving; Clicks; Search Engine; Online Evaluation
© Copyright 2013 Springer-Verlag
Summary: Online evaluation allows the assessment of information retrieval (IR)
techniques based on how real users respond to them. Because this technique is
directly based on observed user behavior, it is a promising alternative to
traditional offline evaluation, which is based on manual relevance assessments.
In particular, online evaluation can enable comparisons in settings where
reliable assessments are difficult to obtain (e.g., personalized search) or
expensive (e.g., for search by trained experts in specialized collections).
Despite its advantages, and its successful use in commercial settings,
online evaluation is rarely employed outside of large commercial search engines
due to a perception that it is impractical at small scales. The goal of this
tutorial is to show how online evaluations can be conducted in such settings,
demonstrate software to facilitate its use, and promote further research in the
area. We will also contrast online evaluation with standard offline evaluation,
and provide an overview of online approaches.
[11]
On caption bias in interleaving experiments
IR track: evaluation methodologies
/
Hofmann, Katja
/
Behr, Fritz
/
Radlinski, Filip
Proceedings of the 2012 ACM Conference on Information and Knowledge
Management
2012-10-29
p.115-124
© Copyright 2012 ACM
Summary: Information retrieval evaluation most often involves manually assessing the
relevance of particular query-document pairs. In cases where this is difficult
(such as personalized search), interleaved comparison methods are becoming
increasingly common. These methods compare pairs of ranking functions based on
user clicks on search results, thus better reflecting true user preferences.
However, by depending on clicks, there is a potential for bias. For example,
users have been previously shown to be more likely to click on results with
attractive titles and snippets. An interleaving evaluation where one ranker
tends to generate results that attract more clicks (without being more
relevant) may thus be biased.
We present an approach for detecting and compensating for this type of bias
in interleaving evaluations. Introducing a new model of caption bias, we
propose features that model bias based on (1) per-document effects, and (2) the
(pairwise) relationships between a document and surrounding documents. We show
that our model can effectively capture click behavior, with best results
achieved by a model that combines both per-document and pairwise features.
Applying this model to re-weight observed user clicks, we find a small overall
effect on real interleaving comparisons, but also identify a case where
initially detected preferences vanish after caption bias re-weighting is
applied. Our results indicate that our model of caption bias is effective and
can successfully identify interleaving experiments affected by caption bias.
[12]
Large-scale validation and analysis of interleaved search evaluation
/
Chapelle, Olivier
/
Joachims, Thorsten
/
Radlinski, Filip
/
Yue, Yisong
ACM Transactions on Information Systems
2012-02
v.30
n.1
p.6
© Copyright 2012 ACM
Summary: Interleaving is an increasingly popular technique for evaluating information
retrieval systems based on implicit user feedback. While a number of isolated
studies have analyzed how this technique agrees with conventional offline
evaluation approaches and other online techniques, a complete picture of its
efficiency and effectiveness is still lacking. In this paper we extend and
combine the body of empirical evidence regarding interleaving, and provide a
comprehensive analysis of interleaving using data from two major commercial
search engines and a retrieval system for scientific literature. In particular,
we analyze the agreement of interleaving with manual relevance judgments and
observational implicit feedback measures, estimate the statistical efficiency
of interleaving, and explore the relative performance of different interleaving
variants. We also show how to learn improved credit-assignment functions for
clicks that further increase the sensitivity of interleaving.
[13]
Inferring and using location metadata to personalize web search
Personalization
/
Bennett, Paul N.
/
Radlinski, Filip
/
White, Ryen W.
/
Yilmaz, Emine
Proceedings of the 34th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2011-07-25
p.135-144
© Copyright 2011 ACM
Summary: Personalization of search results offers the potential for significant
improvements in Web search. Among the many observable user attributes,
approximate user location is particularly simple for search engines to obtain
and allows personalization even for a first-time Web search user. However,
acting on user location information is difficult, since few Web documents
include an address that can be interpreted as constraining the locations where
the document is relevant. Furthermore, many Web documents -- such as local news
stories, lottery results, and sports team fan pages -- may not correspond to
physical addresses, but the location of the user still plays an important role
in document relevance. In this paper, we show how to infer a more general
location relevance which uses not only physical location but a more general
notion of locations of interest for Web pages. We compute this information
using implicit user behavioral data, characterize the most location-centric
pages, and show how location information can be incorporated into Web search
ranking. Our results show that a substantial fraction of Web search queries can
be significantly improved by incorporating location-based features.
[14]
Practical online retrieval evaluation
Tutorials
/
Radlinski, Filip
/
Yue, Yisong
Proceedings of the 34th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2011-07-25
p.1301-1302
© Copyright 2011 ACM
Summary: Online evaluation is amongst the few evaluation techniques available to the
information retrieval community that is guaranteed to reflect how users
actually respond to improvements developed by the community. Broadly speaking,
online evaluation refers to any evaluation of retrieval quality conducted while
observing user behavior in a natural context. However, it is rarely employed
outside of large commercial search engines due primarily to a perception that
it is impractical at small scales. The goal of this tutorial is to familiarize
information retrieval researchers with state-of-the-art techniques in
evaluating information retrieval systems based on natural user clicking
behavior, as well as to show how such methods can be practically deployed. In
particular, our focus will be on demonstrating how the Interleaving approach
and other click based techniques contrast with traditional offline evaluation,
and how these online methods can be effectively used in academic-scale
research. In addition to lecture notes, we will also provide sample software
and code walk-throughs to showcase the ease with which Interleaving and other
click-based methods can be employed by students, academics and other
researchers.
[15]
Comparing the sensitivity of information retrieval metrics
Non-English IR & evaluation
/
Radlinski, Filip
/
Craswell, Nick
Proceedings of the 33rd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2010-07-19
p.667-674
Keywords: evaluation, interleaving, search
© Copyright 2010 ACM
Summary: Information retrieval effectiveness is usually evaluated using measures such
as Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP)
and Precision at some cutoff (Precision@k) on a set of judged queries. Recent
research has suggested an alternative, evaluating information retrieval systems
based on user behavior. Particularly promising are experiments that interleave
two rankings and track user clicks. According to a recent study, interleaving
experiments can identify large differences in retrieval effectiveness with much
better reliability than other click-based methods.
We study interleaving in more detail, comparing it with traditional measures
in terms of reliability, sensitivity and agreement. To detect very small
differences in retrieval effectiveness, a reliable outcome with standard
metrics requires about 5,000 judged queries, and this is about as reliable as
interleaving with 50,000 user impressions. Amongst the traditional measures,
NDCG has the strongest correlation with interleaving. Finally, we present some
new forms of analysis, including an approach to enhance interleaving
sensitivity.
[16]
Metrics for assessing sets of subtopics
Poster presentations
/
Radlinski, Filip
/
Szummer, Martin
/
Craswell, Nick
Proceedings of the 33rd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2010-07-19
p.853-854
Keywords: diversity, novelty, subtopic
© Copyright 2010 ACM
Summary: To evaluate the diversity of search results, test collections have been
developed that identify multiple intents for each query. Intents are the
different meanings or facets that should be covered in a search results list.
This means that topic development involves proposing a set of intents for each
query. We propose four measurable properties of query-to-intent mappings,
allowing for more principled topic development for such test collections.
[17]
Inferring query intent from reformulations and clicks
WWW posters
/
Radlinski, Filip
/
Szummer, Martin
/
Craswell, Nick
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.1171-1172
Keywords: diversity, intents, subtopics
© Copyright 2010 ACM
Summary: Many researchers have noted that web search queries are often ambiguous or
unclear. We present an approach for identifying the popular meanings of queries
using web search logs and user click behavior. We show our approach to produce
more complete and user-centric intents than expert judges by evaluating on TREC
queries. This approach was also used by the TREC 2009 Web Track judges to
obtain more representative topic descriptions from real queries.
[18]
How does clickthrough data reflect retrieval quality?
IR: web search 1
/
Radlinski, Filip
/
Kurup, Madhu
/
Joachims, Thorsten
Proceedings of the 2008 ACM Conference on Information and Knowledge
Management
2008-10-26
p.43-52
© Copyright 2008 ACM
Summary: Automatically judging the quality of retrieval functions based on observable
user behavior holds promise for making retrieval evaluation faster, cheaper,
and more user centered. However, the relationship between observable user
behavior and retrieval quality is not yet fully understood. We present a
sequence of studies investigating this relationship for an operational search
engine on the arXiv.org e-print archive. We find that none of the eight
absolute usage metrics we explore (e.g., number of clicks, frequency of query
reformulations, abandonment) reliably reflect retrieval quality for the sample
sizes we consider. However, we find that paired experiment designs adapted from
sensory analysis produce accurate and reliable statements about the relative
quality of two retrieval functions. In particular, we investigate two paired
comparison tests that analyze clickthrough data from an interleaved
presentation of ranking pairs, and we find that both give accurate and
consistent results. We conclude that both paired comparison tests give
substantially more accurate and sensitive evaluation results than absolute
usage metrics in our domain.
[19]
Optimizing relevance and revenue in ad search: a query substitution approach
Non-topicality
/
Radlinski, Filip
/
Broder, Andrei
/
Ciccolo, Peter
/
Gabrilovich, Evgeniy
/
Josifovski, Vanja
/
Riedel, Lance
Proceedings of the 31st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2008-07-20
p.403-410
© Copyright 2008 ACM
Summary: The primary business model behind Web search is based on textual
advertising, where contextually relevant ads are displayed alongside search
results. We address the problem of selecting these ads so that they are both
relevant to the queries and profitable to the search engine, showing that
optimizing ad relevance and revenue is not equivalent. Selecting the best ads
that satisfy these constraints also naturally incurs high computational costs,
and time constraints can lead to reduced relevance and profitability. We
propose a novel two-stage approach, which conducts most of the analysis ahead
of time. An offine preprocessing phase leverages additional knowledge that is
impractical to use in real time, and rewrites frequent queries in a way that
subsequently facilitates fast and accurate online matching. Empirical
evaluation shows that our method optimized for relevance matches a
state-of-the-art method while improving expected revenue. When optimizing for
revenue, we see even more substantial improvements in expected revenue.
[20]
A support vector method for optimizing average precision
Learning to rank I
/
Yue, Yisong
/
Finley, Thomas
/
Radlinski, Filip
/
Joachims, Thorsten
Proceedings of the 30th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2007-07-23
p.271-278
© Copyright 2007 ACM
Summary: Machine learning is commonly used to improve ranked retrieval systems. Due
to computational difficulties, few learning techniques have been developed to
directly optimize for mean average precision (MAP), despite its widespread use
in evaluating such systems. Existing approaches optimizing MAP either do not
find a globally optimal solution, or are computationally expensive. In
contrast, we present a general SVM learning algorithm that efficiently finds a
globally optimal solution to a straightforward relaxation of MAP. We evaluate
our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing
against SVMs optimized for accuracy and ROCArea. In most cases we show our
method to produce statistically significant improvements in MAP scores.
[21]
Recommending related papers based on digital library access records
Historical digital libraries
/
Pohl, Stefan
/
Radlinski, Filip
/
Joachims, Thorsten
JCDL'07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital
Libraries
2007-06-18
p.417-418
© Copyright 2007 ACM
Summary: An important goal for digital libraries is to enable researchers to more
easily explore related work. While citation data is often used as an indicator
of relatedness, in this paper we demonstrate that digital access records (e.g.
http-server logs) can be used as indicators as well. In particular, we show
that measures based on co-access provide better coverage than co-citation, that
they are available much sooner, and that they are more accurate for recent
papers.
[22]
Evaluating the accuracy of implicit feedback from clicks and query
reformulations in Web search
/
Joachims, Thorsten
/
Granka, Laura
/
Pan, Bing
/
Hembrooke, Helene
/
Radlinski, Filip
/
Gay, Geri
ACM Transactions on Information Systems
2007
v.25
n.2
p.7
© Copyright 2007 ACM
Summary: This article examines the reliability of implicit feedback generated from
clickthrough data and query reformulations in World Wide Web (WWW) search.
Analyzing the users' decision process using eyetracking and comparing implicit
feedback against manual relevance judgments, we conclude that clicks are
informative but biased. While this makes the interpretation of clicks as
absolute relevance judgments difficult, we show that relative preferences
derived from clicks are reasonably accurate on average. We find that such
relative preferences are accurate not only between results from an individual
query, but across multiple sets of results within chains of query
reformulations.
[23]
Improving personalized web search using result diversification
Posters
/
Radlinski, Filip
/
Dumais, Susan
Proceedings of the 29th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2006-08-06
p.691-692
© Copyright 2006 ACM
Summary: We present and evaluate methods for diversifying search results to improve
personalized web search. A common personalization approach involves reranking
the top N search results such that documents likely to be preferred by the user
are presented higher. The usefulness of reranking is limited in part by the
number and diversity of results considered. We propose three methods to
increase the diversity of the top results and evaluate the effectiveness of
these methods.