HCI Bibliography Home | HCI Conferences | CLEF Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
CLEF Tables of Contents: 101112131415

CLEF 2011: International Conference of the Cross-Language Evaluation Forum

Fullname:CLAF 2011: Multilingual and Multimodal Information Access Evaluation: Second International Conference of the Cross-Language Evaluation Forum
Editors:Pamela Forner; Julio Gonzalo; Jaana Kekäläinen; Mounia Lalmas; Marteen de Rijke
Location:Amsterdam, Netherlands
Dates:2011-Sep-19 to 2011-Sep-22
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 6941
Standard No:DOI: 10.1007/978-3-642-23708-9; ISBN: 978-3-642-23707-2 (print), 978-3-642-23708-9 (online); hcibib: CLEF11
Links:Online Proceedings | DBLP Contents | Online Working Notes
  1. Keynote Addresses
  2. Methodologies and Lessons
  3. Language and Processing
  4. Visual and Context

Keynote Addresses

"Would You Trust Your IR System to Choose Your Date?" Re-thinking IR Evaluation in the 21st Century BIBAKFull-Text 1
  Elaine G. Toms
This talk examines interactive IR system evaluation from the holistic approach, including some of the pitfalls in existing approaches, and the issues involved in designing more effective processes and procedures.
Keywords: User-centred evaluation; interactive information retrieval
Crowdsourcing for Information Retrieval Experimentation and Evaluation BIBAFull-Text 2
  Omar Alonso
Very recently, crowdsourcing has emerged as a viable alternative for conducting different types of experiments in a wide range of areas. Generally speaking and in the context of IR, crowdsourcing involves outsourcing tasks to a large group of people instead of assigning such tasks to an employee or editor. The availability of commercial crowdsourcing platforms offers vast access to an on-demand workforce. This new approach makes possible to conduct experiments extremely fast, with good results at a low cost. However, like in any experiment, there are several implementation details that would make an experiment work or fail. For large scale evaluation, deployment in practice is not that simple. Tasks have to be designed carefully with special emphasis on the user interface, instructions, content, and quality control.
   In this invited talk, I will explore some directions that may influence the outcome of a task and I will present a framework for conducting crowdsourcing experiments making some emphasis on a number of aspects that should be of importance for all sorts of IR-like tasks. Finally, I will outline research trends around human computation that promise to make this emerging field even more interesting in the near future.

Methodologies and Lessons

Building a Cross-Language Entity Linking Collection in Twenty-One Languages BIBAKFull-Text 3-13
  James Mayfield; Dawn Lawrie; Paul McNamee; Douglas W. Oard
We describe an efficient way to create a test collection for evaluating the accuracy of cross-language entity linking. Queries are created by semi-automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. We applied the technique to produce the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.
Keywords: Entity Linking; Cross-Language Entity Linking; Multilingual Corpora; Crowdsourcing
Search Snippet Evaluation at Yandex: Lessons Learned and Future Directions BIBAKFull-Text 14-25
  Denis Savenkov; Pavel Braslavski; Mikhail Lebedev
In this paper, we present a Wikipedia-based approach to query expansion for the task of image retrieval, by combining salient encyclopaedic concepts with the picturability of words. Our model generates the expanded query terms in a definite two-stage process instead of multiple iterative passes, requires no manual feedback, and is completely unsupervised. Preliminary results show that our proposed model is effective in a comparative study on the ImageCLEF 2010 Wikipedia dataset.
Keywords: evaluation; snippets; search summaries; web search; experimentation
Towards a Living Lab for Information Retrieval Research and Development -- A Proposal for a Living Lab for Product Search Tasks BIBAFull-Text 26-37
  Leif Azzopardi; Krisztian Balog
The notion of having a "living lab" to undertaken evaluations has been proposed by a number of proponents within the field of Information Retrieval (IR). However, what such a living lab might look like and how it might be setup has not been discussed in detail. Living labs have a number of appealing points such as realistic evaluation contexts where tasks are directly linked to user experience and the closer integration of research/academia and development/industry facilitating more efficient knowledge transfer. However, operationalizing a living lab opens up a number of concerns regarding security, privacy, etc. as well as challenges regarding the design, development and maintenance of the infrastructure required to support such evaluations. Here, we aim to further the discussion on living labs for IR evaluation and propose one possible architecture to create such an evaluation environment. To focus discussion, we put forward a proposal for a living lab on product search tasks within the context of an online shop.
A Comparison of Evaluation Metrics for Document Filtering BIBAFull-Text 38-49
  Enrique Amigó; Julio Gonzalo; Felisa Verdejo
Although document filtering is simple to define, there is a wide range of different evaluation measures that have been proposed in the literature, all of which have been subject to criticism. We present a unified, comparative view of the strengths and weaknesses of proposed measures based on two formal constraints (which should be satisfied by any suitable evaluation measure) and various properties (which help differentiating measures according to their behaviour). We conclude that (i) some smoothing process is necessary process to satisfy the basic constraints; and (ii) metrics can be grouped into three families, each satisfying one out of three formal properties, which are mutually exclusive, i.e. no metric can satisfy all three properties simultaneously.

Language and Processing

Filter Keywords and Majority Class Strategies for Company Name Disambiguation in Twitter BIBAFull-Text 50-61
  Damiano Spina; Enrique Amigó; Julio Gonzalo
Monitoring the online reputation of a company starts by retrieving all (fresh) information where the company is mentioned; and a major problem in this context is that company names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging, where there is little context to disambiguate: this was the task addressed in the WePS-3 CLEF lab exercise in 2010. This paper introduces a novel fingerprint representation technique to visualize and compare system results for the task. We apply this technique to the systems that originally participated in WePS-3, and then we use it to explore the usefulness of filter keywords (those whose presence in a tweet reliably signals either the positive or the negative class) and finding the majority class (whether positive or negative tweets are predominant for a given company name in a tweet stream) as signals that contribute to address the problem. Our study shows that both are key signals to solve the task, and we also find that, remarkably, the vocabulary associated to a company in the Web does not seem to match the vocabulary used in Twitter streams: even a manual extraction of filter keywords from web pages has substantially lower recall than an oracle selection of the best terms from the Twitter stream.
Automatic Annotation of Bibliographical References for Descriptive Language Materials BIBAKFull-Text 62-73
  Harald Hammarström
The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.
Keywords: Document Categorization; Supervised Learning; Cross-Lingual Information Retrieval; Decision Trees; Language Documentation
A Language-Independent Approach to Identify the Named Entities in Under-Resourced Languages and Clustering Multilingual Documents BIBAKFull-Text 74-82
  N. Kiran Kumar; G. S. K. Santosh; Vasudeva Varma
This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entities (NEs) such as persons, locations, organizations play a major role in measuring the document similarity. We propose a method to identify these NEs present in under-resourced Indian languages (Hindi and Marathi) using the NEs present in English, which is a high resourced language. The identified NEs are then utilized for the formation of multilingual document clusters using the Bisecting k-means clustering algorithm. We didn't make use of any non-English linguistic tools or resources such as WordNet, Part-Of-Speech tagger, bilingual dictionaries, etc., which makes the proposed approach completely language-independent. Experiments are conducted on a standard dataset provided by FIRE for their 2010 Ad-hoc Cross-Lingual document retrieval task on Indian languages. We have considered English, Hindi and Marathi news datasets for our experiments. The system is evaluated using F-score, Purity and Normalized Mutual Information measures and the results obtained are encouraging.
Keywords: Multilingual Document Clustering; Named entities; Language-independent
Multilingual Question-Answering System in Biomedical Domain on the Web: An Evaluation BIBAKFull-Text 83-88
  María-Dolores Olvera-Lobo; Juncal Gutiérrez-Artacho
Question-answering systems (QAS) are presented as an alternative to traditional systems of information retrieval, intended to offer precise responses to factual questions. An analysis has been made of the results offered by the QA multilingual biomedical system HONqa, available on the Web. The study has used a set of 120 biomedical definitional questions (What is...?), taken from the medical website WebMD, which were formulated in English, French, and Italian. The answers have been analysed using a series of specific measures (MRR, TRR, FHS, precision, MAP).
   The study confirms that for all the languages analysed the functioning effectiveness needs to be improved, although in the multilingual context analysed the questions in the English language achieve better results for retrieving definitional information than in French and Italian.
Keywords: Multilingual information; Multilingual Question Answering Systems; Restricted-domain Question Answering Systems; HONqa; Biomedical information; Evaluation measures
Simulation of Within-Session Query Variations Using a Text Segmentation Approach BIBAFull-Text 89-94
  Debasis Ganguly; Johannes Leveling; Gareth J. F. Jones
We propose a generative model for automatic query reformulations from an initial query using the underlying subtopic structure of top ranked retrieved documents. We address three types of query reformulations: a) specialization; b) generalization; and c) drift. To test our model we generate three reformulation variants starting with selected fields from the TREC-8 topics as the initial queries. We use manual judgements from multiple assessors to measure the accuracy of the reformulated query variants and observe accuracies of 65%, 82% and 69% respectively for specialization, generalization and drift reformulations.

Visual and Context

Assessing the Scholarly Impact of ImageCLEF BIBAFull-Text 95-106
  Theodora Tsikrika; Alba Garcia Seco de Herrera; Henning Müller
Systematic evaluation has an important place in information retrieval research starting with the Cranfield tests and currently with TREC (Text REtrieval Conference) and other evaluation campaigns. Such benchmarks are often mentioned to have an important impact in advancing a research field and making techniques comparable. Still, their exact impact is hard to measure. This paper aims at assessing the scholarly impact of the ImageCLEF image retrieval evaluation initiative. To this end, the papers in the proceedings published after each evaluation campaign and their citations are analysed using Scopus and Google Scholar. A significant impact of ImageCLEF could be shown through this bibliometric analysis. The differences between the employed analysis methods, each with its advantages and limitations, are also discussed.
The Importance of Visual Context Clues in Multimedia Translation BIBAFull-Text 107-118
  Christopher G. Harris; Tao Xu
As video-sharing websites such as YouTube proliferate, the ability to rapidly translate video clips into multiple languages has become an essential component for enhancing their global reach and impact. Moreover, the ability to provide closed captioning in a variety of languages is paramount to reach a wider variety of viewers. We investigate the importance of visual context clues by comparing transcripts of multimedia clips (which allow transcriptionists to make use of visual context clues in their translations) with their corresponding written transcripts (which do not). Additionally, we contrast translations produced using crowdsourcing workers with those made by professional translators on cost and quality. Finally, we evaluate several genres of multimedia to examine the effects of visual context clues on each and demonstrate the results through heat maps.
To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? BIBAFull-Text 119-130
  Emanuele Di Buccio; Marco Dussin; Nicola Ferro; Ivano Masiero; Giuseppe Santucci; Giuseppe Tino
Evaluation has a crucial role in (IR) since it allows for identifying possible points of failure of an IR approach, thus addressing them to improve its effectiveness. Developing tools to support researchers and analysts when analyzing results and investigating strategies to improve IR system performance can help make the analysis easier and more effective. In this paper we discuss a VA-based approach to support the analyst when deciding whether or not to investigate re-ranking to improve the system effectiveness measured after a retrieval run. Our approach is based on effectiveness measures that exploit graded relevance judgements and it provides both a principled and intuitive way to support analysis. A prototype is described and exploited to discuss some case studies based on TREC data.
Evaluation Methods for Rankings of Facetvalues for Faceted Search BIBAFull-Text 131-136
  Anne Schuth; Maarten Marx
We introduce two metrics aimed at evaluating systems that select facetvalues for a faceted search interface. Facetvalues are the values of meta-data fields in semi-structured data and are commonly used to refine queries. It is often the case that there are more facetvalues than can be displayed to a user and thus a selection has to be made. Our metrics evaluate these selections based on binary relevant assessments for the documents in a collection. Both our metrics are based on Normalized Discounted Cumulated Gain, an often used Information Retrieval metric.
Improving Query Expansion for Image Retrieval via Saliency and Picturability BIBAFull-Text 137-142
  Chee Wee Leong; Samer Hassan; Miguel E. Ruiz; Rada Mihalcea
In this paper, we present a Wikipedia-based approach to query expansion for the task of image retrieval, by combining salient encyclopaedic concepts with the picturability of words. Our model generates the expanded query terms in a definite two-stage process instead of multiple iterative passes, requires no manual feedback, and is completely unsupervised. Preliminary results show that our proposed model is effective in a comparative study on the ImageCLEF 2010 Wikipedia dataset.