HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 868788899091929394959697989900010203040506

Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Hans-Peter Frei; Donna Harman; Peter Schauble; Ross Wilkinson
Location:Zurich, Switzerland
Dates:1996-Aug-18 to 1996-Aug-22
Publisher:ACM
Standard No:ISBN 0-89791-792-8; ACM Order Number 606960; ACM DL: Table of Contents hcibib: IR96
Papers:62
Pages:346
  1. Keynote Address
  2. Experimental Studies
  3. Language
  4. Visualization
  5. Architecture
  6. User Studies
  7. Efficiency
  8. Multimedia IR
  9. Query Refinement
  10. Logic
  11. Asian Languages
  12. Modeling
  13. Filtering
  14. Categorization
  15. User Studies
  16. Panel
  17. System Demonstrations: Abstracts
  18. Posters: Abstracts
  19. Post-Conference Research Workshops

Keynote Address

The Network Computer BIBAPDF 2
  Andy Hopper
The reduction in cost of both local area and wide area communications has spawned interest in new uses of computer systems. However, the Internet is only at the same stage that television was in the fifties. The talk will deal with how the communications facilities, end-point architectures, and applications may develop. This will focus on a "Network Computer" comprising a range of simple networked end-points used in various combinations. A prototype system used for multimedia applications will be illustrated. A class of applications dealing with ubiquitous personalisation will be described. Finally the likelyhood of a turn-back to centralised systems for ease of management will be considered.

Experimental Studies

Query Expansion Using Local and Global Document Analysis BIBAPDF 4-11
  Jinxi Xu; W. Bruce Croft
Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word relationships (global techniques) and those that analyze documents retrieved by the initial query (local feedback). In this paper, we compare the effectiveness of these approaches and show that, although global analysis has some advantages, local analysis is generally more effective. We also show that using global analysis techniques, such as word context and phrase structure, on the local set of documents produces results that are both more effective and more predictable than simple local feedback.
The Design of a High Performance Information Filtering System BIBAPDF 12-20
  Timothy A. H. Bell; Alistair Moffat
A high performance information filtering system has three main requirements: it must be effective in supplying users with useful information, it must do so in a timely fashion, and it must be able to handle a large throughput of information and a large number of user profiles efficiently. These three requirements pose a difficult problem, and to our knowledge no existing system is capable of meeting all three. In this paper we describe a system which combines a number of techniques from other information retrieval and filtering systems, and is capable of providing high performance on a typical workstation platform. We provide estimates of computing resource usage, and show that our system is also scalable.
Pivoted Document Length Normalization BIBAPDF 21-29
  Amit Singhal; Chris Buckley; Mandar Mitra
Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we observe that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea of pivoting with the well known cosine normalization function. We point out some shortcomings of the cosine function and present two new normalization functions -- pivoted unique normalization and pivoted byte size normalization.
Retrieving Spoken Documents by Combining Multiple Index Sources BIBAPDF 30-38
  G. J. F. Jones; J. T. Foote; K. Sparck Jones; S. J. Young
This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; terms not in a 20,000 word recognition system vocabulary can be identified by the word spotter at search time. Though either system alone can yield respectable retrieval performance, the two methods are complementary and work best in combination. Different ways of combining them are investigated, and it is shown that the best of these can increase retrieval average precision for a speaker-independent retrieval system to 85% of that achieved for full-text transcriptions of the test documents.

Language

Viewing Stemming as Recall Enhancement BIBAPDF 40-48
  Wessel Kraaij; Renee Pohlmann
Previous research on stemming has shown both positive and negative effects on retrieval performance. This paper describes an experiment in which several linguistic and non-linguistic stemmers are evaluated on a Dutch test collection. Experiments especially focus on the measurement of Recall. Results show that linguistic stemming restricted to inflection yields a significant improvement over full linguistic and non-linguistic stemming, both in average Precision and R-Recall. Best results are obtained with a linguistic stemmer which is enhanced with compound analysis. This version has a significantly better Recall than a system without stemming, without a significant deterioration of Precision.
Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval BIBAPDF 49-57
  David A. Hull; Gregory Grefenstette
The multilingual information retrieval system of the future will need to be able to retrieve documents across language boundaries. This extension of the classical IR problem is particularly challenging, as significant resources are required to perform query translation. At Xerox, we are working to build a multilingual IR system and conducting a series of experiments to understand what factors are most important in making the system work. Using translated queries and a bilingual transfer dictionary, we have learned that cross-language multilingual IR is feasible, although performance lags considerably behind the monolingual standard. The experiments suggest that correct identification and translation of multi-word terminology is the single most important source of error in the system, although ambiguity in translation also contributes to poor performance.
Experiments in Multilingual Information Retrieval using the SPIDER System BIBAPDF 58-65
  Paraic Sheridan; Jean Paul Ballerini
We introduce a new approach to multilingual information retrieval based on the use of thesaurus-based query expansion techniques applied over a collection of comparable multilingual documents. This approach has been built into the SPIDER information retrieval system and has been tested over a large collection of Italian documents. We have shown that the SPIDER system retrieves Italian documents in response to user queries written in German with better effectiveness than a baseline system evaluating Italian queries against Italian documents. Although the importance of the SPIDER stemming algorithm for Italian must be stressed in these results we have also achieved performance on multilingual retrieval tasks within 32% of the best SPIDER performance on Italian retrieval by including a relevance feedback loop in the task of multilingual retrieval.

Visualization

Visualizing Search Results: Some Alternatives to Query-Document Similarity BIBAPDF 67-75
  Lucy Terry Nowell; Robert K. France; Deborah Hix; Lenwood S. Heath; Edward A. Fox
A digital library of computer science literature, Envision provides powerful information visualization by displaying search results as a matrix of icons, with layout semantics under user control. Envision's Graphic View interacts with an Item Summary Window giving users access to bibliographic information, and XMosaic provides access to complete bibliographic information, abstracts, and full content. While many visualization interfaces for information retrieval systems depict ranked query-document similarity, Envision graphically presents a variety of document characteristics and supports an extensive range of user tasks. Formative usability evaluation results show great user satisfaction with Envision's style of presentation and the document characteristics visualized.
Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results BIBAPDF 76-84
  Marti A. Hearst; Jan O. Pedersen
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential.
Evaluation of a Tool for Visualization of Information Retrieval Results BIBAPDF 85-92
  Aravindan Veerasamy; Nicholas J. Belkin
We report on the design and evaluation of a visualization tool for Information Retrieval (IR) systems that aims to help the end user in the following respects:
  • As an indicator of document relevance, the tool graphically provides specific
       query related information about individual documents
  • As a diagnosis tool, it graphically provides aggregate information about the
       query results that could help in identifying how the different query terms
       influence the retrieval and ranking of documents. Two different experiments using TREC-4 data were conducted to evaluate the effectiveness of this tool. Results, while mixed, indicate that visualization of this sort may provide useful support for judging the relevance of documents, in particular by enabling users to make more accurate decisions about which documents to inspect in detail. Problems in evaluation of such tools in interactive environments are discussed.
  • Architecture

    An Architecture for Implementing Extensible Information-Seeking Environments BIBAPDF 94-100
      David G. Hendry; David J. Harper
    A user-interface architecture, called FireWorks, is described. It consists of a domain-specific toolkit and frameworks for building information retrieval applications. The architecture's expressiveness is demonstrated by first describing an example application, which is designed to help searchers coordinate access to multiple on-line sources. Second, FireWorks is compared to a similar architecture, called InfoGrid. The comparison focuses debate on what software abstractions arc required for implementing a range of effective environments for information seeking.
    Document Retrieval Facilities for Repository-Based System Development Environments BIBAPDF 101-109
      Andreas Henrich
    Modern system development environments usually deploy the object management facilities of a so-called repository to store the documents created and maintained during system development. PCTE is the ISO and ECMA standard for a public tool interface for an open repository [23]. In this paper we present document retrieval extensions for an OQL-oriented query language for PCTE. The extensions proposed cover (1) pattern matching, (2) term based document retrieval with automatically generated document description vectors, (3) the flexible definition of what is addressed as a "document" in a given query, and (4) the integration of these facilities into a CASE tool. Whereas the integration of pattern matching facilities into query languages has been addressed by other authors before, the main contribution of our approach is the homogeneous integration of term based document retrieval and the flexible definition of documents.
    Performance Evaluation of a Distributed Architecture for Information Retrieval BIBAPDF 110-118
      Brendon Cahoon; Kathryn S. McKinley
    Information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we describe a fully functional distributed IR system based on the Inquery unified IR system. To refine this prototype, we implement a flexible simulation model that analyzes performance issues given a wide variety of system parameters and configurations. We present a series of experiments that measure response time, system utilization, and identify bottlenecks. We vary numerous system parameters, such as the number of users, text collections, terms per query, and workload to generalize our results for other distributed IR systems. Based on our initial results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.

    User Studies

    Elicitations During Information Retrieval: Implications for IR System Design BIBAPDF 120-127
      Amanda Spink; Abby Goodrum; David Robins; Mei Mei Wu
    What elicitations or requests for information do search intermediaries and users with information requests make before and during an information retrieval (IR) interaction -- and for what purposes? These issues were investigated in two studies of elicitations during 40 mediated IR interactions -- including one study of the user elicitation purposes and another of the search intermediary elicitation purposes. A total of 2409 elicitations were identified -- 852 user elicitations within 10 purpose categories and 1557 search intermediary elicitations within 15 purpose categories. The elicitation purposes included requests for information on search terms and strategies, database selection, search procedures, system's outputs and relevance of retrieved items, and users' knowledge and previous information-seeking. Implications for the development of a dialogue-based model of IR interaction and the design of IR systems are also discussed.
    Evaluating User Interfaces to Informations Retrieval Systems: A Case Study on User Support BIBAPDF 128-136
      Giorgio Brajnik; Stefano Mizzaro; Carlo Tasso
    Designing good user interfaces to information retrieval systems is a complex activity. The design space is large and evaluation methodologies that go beyond the classical precision and recall figures are not well established. In this paper we present an evaluation of an intelligent interface that covers also the user-system interaction and measures user's satisfaction. More specifically, we describe an experiment that evaluates: (i) the added value of the semi-automatic query reformulation implemented in a prototype system; (ii) the importance of technical, terminological, and strategic supports and (iii) the best way to provide them. The interpretation of results leads to guidelines for the design of user interfaces to information retrieval systems and to some observations on the evaluation issue.

    Efficiency

    Efficient Processing of One and Two Dimensional Proximity Queries in Associative Memory BIBAPDF 138-146
      K. L. Liu; G. J. Lipovski; C. Yu; Naphtali Rishe
    Proximity queries that involve multiple object types are very common. In this paper, we present a parallel algorithm for answering proximity queries of one kind over object instances that lie in a one-dimensional metric space. The algorithm exploits a specialized hardware, the Dynamic Associative Access Memory chip. In most proximity queries of this kind, the number of object types is less than or equal to three and the distance d, within which object instances are required to locate to satisfy a given proximity condition, is small ([d/80] = 1). The execution time for such queries is linearly proportional to the number of object types and is independent of the size of the database. This allows numerous concurrent users to be serviced. The algorithm is extended to process 2-dimensional proximity queries efficiently.
    Efficient Transaction Support for Dynamic Information Retrieval Systems BIBAPDF 147-155
      Mohan Kamath; Krithi Ramamritham
    To properly handle concurrent accesses to documents by updates and queries in information retrieval (IR) systems, efforts are on to integrate IR features with database management system (DBMS) features. However, initial research has revealed that DBMS features optimized for traditional databases, display degraded performance while handling text databases. Since efficiency is critical in IR systems, infrastructural extensions are necessary for several DBMS features, transaction support being one of them. This paper focuses on developing efficient transaction support for IR systems where updates and queries arrive dynamically, by exploiting the data characteristics of the indexes as well as of the queries and updates that access the indexes. Results of performance tests on a prototype system demonstrate the superior performance of our algorithms.

    Multimedia IR

    Image Organization and Retrieval with Automatically Constructed Feature Vectors BIBAPDF 157-165
      Kyung-Ah Han; Sung-Hyun Myaeng
    Retrieving images based on their contents is a difficult problem. Instead of using manually assigned content descriptors for retrieval, we take an approach where images are indexed and organized automatically so that the users can retrieve images by visually browsing the organized image space. For image indexing, the objects in an image are first analyzed for their shape features such as roundness, rectangularity, elipticity, eccentricity, bending energy. These features are used to form a feature vector that represents the image. Subsequently, the feature vectors representing all the images in a image database are organized by a unsupervised neural net learning algorithm called Self-Organizing Map. Since this image feature map reflects the statistical patterns, i.e., the inter-similarities of the objects, the relationships among the images can be recognized by their location, neighborhood, and the way the map is organized. For the feasibility and practicality of the approach, a prototype system has been developed and tested with some experiments.
    Phonetic String Matching: Lessons from Information Retrieval BIBAPDF 166-172
      Justin Zobel; Philip Dart
    Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic matching techniques. Our experimental comparison with existing techniques such as Soundex and edit distances, which is based on recall and precision, demonstrates that the new techniques are superior. In addition, reasoning from the similarity of phonetic matching and information retrieval, we have applied combination of evidence to phonetic matching. Our experiments with combining demonstrate that it leads to substantial improvements in effectiveness.

    Query Refinement

    Experiments on Using Semantic Distances between Words in Image Caption Retrieval BIBAPDF 174-180
      Alan F. Smeaton; Ian Quigley
    Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term approach to IR applications is based upon precomputing semantically-based word-word similarities, work which is described elsewhere, and using these as part of the document-query similarity measure. A basic premise of our word-to-word similarity measure is that the input to this computation is the correct or intended word sense but in information retrieval applications, automatic and accurate word sense disambiguation remains an unsolved problem. In this paper we describe our first successful application of these ideas to an information retrieval application, specifically the indexing and retrieval of captions describing the content of images. We have hand-captioned 2714 images and to circumvent, for the time being, the problems raised by word sense disambiguation, we manually disambiguated polysemous words in captions. We have also built a collection of 60 queries and for each, determined relevance assessments. Using this environment we were able to run experiments in which we varied how the query-caption similarity measure used our pre-computed word-word semantic distances. Our experiments, reported in the paper, show significant improvement for this environment over the more traditional approaches to information retrieval.
    Automatic Linking of Thesauri BIBAPDF 181-186
      S. Amba; N. Narasimhamurthi; Kevin C. O'Kane; Philip M. Turner
    This paper describes a procedure for automatically linking thesauri. Such inter-thesauri linking will enable a user to query a database using terms from a thesaurus that was not used to index the database. The procedure uses a data driven approach for linking. Practical implementation using single link technique and a case study linking terms from Thesaurus of ERIC Descriptors and Thesaurus of Psychological Index Terms is described.
    A New Method of Weighting Query Terms for Ad-Hoc Retrieval BIBAPDF 187-195
      K. L. Kwok
    Ad-hoc retrieval relies on the evidence from a user's query to provide a sufficient variety of terms as well as different term frequencies for differentiating term importance. Short queries lack both types of information. A new method of automatically weighting query terms for ad-hoc retrieval is introduced that works for short queries. It is based on the term usage statistics in a collection and no training is required. Experiments with both the TREC2 and TREC4 ad-hoc queries show that this weighting scheme can provide significantly better results at the initial retrieval stage. At the expanded query stage, results vary from equal to significantly better than those relying on the original query weights. In particular, this automatic method provides similar improvements to extra short queries of two to four content terms only.

    Logic

    A Relevance Terminological Logic for Information Retrieval BIBAPDF 197-205
      Carlo Meghini; Umberto Straccia
    A Terminological Logic is presented as an information retrieval model, with a four-valued semantics that gives to its inference relation the flavour of relevance, that is a strict connection in meaning between the premises and the conclusion of the arguments licensed by the logic. The logic also permits the expression of meta-knowledge enforcing a closed-world reading of the knowledge concerning specified individuals and primitive concepts. A Gentzen-style, sound and complete calculus for reasoning in the logic is given, thus establishing the basis for an information retrieval engine.
    Retrieval of Complex Objects Using a Four-Valued Logic BIBAPDF 206-214
      Thomas Rolleke; Norbert Fuhr
    The aggregated structure of documents plays a key role in full-text, multimedia, and network Information Retrieval (IR). Considering aggregation provides new querying facilities and improves retrieval effectiveness. We present a knowledge representation for IR purposes which pays special attention to this aggregated structure of objects. In addition, further features of objects can be described. Thus, the structure of full-text documents, the heterogeneity and the spatial and temporal relationships of objects typical for multimedia IR, and meta information for network IR are representable within one integrated framework.
       The model we propose allows for querying on the content of documents (objects) as well as on other features. The query result may contain objects having different types. Instead of retrieving only whole documents, the retrieval process determines the least aggregated entities that imply the query.

    Asian Languages

    Using n-Grams for Korean Text Retrieval BIBAPDF 216-224
      Joon Ho Lee; Jeong Soo Ahn
    There is a difficulty in applying the conventional word-based indexing to Korean. The indexable segment of a word, i.e. stem is often a compound noun, which results in the serious decrease of retrieval effectiveness. The morpheme-based indexing, which decomposes a compound noun into simple nouns, has been developed to overcome the problem of compound nouns. It, however, requires a large dictionary and complex linguistic knowledge. In this paper we propose a new indexing method by combining the word-based indexing and the n-gram indexing. The proposed method alleviates the problem of compound nouns without dictionaries and linguistic knowledge. Experimental results show that the proposed method might be almost as effective as the morpheme-based indexing.
    On Chinese Text Retrieval BIBAPDF 225-233
      Jian-Yun Nie; Martin Brisebois; Xiaobo Ren
    In previous studies, Chinese text retrieval has often been dealt with on the character basis. This approach is not suited to deal with complex queries. We suggest that Chinese text retrieval should work with words instead of characters. The crucial problem is to segment originally continuous Chinese texts into words. In this paper, we first propose a hybrid segmentation approach which unifies the commonly used approaches. The system SMART is then adapted to index the segmented Chinese texts. Finally, we suggest that Chinese text retrieval should move further to include a thesaurus in order to cope with the rich vocabulary of Chinese.

    Modeling

    A Deductive Data Model for Query Expansion BIBAPDF 235-243
      Kalervo Jarvelin; Jaana Kristensen; Timo Niemi; Eero Sormunen; Heikki Keskustalo
    We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. The models specify the matching or the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for heterogeneous IR system environments. Expansion is controlled by adjustable matching reliability.
    An Application of Plausible Reasoning to Information Retrieval BIBAPDF 244-252
      Farhad Oroumchian; Robert N. Oddy
    This work explores the use of plausible inferences as a means of retrieving relevant documents. Collins and Michalski's theory of plausible reasoning has been modified to accommodate information retrieval. Methods are proposed to represent document contents by logical terms and statements, and queries by incomplete logical statements. Extensions to plausible inferences are discussed.
       Two versions of the extended plausible reasoning system were implemented, one using dominance weights (described in the paper) and the other using tf*idf (Term Frequency Inverse Document Frequency) weights. Experiments were conducted using the titles and abstracts of the CACM collection and it was found that both versions of the extended plausible reasoning system are better than the vector space model and the system using dominance weights performed better than the system with tf*idf weights.
    A Belief Network Model for IR BIBAPDF 253-260
      Berthier A. N. Ribeiro; Richard Muntz
    We introduce a belief network model for IR which is derived from probabilistic considerations over a clearly defined sample space. This model subsumes the classical models in IR and generalizes the inference network model of Turtle and Croft. Further, we show how to extend the model with information from other queries (which we call contexts) to yield improved retrieval performance.

    Filtering

    Document Filtering with Inference Networks BIBAPDF 262-269
      Jamie Callan
    Although statistical retrieval models are now accepted widely, there has been little research on how to adapt them to the demands of high speed document filtering. The problems of document retrieval and document filtering are similar at an abstract level, but the architectures required, the optimizations that are possible, and the quality of the information available, are all different.
       This paper describes a new statistical document filtering system called InRoute, the problems of filtering effectiveness and efficiency that arise with such a system, and experiments with various solutions.
    Incremental Relevance Feedback for Information Filtering BIBAPDF 270-278
      James Allan
    We use data from the TREC routing experiments to explore how relevance feedback can be applied incrementally -- using a few judged documents each time -- to achieve results that are as good as if the feedback occurred in one pass. We show that relatively few judgments are needed to get high-quality results. We also demonstrate methods that reduce the amount of information archived from past judged documents without adversely affecting effectiveness. A novel simulation shows that such techniques are useful for handling long-standing queries with drifting notions of relevance.
    Method Combination for Document Filtering BIBAPDF 279-287
      David A. Hull; Jan O. Pedersen; Hinrich Schutze
    There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of document filtering, using queries from the Tipster reference corpus. We find that simple averaging strategies do indeed improve performance, but that direct averaging of probability estimates is not the correct approach. Instead, the probability estimates must be renormalized using logistic regression on the known relevance judgements. We examine more complex combination strategies but find them less successful due to the high correlations among our filtering methods which are optimized over the same training data and employ similar document representations.

    Categorization

    Combining Classifiers in Text Categorization BIBAPDF 289-297
      Leah S. Larkey; W. Bruce Croft
    Three different types of classifiers were investigated in the context of a text categorization problem in the medical domain: the automatic assignment of ICD9 codes to dictated inpatient discharge summaries. K-nearest-neighbor, relevance feedback, and Bayesian independence classifiers were applied individually and in combination. A combination of different classifiers produced better results than any single type of classifier. For this specific medical categorization problem, new query formulation and weighting methods used in the k-nearest-neighbor classifier improved performance.
    Training Algorithms for Linear Text Classifiers BIBAPDF 298-306
      David D. Lewis; Robert E. Schapire; James P. Callan; Ron Papka
    Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is presented showing Widrow-Hoff and EG to be more effective than the widely used Rocchio algorithm on several categorization and routing tasks.
    Context-Sensitive Learning Methods for Text Categorization BIBAPDF 307-315
      William W. Cohen; Yoram Singer
    Two recently implemented machine learning algorithms, RIPPER and sleeping experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the "context" of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.

    User Studies

    Detection of Shifts in User Interests for Personalized Information Filtering BIBAPDF 317-325
      W. Lam; S. Mukhopadhyay; J. Mostafa; M. Palakal
    Several machine learning approaches have been proposed in the literature to automatically learn user interests for information filtering. However, many of them are ill-equipped to deal with changes in user interests that may occur due to changes in the user's personal and professional situations. If undetected over a long time, such changes may cause significant degradation in the filtering performance and user satisfaction during the period of non-detection. In this paper, we present a two-level learning approach to cope with such non-stationary user interests. While the lower level consists of a standard convergence-type machine learning algorithm, the higher level uses Bayesian analysis of the user provided relevance feedback to detect shifts in user interests. Once such a shift is detected, the lower-level learning algorithm is suitably reinitialized to quickly adapt to the new user profile. Experimental results with simulated users are presented to demonstrate the feasibility of the approach.
    Interactive Information Retrieval Systems: From User Centered Interface Design to Software Design BIBAPDF 326-334
      P. Mulhem; L. Nigay
    This article is concerned with the design and implementation of Information Retrieval Systems (IRS). We show how theories and models from the domain of Human Computer Interaction (HCI) can be applied to the design of IRS. We first study the user's tasks by modelling the mental activities of the user while accomplishing a task. Adopting a system perspective, we consider the processing tasks of an IRS and organize them in a design space. We then build upon the design space to consider the implications of such data processing and levels of abstraction on software design. Finally we present PAC-Amodeus, a software architecture model and illustrate the applicability of the approach with the implementation of an IRS: the TIAPRI system.

    Panel

    Building and Using Test Collections BIBAPDF 335-337
      Donna Harman
    This panel, emphasizing audience participation, will focus on issues in building and using test collections for information retrieval. There is currently interest in building new test collections in many languages, or for different types of media. This panel presents an opportunity to share experiences gained from past test collection building and usage to help guide the development of these new test collections.
       The Cranfield studies (Cleverdon et al. 1966) emphasized the importance of creating test collections and using these for comparative evaluation of retrieval systems. Now, thirty years later, we are dealing with a major increase in the amount and type of information available for searching, and also working in an interactive environment instead of the old batch retrieval mode. This does not eliminate the need for static test collections, but does require a re-examination of how to build and use these collections.

    System Demonstrations: Abstracts

    IR Application Development with FireWorks BIBAPDF 338
      David J. Harper; David G. Hendry; Jan-Jaap IJdens; Jeomon Jose
    We are developing two different architectures that support the development of IR applications. Eclair is a C++ class library that provides abstractions for the representation, storage, and retrieval of multimedia documents. FireWorks is a user-interface architecture, consisting of IR-specific toolkit and frameworks, for constructing a wide range of IR applications. Both architectures are built on top of Objectstore, a state-of-the-art distributed object data management system. The goal of this demonstration is to show the broad range of IR applications can be implemented with these frameworks.
    A Novel Client-Server Protocol for the Demanding OPAC User BIBAPDF 338
      E. J. Yannakoudakis
    We aim to demonstrate an integrated multilingual OPAC module that offers several novel features, including SDI, free-text retrieval and a full thesaurus tree, using an open logical client-server protocol. The module was implemented under Windows 3.1/95 and a UNIX server running INFORMIX-4GL while the communication layer is operational under both serial and TCP/IP standards. Note that the new module is part of the integrated library automation system LIBRETTO which has already been installed at several sites in Greece.
       The need for a new system to manage our catalogue became even greater as we attempted to process thesaurus relationships using Greek character sets, to define character mappings, to switch from one language to another, to produce logical lexicographic orderings, to process voice, image, video, and cardex information, etc.
       The protocol is based on ASCII packages which are exchanged using the format: Header% [Flags] Data [Flags], where the header comprises one to four characters and is used to denote the command function or the result set. The first and the second set of flags are used by some commands to enhance their meaning. For example, in the free text search, the flag 10100000000 is used to limit the search into two specific database fields, in this case the TITLE and the AUTHOR. Data contains command parameters or result information.
    WING: A Multiple-View Smooth Information Retrieval System BIBAPDF 338
      Toshiyuki Masui; Mitsuru Minakuchi; George R. Borden; Kouichi Kashiwagi
    WING (Whole Interactive Nara Guide) is a system to enable smooth information retrieval by integrating multiple search strategies such as 3-D map visualization, hypertext, keyword search, and category search with the same smooth zooming interface. Nara, located about 40 kilometers south of Kyoto, is an ancient capital of Japan and full of tourist attractions like old shrines and temples. Using WING, any vague knowledge about the data can be utilized to narrow the search space, and users can smoothly navigate through Nara at will, by modifying the search area in each view.
    Visualizing Search Results with Envision BIBAPDF 338-339
      Lucy Terry Nowell; Robert K. France; Edward A. Fox
    Envision, a multimedia digital library of computer science literature, is unique in the variety of document characteristics visualized and in the flexibility afforded users to change the visualization to suit their current information needs. Envision's Graphic View window displays search results as a matrix of icons. Using controls provided in the user interface, the layout of the matrix may be changed to visualize estimated relevance to query, publication year, document type, document size, author names, and index terms. Icon characteristics used in the visualizations include placement relative to the x-axis and y-axis and an alphanumeric icon label, as well as icon size, shape, and color. Visualizations supporting a wide range of user tasks will be demonstrated.
    Ariadne: Electronic Information for Computer Scientists BIBAPDF 339
      Markus Dreger; Stefan Lohrum; Kai Grossjohann; Claus Dieter Ziegler
    Ariadne is a WWW-based system that combines several services in one information system. Quick access to references of relevant publications, preprints, software, etc. about Computer Science is provided by its main service, the navigational part. This is a repository of links, structured hierarchically according to the ACM Classification Scheme. The repository is maintained by the cooperation of the users ("give and take").
       They suggest new links to be added which then undergo a cooperative peer review (also by the users) which ensures the quality of information. A search interface is included, as well.
       Ariadne offers two profiling services. The first regularly checks a URL and notifies the user if the information has changed; the second (known as SFprofile) regularly issues queries to freeWAlS-sf databases with an SFgate WWW forms interface. SFprofile supports in-place (in the HTML form) editing of profiles and a variety of processing options.
    WebCompass: An Agent-Based Metasearch and Metadata Discovery Tool for the Web BIBAPDF 339
      Brad Allen; John Jensen; Jay Nelson; Brian Ulicny; Kristina Lerman; Linda Rudell-Betts
    WebCompass is an agent-based system that automatically gathers and organizes information from the World Wide Web for personal or workgroup use. Given a specification of user interest in the form of thesaurus topics, WebCompass performs metasearch to retrieve potentially relevant Web pages, and then automatically summarizes, classifies and clusters the retrieved Web pages, creating a relational database of metadata about Web pages, organized by topic. WebCompass periodically updates the database, providing the user with an up-to-date overview of Web content of interest.
    Querying Hierarchically Structured Texts with Generalized Context-Free Grammars BIBAPDF 339
      Yves Marcoux; Martin Sevigny
    The system demonstrated is a prototype of information retrieval system for hierarchically structured text. It is based on a new model in which queries are expressed as generalized context-free grammars that allow complementation and intersection operations on the right-hand side of productions. The prototype also incorporates new user-interface elements especially designed for assisting users in retrieving information from large structured-document bases. Such elements include succinct and detailed guides to the structure of the document base. The prototype is demonstrated operating on a document base of SGML documents.
    The CD-ROM of Crete: A Multimedia Tourism Application, Based on Geographic Interaction and Information Retrieval Techniques BIBAPDF 339
      N. Moumoutzis; M. Frangonikolakis
    During the last years, MUSIC has undertaken a number of competitive research and development projects in the area of multimedia tourism information systems. A powerful model has been elaborated supporting the detailed description of areas of touristic interest with their sites and facilities hierarchically organized. An extensive multimedia information base for the region of Crete has been established. Tools have been developed to maintain this information base. A hypermedia model has been implemented in order to create hypermedia presentations with detailed and accurate geographic maps, diagrams and architectural sketches. Commercially available tools have been integrated for creating synthetic multimedia presentations, virtual navigations, and multimedia data processing.
       Graphical queries supported are classified into (a) boolean queries, that are expressed graphically on trees representing type hierarchies and (b) similarity queries, that are meaningful only for type hierarchies with weights. These two classes of queries can be combined together. The CD-ROM of Crete is an Interactive Multimedia Tourism Application developed on this Software Bench that exploits all the above mentioned capabilities.

    Posters: Abstracts

    An Efficient Retrieval Algorithm of Two-trie Structures BIBAPDF 340
      Takako Tsuji; Syouji Mizobuchi; Masami Shishibori; Jun-ichi Aoe
    A trie can search all keys made up from prefixes in an input string, in only one time scanning, since a trie advances the retrieval character by character, which makes up keys. In tries transitions for front strings of keys can be shared, but transitions for rear strings of keys cannot be shared. A DAWG (Directed Acyclic Word-Graph) can share all transitions for both front and rear strings at their front and rear, respectively. There is, however, no guarantee that there exists a subset of states in the DAWG with a one-to-one correspondence between the keys and the states in that subset. The purpose of this paper is to present a compression method of tries under consideration of the following features: 1) Ability of identifying information corresponding to keys uniquely. 2) Sharing transitions for both front and rear strings of keys. 3) Applicability to a dynamic set of keys. The two-trie presented is evaluated by the time complexities of retrieval, insertion and deletion algorithms, and the time and space efficiencies are verified by the results of computer simulations for various key sets.
    An Efficient Retrieval Algorithm of Binary Digital Search-Trees Using Hierarchical Structures BIBAPDF 340
      Masami Shishibori; Yoshitaka Hayashi; Kazuhiro Morita; Jun-ichi Aoe
    Binary Digital Search-tree (BDS-tree) is often used as the hash table of extensible hashing. However, for a bigger key set, the size of the BDS-tree becomes too huge to store into the main memory. In order to solve this problem, Jonge et al. proposed the method to change the BDS-tree into a compact bit stream (called the pre-order bit stream) by traversing the binary tree in pre-order. This pre-order bit stream is a very compact data structure, however, the bigger the binary tree, the longer the pre-order bit stream is, as a result, the time cost to retrieve keys located toward the end of the bit stream is high. This method separates the BDS-tree into smaller trees of a certain depth. These separated trees are connected by pointers. The BDS-tree separated in this way is called a Hierarchical Binary Digital Search-tree (HBDS-tree). Moreover, the pre-order bit stream is created and controlled for each of the separated trees. By using this improved method, each process can be sped up, because unnecessary scanning of the pre-order bit stream for each separated tree can be omitted. The experimental results, using 50,000 English words, show that the presented method provides faster access than the traditional BDS-tree. Retrieval is 18 times, the insertion is 13 times and the deletion is 4 times faster. Thus, it can be concluded that the time each of the processes requires is significantly less when using this method. As for the storage space required by the HBDS-tree, the sizes of the BDS-tree are about 1.25 times the size used by the BDS-tree. However, by nature, the pre-order bit stream is very compact in size, thus their sizes are good enough for practical applications. Moreover, for the BDS-tree and the HBDS-tree, both represented by the pre-order bit stream, the storage requirement to register one key is of 2.50 and 3.24 bits, respectively. Thus, these methods can be operated with more compact storage than the B-tree, B+-tree, B*-tree, etc.
    Assessed Relevance and Stylistic Variation BIBAPDF 340
      Jussi Karlgren
    Texts vary not only by topic. Indeed, stylistic variation between texts of the same topic is often at least as noticeable as the variation between texts of different topic but same genre. This variation is straightforward to compute; distinguishing genres can be done with reasonable precision.
       This experiment uses a large collection of documents and information retrieval queries, where a subset of documents have been hand-judged for relevance to the queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied. This variation is more marked if the corpus is partitioned into stylistically homogeneous subcorpora.
       It remains to be investigated how general the results are. They may be at least partly an effect of the underlying text genres; they are certainly to a large extent an effect of the specific task and information retrieval scenario the human judges sought to emulate. This experiment shows that for a certain set of users and for a certain scenario a clear bias towards certain types of text can be found: these results should be taken as a starting point in investigating how situations affect measures of stylistic variation.
    A Spatial Feature Based Photograph Retrieval System BIBAPDF 341
      Joemon M. Jose; David John Harper; David G. Hendry
    A new indexing and retrieval strategy for photographic collections is introduced. The approach uses both textual and spatial features to improve retrieval performance. Spatial features are objects within photographs that have been named and spatially identified by indexers.
       The similarity measure considers the textual and spatial similarity between objects, and combines the evidence from both sources. Users can set the relative importance of each source. A system for evaluating this retrieval strategy is being built with the ECLAIR class library. The retrieval interface comprises a multi-modal query interface, a browser for thumbnails, and a pane for viewing selected documents.
    On the Potential Utility of Negative Relevance Feedback in Interactive Information Retrieval BIBAPDF 341
      Colleen Cool; Nicholas J. Belkin; Juergen Koenemann
    Automatic relevance feedback (RF) is gradually being incorporated into interactive information retrieval (IR) systems. It is generally the case that such systems use only positive, and not negative, relevance judgements for query modification. We present results of an empirical study of online searchers' relevance judgement behaviors which indicate that users of IR systems could and would make use of negative RF if it were offered.
       The data we present were collected as part of a larger study done within the context of TREC-4. Fifty volunteer searchers were recruited to search for two TREC topics each, using the INQUERY IR system, with automatic RF available. We collected the following types of data in our study: pre- and post-search interviews; videotapes of the searches, including "thinking aloud" protocols; and search logs. During the post-search interview, searchers discussed their uses of and reactions to automatic RF.
       Our analysis of the post-search interview and verbal protocol data identified four functions of negative RF desired by users in our study, and the types of searching problems addressed by each. These data also suggest a specific, novel method of implementing negative RF; that is, to expand queries by incorporating terms, with negative weights, which appear only in negatively judged documents.
    Retrieval of Paintings by Specifying Impression Words BIBAPDF 341
      Kozaburo Hachimura
    This poster presents a method for retrieving color paintings, in which information about "principal" and background colors extracted by image processing is used as key for retrieval.
       Region segmentation processing based on color information is first applied to all of the paintings. Every regions are then evaluated in terms of likeliness for forming principal and background color regions. We use two evaluation functions, i.e., one for principal region and the other for background region. The evaluation value at each region is accumulated on a color histogram according to the region color, and several colors with large accumulation value are extracted as background colors or principal colors.
       In general, we get some kinds of impression from a combination of colors. Also, in the field of color design, the data about relationship between color combinations and impression represented by some appropriate words has been used. Referencing these data, we can use principal and background color information as a clue for describing impression of the painting. Combination of principal colors and background colors extracted from color painting has been linked to corresponding "impression words", and we can make retrieval of paintings by specifying an "impression word" that indicates impression of the paintings.
    Extraction of a Word List from an Existing Dictionary to be Used in a Communication-Aid Software BIBAPDF 341
      Brigitte Le Pevedic
    By allowing a rapid input from the keyboard, those normally having great difficulty in using one in the normal way, this project is working on an initial computing solution. The objective is to propose to the software user, in as short a time as possible, the word to be written. The idea is as follows: to look up in an electronic dictionary, depending on the start of the text already input and the first letters of the word being input, a list of the most commonly used words in the most plausible grammatical category/ies depending on the context on its left. The user would select from such a list the desired word or should it not be given, a new letter. This system guarantees an impressive rate of input.
       To this software one will associate an electronic dictionary with morphological and syntax characteristics as well as as a learning system allowing the software to be adapted to the individual user.
    Merging Hypertext and Information Retrieval in the Interface BIBAPDF 341-342
      Gene Golovchinsky; Mark Chignell
    Information Retrieval (IR) is concerned with facilitating users' access to large amounts of (predominantly textual) information. In the 1980s, hypertext was introduced as an interactive, dynamic user interface style that did away with complicated query syntaxes typical of IR systems of the day. In this work, we propose a logical continuum of interface functionality that unites traditional information retrieval and hypertext interfaces. We describe VOIR, an information exploration interface that combines the immediacy and user-centeredness of hypertext interfaces with the flexibility and generality of modern information retrieval algorithms. This interface, implemented in a software prototype, presents the search results in parallel (newspaper-style), enabling the user to compare search results and to evaluate the effectiveness of the query. It uses term frequency heuristics to identify terms that will serve as anchors, and uses the context around the selected anchor to determine the collection of destination documents. The algorithms developed in this prototype are being applied to a Web-based dynamic hypertext.
    Fast Full Text Search with Free Word Using TS-File BIBAPDF 342
      Takashi Sato
    A new data structure (TS-file) is proposed in order to make a fast search for an arbitrary string in a large full text stored in secondary storage. The TS-file stores the location of every string of length L (the level) in the text.
       Using this, we can efficiently search for, not only strings of length L but also those shorter than or longer than L.
       Because one can find arbitrary strings using the TS-file alone (i.e. with no additional text searches), the proposed method is more accurate than the one using signature files by which one can only know the possibility of existence.
       From an analysis of search cost, the number of accesses to secondary storage in order to find the first match to a key is two when the key length l_{k} is shorter than or equal to L, and 2(L-l_{k}+1) otherwise.
       The proposed algorithm is one of the fastest for arbitrary string search. And the time required to find all matching patterns is proportional to the number of matches, which is the lowest rate of increase for these kind of searches. Because of the high storage cost of the basic TS-file, a compressed TS-file is introduced in order to lower storage costs for practical use without losing search speed.
    OLISTICO: An Evaluation Environment for Interactive IR Applications BIBAPDF 342
      M. Agosti; R. Bandiera; F. Bazo; R. Colotti; S. Gabrielli
    OLISTICO is an experimental evaluation environment where different methodological tools can be applied to evaluate the effectiveness of various types of interactive IR applications and systems. Problems of measuring system performance at interaction level with real users are addressed. Understanding of methodological consequences of taking an evaluation user-oriented paradigm, and developing of a conceptual framework within which collect and integrate qualitative and quantitative data on user information-seeking behaviour are faced.
       In the context of DUO-OPAC and Legal Documents hypertext HyperLaw2 evaluation studies OLISTICO made use of online questionnaires to identify user's profiles and interface effectiveness, transaction log to analyse different navigation strategies through available semantic structures, and ASL way of search length measuring in Hypertext IR systems.
       An "activity-centred" methodology is going to be embodied in the environment to establish if specific applications fit users real information needs as they emerge in the everyday practice, and evaluate expert workers interaction with IR tools, in their natural settings, as a way of accomplishing their typical working tasks. Therefore a new HIR system which implements HyperLaw2 model but with capabilities of managing different and new types of document collections is under development. Repeating the experiments with a more generalised system gives opportunities of evaluating the HIR model the systems are based on, and making the used evaluation methods more general and independent of any knowledge domain. Furthermore this study is giving insights on how to develop applications to enhance the quality of user-tasks performance within the context of computer-supported working activities.

    Post-Conference Research Workshops

    Foundations of Advanced Information Visualization for Information Retrieval Systems BIBAPDF 343
      Mark E. Rorvig; Matthias Hemmje
    This workshop is designed to educate interested members of the IR research community on current approaches to information visualization and visual retrieval interfaces. The quickly advancing research field of information visualization helps users acquire the ability to assess retrieved items in the aggregate quickly and efficiently. In a visual retrieval interface, maps of relationships among retrieved items are displayed graphically in two or three dimensions, offering visual clues to users in modifying searches, selecting groups of documents for examination textually, or selecting individual items as seeds for a new search. Approaches and methods vary by the complexity of dimensions or variables used to organize the data into visual patterns. At one end of this spectrum are systems which organize data by axes of time and subject. At the opposite end are systems which organize data by rational functions of object similarity using techniques originally applied to the mapping of bibliometric spaces. The workshop is organized around presentations focusing on the number of dimensions (or variables) used in the effort to assist the user in aggregate data comprehension.
       The workshop will be continued on Friday to set up and kickoff FADIVA-Net, a Network of Excellence which is meant to become the successor for the FADIVA working group on Foundation of Advanced Information Visualization during the next three years. This organizational framework will foster research and cooperation in the field of advanced information visualization. Participants interested in this activity should state their interest in future participation in this international group during registration with the workshop.
    Courseware, Training and Curriculum in Information Retrieval BIBAPDF 343
      Edward A. Fox
    The meeting will begin with 2 hours of presentations:
  • Activities of the SIGIR Education Committee
  • Aids to teaching about evaluation
  • Hypertext and hypermedia requirements
  • Needs in Europe and Asia
  • Requirements of potential employers of those trained in the IR field The rest of the morning will deal with the problem of defining an IR curriculum: what courses can be agreed upon at the undergraduate, masters, and PhD levels (and at similar levels in non-US-type educational systems), and what "knowledge modules" can be defined and put together in several ways to suit a variety of course sequences. At the end of the morning session the workshop will break into groups so that each course and knowledge module can be covered by those best suited. Initial discussions can begin over lunch.
       After lunch, each workshop group will meet in a separate room. During a 2 hour period a draft syllabus for each course or knowledge module will be developed. After another break, the groups will all meet together for a closing plenary, presenting their conclusions. Discussion will lead to refinements of group reports. The workshop report will appear in Forum to stimulate discussion in the IR community of the proposed curriculum.
  • Research in Information Retrieval and the Practical Needs of Research and Cultural Libraries BIBAPDF 344
      Encarnacion Rancitelli
    The goal of the workshop is to develop and to get a better perception of the possible interrelations of these two fields of interests. The workshop will be divided into two parts of 3 hours each. In the morning the discussion will be opened by 2 short presentations (15-20 minutes each) by representatives of users and librarians, which will be the basis for discussion. The needs of that sector, especially oriented to the end users, will be the common theme of the presentations and the following discussion. In the afternoon the researchers and commercial companies will be also invited to present their points of view and their ideas on the solutions they can offer to solve the problems raised during the morning. Participation of the audience in the debate will be supported by the moderator. Concrete recommendations and ways of possible future cooperation should result from that finishing workshop. The moderator will be responsible for summarising the main ideas of the speakers and the audience.
    Cross-Linguistic Information Retrieval Workshop BIBAPDF 344
      Gregory Grefenstette
    As more and more information sources are becoming available on the Web, the portion of internationally available non-English text is slowly growing. The problem of accessing unrestricted information with queries expressed in a language different from the source language of the documents will become more widespread. This workshop aims to bring together researchers working on this problem of cross-linguistic, or multilingual, information retrieval. We invite articles describing implementations of cross-linguistic retrieval, use of bilingual dictionaries, of parallel corpora, or of non-parallel comparable bilingual corpora applied to the retrieval problem. Questions: Does automatic translation solve the problem? Are word-to-word correspondences sufficient? How should ambiguity in translation be dealt with? What kind of user interaction can help resolve ambiguities? How can test collections be built for multilingual information retrieval? How can responding documents in different languages be merged? What would multilingual user interfaces look like? What can be learned from four decades of machine translation for the cross-linguistic retrieval problem? Accepted papers will be allotted a 30-minute presentation followed by 10-20 minutes of questions and discussion.
    Networked Information Retrieval BIBAPDF 344
      Norbert Fuhr
    The recent and rapid growth of the Internet and corporate intranets poses new problems for Information Retrieval. There is now a need for tools that help people navigate the network, select which collections to search, and fuse the results returned from searching multiple collections. These problems are being addressed by the international IR research community and a number of digital library projects around the world, e.g. the U.S. Digital Libraries projects, the ERCIM Digital Libraries projects, and the German MEDOC project. The goal of this workshop is to bring together people from each of these areas to discuss their varying approaches to common problems. Researchers are invited to submit position papers or extended abstracts discussing novel approaches to the following problems:
  • Resource selection: selecting from among a set of collections or databases;
  • Data fusion: merging or fusing results from different collections or
       databases;
  • Archival retrieval methods for heterogeneous objects;
  • Metaknowledge;
  • Consistency;
  • Multilingual environments;
  • User interfaces; and
  • Architectures for networked information retrieval.