HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 868788899091929394959697989900010203

Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Robert Korfhage; Edie Rasmussen; Peter Willet
Location:Pittsburgh, Pennsylvania
Dates:1993-Jun-27 to 1993-Jul-01
Standard No:ISBN 0-89791-605-0; ACM Order Number 606930; ACM DL: Table of Contents hcibib: IR93
  1. Inference Networks
  2. TREC Overview
  3. Full Text Analysis
  4. Compression and Signature Files
  5. Association Methods
  6. Query Expansion
  7. Linguistic Analysis
  8. Structured Text
  9. Panel on Natural Language Processing for Information Management
  10. Processing for Japanese Text
  11. Interface Issues
  12. Mathematical Models
  13. DBMS/IR Integration
  14. Query Processing and Evaluation
  15. Demonstrations

Inference Networks

Relevance Feedback and Inference Networks BIBAPDF 2-11
  David Haines; W. Bruce Croft
Relevance feedback, which modifies queries using judgements of file relevance of a few, highly-ranked documents, has historically been an important method for increasing the performance of information retrieval systems. In this paper, we extend the inference network model introduced by Turtle and Croft to include relevance feedback techniques. The difference between relevance feedback on text abstracts and full text collections is studied. Preliminary results for relevance feedback on the structured queries supported by the inference net model are also reported.
Efficient Context-Sensitive Plausible Inference for Information Disclosure BIBAPDF 12-21
  P. D. Bruza; L. C. van der Gaag
Plausible inference is an essential aspect of logic-based information disclosure. This paper proposes a context-sensitive plausible inference mechanism based on a so-called index expression belief network. Plausible inference is cloaked as probabilistic evidence propagation within this network. Preliminary experiments show general evidence propagation algorithms to be too inefficient for real-life information disclosure applications. The paper sketches two optimizations whereby efficient, special-purpose evidence propagation may be realized.
Automatic Indexing Based on Bayesian Inference Networks BIBAPDF 22-34
  Kostas Tzeras; Stephan Hartmann
In this paper, a Bayesian inference network model for automatic indexing with index terms (descriptors) from a prescribed vocabulary is presented. It requires an indexing dictionary with rules mapping terms of the respective subject field onto descriptors and inverted lists for terms occurring in a set of documents of the subject field and descriptors manually assigned to these documents. The indexing dictionary can be derived automatically from a set of manually indexed documents. An application of the network model is described, followed by an indexing example and some experimental results about the indexing performance of the network model.

TREC Overview

Overview of the First TREC Conference BIBAPDF 36-47
  Donna Harman
The first Text REtrieval Conference (TREC-1) was held in early November 1992 and was attended by about 100 people working in the 25 participating groups. The goal of the conference was to bring research groups together to discuss their work on a new large test collection. There was a large variety of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. As results had been run through a common evaluation package, groups were able to compare the effectiveness of different techniques, and discuss how differences among the systems affected performance.

Full Text Analysis

Approaches to Passage Retrieval in Full Text Information Systems BIBAPDF 49-58
  Gerard Salton; J. Allan; Chris Buckley
Large collections of full-text documents are now commonly used in automated information retrieval. When the stored document texts are long, the retrieval of complete documents may not be in the users' best interest. In such circumstances, efficient and effective retrieval results may be obtained by using passage retrieval strategies designed to retrieve text excerpts of varying size in response to statements of user interest.
   New approaches are described in this study for implementing selective passage retrieval systems, and identifying text passages responsive to particular user needs. An automated encyclopedia search system is used to evaluate the usefulness of the proposed methods.
Subtopic Structuring for Full-Length Document Access BIBAPDF 59-68
  Marti A. Hearst; Christian Plaunt
We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.
The Identification of Important Concepts in Highly Structured Technical Papers BIBAPDF 69-78
  Chris D. Paice; Paul A. Jones
Automatic abstracting, typically based on extraction of important sentences from a text, has been treated as a largely separate task from automatic indexing. This paper describes an approach in which the indexing and abstracting tasks are effectively combined. It is applicable to highly structured empirical research papers, whose content can be organised using a semantic frame. During a scan of a source text, stylistic clues and constructs are used for extracting candidate fillers for the various slots in the frame. Subsequently, an actual concept name is chosen for each slot by comparing the various candidates and their weights.

Compression and Signature Files

Is Huffman Coding Dead? BIBPDF 80-87
  Abraham Bookstein; Shmuel T. Klein; Timo Raita
Compression of Indexes with Full Positional Information in Very Large Text Databases BIBAPDF 88-95
  Gordon Linoff; Craig Stanfill
This paper describes a combination of compression methods which may be used to reduce the size of inverted indexes for very large text databases. These methods are Prefix Omission, Run-Length Encoding, and a novel family of numeric representations called n-s coding. Using these compression methods on two different text sources (the King James Version of The Bible and a sample of Wall Street Journal Stories), the compressed index occupies less than 40% of the size of the original text, even when both stopwords and numbers are included in the index. The decreased time required for I/O can almost fully compensate for the time needed to uncompress the postings. This research is part of an effort to handle very large text databases on the CM-5, a massively parallel MIMD super-computer.
Analysis of Multiterm Queries in a Dynamic Signature File Organization BIBAPDF 96-105
  Deniz Aktug; Fazli Can
Our analysis combines the concerns of signature extraction and signature file organization which have usually been treated as separate issues. We also relax the uniform frequency and single term query assumptions and provide a comprehensive analysis for multiterm query environments where terms can be classified based on their query and database occurrence frequencies. The performance of three superimposed signature generation schemes is explored as they are applied to one dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits regardless of their discriminatory power whereas the second and third methods (MMS and MMM) emphasize the terms with high query and low database occurrence frequencies. Of these three schemes, only MMM takes the probability distribution of the number of query terms into account in finding the optimal mapping strategy. Derivation of performance evaluation formulas is provided together with the results of various experimental settings. Suggestions as to how to implement the given techniques in real life cases are also provided. Results indicate that MMM outperforms the other methods as the gap between the discriminatory power of the terms gets larger. The absolute value of the savings provided by MMM reach a maximum for the high query weight case. However, the extra savings decline sharply for high weight and moderately for the low weight queries with the increase in database size.

Association Methods

Computation of Term Associations by a Neural Network BIBAPDF 107-115
  S. K. M. Wong; Y. J. Cai; Y. Y. Yao
This paper suggests a method for computing term associations based on an adaptive bilinear retrieval model. Such a model can be implemented by using a three-layer feed-forward neural network. Term associations are modeled by weighted links connecting different neurons, and are derived by the perceptron learning algorithm without the need for introducing any ad hoc parameters. The preliminary results indicate the usefulness of neural networks in the design of adaptive information retrieval systems.
Cluster Analysis for Hypertext Systems BIBAKPDF 116-125
  Rodrigo A. Botafogo
Identifying nodes of information that are highly related has many applications in any information systems, and in particular in hypertext systems. In this paper we present, a technique to identify "natural" clusters in a hypertext. A natural cluster is a cluster that is not arbitrary, but depends only on intrinsic properties of the hypertext. In our case, the property we will use to identify the clusters is the number of independent paths between nodes. Using the graph theoretic definition of k-edge-components we present an aggregation technique to cluster the nodes. We then use this techniques to cluster three medium sized hypertexts that were developed by different authors for different users, using different methodologies. We also show how to use clustering to improve data display, browsing and retrieval.
Keywords: Aggregation, Clustering, Structural analysis, Hypertext, Graph theory
Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections BIBAPDF 126-134
  Douglass R. Cutting; David R. Karger; Jan O. Pedersen
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.

Query Expansion

Integrating a Dynamic Lexicon with a Dynamic Full-Text Retrieval System BIBAPDF 136-145
  Peter G. Anick; Rex A. Flynn
There has been a great deal of interest within the Information Retrieval community in evaluating the use of linguistic knowledge to improve the indexing and searching of textual databases. Such systems must often employ a lexicon to store information about the words and phrases comprising the application's domain. Unlike a static lexicon, a dynamic lexicon raises practical concerns about the coordination between the state of the lexicon and IR indexing schemes based on lexical knowledge. Additionally, it introduces a host of database management issues, many of which are similar to those found in the text databases as well. In this paper, we explore a range of system design and performance issues that arise when integrating a dynamic lexicon with a dynamic full-text information retrieval system. We observe that the principle of functional isolation argues against the use of language-dependent information in article indexes and favors the use of query-time strategies for applying lexical knowledge. We propose and evaluate a system architecture which embodies this principle. We also show how a storage and retrieval infrastructure based on Burkowski's [BURKOWSKI92] "containment model" abstraction can be employed to implement both the text retrieval and lexicon facilities required in an integrated system.
A User-Centred Evaluation of Ranking Algorithms for Interactive Query Expansion BIBAPDF 146-159
  Efthimis N. Efthimiadis
The evaluation of 6 ranking algorithms for the ranking of terms for query expansion is discussed within the context of an investigation of interactive query expansion and relevance feedback in a real operational environment. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. This methodology introduces a new way of looking at and evaluating ranking algorithms for query expansion. The evaluation focuses on the similarities in the performance of the different algorithms and how the algorithms with similar performance treat terms.
Concept Based Query Expansion BIBAPDF 160-169
  Yonggang Qiu; H. P. Frei
Query expansion methods have been studied for a long time -- with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particular collection from which it is constructed. We address the two important issues with query expansion: the selection and the weighting of additional search terms. In contrast to earlier methods, our queries are expanded by adding those terms that are most similar to the concept of the query, rather than selecting terms that are similar to the query terms. Our experiments show that this kind of query expansion results in a notable improvement in the retrieval effectiveness when measured using both recall-precision and usefulness.

Linguistic Analysis

Using WordNet to Disambiguate Word Senses for Text Retrieval BIBAPDF 171-180
  Ellen M. Voorhees
This paper describes an automatic indexing procedure that uses the "IS-A" relations contained within WordNet and the set of nouns contained in a text to select a sense for each polysemous noun in the text. The result of the indexing procedure is a vector in which some of the terms represent word senses instead of word stems. Retrieval experiments comparing the effectiveness of these sense-based vectors vs. stem-based vectors show the stem-based vectors to be superior overall, although the sense-based vectors do improve the performance of some queries. The overall degradation is due in large part to the difficulty of disambiguating senses in short query statements. An analysis of these results suggests two conclusions: the IS-A links define a generalization/specialization hierarchy that is not sufficient to reliably select the correct sense of a noun from the set of fine sense distinctions in WordNet; and missing correct matches because of incorrect sense resolution has a much more deleterious effect on retrieval performance than does making spurious matches.
MURAX: A Robust Linguistic Approach for Question Answering Using an On-Line Encyclopedia BIBAPDF 181-190
  Julian Kupiec
Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad domain: answering general-knowledge questions using an on-line encyclopedia.
   A closed-class question is a question stated in natural language, which assumes some definite answer typified by a noun phrase rather than a procedural answer. The methods hypothesize noun phrases that are likely to be the answer, and present the user with relevant text in which they are marked, focussing the user's attention appropriately. Furthermore, the sentences of matching text that are shown to the user are selected to confirm phrase relations implied by the question, rather than being selected solely on the basis of word frequency.
   The corpus is accessed via an information retrieval (IR) system that supports boolean search with proximity constraints. Queries are automatically constructed from the phrasal content of the question, and passed to the IR system to find relevant text. Then the relevant text is itself analyzed; noun phrase hypotheses are extracted and new queries are independently made to confirm phrase relations for the various hypotheses.
   The methods are currently being implemented in a system called MURAX and although this process is not complete, it is sufficiently advanced for an interim evaluation to be presented.
Viewing Morphology as an Inference Process BIBAPDF 191-202
  Robert Krovetz
Morphology is the area of linguistics concerned with the internal structure of words. Information Retrieval has generally not paid much attention to word structure, other than to account for some of the variability in word forms via the use of stemmers. This paper will describe our experiments to determine the importance of morphology, and the effect that it has on performance. We will also describe the role of morphological analysis in word sense disambiguation, and in identifying lexical semantic relationships in a machine-readable dictionary. We will first provide a brief overview of morphological phenomena, and then describe the experiments themselves.

Structured Text

Structured Answers for a Large Structured Document Collection BIBAKPDF 204-213
  Michael Fuller; Eric Mackie; Ron Sacks-Davis; Ross Wilkinson
There is a simple method for integrating information retrieval and hypertext. This consists of treating nodes as isolated documents and retrieving them in order of similarity. If the nodes are structured, in particular, if sets of nodes collectively constitute documents, we can do better. This paper shows how the formation of the hypertext, the retrieval of nodes in response to content based queries, and the presentation of the nodes can be achieved in a way that exploits the knowledge encoded as the structure of the documents. The ideas are then exemplified in an SGML based hypertext information retrieval system.
Keywords: Hypertext, Information retrieval, SGML, Structured documents
Retrieval from Hierarchical Texts by Partial Patterns BIBAPDF 214-222
  Pekka Kilpelainen; Heikki Mannila
Structured texts (for example dictionaries and user manuals) typically have a hierarchical (tree-like) structure. We describe a query language for retrieving information from collections of hierarchical text. The language is based on a tree pattern matching notion called tree inclusion. Tree inclusion allows easy expression of queries that use the structure and the content of the document. In using it a user need not be aware of the whole structure of the database. Thus a language based on tree inclusion is data independent, a property made necessary because of the great variance in the structure of the texts.

Panel on Natural Language Processing for Information Management

Explorations of NLP for Information Management: Observations from Practice in Mono- and Multi-lingual Applications BIBAPDF 224-225
  David A. Evans
The panel will explore the practical strengths and limitations of natural-language processing (NLP) and the implications for NLP-based information-management (IM). It will offer special perspectives on NLP-based IM by reflecting on techniques that are employed in multi-lingual NLP.
   Examples will be taken from current NLP-based projects at Carnegie Mellon University in the Laboratory for Computational Linguistics (LCL) and in the Center for Machine Translation (CMT). The projects represent a variety of application types, ranging from the identification and extraction of information from mono- and multi-lingual texts to the first steps toward robust multi-lingual speech processing. They encompass a spectrum of techniques and new areas of research in NLP, many of which are specifically designed to enhance robustness, to exploit limited NLP, and to work well with large amounts of unrestricted text.
Lessons from the CLARIT Project BIBAPDF 224-225
  David A. Evans
A general orientation to strategies in NLP-based IM will be provided. This will focus especially on such matters as:
  • the use of selective NLP
  • NLP in combination with heuristics and other techniques
  • practical means of circumventing problems in NLP
  • techniques for the extraction of NLP-useful resources from corpora
  • techniques for using syntactic information to win semantic functionality Specific results of the CLARIT project will be discussed, including use of NLP in automatic indexing and retrieval in thesaurus construction, and in the discovery of term relation sets. As one illustration of the strengths and limitations of selective-NLP designs, he will describe the project's experience in rapidly modifying the CLARIT system to process Dutch and French, preserving general CLARIT IM functionality.
      Jaime G. Carbonell
    Work on NLP-based information extraction will be presented, illustrating the power of limited (hence, practical) semantic approaches. Examples will be based on results from the CMT TIPSTER project. Lessons from a TIPSTER-companion project, SHOGUN, designed to perform extraction on Japanese text, will be used to illustrate the generalizability of the techniques and the feasibility of a uniform approach to multi-lingual IM. The results of the JANUS project, developing robust speech-to-speech machine translation, will also be discussed, to suggest directions for future research in IM.
    Lessons from PANGLOSS BIBAPDF 225
      Sergei Nirenburg
    The state of the art in machine translation will be explored by focusing on the special challenges of developing 'interlingua representations'. Examples will be taken from the PANGLOSS project. One relevant observation is that attempts in IM to establish concept-normal forms (to support term normalizations, to identify term substitution classes, to achieve 'concept-based' indexing and retrieval) must sooner or later confront the same sorts of problems that knowledge-based machine translation currently is exploring. Such problems are reflected in the limitations in syntactic or lexical correspondence across languages; need for 'construction'-sensitive techniques in NLP, and the ontological complexity of interlingua representations. As a counterpoint to the difficulties of concept-based processing, however, examples will be offered of 'successful' and practical techniques to discover and manage cross-linguistic term correspondences. These include methods for identifying corresponding 'open compounds' (e.g., "blood transfusion"/"transfusion sanguine"; "stock market crash"/"effondrement de la bourse"). Since open compounds are often the most informative concept identifiers in sub-domains, they are among the most useful index terms. The ability to identify them using 'simple' technology can provides a basis for interlingual IM.

    Processing for Japanese Text

    Simple Word Strings as Compound Keywords: An Indexing and Ranking Method for Japanese Texts BIBAPDF 227-236
      Yasushi Ogawa; Akako Bessho; Masako Hirose
    This paper describes a new indexing method for Japanese text databases using the simple keyword string, in which a compound word is treated as a string of simple words, which are the smallest units in Japanese grammar which still maintain their meanings. This method allows retrieved texts to be ranked, according to the similarity of their meaning to the query, without using a control vocabulary or thesaurus. This paper also introduces the keyword feature, which describes the syntactic and semantic characteristics of a word, and results in more precise keyword extraction and text retrieval as well as simple dictionary maintenance.
    A Comparison of Indexing Techniques for Japanese Text Retrieval BIBAPDF 237-246
      Hideo Fujii; W. Bruce Croft
    A series of Japanese full-text retrieval experiments were conducted using an inference network document retrieval model. The retrieval performance of two major indexing methods, character-based and word-based, were evaluated. Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system. This result has practical significance since the character-based indexing speed is considerably faster than the traditional word-based indexing. All the queries in this experiment were automatically formulated from natural language input.

    Interface Issues

    Development of a Modern OPAC: From REVTOLC to MARIAN BIBAPDF 248-259
      Edward A. Fox; Robert K. France; Eskinder Sahle; Amjad Daoud; Ben E. Cline
    Since 1986 we have investigated the problems and possibilities of applying modern information retrieval methods to large online public access library catalogs (OPACs). In the Retrieval Experiment -- Virginia Tech OnLine Catalog (REVTOLC) study we carried out a large pilot test in 1987 and a larger, controlled investigation in 1990, with 216 users and roughly 500,000 MARC records. Results indicated that a forms-based interface coupled with vector and relevance feedback retrieval methods would be well received. Recent efforts developing the Multiple Access and Retrieval of Information with ANnotations (MARIAN) system have involved use of a specially developed object-oriented DBMS, construction of a client running under NeXTSTEP, programming of a distributed server with a thread assigned to each user session to increase concurrency on a small network of NeXTs, refinement of algorithms to use objects and stopping rules for greater efficiency, usability testing and iterative interface refinement.
    Content Awareness in a File System Interface: Implementing the 'Pile' Metaphor for Organizing Information BIBAPDF 260-269
      David E. Rose; Richard Mander; Tim Oren; Dulce B. Ponceleon; Gitta Salomon; Yin Yin Wong
    The pile is a new element of the desktop user interface metaphor, designed to support the casual organization of documents. An interface design based on the pile concept suggested uses of content awareness for describing, organizing, and filing textual documents. We describe a prototype implementation of these capabilities, and give a detailed example of how they might appear to the user. We believe the system demonstrates how content awareness can be not only used in a computer filing system, but made an integral part of the user's experience.
    A Browser for Bibliographic Information Retrieval, Based on an Application of Lattice Theory BIBAPDF 270-279
      Gert Schmeltz Pedersen
    An application of mathematical lattice theory, called relationship lattices, is utilized to attack problems of operational bibliographic information retrieval.
       The proposed solution offers an interface to the information searcher enabling operation in a world of concepts, authors, and document records and their relationships. This hides the complexities of query language and database structures, and it allows to use a personally preferred terminology and to browse, query and download document records in a convenient way.
       The main component of the proposed solution is a personal thesaurus built up as a relationship lattice.

    Mathematical Models

    An Application of Least Squares Fit Mapping to Text Information Retrieval BIBAPDF 281-290
      Yiming Yang; Christopher G. Chute
    This paper describes a unique example-based mapping method for document retrieval. We discovered that the knowledge about relevance among queries and documents can be used to obtain empirical connections between query terms and the canonical concepts which are used for indexing the content of documents. These connections do not depend on whether there are shared terms among the queries and documents; therefore, they are especially effective for a mapping from queries to the documents where the concepts are relevant but the terms used by article authors happen to be different from the terms of database users. We employ a Linear Least Squares Fit (LLSF) technique to compute such connections from a collection of queries and documents where the relevance is assigned by humans, and then use these connections in the retrieval of documents where the relevance is unknown. We tested this method on both retrieval and indexing with a set of MEDLINE documents which has been used by other information retrieval systems for evaluations. The effectiveness of the LLSF mapping and the significant improvement over alternative approaches was evident in the tests.
    On the Evaluation of Boolean Operators in the Extended Boolean Retrieval Framework BIBAPDF 291-297
      Joon Ho Lee; Won Yong Kim; Myoung Ho Kim; Yoon Joon Lee
    The retrieval models based on the extended boolean retrieval framework, e.g., the fuzzy set model and the extended boolean model have been proposed in the past to provide the conventional boolean retrieval system with the document ranking facility. However, due to undesirable properties of evaluation formulas for the AND and OR operations, the former generates incorrect ranked output in certain cases and the latter suffers from the complexity of computation. There have been a variety of fuzzy operators to replace the evaluation formulas. In this paper we first investigate the behavioral aspects of the fuzzy operators and address important issues to affect retrieval effectiveness. We then define an operator class called positively compensatory operators giving high retrieval effectiveness, and present a pair of positively compensatory operators providing high retrieval efficiency as well as high retrieval effectiveness. All the claims are justified through experiments.
    A Model of Information Retrieval Based on a Terminological Logic BIBAPDF 298-307
      Carlo Meghini; Fabrizio Sebastiani; Umberto Straccia; Costantino Thanos
    According to the logical model of Information Retrieval (IR), the task of IR can be described as the extraction, from a given document base, of those documents d that, given a query q, make the formula d → q valid, where d and q are formulae of the chosen logic and "→" denotes the brand of logical implication formalized by the logic in question. In this paper, although essentially subscribing to this view, we propose that the logic to be chosen for this endeavour be a Terminological Logic (TL): accordingly, the IR task becomes that of singling out those documents d such that d ≤ q, where d and q are terms of the chosen TL and "≤" denotes subsumption between terms. We call this the terminological model of IR.
       TLs are particularly suitable for modelling IR; in fact, they can be employed: 1) in representing documents under a variety of aspects (e.g. structural, layout, semantic content); 2) in representing queries; 3) in representing lexical, "thesaural" knowledge. The fact that a single logical language can be used for all these representational endeavours ensures that all these sources of knowledge will participate in the retrieval process in a uniform and principled way.
       In this paper we introduce MIRTL, a TL for modelling IR according to the above guidelines; its syntax, formal semantics and inferential algorithm are described.

    DBMS/IR Integration

    A Probabilistic Relational Model for the Integration of IR and Databases BIBAPDF 309-317
      Norbert Fuhr
    In this paper, a probabilistic relational model is presented which combines relational algebra with probabilistic retrieval. Based on certain independence assumptions, the operators of the relational algebra are redefined such that the probabilistic algebra is a generalization of the standard relational algebra. Furthermore, a special join operator implementing probabilistic retrieval is proposed. When applied to typical document databases, queries can not only ask for documents, but for any kind of object in the database. In addition, an implicit ranking of these objects is provided in case the query relates to probabilistic indexing or uses the probabilistic join operator. The proposed algebra is intended as a standard interface to combined database and IR systems, as a basis for implementing user-friendly interfaces.
    SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data BIBAPDF 318-327
      Peter Schauble
    The access structure, the retrieval model, and the system architecture of the SPIDER information retrieval system are described. The access structure provides efficient weighted retrieval on dynamic data collections. It is based on signatures and non-inverted item descriptions. The signatures provide upper bounds for the exact retrieval status values such that only a small number of exact retrieval status values have to be computed. SPIDER's retrieval model is a probabilistic retrieval model that is capable to exploit the database scheme of semistructured data collections. This model can be considered as a further development of the Binary Independence Indexing (BII) model. The system architecture was derived systematically from a given set of requirements such as effective and efficient retrieval on dynamic data collections, exploitation of the database scheme, computed views, and the integration of information retrieval functionality and database functionality.

    Query Processing and Evaluation

    Using Statistical Testing in the Evaluation of Retrieval Experiments BIBAPDF 329-338
      David Hull
    The standard strategies for evaluation based on precision and recall are examined and their relative advantages and disadvantages are discussed. In particular, it is suggested that relevance feedback be evaluated from the perspective of the user. A number of different statistical tests are described for determining if differences in performance between retrieval methods are significant. These tests have often been ignored in the past because most are based on an assumption of normality which is not strictly valid for the standard performance measures. However, one can test this assumption using simple diagnostic plots, and if it is a poor approximation, there are a number of non-parametric alternatives.
    The Effect of Multiple Query Representations on Information Retrieval System Performance BIBAPDF 339-346
      N. J. Belkin; C. Cool; W. B. Croft; J. P. Callan
    Five independently generated Boolean query formulations for ten different TREC topics were produced by ten different expert online searchers. These different formulations were grouped, and the groups, and combinations of them, were used as searches against the TREC test collection, using the INQUERY probabilistic inference network retrieval engine. Results show that progressive combination of query formulations leads to progressively improving retrieval performance. Results were compared against the performance of INQUERY natural language based queries, and in combination with them. The issue of recall as a performance measure in large databases was raised, since overlap between the searches conducted in this study, and the TREC-1 searches, was smaller than expected.
    An Evaluation of Query Processing Strategies Using the TIPSTER Collection BIBAPDF 347-355
      James P. Callan; W. Bruce Croft
    The TIPSTER collection is unusual because of both its size and detail. In particular, it describes a set of information needs, as opposed to traditional queries. These detailed representations of information need are an opportunity for research on different methods of formulating queries. This paper describes several methods of constructing queries for the INQUERY information retrieval system, and then evaluates those methods on the TIPSTER document collection. Both AdHoc and Routing query processing methods are evaluated.


    Multiple Access and Retrieval of Information with ANnotations BIBAPDF 357
      Edward A. Fox
    MARIAN is a client/server online public access catalog system developed at Virginia Tech to support large numbers of users running on a variety of terminals and workstations, searching our million MARC record library catalog. It:
  • implements vector retrieval with a forms-based interface,
  • uses our specially developed object-oriented DBMS (LEND) which has a powerful
       "information graph" query language and minimal perfect hash functions,
  • has clients for NeXTstep and Motif,
  • runs as a distributed server with a thread assigned to each user session to
       increase concurrency on a small network of NeXTs,
  • incorporates our algorithms to use objects and stopping rules for greater
  • Project Envision BIBAPDF 357
      Edward A. Fox
    This system involves the development of a user-centered hypermedia database from the computer science literature. We will demonstrate several innovative screens for "digital library" access, that are the result of an ongoing program of user-interviewing, interface design, usability testing, and iterative refinement. Searching, retrieval and display of ACM publications builds upon our work with the Large External object-oriented Network Database (LEND) and MARIAN systems, along with extensions to the Z39.50 information retrieval protocol.
    Incremental Interface Design: A Prototype Graphical User Interface for Grateful Med BIBAPDF 357-358
      Gary Marchionini
    An important problem for established information retrieval systems is system upgrades to new platforms and interface styles. Although front-ends for online systems shield users from the intricacies of data structures and retrieval engine machinations, no interface can be totally independent of the underlying data and retrieval algorithms. Just as most programming efforts today go into maintenance rather than new developments, user interface efforts are increasingly given to upgrading character-based and graphical user interfaces (GUI). Designing interfaces under such incremental constraints offers interesting challenges to designers.
       This demonstration presents the results of our efforts in the University of Maryland Human Computer Interaction Laboratory to design a new prototype interface for the Grateful Med system. Because about 30% of all Grateful Med searches result in no hits, our contract from the National Library of Medicine was focused on query formulation and intermediate results. In the first phase of the project, we designed guidelines and prototype screens for the existing character-based system. In the second phase presented here, we designed guidelines and prototypes for a GUI. The prototype anchors the user on a query form screen to avoid disorientation, provides systematic and consistent feedback for search progress, and provides simple suggestions for the special cases of no hits and too many hits. The prototype was mocked up using HyperCard and illustrates what we believe is an improved component for Grateful Med without assuming Internet access speeds or high-performance workstations. Suggestions for interfaces that are less-constrained by existing system characteristics and user market conditions were also made.
    BRAQUE: An Interface to Support Browsing and Interactive Query Formulation in Information Retrieval Systems BIBAPDF 358
      P. G. Marchetti; S. Vazzana; R. Panero; N. J. Belkin
    BRAQUE [1,2] is a Windows-based interface to a large-scale operational information retrieval system. The general design of BRAQUE is based on two concepts. One is that of supporting end users of information retrieval systems in a variety of information-seeking strategies, with the user being able to move easily and effectively from any one such strategy to any other. Included in this concept is the idea that information retrieval is an inherently interactive process, and that support of users should be support of their interaction, with all of the system resources. The second concept is that of a two-level hypertext model of information retrieval system databases, which provides a framework in which the kind of interactive support we propose can be implemented. The current version of BRAQUE is a working prototype interface to the ESA-QUEST system. It provides facilities for browsing in thesaural structures, and among documents and terms related to documents. It supports information retrieval without query formulation, and it supports progressive, interactive query formulation and reformulation, as well as general browsing and exploration.
    ELSA: An Electronic Library Search Assistant BIBAPDF 358-359
      Bekka Denning; Philip J. Smith
    ELSA is a frame-based system that assists information seekers in exploring topics of interest and retrieving information on relevant documents. Two issues have been explored in its design:
  • 1. The identification and implementation of "knowledge-based search tactics"
        that the computer can apply to help the information seeker explore a topic
        by generating suggestions for related topics;
  • 2. The development of an interface that allows the user to make effective use
        of such functions.
  • 3. The net result is a system that not only assists in document retrieval, but
        that effectively serves as a question answering system, using the document
        representations in the frame database as an implicit knowledge-base to
        provide answers to questions like: What are the side effects of using
        diazepam as a sedative?
  • Queries-R-Links: Browsing and Retrieval via Interactive Querying BIBAPDF 359
      Gene Golovchinsky; Mark Chignell
    In this demonstration we introduce an interactive querying style of interaction that combines features of hypertext with Boolean querying, using direct markup of text to launch queries. We show the Queries-R-Links system that we have developed at the University of Toronto. The current implementation of Queries-R-Links is a SmallTalk (ObjectWorks from ParcPlace) program that runs on a SparcStation in tandem with a full text retrieval system. Queries-R-Links uses the graphical markup method to launch Boolean queries interactively using direct markup of text.
       Queries-R-Links is fully expressive for non-negated Boolean queries. Queries are expressed in disjunctive normal form by grouping selected words into AND clusters. This form is used as a relatively simple but expressive method of querying that avoids the use of negations, parentheses and nested expressions. Expressions in non-negated disjunctive normal form (DNF) do not use the NOT operator, and each expression is a collection of disjoint AND clusters (e.g., [A AND B] OR [C AND D]). Words are selected by clicking on them, and dragging between words creates a line representing the AND operator. OR relations are indicated by the absence of connecting lines. Queries-R-Links also allows the user to enter additional keywords in the margin. These words may then be linked in the same manner. Thus queries may include words that do not appear in the text currently being viewed. Experimental evidence in support of the use of this type of graphical querying as an alternative to textual Boolean queries is described in Golovchinsky and Chignell (1993).
    A Common Query Interface for Multilingual Document Retrieval from Databases of the European Community Institutions BIBAPDF 359
      A. Steven Pollitt; Geoffrey P. Ellis; Martin P. Smith; Mark R. Gregory; Chun Sheng Li; Henrik Zangenberg
    The move to a single market for the members of the European Community has provided a focus for efforts that are intended to overcome the barriers of language, especially in the workings of the European Community Institutions. The storage and retrieval of documents is of vital importance to the workings of the institutions. This demo will illustrate the results of applying techniques originally developed to improve access to medical databases, to EPOQUE, the major document database of the European Parliament. A common query interface has been incorporated, based on the EUROVOC thesaurus and MenUSE front-end software.