HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 868788899091929394959697989900

Proceedings of the Thirteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Thirteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Jean-Luc Vidick
Location:Brussels, Belgium
Dates:1990-Sep-05 to 1990-Sep-07
Standard No:ISBN 0-89791-408-2; ACM Order Number 606900; ACM DL: Table of Contents hcibib: IR90
  1. Information Retrieval Models (1)
  2. Hypertext and Image Retrieval
  3. User Interfaces
  4. Signatures
  5. Knowledge Based Information Retrieval
  6. Quantitative Text Analysis
  7. Panel Session
  8. Query Processing
  9. Access Methods
  10. Information Retrieval Models (2)

Information Retrieval Models (1)

Inference Networks for Document Retrieval BIBA 1-24
  Howard Turtle; W. Bruce Croft
The use of inference networks to support document retrieval is introduced. A network-based retrieval model is described and compared to conventional probabilistic and Boolean models.
A Retrieval Model Based on an Extended Modal Logic and its Application to the RIME Experimental Approach BIBA 25-43
  Yves Chiaramella; Jianyun Nie
This paper focuses on the processing module of RIME, an experimental prototype of an intelligent information retrieval system designed to manage high precision queries on a corpus of medical reports. Though highly specific this particular corpus is representative of an important class of applications: information retrieval among full-text specialized documents which constitute critical sources of information in several organizations (medicine, law, space industry...). This experience allowed us to design and implement an elaborate model for the semantic content of the documents which is an extension of the Conceptual Dependancy approach. The underlying retrieval model is inspired from the Logic model proposed by C. J. van Rijsbergen, which has been considerably refined using an Extended Modal Logic. After presenting the context of the RIME project, we briefly describe the models designed for the interval representation of medical reports and queries. The main part of the paper is then devoted to the retrieval model and its application to the query processing module of RIME which has a natural language interface. Processing a query involves two main phases: the interpretation which transforms the natural language query into a search expression, and the evaluation phases which retrieves the corresponding medical reports. We focus here on the evaluation phases and show its relationship with the underlying retrieval model. Evaluations from practical experiments are also given with indications about current developments of the project.
Probabilistic Document Indexing from Relevance Feedback Data BIBA 45-61
  Norbert Fuhr; Chris Buckley
Based on the binary independence indexing model, we apply three new concepts for probabilistic document indexing from relevance feedback data:
  • 1. Abstraction from specific terms and documents, which overcomes the
        restriction of limited
  • relevance information for parameter estimation.
  • 2. Flexibility of the representation, which allows the integration of new text
        analysis and knowledge-based methods in our approach as well as the
        consideration of more complex document structures or different types of
        terms (e.g. single words and noun phrases).
  • 3. Probabilistic learning or classification methods for the estimation of the
        indexing weights making better use of the available relevance information. We give experimental results for five test collections which show improvements over other indexing methods.
  • Hypertext and Image Retrieval

    EXPRESS: An Experimental Interface for Factual Information Retrieval BIBA 63-81
      Heinz Ulrich Hoppe; Karin Ammersbach; Barbara Lutes-Schaab; Gaby Zinssmeister
    The EXPRESS system has been designed and implemented in order to explore methods for user assistance in accessing complexly structured factual databases, e.g. relational product databases. Terminological support in this area has to take into account that different controlled vocabularies may be used in a variety of attributes spread over several relations. In our approach, traditional thesaurus structures are extended in order to cope with these problems and to encode further domain-specific knowledge. User support in query reformulation is based on this enriched thesaurus as well as on the local evaluation of the retrieved data sets. Concepts for the representation of retrieval strategies in the form of plans and their potential use in future systems are discussed.
    Hypertext, Full Text, and Automatic Linking BIBA 83-98
      James H. Coombs
    Current computing systems typically support only mid-century information structures: simple hierarchies. Hypertext technologies enable users to impose many structures on document sets and, consequently, provide many paths to desired information, but they require that users work their way through some structure. Full-text search eliminates this requirement by ignoring structure altogether. The search strategy can also be restricted to work within specified contexts. The architecture provided for search readily supports automatic linking. These ideas have been tested in IRIS Intermedia.
    Machine Learning and Vectorial Matching for an Image Retrieval Model: EXPRIM and the System RIVAGE BIB 99-114
      G. Halin; M. Crehange; P. Kerekes

    User Interfaces

    Online Query Refinement on Information Retrieval Systems: A Process Model of Searcher/System Interactions BIBA 115-133
      Hsinchun Chen; Vasant Dhar
    This article reports findings of empirical research that investigated information searchers' online query refinement process. Prior studies have recognized the information specialists' role in helping searchers articulate and refine queries. Using a semantic network and a Problem Behavior Graph to represent the online search process, our study revealed that searchers also refined their own queries in an online task environment. The information retrieval system played a passive role in assisting online query refinement, which was, however, one that confirmed Taylor's four-level query formulation model. Based on our empirical findings, we proposed using a process model to facilitate and improve query refinement in an online environment. We believe incorporating this model into retrieval systems can result in the design of more "intelligent" and useful information retrieval systems.
    A Direct Manipulation Interface for Boolean Information Retrieval via Natural Language Query BIBA 135-150
      Peter G. Anick; Jeffrey D. Brennan; Rex A. Flynn; David R. Hanssen; Bryan Alvey; Jeffrey M. Robbins
    This paper describes the design of a direct manipulation user interface for Boolean information retrieval. Intended to overcome the difficulties of manipulating explicit Boolean queries as well as the "black box" drawbacks of so-called natural language query systems, the interface presents a two-dimensional graphical representation of a user's natural language query which not only exposes heuristic query transformations performed by the system, but also supports query reformulation by the user via direct manipulation of the representation. The paper illustrates the operation of the interface as implemented in the AI-STARS full-text information retrieval system.
    Determining the Functionality and Features of an Intelligent Interface to an Information Retrieval System BIBA 151-177
      Nicholas J. Belkin; Pier Giorgio Marchetti
    In this paper, we propose a method for specifying the functionality of an intelligent interface to large-scale information retrieval systems and for implementing those functions in an operational environment. The method is based on a progressive, three-stage model of intelligent information support; a high-level cognitive task analysis of the information retrieval problem; a low-level specification of the host system functionality; and, derivation of explicit relations between the system functions and the cognitive tasks. This method is applied, by example, in the context of the European Space Agency Information Retrieval Service, with some specific suggestions for implementation of a stage one intelligent interface to that system.


    Using Syntactic Analysis in a Document Retrieval System that Uses Signature Files BIBA 179-192
      Ron Sacks-Davis; Peter Wallis; Ross Wilkinson
    Our work involves the study of the extent to which natural language processing techniques aid the automatic indexing and retrieval of documents. In this paper we describe the use of signature files in large text retrieval systems. We show that good performance can be obtained without requiring the significant overheads required for the inverted file technique. We examine the use of syntactic analysis of the text in all stages of retrieval and argue that an initial boolean query should be performed that provides a subset of documents, which are then ranked. We then give an algorithm for generating such queries, taking into account the syntactic structure of the queries.
    A Dynamic Signature Technique for Multimedia Databases BIBA 193-210
      F. Rabitti; P. Zezula
    A signature file acts as a filtering mechanism to reduce the amount data needs to be searched during query evaluation. Even though several techniques for organizing and searching signature files have been proposed in literature, they have serious limitations when applied to multimedia databases, where integrated access methods to text and image content are needed.
       A new signature technique, called Quick Filter, is proposed in the paper. According to this technique, signatures are divided into partitions, each of which holds signatures sharing the same characteristic key. As a result, it is possible to determine if the signatures in a partition satisfy a query by merely examining the key. Partitions not matching the key need not be searched. This method is based on dynamic hashing since signatures are hashed into partitions according to the keys and the file size, computed algorithmically from the signatures. Implementation of this technique is illustrated using an example and is verified by analytical performance evaluation.
       The result is a signature technique which satisfies the requirements for access methods in multimedia databases: dynamicity, with respect to insertions and updates, good query processing performance on large databases for high-weight queries.
    Surrogate Subsets: A Free Space Management Strategy for the Index of a Text Retrieval System BIBA 211-226
      F. J. Burkowski
    This paper presents a new data structure and an associated strategy to be utilized by indexing facilities for text retrieval systems. The paper starts by reviewing some of the goals that may be considered when designing such an index and continues with a small survey of various current strategies. It then presents an indexing strategy referred to as surrogate subsets discussing its appropriateness in the light of the specified goals. Various design issues and implementation details are discussed. Our strategy requires that a surrogate file be divided into a large number of subsets separated by free space which will allow the index to expand when new material is appended to the database. Experimental results report on the utilization of free space when the database is enlarged.

    Knowledge Based Information Retrieval

    Construction of a Dynamic Thesaurus and Its Use of Associated Information Retrieval BIBAK 227-240
      Haruo Kimoto; Toshiaki Iwadera
    An information retrieval system based on a dynamic thesaurus was developed utilizing the connectionist approach. The dynamic thesaurus consists of nodes, which represent each term of a thesaurus, and links, which represent the connections between nodes. Term information that is automatically extracted from user's relevant documents is used to change node weights and generate links. Node weights and links reflect a user's particular interest. A document retrieval experiment was conducted in which both a high recall rate and a high precision rate were achieved.
    Keywords: Connectionist model, Automatic indexing, Information retrieval, Thesaurus
    Knowledge Based Retrieval of Office Documents BIBA 241-253
      Augusto Celentano; Maria Grazia Fugini; Silvano Pozzi
    Document classification and retrieval systems for office applications require knowledge management to describe the semantics of documents related to procedural and domain dependent aspects of the office work to be described: operational dependencies, documents relationships and references to regulations and laws are concepts which are not explicitly stated in the text. This paper presents a semantic model for office documents classification and retrieval based on knowledge representation of the office procedural environment and of the application domain. Navigation along knowledge networks for document retrieval and browsing is described.
    Evaluation of an Expert System for Searching in Full Text BIBA 255-277
      Susan Gauch
    This paper presents a prototype expert system which provides online search assistance. The expert system automatically reformulates queries, using an online thesaurus as the source of domain knowledge, and a knowledge base of domain-independent search tactics. The expert system works with a full-text database which requires no syntactic or semantic pre-processing. In addition, the expert system ranks the retrieved passages in decreasing order of probable relevance.
       Users' search performance using the expert system was compared with their search performance on their own, and their search performance using the online thesaurus. The following conclusions were reached: 1) The expert system significantly reduced the number of queries necessary to find relevant passages compared with the user searching alone or with the thesaurus. 2) The expert system produced marginally significant improvements in precision compared with the user searching on their own. There was no significant difference in the recall achieved by the three system configurations. 3) Overall, the expert system ranked relevant passages above irrelevant passages.

    Quantitative Text Analysis

    Order Preserving Minimal Perfect Hash Functions and Information Retrieval BIBA 279-311
      Edward A. Fox; Qi Fan Chen; Amjad M. Daoud; Lenwood S. Heath
    Rapid access to information is essential for a wide variety of retrieval systems and applications. Hashing has long been used when the fastest possible direct search is desired, but is generally not appropriate when sequential or range searches are also required. This paper describes a hashing method, developed for collections that are relatively static, that supports both direct and sequential access. Indeed, the algorithm described gives hash functions that are optimal in terms of time and hash table space utilization, and that preserve any a priori ordering desired. Furthermore, the resulting order preserving minimal perfect hash functions (OPMPHFs) can be found using space and time that is on average linear in the number of keys involved.
    On the Interrelationship of Dictionary Size and Completeness BIBA 313-325
      Hubert Huther
    When dictionaries for specific applications or subject fields are derived from a text collection, the frequency distribution of the terms in the collection gives information about the expected completeness of the dictionary. If only a subset of the terms in the collection is to be included in the dictionary, the completeness of the dictionary can be optimized with respect to dictionary size.
       In this paper, formulas for the relationship between the frequency distribution of the terms in the collection and expected dictionary completeness are derived. First we regard one-dimensional dictionaries where the (non-trivial) terms occurring in the texts are to be included in the dictionary. Then we describe the case of two-dimensional dictionaries, which are needed for example for automatic indexing with a controlled vocabulary; here relationship between text terms and descriptors from the prescribed vocabulary have to be stored in the dictionary. For both cases, formulas for the interpolation and extrapolation with respect to different collection sizes are derived.
       We give experimental results for one-dimensional dictionaries and show how the completeness can be estimated and optimized.
    Construction of Optimal Graphs for Bit-Vector Compression BIBA 327-342
      Abraham Bookstein; Shmuel T. Klein
    Bitmaps are data structures occurring often in information retrieval. They are useful; they are also large and expensive to store. For this reason, considerable effort has been devoted to finding techniques for compressing them. These techniques are most effective for sparse bitmaps. We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed. The clustering method efficiently generates a graph structure on the bitmaps. The results of applying our algorithm to the Bible is presented: for some sets of bitmaps, our method almost doubled the compression savings.

    Panel Session

    Hypertext: "Growing Up?" BIBA 343-347
      M. E. Frisse; Maristella Agosti; Marie-France Bruandet; Udo Hahn; Stephen F. Weiss
    This panel will employ two different interpretations of the phrase "growing up" to address areas of common interest between hypertext and information retrieval researchers. First, the panelists will question whether or not hypertext is "growing up" as a scientific discipline; They will discuss characteristics that separate hypertext research from other related disciplines. Second, the panelists will discuss the problems encountered when a hypertext system "grows up" in size and complexity; They will discuss the very real problems expected when representing and integrating large knowledge bases, accommodating multiple users, and distributing single logical hypertexts across multiple physical sites.
       The panelists will not lecture, but they will advance a number of themes including "the Myth of Modularity" (Frisse), "New Architectures Employing Hyperconcept Databases" (Agosti), "Hypertext in Software Engineering" (Bruandet), "Automatic Hypertext Generation" (Hahn), and "Large-Scale Hypertexts" (Weiss).

    Query Processing

    Experiments with Query Acquisition and Use in Document Retrieval Systems BIBA 349-368
      W. Bruce Croft; Raj Das
    In some recent experimental document retrieval systems, emphasis has been placed on the acquisition of a detailed model of the information need through interaction with the user. It has been argued that these "enhanced" queries, in combination with relevance feedback, will improve retrieval performance. In this paper, we describe a study with the aim of evaluating how easily enhanced queries can be acquired from users and how effectively this additional knowledge can be used in retrieval. The results indicate that significant effectiveness benefits can be obtained through the acquisition of domain concepts related to query concepts, together with their level of importance to the information need.
    The Automatic Generation of Extended Queries BIBA 369-383
      Carolyn J. Crouch; Donald B. Crouch; Krishna R. Nareddy
    In the extended vector space model, each document vector consists of a set of subvectors representing the multiple concepts or concept classes present in the document. Typical information concepts, in addition to the usual content terms or descriptors, include author names, bibliographic links, etc. The extended vector space model is known to improve retrieval effectiveness. However, a major impediment to the use of the extended model is the construction of an extended query. In this paper, we describe a method for automatically extending a query containing only content terms (a single concept class) to a representation containing multiple concept classes. No relevance feedback is involved. Experiments using the CACM collection resulted in an average precision 34% better than that obtained using the standard single-concept term vector model.
    Term Clustering of Syntactic Phrases BIBA 385-404
      David D. Lewis; W. Bruce Croft
    Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.

    Access Methods

    Optimizations for Dynamic Inverted Index Maintenance BIBA 405-411
      Doug Cutting; Jan Pedersen
    For free-text search over rapidly evolving corpora, dynamic update of inverted indices is a basic requirement. B-trees are an effective tool in implementing such indices. The Zipfian distribution of postings suggests space and time optimizations unique to this task. In particular, we present two novel optimizations, merge update, which performs better than straight forward block update, and pulsing which significantly reduces space requirements without sacrificing performance.
    Partitioned Posting Files: A Parallel Inverted File Structure for Information Retrieval BIBA 413-428
      Craig Stanfill
    This paper describes algorithms and data structures for applying a parallel computer to information retrieval. Previous work has described an implementation based on overlap encoded signatures. That system was limited by 1) the necessity of keeping the signatures in primary memory, and 2) the difficulties involved in implementing document-term weighting. Overcoming these limitations requires adapting the inverted index techniques used on serial machines. The most obvious adaptation, also previously described, suffers from the fact that data must be sent between processors at query-time. Since interprocessor communication is generally slower than local computation, this suggests that an algorithm which does not perform such communication might be faster. This paper presents a data structure, called a partitioned posting file, in which the interprocessor communication takes place at database-construction time, so that no data movement is needed at query-time. Algorithms for constructing the data structure are also described. Performance characteristics and storage overhead are established by benchmarking against a synthetic database.
    Parallel Text Searching in Serial Files Using a Processor Farm BIBA 429-453
      Janey K. Cringean; Roger England; Gordon A. Manson; Peter Willett
    This paper discusses the implementation of a parallel text retrieval system using a microprocessor network. The system is designed to allow fast searching in document databases organised using the serial file structure, with a very rapid initial text signature search being followed by a more detailed, but more time-consuming, pattern matching search. The network is built from transputers, high performance microprocessors developed specifically for the construction of highly parallel computing systems, which are linked together in a processor farm. The paper discusses the design and implementation of processor farms, and then reports our initial studies of the efficiency of searching that can be achieved using this approach to text retrieval from serial files.

    Information Retrieval Models (2)

    An Architecture for Probabilistic Concept-Based Information Retrieval BIBA 455-467
      Robert M. Fung; Stuart L. Crawford; Lee A. Appelbaum; Richard M. Tong
    While concept-based methods for information retrieval can provide improved performance over more conventional techniques, they require large amounts of effort to acquire the concepts and their qualitative and quantitative relationships.
       This paper discusses an architecture for probabilistic concept-based information retrieval which addresses the knowledge acquisition problem. The architecture makes use of the probabilistic networks technology for representing and reasoning about concepts and includes a knowledge acquisition component which partially automates the construction of concept knowledge bases from data.
       We describe two experiments that apply the architecture to the task of retrieving documents about terrorism from a set of documents from the Reuters news service. The experiments provide positive evidence that the architecture design is feasible and that there are advantages to concept-based methods.
    A New Method for Information Retrieval, Based on the Theory of Relative Concentration BIBA 469-493
      L. Egghe
    This paper introduces a new method for information retrieval of documents that are represented by a vector. The novelty of the algorithm lies in the fact that no (generalized) p-norms are used as a matching function between the query and the document (as is done e.g. by Salton and others) but a function that measures the relative dispersion of the terms between a document and a query. This function originates from an earlier paper of the author where a good measure of relative concentration was introduced, used in informetrics to measure the degree of specialization of a journal w.r.t. the entire subject.
       This new information retrieval algorithm is shown to have many desirable properties (in the sense of the new Cater-Kraft wish list) including those of the original cosine-matching function of Salton. In addition the property of the cosine-matching function that, if one only uses weights 0 to 1, one is reduced to Boolean IR, is refined in the sense that one takes into consideration the broadness or specialization of a document and a query. Our new matching function satisfies these additional properties.
    Extended Boolean Retrieval: A Heuristic Approach? BIBA 495-508
      Ronald Rousseau
    We show that the similarity measures for p-norm retrieval, as defined by Salton, Fox and Wu have some undesirable mathematical properties. We propose a new function that remedies some of these drawbacks. Still, even for this new similarity measure the extended Boolean model has some properties which can only be described as 'heuristic'.