HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 86878889909192939495969798990001

Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Abraham Bookstein; Yves Chiaramella; Gerard Salton; Vijay V. Raghavan
Location:Chicago, Illinois
Dates:1991-Oct-13 to 1991-Oct-16
Standard No:ISBN 0-89791-448-1; ACM Order Number 606910; ACM DL: Table of Contents hcibib: IR91
  1. SIGIR Biennial Award Acceptance Speech
  2. Document Structure
  3. Modeling Information Retrieval Systems I
  4. Data Compression
  5. Invited Speaker
  6. Distributed Systems I
  7. Interfaces
  8. Office Automation and Databases
  9. Modeling Information Retrieval Systems II
  10. Distributed Systems II
  11. Object Oriented Approaches to IR
  12. Semantic Models
  13. Access Methods
  14. Hypertext
  15. Natural Language Processing
  16. Panel
B. C. Brookes: In Memoriam BIBPDF 1
  Nicholas J. Belkin

SIGIR Biennial Award Acceptance Speech

The Significance of the Cranfield Tests on Index Languages BIBPDF 3-12
  Cyril W. Cleverdon

Document Structure

Complete Formal Model for Information Retrieval Systems BIBPDF 14-20
  Jean Tague; Airi Salminen; Charles McClellan
Automatic Text Structuring and Retrieval -- Experiments in Automatic Encyclopedia Searching BIBAPDF 21-30
  Gerard Salton; Chris Buckley
Many conventional approaches to text analysis and information retrieval prove ineffective when large text collections must be processed in heterogeneous subject areas. An alternative text manipulation system is outlined useful for the retrieval of large heterogeneous texts, and for the recognition of content similarities between text excerpts, based on flexible text matching procedures carried out in several contexts of different scope. The methods are illustrated by search experiments performed with the 29-volume Funk and Wagnalls encyclopedia.

Modeling Information Retrieval Systems I

The Use of Phrases and Structured Queries in Information Retrieval BIBAPDF 32-45
  W. Bruce Croft; Howard R. Turtle; David D. Lewis
Both phrases and Boolean queries have a long history in information retrieval, particularly in commercial systems. In previous work, Boolean queries have been used as a source of phrases for a statistical retrieval model. This work, like the majority of research on phrases, resulted in little improvement in retrieval effectiveness. In this paper, we describe an approach where phrases identified in natural language queries are used to build structured queries for a probabilistic retrieval model. Our results show that using phrases in this way can improve performance, and that phrases that are automatically extracted from a natural language query perform nearly as well as manually selected phrases.
Combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing BIBAPDF 46-56
  Norbert Fuhr; Ulrich Pfeifer
We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Description-oriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
Some Inconsistencies and Misnomers in Probabilistic Information Retrieval BIBAPDF 57-61
  William S. Cooper
The probabilistic theory of information retrieval involves the construction on mathematical models based on statistical assumptions of various sorts. One of the hazards inherent in this kind of theory construction is that the assumptions laid down may be inconsistent with the data to which they are applied. Another hazard is that the stated assumptions may not be the real assumptions on which the derived modelling equations or resulting experiments are actually based. Both kinds of error have been made repeatedly in research on probabilistic information retrieval. One consequence of these lapses in that the statistical character of certain probabilistic IR models, including the so-called 'binary independence' model, has been seriously misapprehended.

Data Compression

Generative Models for Bitmap Sets with Compression Applications BIBAPDF 63-71
  Abraham Bookstein; Shmuel T. Klein
In large IR systems, information about word occurrence may be stored as a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible correlation between rows and between columns. The methods are based on partitioning the matrix into small blocks and predicting the 1-bit distribution within a block by means of various bit generation models. Each block is then encoded using Huffman or arithmetic coding. Preliminary experimental results indicate improvements over previous methods.
Posting Compression in Dynamic Retrieval Environments BIBAPDF 72-81
  IJsbrand Jan Aalbersberg
This paper describes a posting compression technique to be used in dynamic full-text document retrieval environments. The compression technique being presented is applicable in main-memory document retrieval systems, and consists of two parts. First there is the efficient use of auxiliary tables, and second there is the application of the well-known rank-frequency law of Zipf. It is shown that on the basis of this law term weights can be approximated, and thus that their explicit storage can be avoided.
A Hybrid Bilevel Image Decode Algorithm for Group 4 FAX BIBAPDF 82-91
  Chengjie Luo; Clement Yu
The modified READ code is a two-dimensional coding scheme standardized by CCITT to compress black and white pictures. Existing decompression algorithms process the compressed data bit-by-bit. In this paper, we propose a hybrid decompressing algorithm which processes most of the compressed data byte-by-byte. The remaining data is processed bit-by-bit. It is known statistically that the former situation, where byte-by-byte processing occurs, happens much more often than the later situation, where bit-by-bit processing takes place. Thus, decompression will be speeded up by the proposed algorithm.

Invited Speaker

The CORE Electronic Chemistry Library BIBAPDF 93-112
  Michael Lesk
A major online file of chemical journal literature complete with graphics is being developed to test the usability of fully electronic access to documents. The test file will include ten years of the American Chemical Society's online journals, supplemented with the graphics from the paper publication, and the indexing of the articles from Chemical Abstracts. Our goals are (1) to assess the effectiveness and acceptability of electronic access to primary journals as compared with paper, and (2) to identify the most desirable functions of the user interface to an electronic systems of journals, including in particular a comparison of page image display with Ascii display interfaces. This paper describes the chemical journal data, the interfaces for searching and reading it, and the experiments being done.

Distributed Systems I

Retrieval Algorithm Effectiveness in a Wide Area Network Information Filter BIBAPDF 114-122
  H. P. Frei; M. F. Wyle
We present an application of the usefulness performance measure in a WAN-based SDI system. Components of two basic indexing and retrieval algorithms are compared experimentally. The components we investigate include indexing token type (words versus N-grams), the amount of word reduction used in indexing, and the use of an indirect similarity component in retrieval. The theoretical basis and implementation of the basic algorithms and variations are discussed. Results indicate that works perform better than N-grams, that S-stemming is better than full-stemming, and that indirect similarity provides an improvement to the cosine measure. Performance improvements are, however, small.
Distributed Representations in a Text Based Information Retrieval System: A New Way of Using the Vector Space Model BIBAPDF 123-132
  Richard F. E. Sutcliffe
In this paper we discuss how the Vector Space model of Information Retrieval can be used in a new way by combining connectionist ideas about distributed representations with the concept of propositional structure (semantic case structure) derived from mainstream Natural Language Understanding research. We show how distributed representations may be used to capture both amorphous concept representations and propositional structures and we discuss a prototype Information Retrieval system, PELICAN, which has been constructed in order to experiment with these ideas.


To See, or Not to See -- Is That the Query? BIBAPDF 134-141
  Robert R. Korfhage
Traditional information retrieval systems, in the guise of presenting the most relevant information to the searcher, really put blinders on him. They present certain information to the searcher, but strongly inhibit him from seeing other information, or even knowing of its existence. In this paper we present an argument for a new retrieval paradigm, one that focuses on the organized display of all documents, rather than on the linear display of just the "best."
Integrating Query, Thesaurus, and Documents through a Common Visual Representation BIBAPDF 142-151
  Richard H. Fowler; Wendy A. L. Fowler; Bradley A. Wilson
Document retrieval is a highly interactive process dealing with large amounts of information. Visual representations can both provide a means for managing the complexity of large information structures and support an interface style well suited to interactive manipulation. The system we have designed utilizes visually displayed graphic structures and a direct manipulation interface style to supply an integrated environment for retrieval. A common visually displayed network structure is used for query, document content, and term relations. A query can be modified through direct manipulation of its visual form by incorporating terms from any other information structure the system displays. An associative thesaurus of terms and an interdocument network provide information about a document collection that can complement other retrieval aids. Visualization of these large data structures makes use of fisheye views and overview diagrams to help overcome some of the difficulties of orientation and navigation in large information structures.
A Case-Based Architecture for a Dialogue Manager for Information-Seeking Processes BIBAKPDF 152-161
  Anne Tissen
In this paper, we propose a case-based architecture for a dialogue manager. The dialogue manager is one of the main components of the cognitive layer of an interface system for information-seeking processes. Information-seeking is a highly exploratory and navigational process and needs therefore elaborated interaction functionality. In our approach, this functionality will be provided by the dialogue manager operating on a set of case-based dialogue plans. In a case-based planning system a new plan will be generated by retrieving the plan which is most appropriate to the user's goals and adapting it dynamically during the ongoing dialogue. We propose a case-based architecture for two reasons. First, operating on old solutions provides a coherent framework which prevents the user from being 'lost in hyperspace'. Second, it allows flexible adaptations, domain dependents ones, using perspectives on domain objects, and domain independent ones, that change the sequence of dialogue steps.
Keywords: Case-based reasoning, Human-computer interaction, Information-seeking

Office Automation and Databases

Addressing the Requirements of a Dynamic Corporate Textual Information Base BIBAPDF 163-172
  Peter G. Anick; Rex A. Flynn; David R. Hanssen
AI-STARS is a lexicon-assisted full-text Information Retrieval system, designed for use in a dynamic corporate environment. In this paper, we explore how the requirements of such an environment have influenced many key aspects of the design and implementation of the AI-STARS system. We promote the use of "views" to create logical partitions in large, heterogeneous databases, and argue that storing not only article instances, but also class definitions, stored queries, display templates and linguistic data in a single object repository has consequences that can be exploited for schema and lexicon evolution, security and subject filtering, information navigation, and data distribution.
Data Conversion, Aggregation and Deduction for Advanced Retrieval from Heterogeneous Fact Databases BIBAPDF 173-182
  Kalervo Jarvelin; Timo Niemi
Modern distributed fact databases are heterogeneous and autonomous. Their heterogeneity is due to many reasons, including varying data models, data structures, attribute naming conventions, units of measurement or naming of data values, composition of data as attributes, technical representation of data, abstraction levels of data, etc. Database autonomity means that the database users have hardly any means for reducing such heterogeneity. Present information retrieval (IR) systems either provide no support for overcoming such heterogeneity or their support is insufficient and difficult to utilize. In this paper we offer integrated and powerful data conversion aggregation and deductive techniques for advanced IR in such environments. These techniques allow the users to overcome data inconsistency due to units of measurement or naming of data values, composition of data as attributes, abstraction levels of data, and difficulties related to deductive use of hierarchically classified data. In complex situations, all these inconsistencies appears together. Therefore we also show how these techniques are integrated into a powerful query language which has been implemented in Prolog in a workstation environment.
Querying Office Systems about Document Roles BIBAPDF 183-190
  A. Celentano; M. G. Fugini; S. Pozzi
This paper describes the architecture of a document retrieval system integrating classical IR features with knowledge about the procedural and application context where documents are used. The paper focuses on the query language that allows the user to pose queries involving the analysis of both the semantic network where procedures, office agents, and events of the office context are represented as elements accessing, modifying, filing, manipulating document, and the document contents, i.e. their text. The coupling of the query system with a browser tool is also discussed. The system relies on a knowledge representation model for document and document roles developed in previous phases of the research.

Modeling Information Retrieval Systems II

Query Modification and Expansion in a Network with Adaptive Architecture BIBAPDF 192-201
  K. L. Kwok
This paper shows how a network view of probabilistic information indexing and retrieval with components may implement query expansion and modification (based on user relevance feedback) by growing new edges and adapting weights between queries and terms of relevant documents. Experimental results with two collections and partial feedback confirm that the process can lead to much improved performance. Learning from irrelevant documents however was not effective.
Using the Cosine Measure in a Neural Network for Document Retrieval BIBAPDF 202-210
  Ross Wilkinson; Philip Hingston
The task of document retrieval systems is to match one natural language query against a large number of natural language documents. Neural networks are known to be good pattern matchers. This paper reports our investigations in implementing a document retrieval system based on a neural network model. It shows that many of the standard strategies of information retrieval are applicable in a neural network model.
Preference Structure, Inference and Set-Oriented Retrieval BIBAPDF 211-218
  Y. Y. Yao; S. K. M. Wong
In this paper, a framework for modeling information retrieval is introduced by combining the salient features of many inference-based and set-oriented retrieval models. The degrees of relevance of different subsets of documents are inferred from the user preference judgments on subsets of index terms. In order to demonstrate the usefulness of the proposed framework, the Boolean and the binary vector space models are analyzed. This analysis reveals the structures implicitly used in these models.

Distributed Systems II

Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval BIBAKPDF 220-229
  Peter B. Danzig; Jongsuk Ahn; John Noll; Katia Obraczka
Despite blossoming computer network bandwidths and the emergence of hypertext and CD-ROM databases, little progress has been made towards uniting the world's library-style bibliographic databases. While a few advanced distributed retrieval systems can broadcast a query to hundreds of participating databases, experience shows that local users almost always clog library retrieval systems. Hence broadcast remote queries will clog nearly every systems. The premise of this work is that broadcast-based systems do not scale to world-wide systems. This project describes an indexing scheme that will permit thorough yet efficient searches of millions of retrieval systems. Our architecture will work with an arbitrary number of indexing companies and information providers, and, in the market place, could provide economic incentive for cooperation between database and indexing services. We call our scheme distributed indexing, and believe it will help researchers disseminate and locate both published and republication material.
   We are building and plan to distribute a research prototype for the Internet that demonstrates these ideas. Our prototype will index technical reports and public domain software from dozens of computer science departments around the country.
Keywords: Information retrieval, Heterogeneous databases, Resource location, Bibliographic databases
On the Allocation of Documents in Multiprocessor Information Retrieval Systems BIBAPDF 230-239
  Ophir Frieder; Hava Tova Siegelmann
Information retrieval is the selection of documents that are potentially relevant to a user's information need. Given the vast volume of data stored in modern information retrieval systems, searching the document database requires vast computational resources. To meet these computational demands, various researchers have developed parallel information retrieval systems. As efficient exploitation of parallelism demands fast access to the documents, data organization and placement significantly affect the total processing time. We describe and evaluate a data placement strategy for distributed memory, distributed I/O multicomputers. Initially, a formal description of the Multiprocessor Document Allocation Problem (MDAP) and a proof that MDAP is NP Complete are presented. A document allocation algorithm for MDAP based on Genetic Algorithms is developed. This algorithm assumes that the documents are clustered using any one of the many clustering algorithms. We define a cost function for the derived allocation and evaluate the performance of our algorithm using this function. As part of the experimental analysis, the effects of varying the number of documents and their distribution across the clusters as well the exploitation of various differing architectural interconnection topologies are studied. We also experiment with the several parameters common to Genetic Algorithms, e.g., the probability of mutation and the population size.

Object Oriented Approaches to IR

An Object-Oriented Modeling of the History of Optimal Retrievals BIBAPDF 241-250
  Yong Zhang; Vijay V. Raghavan; Jitender S. Deogun
Learning techniques are used in IR to exploit user feedback in order that the system can improve its performance with respect to particular queries. This process involves the construction of an optimal query that best separates the documents known to be relevant from those that are not. Since obtaining relevance judgments and constructing an optimal query involve a great deal of effort, in this paper, we develop a framework for organizing the history of optimal retrievals. The framework involves the identification of a hierarchy of document classes such that the concepts corresponding to higher level classes are more general than those of the lower level classes.
   The ways in which such a hierarchy may be used to retrieve answers to new queries are outlined. This approach has the advantage that the query specification is concept-based, where as the retrieval mechanism is numerically-oriented involving optimal query vectors.
   It is shown that the construction of a hierarchy of optimal queries can correspond to an object-oriented modeling of IR objects. Furthermore, the resulting model can be easily implemented using a relational DBMS.
Retrieving Software Objects in an Example-Based Programming Environment BIBAKPDF 251-260
  Scott Henninger
Example-based programming is a form of software reuse in which existing code examples are modified to meet current task needs. Example-based programming systems that have enough examples to be useful present the problem of finding relevant examples. A prototype system named CodeFinder, which explores issues of retrieving software objects relevant to the design task, is presented. CodeFinder supports human-computer dialogue by providing the means to incrementally construct a query and by providing associative cues that are compatible with human memory retrieval principles.
Keywords: Human-computer interaction, Retrieval, Software reuse, Connectionism, Cooperative problem solving, Information access, Retrieval by reformulation, Associative spreading activation

Semantic Models

A Self-Organizing Semantic Map for Information Retrieval BIBAPDF 262-269
  Xia Lin; Dagobert Soergel; Gary Marchionini
A neural network's unsupervised learning algorithm, Kohonen's feature map, is applied to constructing a self-organizing semantic map for information retrieval. The semantic map visualizes semantic relationships between input documents, and has properties of economic representation of data with their interrelationships. The potentials of the semantic map include using the map as a retrieval interface for an online bibliographic system. A prototype system that demonstrates this potential is described.
Incorporating a Semantic Analysis into a Document Retrieval Strategy BIBAPDF 270-279
  Edgar B. Wendlandt; James R. Driscoll
Current information retrieval systems focus on the use of keywords to respond to user queries. We propose the additional use of surface level knowledge in order to improve the accuracy of information retrieval. Our approach is based on the database concept of semantic modeling (particularly entities and relationships among entities). We extend the concept of query-document similarity by recognizing basic entity properties (attributes) which appear in text. We also extend query-document similarity using the linguistic concept of thematic roles. Thematic roles allow us to recognize relationship properties which appear in text. We include several examples to illustrate our approach. Test results which support our approach are reported. The test results concern searching documents and using their contents to perform the intelligent task of answering a question.
Complementary Structures in Disjoint Science Literatures BIBPDF 280-289
  Don R. Swanson

Access Methods

An Efficient Directory System for Document Retrieval BIBAKPDF 291-304
  D. Motzkin
This paper introduces a file directory structure which provides an efficient access path for document retrieval. The directory structure is based on the multi-B-tree structure. This directory structure is compatible with current automatic retrieval and query processing techniques. Weights that are assigned to index terms can be included in the directory with the terms at no additional cost. In addition, it provides for indexing a secondary attribute within a primary attribute with no additional cost. Updates are achieved with a high degree of efficiency as well. It is shown that this structure achieves a better overall performance than inverted files, standard B-trees, and other directory structures.
Keywords: Access methods, Indices, Directories, M-B-T directory, B-trees, Multi-B-trees, Information retrieval, Document retrieval, Database management systems, Non-dense attributes
Image Query Processing Based on Multi-Level Signatures BIBAPDF 305-314
  F. Rabitti; P. Savino
This paper describes the processing of queries, expressing conditions on the content of images, in large image databases. The query language assumes that a semantic interpretation of the image content is available (i.e. an image symbolic interpretation), as result of an image analysis process. The image query language addresses important aspects of the image interpretations resulting from image analysis, by defining partial conditions on the composition of the complex objects, requirements on their degree of recognition, and requirements on their position in the image interpretation. Particular emphasis is given on the definition of suitable content-based access structures to make more efficient the query processing. An approach based on multi-level signatures is adopted. The query is pre-processed on the signatures to filter-out most of the images not satisfying the query. Finally, an evaluation of the efficiency and precision of the signature technique is given.


A Two-Level Hypertext Retrieval Model for Legal Data BIBAPDF 316-325
  Maristella Agosti; Roberto Colotti; Girolamo Gradenigo
This paper introduces an associative information retrieval model based on the two-level architecture proposed in [Agosti et al, 1989a] and [Agosti et al, 1990], and an experimental prototype developed in order to validate the model in a personal computing environment. In the first part of the paper, related work and motivations are presented. In the second part, the model, entitled EXPLICIT, is introduced. EXPLICIT is based on a two-level architecture which holds the two main parts of the informative resource managed by an information retrieval tool: the collection of documents and the indexing term structure. The term structure is managed as a schema of concepts which can be used by the final user as a frame of reference in the query formulation process. The model supports the concurrent use of different schemas of concepts to satisfy information needs of different categories of users. In the third part of the paper, the main characteristics of the experimental prototype, named HyperLaw, are presented.
Automatic Generation of "Hyper-Paths" in Information Retrieval Systems: A Stochastic and an Incremental Algorithm BIBAPDF 326-335
  Alain Lelu
A hypertext procedure for browsing through documentary databases is proposed, based upon a global synthetic mapping in addition to a set of local scanning axes. A method is developed for automatic generation of these relevant axes: local component analysis. It consists in tracking the local maxima of a "partial inertia" landscape. First, a "neural" algorithm converging after several passes on the data is presented. Then a deterministic one-pass algorithm is deduced, allowing dynamic data-flow analysis.

Natural Language Processing

Creating Segmented Databases from Free Text for Text Retrieval BIBAPDF 337-346
  Lisa F. Rau; Paul S. Jacobs
Indexing text for accurate retrieval is a difficult and important problem. On-line information services generally depend on "keyword" indices rather than other methods of retrieval, because of the practical features of keywords for storage, dissemination, and browsing as well as for retrieval. However, these methods of indexing have two major drawbacks: First, they must be laboriously assigned by human indexers. Second, they are inaccurate, because of mistakes made by these indexers as well as the difficulties users have in choosing keywords for their queries, and the ambiguity a keyword may have.
   Current natural language text processing (NLP) methods help to overcome these problems. Such methods can provide automatic indexing and keyword assignment capabilities that are at least as accurate as human indexers in many applications. In addition, NLP systems can increase the information contained in keyword fields by separating keywords into segments, or distinct fields that capture certain discriminating content or relations among keywords.
   This paper reports on a system that uses natural language text processing to derive keywords from free text news stories, separate these keywords into segments, and automatically build a segmented database. The system is used as part of a commercial news "clipping" and retrieval product. Preliminary results show improved accuracy, as well as reduced cost, resulting from these automated techniques.
Retrieval Performance in FERRET: A Conceptual Information Retrieval System BIBAPDF 347-355
  Michael L. Mauldin
FERRET is a full text, conceptual information retrieval system that uses a partial understanding of its texts to provide greater precision and recall performance than keyword search techniques. It uses a machine-readable dictionary to augment its lexical knowledge and a variant of genetic learning to extend its script database.
   Comparison of FERRET's retrieval performance on a collection of 1065 astronomy texts using 22 sample user queries with a standard boolean keyword query system showed that precision increased from 35 to 48 percent, and recall more than doubled, from 19.4 to 52.4 percent.
   This paper describes the FERRET system's architecture, parsing and matching abilities, and focuses on the use of the Webster's Seventh dictionary to increase the system's lexical coverage.


The Smart Project in Automatic Document Retrieval BIBAPDF 356-358
  Gerard Salton; Michael E. Lesk; Donna Harman; Robert E. Williamson; Edward A. Fox; Chris Buckley
The Smart project in automatic text retrieval was started in 1961. It is the oldest, continuously running research project in information retrieval. The panel members are all major contributors to the Smart system work. The discussion covers aspects of the Smart system design and examines the past and future significance of some of the research conducted in the Smart environment.