HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 868788899091929394959697

Proceedings of the Tenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Tenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:C. T. Yu; C. J. van Rijsbergen
Location:New Orleans, Louisiana
Dates:1987-Jun-03 to 1987-Jun-05
Publisher:ACM
Standard No:ISBN 89791-232-2; ACM Order number 606870; ACM DL: Table of Contents hcibib IR87
Papers:39
Pages:317
  1. Keynote Address
  2. Retrieval Effectiveness I
  3. Knowledge Based IR I
  4. User Interface
  5. Automatic Indexing
  6. Keynote Address
  7. Clustering
  8. Panel
  9. Retrieval Effectiveness II
  10. Retrieval Systems
  11. Storage/Retrieval Techniques I
  12. Knowledge Based IR II
  13. Panel
  14. Storage/Retrieval Techniques II
  15. Panel

Keynote Address

Multimedia Retrieval BIB 1
  S. Christodoulakis

Retrieval Effectiveness I

A Statistical Similarity Measure BIBA 3-12
  S. K. M. Wong; Y. Y. Yao
Within the framework of the vector space models, a statistical similarity measure between document and query is proposed. In this approach the assumption that term (or atomic) vectors are pairwise orthogonal is not required. In addition, it provides a natural and consistent interpretation of term occurrence frequencies obtained from autoindexing.
Probabilistic Search Term Weighting-Some Negative Results BIBA 13-18
  Norbert Fuhr; Peter Muller
The effect of probabilistic search term weighting on the improvement of retrieval quality has been demonstrated in various experiments described in the literature. In this paper, we investigate the feasibility of this method for boolean retrieval with terms from a prescribed indexing vocabulary. This is a quite different test setting in comparison to other experiments where linear retrieval with free text terms was used. The experimental results show that in our case no improvement over a simple coordination match function can be achieved. On the other hand, models based on probabilistic indexing outperform the ranking procedures using search term weights.
Some Considerations for Approximate Optimal Queries BIBA 19-24
  K. L. Kwok
An optimal query has been defined as one which will recover all the known relevant documents of a query in their best probability of relevance ranking. We have slightly modified the definition so that it also allows one to trace its evolution from the original to the optimal via the various feedback stages. Such a query can be constructed by modifying the original query with terms from the known relevant documents. It is pointed out that such a term addition strategy differs materially from other approaches that add terms based on term association with all query terms, and calculated from the whole document collection. The effect of viewing a document as constituted of components, and hence affecting the weighting and retrieval results of the optimal query, is also discussed.

Knowledge Based IR I

An Approach to Natural Language Processing for Document Retrieval BIBA 26-32
  W. Bruce Croft; David D. Lewis
Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem-the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on "conceptual case frames" and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.
Outline of a Knowledge Base Model for an Intelligent Information Retrieval System BIBA 33-43
  Marie-France Bruandet
We attempt in this paper to outline a method for the automatic construction of a knowledge base. We propose some methods and a domain knowledge model. A new idea is to conceive a system that is able to each phase of its construction to acquire domain knowledge from all new information that it is building, in particular the indexing terms; the last section is an attempt in this sense.
Enriched Knowledge Representations for Information Retrieval BIBA 43a-43g
  F. N. Teskey
In this paper we identify the need for a new theory of information. An information model is developed which distinguishes between data, as directly observable facts, information, as structured collections of data, and knowledge as methods of using information. The model is intended to support a wide range of information systems. In the paper we develop the use of the model for a semantic information retrieval system using the concept of semantic categories. The likely benefits of this area discussed, though as yet no detailed evaluation has been conducted.

User Interface

Informational Zooming: An Interaction Model for the Graphical Access to Text Knowledge Bases BIB 45-56
  Ulrich Thiel; Rainer Hammwohner
Generating an Individualized User Interface: From Novice to Expert BIBA 57-60
  Jean Tague
A model of the interface to an information retrieval system is developed based on the semantic data model. Using this framework, a method of developing customized user interfaces is described, in general terms and in a specific implementation in the Interface Builder module of the Western Information Retrieval System.
Individual Differences in the Use of Information Retrieval Systems: Some Issues and Some Data BIBA 61-71
  Christine L. Borgman
The population using information retrieval systems is becoming increasingly diverse. We find a wide range of skills in ability to use these systems; this diverse population must be accommodated by the next generation of systems. This paper reports on a study to identify variables related to information retrieval aptitude, based on results from earlier studies of searchers and programmers. A sample of undergraduate subjects from English, psychology, and engineering majors was given a series of psychometric tests and compared to known populations. We find that engineering majors exhibit academic background and personality characteristics most like those of skilled searchers and programmers, with contrasting patterns or no discernible patterns in English and psychology majors. The strength of most associations increases when restricted to subjects who have either stayed in one major or who have changed major only within one disciplinary area. About half the variance in choice of major can be explained by scores on the tests administered, and a comparable amount of variance in test scores can be explained by the academic background variables.

Automatic Indexing

Illustrated Description of an Interactive Knowledge-Based Indexing System BIBA 73-90
  Susanne M. Humphrey
This report discusses the Indexing Aid Project for conducting research in interactive knowledge-based indexing of the medical literature. After providing an overview and background, we describe and illustrate the Indexing Aid System using an extended example, highlighting the knowledge-based capabilities of the system, namely, inheritance and internal retrieval, enforcement of restrictions, and other functions implemented by procedural attachments, which are characteristic of frame-based knowledge representation languages. A feature which generates reports for evaluating the system is also shown. The paper concludes with discussion of the research plan. The project is part of the Automated Classification and Retrieval Program at the Lister Hill National Center for Biomedical Communications, the research and development arm of the National Library of Medicine.
Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods BIBA 91-101
  Joel L. Fagan
An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented. Problems related to this non-syntactic phrase construction method are discussed, and some possible solutions are proposed that make use of information about the syntactic structure of document and query texts.
A Failure Analysis on the Limitations of Suffixing in an Online Environment BIBA 102-108
  Donna Harman
The interaction of suffixing algorithms and ranking techniques in retrieval performance, particularly in an online environment, was investigated. Three general purpose suffixing algorithms were used for retrieval on the Cranfield 1400, Medlars, and CACM collections, and the results analysed with several standard evaluation measures. An examination of the retrieval performance using suffixing suggested two modifications to ranking techniques: variable weighting of word variants and selective stemming depending on query length. The experimental data is presented, and the limitations of suffixing in an online environment is discussed.

Keynote Address

Uncertainties in Information Retrieval BIB 109
  L. Zadehi

Clustering

Fast Object Partitioning Using Stochastic Learning Automata BIBA 111-122
  B. J. Oommen; D. C. Y. Ma
Let {Omega} = {A1, ..., AW} be a set of W objects to be partitioned into R classes {P1, ..., PR}. The objects are accessed in groups of unknown size and the size of these groups need not be equal. Additionally, the joint access probabilities of the objects are unknown. The intention is that the objects accessed more frequently together are located in the same class. This problem has been shown to be NP-hard [15,16]. In this paper, we propose two stochastic learning automata solutions to the problem. Although the first one is relatively fast, its accuracy is not so remarkable in some environments. The second solution, which uses a new variable structure stochastic automaton, demonstrates an excellent partitioning capability. Experimentally, this solution converges an order of magnitude faster than the best known algorithm in the literature [15,16].
A Dynamic Cluster Maintenance System for Information Retrieval BIBA 123-131
  Fazli Can; Esen A. Ozkarahan
Partitioning by clustering of very large databases is a necessity to reduce the space/time complexity of retrieval operations. However, the contemporary and modern retrieval environments demand dynamic maintenance of clusters. A new cluster maintenance strategy is proposed and its similarity/stability characteristics, cost analysis, and retrieval behavior in comparison with unclustered and completely reclustered database environments have been examined by means of a series of experiments.
Non-Hierarchic Document Clustering Using the ICL Distributed Array Processor BIBA 132-139
  Edie M. Rasmussen; Peter Willett
This paper considers the suitability and efficiency of a highly parallel computer, the ICL Distributed Array Processor (DAP), for document clustering. Algorithms are described for the implementation of the single-pass and reallocation clustering methods on the DAP and on a conventional mainframe computer. These methods are used to classify the Cranfield, Vaswani and UKCIS document test collections. The results suggest that the parallel architecture of the DAP is not well suited to the variable-length records which characterise bibliographic data.
Optimal Determination of User-Oriented Clusters BIBA 140-146
  Vijay V. Raghavan; Jitender S. Deogun
User-oriented clustering schemes enable the classification of documents based upon the user perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user criteria. In this paper, an enhancement of such a clustering scheme is presented. This is accomplished by the formulation of the user-oriented clustering as a function-optimization problem. The problem formulated is termed the Boundary Selection Problem (BSP). Heuristic approaches to solve the BSP are proposed and a preliminary for evaluation of these approaches is provided.

Panel

Models of IR BIB 147
  V. Raghavan; M. Gordon; R. Korfhage; C. Yu

Retrieval Effectiveness II

A Formal Treatment of Missing and Imprecise Information BIBA 149-156
  J. M. Morrissey; C. J. van Rijsbergen
Missing, non-applicable and imprecise values arise frequently in Office Information Systems. There is a need to treat them in a consistent and useful manner. This paper proposes a method and gives the precise semantics of the retrieval operations in a system where imprecision is allowed. It also suggests a way to handle the uncertainty introduced by imprecise data values.
Adaptive Linear Information Retrieval Models BIBA 157-163
  P. Bollmann; S. K. M. Wong
Linear decision (retrieval) functions have been widely adopted in information retrieval systems such as in Boolean, vector, and probabilistic models. Based on measurement theory, adaptive linear retrieval models are proposed in this paper. A necessary and sufficient condition for the existence of a linear decision function is given. By an inductive learning (feedback) process, techniques in linear integer programming can be directly applied to estimate parameters for automatic query formulation.
The Effect of Database Size on Document Retrieval: Random and Best-First Retrieval Models BIBA 164-169
  Robert M. Losee
Most document retrieval systems based on probabilistic models of feature distributions assume random selection of documents for retrieval. The assumptions of these models are met when documents are randomly selected from the database or when retrieving all available documents. A more suitable model for retrieval of a single document assumes that the best document available is to be retrieved first. Models of document retrieval systems assuming random selection and best-first selection are developed and compared under binary independence and two Poisson independence feature distribution models. Under the best-first model, feature discrimination varies with the number of documents in each relevance class in the database. A weight similar to the Inverse Document Frequency weight and consistent with the best-first model is suggested which does not depend on knowledge of the characteristics of relevant documents.

Retrieval Systems

TIRS: A Topological Information Retrieval System Satisfying the Requirements of the Waller-Kraft Wish List BIBA 171-180
  Steven C. Cater; Donald H. Kraft
A new information retrieval system, the Topological Information Retrieval System (TIRS), is shown to be the first system that can fulfill all the properties on the Waller-Kraft wish list of desired IR system properties. The wish list itself is discussed, and one possible problem of interpretation in the list is shown to be rectified in the TIRS model. Examples of the TIRS systems that satisfy the wish list properties are given, along with proofs of satisfaction for each item on the list.
A Retrieval System for On-Line English-Japanese Dictionaries BIB 181-186
  Tetsuro Ito; Mana Kubota
MICROARRAS: An Advanced Full-Text Retrieval and Analysis System BIBA 187-195
  John B. Smith; Stephen F. Weiss; Gordon J. Ferguson
MICROARRAS is an advanced full-text retrieval and analysis system. It supports fast, efficient browsing of a document's vocabulary as well as its text, recursive analytic categories, Boolean search with flexible context specifications, evaluation of arithmetic expressions, and graphical display of various numeric distributions. The system is designed to work with large textbases stored on remote mainframes or on a local store for a microcomputer or workstation. The description covers system architecture, design principals, as well as user functions.
A Relational Model for Unstructured Documents BIBA 196-206
  Airi Salminen
The logical structure of a document is usually a tree in which the order of the nodes is important at least at some level of the tree. We call a document unstructured if its structure is a single-level ordered tree. The purpose of this paper is to present a many-sorted algebra for handling unstructured documents. The documents in the model are represented by relations. An algebra for handling documents of one type can be extended to an algebra for handling documents of several types. Further, an algebra for handling documents can be extended by the relational algebra for handling documents and relations in a common algebra. The model of this paper can be regarded as a part of a general document model. On the other hand, unstructured documents themselves are an important group of documents. We will show by examples that the simple model covers a wide range of document handling and information retrieval problems.

Storage/Retrieval Techniques I

A VLSI Chip for Efficient Transmission and Retrieval of Information BIBA 208-216
  Amar Mukherjee; M. A. Bassiouni
In this paper, we present a functional description of a VLSI chip aimed at reducing the cost of data transmission and data access within information processing machines and distributed information systems. The chip maps standard character codes (e.g., ASCII) into more efficient codes (e.g., Huffman's codes) using a tree module of basic cells. In bit-serial communication controllers, for example, the parallel-to-serial transformation unit can be simply replaced by the proposed chip. The VLSI design can provide speeds that far exceed current and projected peak transfer rates of high-speed disks and communication controllers.
File Organizations and Incrementally Specified Queries BIBA 217-222
  Caroline M. Eastman
Queries to information retrieval systems are often incrementally specified as a result of user interaction with the system. However, most discussions of file organizations consider only completely specified queries. The choice of file organizations to support such incremental specification is discussed qualitatively in this extended abstract. (Quantitative comparisons are partially complete and are not presented here.) Organizations which are advantageous for completely specified queries are not necessarily so for incrementally specified queries (and vice versa).
Predictive Text Compression by Hashing BIBA 223-233
  Timo Raita; Jukka Teuhola
The knowledge of a short substring constitutes a good basis for guessing the next character in a natural language text. This observation, i.e. repeated guessing and encoding of subsequent characters, is very fundamental for the predictive text compression. The paper describes a family of such compression methods, using a hash table for searching the prediction information. The experiments show that the methods produce good compression gains and, moreover, are very fast. The one-pass versions are especially apt for "on-the-fly" compression of transmitted data, and could be a basis for specialized hardware.
Estimating Effective Display Size in Online Retrieval Systems BIBA 234-245
  Danny P. Wallace; Bert R. Boyce; Donald H. Kraft
This paper outlines a problem in commercial online retrieval systems, provides a review of the relevant literature, and presents a solution for a special case of the problem. Previous investigators have considered how to best determine, for a ranked list of records retrieved from an online retrieval system, whether or not the user should continue to display the output. This paper examines the problem of how effective display size can be estimated as a means of assisting the users of commercial online retrieval systems. Although no experimental results are as yet available, the approach presented here will provide a guide to and prolegomenon for systematic study of the problem, as well as a method for providing the estimated number of relevant records remaining in a retrieved set ranked by a retrieval status value.

Knowledge Based IR II

Conceptual Information Retrieval Using RUBRIC BIB 247-253
  Richard M. Tong; Lee A. Appelbaum; Victor N. Askman; James F. Cunningham
Thesaurus Based Concept Spaces BIB 254-262
  P. Schauble
EP-X: A Demonstration of Semantically-Based Search of Bibliographic Databases BIBA 263-271
  Deb Krawczak; Philip J. Smith; Steven J. Shute
EP-X (Environmental Pollution eXpert) is a prototype knowledge-based system that assists users in conducting bibliographic searches of the environmental pollution literature. This system combines artificial intelligence and human factors engineering techniques, allowing us to redesign traditional bibliographic information retrieval interfaces. The result supports semantically-based search as opposed to the typical character-string matching approach. This paper discusses
  • 1) a sample interaction with EP-X,
  • 2) the knowledge representations necessary to support this semantically-based
        interaction,
  • 3) preliminary results of empirical studies to evaluate the interface, and
  • 4) recommendations for future directions.
  • Towards an Expert System for Bibliographic Retrieval: A Prolog Prototype BIBA 272-281
      C. R. Watters; M. A. Shepherd; W. Robertson
    A prototype Prolog system has been developed for online bibliographic retrieval. Most online bibliographic retrieval systems may be characterized by queries based on the occurrence of keywords and by databases consisting of possibly millions of records. Such systems have very fast response times but generally lack any deductive reasoning capability.
       An expert system for online bibliographic retrieval, developed in Prolog, would provide enhanced retrieval capabilities through the application of deductive reasoning. Such a system would permit knowledge-type queries to be asked in addition to the traditional keyword-type of queries.
       A concern with using Prolog to perform an online search of a million-record data base is that the response time would be unacceptable. In order to overcome this drawback two alternatives are examined: a special-purpose hardware device and an extended Prolog capability.

    Panel

    Parallel Architecture in IR BIB 282
      E. Ozkarahan; C. Stanfill; G. Salton

    Storage/Retrieval Techniques II

    An Approach to Image Retrieval from Large Image Databases BIBA 284-295
      F. Rabitti; P. Stanchev
    In this paper we address the problem of retrieving images from large image databases, giving a partial description of the image content. This approach allows a limited automatic analysis for image belonging to a domain described in advance to the system using a formalism based on fuzzy sets. The image query processing is based on special access structures generated from the image analysis process.
    Data Caching in Information Retrieval Systems BIBA 296-305
      Patricia Simpson; Rafael Alonso
    Information retrieval (IR) systems provide individual remote access to centrally managed data. The current proliferation of personal computer systems, as well as advances in storage and communication technology, have created new possibilities for designing information systems which are easily accessible, economical, and responsive to user needs. This paper outlines methods of integrating personal computers (PCs) into large information systems, with emphasis on effective use of the storage and processing capabilities of these computers. In particular we discuss means for caching retrieved data at PC-equipped user sites, noting that caching in this environment poses unique problems. An event-driven simulation program is described which models information system operation. This simulator is being used to examine caching strategies. Some results of these studies are presented.
    Improved Techniques for Processing Queries in Full-Text Systems BIBA 306-315
      Y. Choueka; A. S. Fraenkel; S. T. Klein; E. Segal
    In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a "concordance", from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.

    Panel

    Clustering of Concepts for Optimal Retrieval BIB 316
      P. Kantor; A. Bookstein; M. Dillon; T. Saracevic