HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 8687888990919293949596979899

Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Nicholas J. Belkin; C. J. van Rijsbergen
Location:Cambridge, Massachusetts
Dates:1989-Jun-25 to 1989-Jun-28
Standard No:ISBN 0-89791-321-3; ACM Order Number 606890; ACM DL: Table of Contents hcibib: IR89
  1. Keynote Address
  2. Artificial Intelligence/Connectionism
  3. Interfaces
  4. Information Retrieval Theory
  5. Architectures
  6. Natural Language
  7. Information Access Methods
  8. Representation
  9. Hypermedia
  10. Panel Sessions

Keynote Address

Parallel Processing and Information Retrieval BIB 1
  David Waltz

Artificial Intelligence/Connectionism

In Search of Knowledge-Based Search Tactics BIBA 3-10
  Philip J. Smith; Steven J. Shute; Deb Galdes; Mark H. Chignell
Knowledge-based search tactics are discussed in terms of their role in the functioning of a semantically-based search system for bibliographic information retrieval. This prototype system, EP-X, actively assists users in defining or refining their topics of interest. It does so by applying search tactics to a knowledge-base describing topics in a particular domain and a database describing the contents of individual documents.
   This paper reviews the empirical studies that lead to the two central concepts implemented in EP-X:
  • 1. Semantically-based search;
  • 2. Knowledge-based search tactics. It then describes the capabilities of a system based on such concepts.
  • Adaptive Information Retrieval: Using a Connectionist Representation to Retrieve and Learn about Documents BIBA 11-20
      Richard K. Belew
    AIR represents a connectionist approach to the task of information retrieval. The system uses relevance feedback from its users to change its representation of authors, index terms and documents so that, over time, AIR improves at its task. The result is a representation of the consensual meaning of keywords and documents shared by some group of users. The central focus goal of this paper is to use our experience with AIR to highlight those characteristics of connectionist representations that make them particularly appropriate for IR applications. We argue that this associative representation is a natural generalization of traditional IR techniques, and that connectionist learning techniques are effective in this setting.
    A Neural Network for Probabilistic Information Retrieval BIBA 21-30
      K. L. Kwok
    This paper demonstrates how a neural network may be constructed, together with learning algorithms and modes of operation, that will provide retrieval effectiveness similar to that of the probabilistic indexing and retrieval model based on single terms as document components.


    Design of a Browsing Interface for Information Retrieval BIBA 32-39
      Robert Godin; Jan Gecsei; Claude Pichet
    In conventional Boolean retrieval systems, users have difficulty controlling the amount of output obtained from a given query. This paper describes the design of a user interface which permits gradual enlargement or refinement of the user's query by browsing through a graph of term and document subsets. This graph is obtained from a lattice automatically generated from the usual document-term relation. The major design features of the proposed interface are the integration of menu, fill-in the blank and direct manipulation modes of interaction within the "fisheye view" [Furnas, 1986] paradigm. A prototype user interface incorporating some of these ideas has been implemented on a microcomputer.
       The resulting interface is well adapted to various kinds of users and needs. More experienced users with a particular subject in mind can directly specify a query which results into a jump to a particular vertex in the graph. From there, the user can refine his initial query by browsing through the graph from that point on. On the other hand, casual users without any prior knowledge of the contents of the system or users without any particular subject in mind can freely navigate through the graph without ever specifying any query.
    A Library System for Information Retrieval Based on a Cognitive Task Analysis and Supported by an Icon-Based Interface BIB 40-47
      Annelise Mark Pejtersen
    Integrated Information Retrieval in a Knowledge Worker Support System BIBA 48-57
      Gordon McAlpine; Peter Ingwersen
    This paper describes the design of the information retrieval facilities of an integrated information system called EUROMATH. EUROMATH is an example of a Knowledge Worker Support System: it has been designed specifically to support mathematicians in their research work. EUROMATH is required to provide uniform retrieval facilities for searching in a user's personal data, in a shared database of structured documents and in public, bibliographic databases. The design of information retrieval facilities that satisfy these and other requirements posed several interesting design issues regarding the integration of various retrieval techniques. As well as a uniform query language, designed to be highly usable by the target user group, the retrieval facilities provide expert intermediary functions, i.e. sophisticated support for the retrieval of bibliographic data. This support is achieved using a model of the user, a model of the user's information need and a set of search strategies based on those used by human intermediaries. The expert intermediary facilities include extensive help facilities, automatic query reformulation and browsing of a variety of sources of query terms.

    Information Retrieval Theory

    Retrieval System Evaluation Using Recall and Precision: Problems and Answers BIB 59-68
      Vijay V. Raghavan; Peter Bollmann; Gwang S. Jung
    Optimum Polynomial Retrieval Functions BIBA 69-76
      Norbert Fuhr
    We show that any approach to develop optimum retrieval functions is based on two kinds of assumptions: first, a certain form of representation for documents and requests, and second, additional simplifying assumptions that predefine the type of the retrieval function. Then we describe an approach for the development of optimum polynomial retrieval functions: request-document pairs (fl,dm) are mapped onto description vectors x(fl,dm), and a polynomial function of the form aT×v(x) is developed such that it yields estimates of the probability of relevance P(R|x(fl,dm) with minimum square errors. We give experimental results for the application of this approach to documents with weighted indexing as well as to documents with complex representations. In contrast to other probabilistic models, our approach yields estimates of the actual probabilities, it can handle very complex representations of documents and requests, and it can be easily applied to multi-valued relevance scales. On the other hand, this approach is not suited to log-linear probabilistic models, and it needs large samples of relevance feedback data for its application.
    Towards an Information Logic BIB 77-86
      C. J. van Rijsbergen


    A Parallel Indexed Algorithm for Information Retrieval BIBA 88-97
      Craig Stanfill; Robert Thau; David Waltz
    In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel algorithm for retrieval based on signature files. First, it permits the employment of ranking strategies which cannot be easily implemented using signature files, specifically methods which depend on document-term weighting. Second, it permits the interactive searching of databases resident on secondary storage. The algorithm is evaluated via a mixture of analytic and simulation techniques, with a particular focus on how cost-effectiveness and efficiency change as the size of the database, number of processors, and cost of memory are altered. In particular, we find that if the ratio of the number of processors and/or disks to the size of the database is held constant, then the cost-effectiveness of the resulting system remains constant. Furthermore, for a given size of database, there is a number of processors which optimizes cost-effectiveness. Estimated response times are also presented. Using these methods, it appears that cost-effective interactive access to databases in the 100-1000 GB range can be achieved using current technology.
    An Optical System for Full Text Search BIBA 98-107
      Pericles A. Mitkas; P. Bruce Berra; Peter S. Guifoyle
    In this paper we propose a full text search system based on optics. The storage and processing of the textual data are performed by an optical back-end system to an electronic computer. In this way we can take advantage of the speed and parallelism of digital optical processing. Using the proposed configuration we show how one might implement a set of text processing operations using lasers, spatial light modulators and photodetectors.
    Retrieving Highly Dynamic, Widely Distributed Information BIBA 108-115
      M. F. Wyle; H. P. Frei
    Wide area networks provide a variety of information sources which can be exploited only by appropriate information retrieval techniques such as repeated automatic query of remote databases and bulletin boards. Distinctive features of the content and access methods of information on wide area nets are discussed from an IR perspective. The development, algorithms, and analysis of a functioning system are also presented.

    Natural Language

    The Constituent Object Parser: Syntactic Structure Matching for Information Retrieval BIB 117-126
      Douglas P. Metzler; Stephanie W. Hass
    Word Sense Disambiguation Using Machine-Readable Dictionaries BIBA 127-136
      Robert Krovetz; W. Bruce Croft
    Most approaches to full-text information retrieval currently index documents based on the words they contain, and retrieve them based on the word's frequency of occurrence. This can cause many irrelevant documents to be retrieved because words are often ambiguous. We propose an approach in which documents are indexed by word senses, and in which these senses are taken from a machine-readable dictionary. We review some of the work on machine-readable dictionaries and the approaches that have been taken to word sense disambiguation. We then discuss our own approach to the problem based on the use of multiple sources of evidence. We conclude with the results of some experiments that indicate the degree to which lexical ambiguity is a factor in current systems.
    On the Application of Syntactic Methodologies in Automatic Text Analysis BIBA 137-150
      Gerard Salton; Maria Smith
    This study summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Included are standard syntactic methods to generate complex content identifiers, and the use of semantic know-how obtained from machine-readable dictionaries and from specially constructed knowledge bases. A particular syntactic analysis methodology is also outlined and its usefulness for the automatic construction of book indexes is examined.

    Information Access Methods

    File Organizations and Access Methods for CLV Optical Discs BIBA 152-159
      Stavros Christodoulakis; Daniel Alexander Ford
    A large and important class of optical disc technology are CLV format discs such as CD ROM and WORM. In this paper, we examine the issues related to the implementation and performance of several different file organizations on CLV format optical discs such as CD ROM and WORM. The organizations examined are based on hashing and trees.
       The CLV recording scheme is shown to be a good environment for efficiently implementing hashing. Single seek access and storage utilization levels approaching 100% can be achieved for CD ROM's. It is shown that a B-tree organization is not a good choice for WORM discs (both CAV and CLV), but a modified ISAM approach can be appropriate for WORM discs. We describe clustered BIM's, a class of tree organizations appropriate for CD ROMS. Expressions for the expected retrieval performance of both hashing and trees are also given.
       The paper concludes by outlining recent results and future directions on buffered implementations of access methods for WORM discs, as well as advantages of signature based access methods for text retrieval in WORM disc architectures.
    Storing Text Retrieval Systems on CD-ROM: Compression and Encryption Considerations BIB 160-167
      Shmuel T. Klein; Abraham Bookstein; Scott Deerwester
    A New Approach to Text Searching BIBA 168-175
      Ricardo A. Baeza-Yates; Gaston H. Gonnet
    We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they are real time algorithms, they don't need to buffer the input, and they are suitable to be implemented in hardware.
    Multikey Access Methods Based on Term Discrimination and Signature Clustering BIBA 176-185
      Jae W. Chang; Joon H. Lee; Yoon J. Lee
    In order to improve the two-level signature file method designed by Sacks-Davis et al. [20], we propose new multikey access methods based on term discrimination and signature clustering. By term discrimination, we create separate, efficient access methods for the terms frequently used in user queries. We in addition cluster similar signatures by means of these terms so that we may achieve good performance on retrieval. Meanwhile we provide the space time analysis of the proposed methods and compare them with the two-level signature file method. We show that the proposed methods achieve 15-30% savings in retrieval time and require 3-9% more storage overhead.


    Indexing Medical Reports in a Multimedia Environment: The RIME Experimental Approach BIBA 187-197
      Catherine Berrut; Yves Chiaramella
    This paper focuses on the RIME system aimed to the indexing of medical reports in a multimedia environment. This particular application is viewed as representative of a large set of still unanswered needs of large communities of users: domain experts dealing with on-line specialized documentation such as software engineers, medical specialists and so on. In this application textual information appears as an interesting media to access related pictures in the data base. After the presentation of the application and a study of the particular corpus involved we define a semantic model for the documents which is based on a Conceptual Language. Then we detail the indexing process and its various linguistic components which perform the translation of every medical report according to this semantic model.
    Full Text Indexing Based on Lexical Relations An Application: Software Libraries BIBAK 198-206
      Yoelle S. Maarek; Frank A. Smadja
    In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge-based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.
    Keywords: Automatic indexing, Software libraries, software reuse, Lexical relations, Natural language processing, Co-occurrence knowledge
    How a Personal Document's Intended Use or Purpose Affects its Classification in an Office BIBA 207-210
      Barbara H. Kwasnik
    This paper reports on one of the findings of a larger case study that attempts to describe how people organize documents in their own offices. In that study, several dimensions along which people make classificatory decisions were identified. Of these, the use to which a document is put emerged as a strong determiner of that document's classification. The method of analysis is reviewed, and examples of different kinds of uses are presented, demonstrating that it is possible to describe a wide variety of specific instances using a closed set of descriptors. The suggestion is made that, in designing systems for organizing materials, it might be advantageous to incorporate information about contextual variables, such as use, since these seem to be particularly important in classification decisions made within personal environments.


    Information Retrieval Using a Hypertext-Based Help System BIBAK 212-220
      F. R. Campagnoni; Kate Ehrlich
    Hypertext offers users a simple, flexible way to navigate through electronic information systems but at the potential risk of becoming lost in the network of interconnected pieces of information. A study was conducted on information retrieval using a commercial hypertext-based help system. It was found that the predominant search strategy was "browsing" (characterized by scanning tables of contents and paging through topics), rather than employing the indexes ("analytical search"). Although subjects did not become lost, individuals with better spatial visualization ability, as measured by a standardized test, were faster at retrieving information and returned to the top of the information hierarchy less often than those with poorer spatial visualization ability. These results support previous studies that have found a strong preference by users to browse in hypertext systems and extend those findings to a new domain (help), a different type of user interface, and a different information architecture. In addition, the results demonstrate the importance of spatial visualization ability for efficient navigation and information retrieval in a hierarchical hypertext system.
    Keywords: Hypertext, Help systems, Information retrieval, Individual differences, Visualization
    A Hypertext Knowledge Base for Primary Care -- Limeds in Lincks BIBA 221-228
      Toomas Timpka; Lin Padgham; Per Hedblom; Stefan Wallin; Gosta Tibblin
    In organized health care, primary care is the first level. It is characterized by the wide span of health problems managed as well as remote location from traditional medical information and knowledge sources. The LIMEDS project has formulated the special requirements for integrated knowledge and data base management in primary care. This paper presents Gosta's book, a hypertext knowledge base implemented in LINCKS, an object oriented, networked database system. Firstly, aspects which make integrated hypermedia systems particularly suitable for application in primary health care are explored. We then describe the hypertext knowledge base, consisting of 500 basic text objects and 3000 links, and current implementations using the NODE data model. NODE is implemented on a SUN III fileserver, and the user interface for the hypertext context on Apple Macintosh. Combination of design methods towards a parallel means-ends strategy was found to be necessary to achieve Gosta's book. Design groups need to be composed of computer science, medical, psychological and organizational competences.
    Settings and the Settings Structure: The Description and Automated Propagation of Networks for Perusing Videodisc Image States BIBA 229-238
      Alan P. Parkes
    This paper describes a system for formally representing spatial relationships between videodisc image states called settings. A number of setting relations are defined, these being based on the manipulations of the camera typically used in the production of the moving film: zooming in or out, panning etc. An algorithm is presented which, given a limited level of initial specification by a describer, will constrain, where possible, the selling relations holding between all pairs of settings. The resulting network is called the settings structure. The paper begins by placing the settings structure into the context of its being one part of the CLORIS system.

    Panel Sessions

    The Lexicon and Information Retrieval BIB 240-241
      Robert Krovetz; Ed Fox; Robert J. P. Ingria; Henry Kucera
    Research Toward the Development of a Lexical Knowledge Base for Natural Language Processing BIBA 242-249
      Robert A. Amsler
    This paper documents research toward building a complete lexicon containing all the words found in general newspaper text. It is intended to provide the reader with an understanding of the inherent limitations of existing vocabulary collection methods and the need for greater attention to multi-word phrases as the building blocks of text. Additionally, while traditional reference books define many proper nouns, they appear to be very limited in their coverage of the new proper nouns appearing daily in newspapers. Proper nouns appear to require a grammar and lexicon of components much the way general parsing of text requires syntactic rules and a lexicon of common nouns.
    Present and Future of Electronic Databases BIB 250
      Gerard Salton; Martha Williams; David Penniman; John Regazzi; Tadeusz Radecki
    Information Retrieval and Software Reuse BIBA 251-256
      W. B. Frakes; N. Belkin; R. Prieto-Diaz; S. Wartik
    Software reuse is widely believed to be the most promising technology for improving software quality and productivity. There are many technical and non-technical problems to be solved, however, before widespread reuse of software lifecycle objects becomes a reality. One class of problem concerns the classification, storage, and retrieval of reusable components. Panel members will discuss these problems and some approaches to solving them.