HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 86878889909192939495969798

Proceedings of the Eleventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Eleventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Yves Chiaramella
Location:Grenoble, France
Dates:1988-Jun-13 to 1988-Jun-15
Publisher:Presses Universitaires de Grenoble, BP 47X, 38040 Grenoble Cedex, FRANCE
Standard No:ISBN 2-7061-0309-4; ACM Order Number 606880; ACM DL: Table of Contents hcibib: IR88
  1. Special Session: SIGIR Award
  2. Natural Language Processing (1)
  3. Natural Language Processing (2)
  4. Natural Language Processing (3)
  5. Cognitive Models
  6. Parallel Distributed Processes
  7. Applications (1)
  8. Quantitative Models (1)
  9. Quantitative Models (2)
  10. Thesaural Models
  11. Applications (2)
  12. Interfaces (1)
  13. Interfaces (2)
  14. Data Bases
  15. Artificial Intelligence (1)
  16. Logical Models
  17. Artificial Intelligence (2)
  18. Set Oriented Models
  19. Implementation Techniques
  20. Applications (3)

Special Session: SIGIR Award

A Look Back and A Look Forward BIBA 13-29
  Karen Sparck Jones
This paper is in two parts, following the suggestion that I first comment on my own past experiences in information retrieval, and then present my views on the present and future.

Natural Language Processing (1)

Experiment on Incorporation Syntactic Processing of User Queries into a Document Retrieval Strategy BIBA 31-51
  A. F. Smeaton; C. J. van Rijsbergen
Traditional information has relied on the extensive use of statistical parameters in the implementation of retrieval strategies. This paper sets out to investigate whether linguistic processes can be used as part of a document retrieval strategy. This is done by predefining a level of syntactic analysis of user queries only, to be used as part of the retrieval process. A large series of experiments on an experimental test collection are reported which use a parser for noun phrases as part of the retrieval strategy. The results obtained from the experiments do yield improvements in the level of retrieval effectiveness and given the crude linguistic process used and the way it was used on queries and not on document texts, suggests that the approach of using linguistic processing in retrieval is valid.
The Use of Anaphoric Resolution for Document Description in Information Retrieval BIBA 53-66
  Susan Bonzi; Elizabeth Liddy
This study investigated two hypotheses concerning the use of anaphors in information retrieval. The first hypothesis, that anaphors tend to refer to integral concepts rather than to peripheral concepts, was well supported. Two samples of documents, one in psychology and the other in computer science, were examined by subject experts who judged the centrality of phrases which were referred to anaphorically. The second hypothesis, that various term weighting schemes are affected differently by anaphoric resolution, was also well supported. It was found that schemes which incorporate document length into the calculations produce much smaller increases in term weights for terms occurring in anaphoric resolutions than do those which do not consider document length. It is concluded that although anaphoric resolution has potential for better representing the "aboutness" of a document, care must be taken in choosing both the anaphoric classes to be resolved and the term weighting schemes to be used in measuring a document's topicality.

Natural Language Processing (2)

A French Text Recognition Model for Information Retrieval System BIBA 67-84
  Georges Antoniadis; Genevieve Lallich-Boidin; Yolla Polity; Jacques Rouault
This information retrieval system is based on a linguistic model for recognition of written French. The model has to limit production of multiple solutions. The modules of indexing and that of retrieval are equally based on that model. The paper describes the model used, its automation and exploitation by the two modules.
Natural Language Techniques for Intelligent Information Retrieval BIBA 85-99
  Paul S. Jacobs; Lisa F. Rau
Neither natural language processing nor information retrieval is any longer a young field, but the two areas have yet to achieve a graceful interaction. Mainly, the reason for this incompatibility is that information retrieval technology depends upon relatively simple but robust methods, while natural language processing involves complex knowledge-based systems that have never approached robustness. We provide an analysis of areas in which natural language and information retrieval come together, and describe a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.

Natural Language Processing (3)

Correction of Phonographic Errors in Natural Language Interfaces BIBA 101-115
  Jean Veronis
In this paper, we point out that, in applications available to the general public, and/or natural language interfaces, the correction of phonographic errors (which are competence errors) is far more important than the correction of typographical errors (which are simply performance errors). Many studies aimed at the correction of typographical errors have been carried out, but relatively few tackle the problem of phonographic correction, and they are generally based on more or less ad hoc methods. We propose a mathematical framework for phonographic correction by defining a similarity relation between phonetically related substrings and a dissimilarity index between strings. We also provide a simple and efficient algorithm for recognizing words in dictionaries from misspelt inputs including both typographical and phonographic errors.
Precedental Data Bases: How and Why They are Worked Out and Used BIBA 117-125
  B. Pevzner
The concept of a "precedental data base" is introduced. It is a linguistic data base consisting of a dictionary of lexical patterns (clishes) and a dictionary of discourses. Some algorithms for textual information processing using precedental data bases are discussed in detail.
   These systems are installed on mainframe and minicomputers for test runs.

Cognitive Models

How Do the Experts Do It? The Use of Ethnographic Methods as an Aid to Understanding the Cognitive Processing and Retrieval of Large Bodies of Text BIBA 127-133
  Donald Owen Case
This paper explores an important problem in information retrieval: that of rapidly increasing amounts of full-text storage that is difficult to file and retrieve effectively. The author suggests that a possible avenue for improving full-text retrieval would include in-depth studies of the ways in which individual users cope with large amounts of written information, stored chiefly on paper in their offices. Relevant literature in cognitive psychology is reviewed and some recent and continuing studies are described that have used anthropological methods to approach this problem. It is argued that historians are a good group to study, due to their reliance on the examination and processing of texts, and the broad scope of their inquiries. Examinations of the ways in which this one group of information workers categorize documents could lead us to a better understanding of human problems in processing and retrieving textual information.
On the Nature and Function of Explanation in Intelligent Information Retrieval BIBA 135-145
  N. J. Belkin
We discuss the complexity of explanation activity in human goal-directed dialogue, and suggest that this complexity ought to be taken account of in the design of explanation in human-computer interaction. We propose a general model of clarity in human-computer systems, of which explanation is one component. On the bases of: this model; of a model of human-intermediary interaction in the document retrieval situation as one of cooperative model-building for the purpose of developing an appropriate search formulation; and, on the results of empirical observation of human user-human intermediary interaction in information systems, we propose a model for explanation by the computer intermediary in information retrieval.

Parallel Distributed Processes

On the Use of Spreading Activation Methods in Automatic Information Retrieval BIBA 147-160
  Gerard Salton; Chris Buckley
Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.
Knowledge Representation, Connectionism, and Conceptual Retrieval BIBA 161-174
  Ronald J. Brachman; Deborah L. McGuinness
Knowledge Representation (KR) systems provide support for Artificial Intelligence systems that reason about relationships between objects in their domains of expertise. Because of their support for inference, KR systems appear to have potential to enrich the kind of retrievals that IR systems might make. Ironically, however, the most useful KR systems are limited to reasoning based on a rigid notion of validity, and thus are awkward to use when relevant but inexact retrievals are desired. We have been exploring the potential of a "connectionist model" -- the Boltzmann Machine -- to overcome this limitation. We report on a number of experiments in which we use a connectionist simulator to support similarity-based reasoning in a frame representation. We draw some tentative, mixed conclusions on the potential for a union of KR, IR, and connectionism.

Applications (1)

BABEL: A Base for an Experimental Library BIBA 175-190
  Hassan Ait-Kaci; Roger Nasr; Jungyun Seo
This report discusses the implementation of a knowledge base for a library information system. It is done using a typed logic programming language -- LOGIN -- where type inheritance is built in. The knowledge base is structured in a hierarchical taxonomy of library object classes where each class is represented in a FRAME style knowledge structure and inherits the properties of its parents, and where infrastructural inference rules have been established through typed Horn clauses. Also in this document, some programming techniques aimed at using the power of inheritance as taxonomic inference are discussed.
ALLOY: An Amalgamation of Expert, Linguistic and Statistical Indexing Methods BIBA 191-199
  Leslie P. Jones; Cary deBessonet; Sukhmay Kundu
In this paper we report progress on development of ALLOY, a system that simplifies automatic document indexing and retrieval by combining techniques from several different approaches: expert, linguistic and statistical. The system is being designed to allow a panel of experts to create an ALLOY system for a given field by providing the necessary input that ALLOY needs to automatically index documents and to set up a convenient user interface. The input provided by the experts includes a hierarchy of concepts and an expert dictionary. The amount of information that the panel must provide for given field is considerably less than the amount required to build a complete thesaurus or knowledge base about that field.

Quantitative Models (1)

Two Learning Schemes in Information Retrieval BIBA 201-218
  Clement T. Yu; Hirotaka Mizuno
Two methods are given to improve weighting schemes by using relevance information of a set of queries. The first method is to estimate parameter values of two independence models in information retrieval -- the binary independence model and the non-binary independence model. The parameters estimated here are used to calculate optimal weights for terms in a different set of queries. Performance of this estimation is compared to the inverse document frequency method, the cosine measure, and the statistical similarity measure. The second method is to learn optimal weights of the non-binary independence model adaptively by a learning formula. Experiments are performed on three different document collections CISI, MEDLARS, and CRN4NUL for both methods, and results are reported. Both methods show improvements compared to the existing weighting schemes. Experimental results show that the second method gives slightly better performance than the first one, and has simpler implementation.
Linear Structure in Information Retrieval BIBA 219-232
  S. K. M. Wong; Y. Y. Yao; P. Bollmann
Based on the concept of user preference, we investigate the linear structure in information retrieval. We also discuss a practical procedure to determine the linear decision function and present an analysis of term weighting. Our experimental results seem to demonstrate that our model provides a useful framework for the design of an adaptive system.
Information Retrieval using Impression of Documents as a Clue BIBA 233-244
  Fusako Hirabayashi; Hiroshi Matoba; Yutaka Kasahara
Proposed here is an internal representation and mapping method for multimedia information in which retrieval is based on the impression documents arc desired to make. A user interface design for a system using this method is also proposed.
   The proposed internal representation and mapping method represents each desired document impression as an axis in a semantic space. Documents are represented as points in the space. Queries are represented as subspaces. The proposed user interface design employs a method of visual presentation of the semantic space.
   For evaluation purposes, a prototype system has been developed. An image retrieval experiment shows that the proposed internal representation and mapping method and the user interface design provide effective tools for information retrieval.

Quantitative Models (2)

A Utility-Theoretic Analysis of Expected Search Length BIBA 245-256
  Peter Bollmann; Vijay V. Raghavan
In this paper the expected search length, which is a measure of retrieval system performance, is investigated from the viewpoint of axiomatic utility theory. Necessary and sufficient criteria for the expected search length to be an ordinal scale and sufficient criteria that it is a ratio scale are given.
Optimum Probability Estimation Based on Expectations BIBA 257-273
  Norbert Fuhr; Hubert Huther
Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and other properties of probabilistic models. Then we define an optimum estimate which can be applied to various typical estimation problems in IR. A method for the computation of this estimate is described which uses expectations from empirical distributions. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield estimates with systematic errors.

Thesaural Models

Concept Based Retrieval in Classical IR Systems BIBA 275-289
  H. P. Giger
This paper describes some aspects of a project with the aim of developing a user-friendly interface to a classical Information Retrieval (IR) System in order to improve the effectiveness of retrieval. The character by character approach to IR has been abandoned in favor of an approach based on the meaning of both the queries and the texts containing the information to be sought. The concept space, locally derived from a thesaurus, is used to represent a query as well as documents retrieved in atomic concept units. Dependencies between the search terms are taken into account. The meanings of the query and the retrieved documents (results of Elementary Logical Conjuncts (ELCs)) are compared. The ranking method on the semantical level is used in connection with existing data of a classical IR system. The user enters queries without using complex Boolean expressions.
Coefficients for Combining Concept Classes in a Collection BIBA 291-307
  Edward A. Fox; Gary L. Nunn; Whay C. Lee
This report considers combining information to improve retrieval. The vector space model has been extended so different classes of data are associated with distinct concept types and their respective subvectors. Two collections with multiple concept types are described, ISI-1460 and CACM-3204. Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type. After sampling and transformation of data, the coefficient of determination for the best model was 48 (.66) for ISI (CACM). Average precision for the two collections was 11% (31%) better for probabilistic feedback with all types versus with terms only. These findings may be of particular interest to designers of document retrieval or hypertext systems since the role of links is shown to be especially beneficial.
A Cluster-Based Approach to Thesaurus Construction BIBA 309-320
  Carolyn J. Crouch
The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

Applications (2)

Towards Interactive Query Expansion BIBA 321-331
  Donna Harman
In an era of online retrieval, it is appropriate to offer guidance to users wishing to improve their initial queries. One form of such guidance could be short lists of suggested term gathered from feedback, nearest neighbors, and term variants of original query terms. To verify this approach, a series of experiments were run using the Cranfield test collection to discover techniques to select terms for these lists that would be effective for further retrieval. The results show that significant improvement can be expected from this approach to query expansion.
The Automatic Indexing System AIR/PHYS -- From Research to Application BIBA 333-342
  Peter Biebricher; Norbert Fuhr; Gerhard Lustig; Michael Schwantner; Gerhard Knorz
Since October 1985, the automatic indexing system AIR/PHYS has been used in the input production of the physics data base of the Fachinformationszentrum Karlsruhe/West Germany. The texts to be indexed are abstracts written in English. The system of descriptors is prescribed. For the application of the AIR/PHYS system a large-scale dictionary containing more than 600000 word-descriptor relations resp. phrase-descriptor relations has been developed. Most of these relations have been obtained by means of statistical and heuristical methods. In consequence, the relation system is rather imperfect. Therefore, the indexing system needs some fault-tolerating features. An appropriate indexing approach and the corresponding structure of the AIR/PHYS system are described. Finally, the conditions of the application as well as problems of further development are discussed.

Interfaces (1)

Retrieval Based on User Behaviour BIBA 343-357
  A. J. Kok; A. M. Botman
This paper gives an overview of the ongoing research in the Active Data Bases project at the Vrije Universiteit, Amsterdam. In this project we are specifying and building a system that helps a user in his search for useful and interesting information in large, complex information systems. The system is able to do this, because it learns from the interaction about the users and the data it contains. The indications of the users are expressed in terms of interests in the data, which serve as building blocks for user and data models. These models are then used to improve the search for interesting data.
Query Processing in a Heterogeneous Retrieval Network BIBA 359-370
  Patricia Simpson
The concept of a large-scale information retrieval network incorporating heterogeneous retrieval systems and users is introduced, and the necessary components for enabling term-based searching of any database by untrained end-users are outlined. We define a normal form for expression of queries, show that such queries can be automatically produced, if necessary, from a nature language request for information, and give algorithms for translating such queries, with little or no loss of expressiveness, into equivalent queries on both Boolean and term-vector type retrieval systems. We conclude with a proposal for extending this approach to arbitrary database models.
Some Measures and Procedures for Evaluation of the User Interface in an Information Retrieval System BIBA 371-385
  Jean Tague; Ryan Schultz
Planning the evaluation of an information retrieval system involves two steps: first, a determination of performance descriptors and measures appropriate to the system objectives and, secondly, a development of an evaluation design which ensures the effect of variation in components of interest will be isolated and assessed in an unbiased fashion. This paper examines the question of retrieval system evaluation from the perspective of the user. It presents evaluation procedures which are appropriate to this perspective and which can be used to isolate the effect of variation in the user interface to the system. The general procedure is exemplified by an application to evaluation of an experimental OPAC interface.

Interfaces (2)

IR-NLI II: Applying Man-Machine Interaction and Artificial Intelligence Concepts to Information Retrieval BIBA 387-399
  Giorgio Brajnik; Giovanni Guida; Carlo Tasso
This paper addresses the problem of building expert interfaces to information retrieval systems. In particular, the problem of augmenting the capabilities of such interfaces with user modeling features is discussed and the main benefits of this approach are outlined. The paper presents a prototype system called IR-NLI II, devoted to model by means of artificial intelligence techniques the human intermediary to information retrieval systems. The overall organization of the IR-NLI II system is presented, together with a short description of the two main modules implemented so far, namely the Information Retrieval Expert Subsystem and the User Modeling, Subsystem. An example of interaction with IR-NLI II is described. Perspectives and future research directions are finally outlined.
Intelligent Support for Interface Systems BIBA 401-415
  F. N. Teskey
This paper describes how a language for building interfaces to information systems, that is being developed by the Office of Research at OCLC, can be linked to an artificial intelligence environment, Poplog. A demonstration system, showing how Poplog could provide some intelligent support for a D interface, has been developed. It is suggested that this could form the basis for intelligent support for interface systems.

Data Bases

A Parallel Multiprocessor Machine Dedicated to Relational and Deductive Databases BIBA 417-431
  R. Gonzalaz-Rubio; M. Couprie
Efficiency in databases is a major requirement. This paper presents some solutions to cope with this problem. One solution is to execute operations in parallel: this is done in the "Delta Driven Computer" DDC, which is a multiprocessor machine with distributed memory dedicated to relational and deductive databases. In DDC, relations are distributed among the nodes of the machine, and the data are processed asynchronously in each node. To do that in an efficient way, a coprocessor, specialized for relational operations, is also proposed. It is called µSyC, for "microprogrammable Symbolic Coprocessor". This paper is divided into two parts. The first part describes DDC, presenting the architecture, the languages, and an original computational model. The second part describes µSyC, its architecture, instruction set and the data structures used at the µSyC level.
Flexible Selection among Objects: A Framework Based on Fuzzy Sets BIBA 433-449
  P. Bosc; M. Galibourg
Up to now, most of the retrieving systems are founded on a boolean selection mechanism. It appears that this way of doing is not powerful enough to deal with some applications, especially when the size (number) of the results must be controlled. In that case, some kind of flexibility is needed in query expression. In this paper, we suggest the use of a fuzzy sets based approach. The basic principles of this approach are presented and compared to more conventional solutions providing only limited extensions. Moreover, the implementation aspects related to our approach are discussed to show that reasonable performances can be expected.
The Document Management Component of a Multimedia Data Model BIBA 451-464
  Christophe Damier; Bruno Defude
We describe ESTRELLA a multimedia object oriented data model developed by MATRA. This model is based upon objects, classes (organized in a lattice) and functions (allow to dynamically implement operations on data and new data types). The valid states of the data base are described by a set of integrity constraints. We propose a document model capable to manage structured documents and to index them with a superimposed codes method. We present as well the associated data manipulation language with a navigational interface and content search operators.

Artificial Intelligence (1)

Information Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure BIBA 465-480
  George W. Furnas; Scott Deerwester; Susan T. Dumais; Thomas K. Landauer; Richard A. Harshman; Lynn A. Streeter; Karen E. Lochbaum
In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150-dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.
Retrieving Documents by Plausible Inference: A Preliminary Study BIBA 481-494
  W. B. Croft; T. J. Lucia; P. R. Cohen
Choosing an appropriate document representation and search strategy for document retrieval has been largely guided by achieving good average performance instead of optimizing the results for each individual query. A model of retrieval based on plausible inference gives us a different perspective and suggests that techniques should be found for combining multiple sources of evidence (or search strategies) into an overall assessment of a document's relevance, rather than attempting to pick a single strategy. In this paper, we explain our approach to plausible inference for retrieval and describe some preliminary experiments designed to test this approach. The experiments use a spreading activation search to implement the plausible inference process. The results show that significant effectiveness improvements are possible using this approach.

Logical Models

An Outline of a General Model for Information Retrieval Systems BIBA 495-506
  Jianyun Nie
This paper is a contribution to the construction of a general model for information retrieval. As in the paper of van Rijsbergen ([RIJ86]), the implicit base in all information retrieval systems is considered as a logical implication. The measure of correspondence between a document and a query is transformed into the estimation of the strength (or certainty) of logical implication. The modal logics will show its suitability for representing the behavior of information retrieval systems. In existing information Retrieval models, several aspects are often mixed. A part of this paper is contributed to separate these aspects to give a clearer view of information retrieval systems. This general model is also compared with some existing models to show its generality.
French Textual Information Systems: The Contribution of Extensional and Intensional Logics BIBA 507-518
  Sylvie Laine; Omar Larouk; Isabelle Vidalenc
Within its research activities in information retrieval system design, the SYDOL group defines new methods of automatic analysis of French textual corpus and computer assisted information searching. A text indexing is set up, as a relational data base, by the description of syntactic connections between Noun Phrases within the discourse. This paper presents the basis and theoretical features of activities of SYDOL group. In particular, we show here how simultaneous use of intensional and extensional logics is essential to the Natural Language Processing Systems.
   Within this context, extensional logic is opposed to intensional one because of the existence of a referential universe to which belong the manipulated objects. After defining the concepts in relation with Noun Phrase, we show how the opposition between both logics can be illustrated in the opposition of language and speech (in SAUSSURE's acception of language versus speech), and in the information retrieval system design. Different cases are presented, where intensional meets extensional: analysis of Noun Phrase, processing of coordination, and resolution of anaphora.
An Information Structure Dealing with Term Dependence and Polysemy BIBA 519-533
  Peter Schauble
All information structure (IS) that is regarded as a formal description of a domain of discourse is proposed. This IS is aimed at increasing the effectiveness of an information retrieval system. It is shown how the retrieval algorithm can take into account the term dependencies that are provided by the IS. Moreover, these term dependencies can be used by an automatic indexing procedure in order to interpret polysemic terms. The theoretical framework of our IS has some favorable properties. As a consequence, the construction and maintenance of such an IS is simpler than that of a thesaurus.

Artificial Intelligence (2)

Planning in an Expert System for Automated Information Retrieval BIBA 535-550
  Christine Barthes; Pierre Glize
Searching online databases requires an information retrieval strategy formalized in the EURISKO expert system. This search strategy is based on different kinds of planning: at the highest level a plan orders a linear and hierarchical planning for the request interrogation and a dynamic planning for the request modification.
   The recent development of the system has allowed to supply some new judgements on this approach.
Conceptual Representation for Knowledge Bases and "Intelligent" Information Retrieval Systems BIBA 551-565
  Gian Piero Zarri
This paper describes the "conceptual" Knowledge Representation Language (KRL) proper to an environment for the construction and use of large Knowledge Bases and/or "Intelligent" Information Retrieval Systems. In the KRL, we separate the treatment of the episodic memory (extensional, assertional data = "Snoopy is Charlie Brown's beagle") from the treatment of the semantic: memory (intensional, terminological data = "A beagle is a sort of hound / a hound is a dog ...). A compromise between an "object-oriented approach" and a "logic-oriented approach" is proposed for implementation purposes.

Set Oriented Models

Rough Sets and Information Retrieval BIBA 567-581
  Padmini Das-Gupta
The theory of rough sets was introduced [PAWLAK82]. It allows us to classify objects into sets of equivalent members based on their attributes. We may then examine any combination of the same objects (or even their attributes) using the resultant classification. The theory has direct applications in the design and evaluation of classification schemes and the selection of discriminating attributes. Pawlak's papers discuss its application in the domain of medical diagnostic systems. Here we apply it to the design of information retrieval systems accessing collections of documents. Advantages offered by the theory are: the implicit inclusion of boolean logic; term weighting; and the ability to rank retrieved documents. In the first section we describe the theory. This is derived from the work by [PAWLAK84,PAWLAK82] and includes only the most relevant aspects of the theory. In the second we apply it to information retrieval. Specifically, we design the approximation space, search strategies as well as illustrate the application of relevance feedback to improve document indexing. Following this in section three we compare the rough set formalism to the boolean, or and fuzzy models of information retrieval. Finally we present a small scale evaluation of rough sets which indicates its potential in information retrieval.
Set Oriented Retrieval BIBA 583-596
  A. Bookstein
The broad way in which we look at how an IRS functions influences the types of questions we ask about it and the ways we try to improve performance. In the recent past, retrieval methodologies have been based on retrieving documents one at a time. In this paper we are introducing a set oriented view. We observe that this view is quite consistent with the single-document or sequential methods, and define a precise model to capture the set-oriented approach. We then examine a number consequences of the model, such as the limitations implied by a finite index vocabulary. Finally, we discuss various ways in which the set orientation can influence our thinking about IR.

Implementation Techniques

Compression of Concordances in Full-Text Retrieval Systems BIBA 597-612
  Yaacov Choueka; Aviezri S. Fraenkel; Shmuel T. Klein
The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of "coordinates", each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large full text retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.
Activity Memory for Text Information Retrieval BIBA 613-627
  Yan H. Ng; Silvano P. V. Barros
A Symbolic Associative Processor (SAP), capable of supporting parallel Keyword Match and Record Match functions, is proposed to select and streamline textual data for information retrieval. Consequently, high volume text data could be analysed on-the-fly before being channelled to CPU, and thus, cushion the impact of Von Neumann bottleneck commonly experienced in applications requiring high I/O bandwidth. This paper identifies some of the system requirements to support text information retrieval using SAP with the aid of simplified examples.
Access by Contents of Documents in an Office Information System BIBA 629-649
  Claudia Jimenez Guarin
This paper presents the integration of retrieval functions of an Information Retrieval System, IOTA, in an Office Information Server. Besides the linear scanning of the text (using a software and a hardware filter), two access methods are proposed. The first one is based on a simple indexing of documents based on signatures. Here, texts are treated as character strings. We call this method Textual Search. The second one is based on the extention of Signature Methods for implementing the Indexing Relation of IOTA, where meaningful terms (noun groups, for example) are identified in the text together with grammatical information. We call this method of signature computation the Indexing-Term Signature. The resulting access method is called Semantic Search. We present the current experimentations using the SCHUSS hardware filter as a scanning accelerator and the results of different alternatives of implementation of these Retrieval functions.

Applications (3)

Development of a Large, Concept-Oriented Database for Information Retrieval BIBA 651-661
  Robert H. Ledwith
The development of concept-oriented databases using AI knowledge representation schemes is proposed as a step towards improving the precision and recall of information retrieval systems. Currently underway is the augmentation of a 238,000 citation database, Chemical Abstracts (CA) Volume 105, by addition of detailed conceptual information in the form of frames and hierarchies. The initial text data is parsed using natural language processing (NLP) techniques to create frames describing the semantics of the index entries in the database, with the slots in the frames being pointers into a very large semantic network of conceptual objects (956,000 objects). To examine the resultant knowledge base (KB), a simple hypertext system is proposed, with the conceptual information serving as pathways to connect related citations.
Integrated Information Retrieval for Law in a Hypertext Environment BIBA 663-677
  Eve Wilson
A prototype information retrieval system for lawyers, Justus, has been developed on a Sun workstation to run in a Guide hypertext environment. The hypertext database is created automatically by Justus from machine readable versions of the ordinary printed texts, ideally the publisher's typesetting tapes. The database incorporates primary legal sources, such as statutes and cases, and secondary sources, such as textbooks and a dictionary. Initially, the lawyer may select any document in the system. From this initial document, he may access any other document, or part of any other document, to which reference is made. Reference selection is by a pointing device, such as a mouse. There is no limit on the number of selections that can be made, and no restrictions on the path through the system.