| Parallel Processing and Information Retrieval | | BIB | 1 | |
| David Waltz | |||
| In Search of Knowledge-Based Search Tactics | | BIBA | 3-10 | |
| Philip J. Smith; Steven J. Shute; Deb Galdes; Mark H. Chignell | |||
| Knowledge-based search tactics are discussed in terms of their role in the
functioning of a semantically-based search system for bibliographic information
retrieval. This prototype system, EP-X, actively assists users in defining or
refining their topics of interest. It does so by applying search tactics to a
knowledge-base describing topics in a particular domain and a database
describing the contents of individual documents.
This paper reviews the empirical studies that lead to the two central concepts implemented in EP-X: 1. Semantically-based search; 2. Knowledge-based search tactics. It then describes the capabilities of a system based on such concepts. | |||
| Adaptive Information Retrieval: Using a Connectionist Representation to Retrieve and Learn about Documents | | BIBA | 11-20 | |
| Richard K. Belew | |||
| AIR represents a connectionist approach to the task of information retrieval. The system uses relevance feedback from its users to change its representation of authors, index terms and documents so that, over time, AIR improves at its task. The result is a representation of the consensual meaning of keywords and documents shared by some group of users. The central focus goal of this paper is to use our experience with AIR to highlight those characteristics of connectionist representations that make them particularly appropriate for IR applications. We argue that this associative representation is a natural generalization of traditional IR techniques, and that connectionist learning techniques are effective in this setting. | |||
| A Neural Network for Probabilistic Information Retrieval | | BIBA | 21-30 | |
| K. L. Kwok | |||
| This paper demonstrates how a neural network may be constructed, together with learning algorithms and modes of operation, that will provide retrieval effectiveness similar to that of the probabilistic indexing and retrieval model based on single terms as document components. | |||
| Design of a Browsing Interface for Information Retrieval | | BIBA | 32-39 | |
| Robert Godin; Jan Gecsei; Claude Pichet | |||
| In conventional Boolean retrieval systems, users have difficulty controlling
the amount of output obtained from a given query. This paper describes the
design of a user interface which permits gradual enlargement or refinement of
the user's query by browsing through a graph of term and document subsets.
This graph is obtained from a lattice automatically generated from the usual
document-term relation. The major design features of the proposed interface
are the integration of menu, fill-in the blank and direct manipulation modes of
interaction within the "fisheye view" [Furnas, 1986] paradigm. A prototype
user interface incorporating some of these ideas has been implemented on a
microcomputer.
The resulting interface is well adapted to various kinds of users and needs. More experienced users with a particular subject in mind can directly specify a query which results into a jump to a particular vertex in the graph. From there, the user can refine his initial query by browsing through the graph from that point on. On the other hand, casual users without any prior knowledge of the contents of the system or users without any particular subject in mind can freely navigate through the graph without ever specifying any query. | |||
| A Library System for Information Retrieval Based on a Cognitive Task Analysis and Supported by an Icon-Based Interface | | BIB | 40-47 | |
| Annelise Mark Pejtersen | |||
| Integrated Information Retrieval in a Knowledge Worker Support System | | BIBA | 48-57 | |
| Gordon McAlpine; Peter Ingwersen | |||
| This paper describes the design of the information retrieval facilities of an integrated information system called EUROMATH. EUROMATH is an example of a Knowledge Worker Support System: it has been designed specifically to support mathematicians in their research work. EUROMATH is required to provide uniform retrieval facilities for searching in a user's personal data, in a shared database of structured documents and in public, bibliographic databases. The design of information retrieval facilities that satisfy these and other requirements posed several interesting design issues regarding the integration of various retrieval techniques. As well as a uniform query language, designed to be highly usable by the target user group, the retrieval facilities provide expert intermediary functions, i.e. sophisticated support for the retrieval of bibliographic data. This support is achieved using a model of the user, a model of the user's information need and a set of search strategies based on those used by human intermediaries. The expert intermediary facilities include extensive help facilities, automatic query reformulation and browsing of a variety of sources of query terms. | |||
| Retrieval System Evaluation Using Recall and Precision: Problems and Answers | | BIB | 59-68 | |
| Vijay V. Raghavan; Peter Bollmann; Gwang S. Jung | |||
| Optimum Polynomial Retrieval Functions | | BIBA | 69-76 | |
| Norbert Fuhr | |||
| We show that any approach to develop optimum retrieval functions is based on two kinds of assumptions: first, a certain form of representation for documents and requests, and second, additional simplifying assumptions that predefine the type of the retrieval function. Then we describe an approach for the development of optimum polynomial retrieval functions: request-document pairs (fl,dm) are mapped onto description vectors x(fl,dm), and a polynomial function of the form aT×v(x) is developed such that it yields estimates of the probability of relevance P(R|x(fl,dm) with minimum square errors. We give experimental results for the application of this approach to documents with weighted indexing as well as to documents with complex representations. In contrast to other probabilistic models, our approach yields estimates of the actual probabilities, it can handle very complex representations of documents and requests, and it can be easily applied to multi-valued relevance scales. On the other hand, this approach is not suited to log-linear probabilistic models, and it needs large samples of relevance feedback data for its application. | |||
| Towards an Information Logic | | BIB | 77-86 | |
| C. J. van Rijsbergen | |||
| A Parallel Indexed Algorithm for Information Retrieval | | BIBA | 88-97 | |
| Craig Stanfill; Robert Thau; David Waltz | |||
| In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel algorithm for retrieval based on signature files. First, it permits the employment of ranking strategies which cannot be easily implemented using signature files, specifically methods which depend on document-term weighting. Second, it permits the interactive searching of databases resident on secondary storage. The algorithm is evaluated via a mixture of analytic and simulation techniques, with a particular focus on how cost-effectiveness and efficiency change as the size of the database, number of processors, and cost of memory are altered. In particular, we find that if the ratio of the number of processors and/or disks to the size of the database is held constant, then the cost-effectiveness of the resulting system remains constant. Furthermore, for a given size of database, there is a number of processors which optimizes cost-effectiveness. Estimated response times are also presented. Using these methods, it appears that cost-effective interactive access to databases in the 100-1000 GB range can be achieved using current technology. | |||
| An Optical System for Full Text Search | | BIBA | 98-107 | |
| Pericles A. Mitkas; P. Bruce Berra; Peter S. Guifoyle | |||
| In this paper we propose a full text search system based on optics. The storage and processing of the textual data are performed by an optical back-end system to an electronic computer. In this way we can take advantage of the speed and parallelism of digital optical processing. Using the proposed configuration we show how one might implement a set of text processing operations using lasers, spatial light modulators and photodetectors. | |||
| Retrieving Highly Dynamic, Widely Distributed Information | | BIBA | 108-115 | |
| M. F. Wyle; H. P. Frei | |||
| Wide area networks provide a variety of information sources which can be exploited only by appropriate information retrieval techniques such as repeated automatic query of remote databases and bulletin boards. Distinctive features of the content and access methods of information on wide area nets are discussed from an IR perspective. The development, algorithms, and analysis of a functioning system are also presented. | |||
| The Constituent Object Parser: Syntactic Structure Matching for Information Retrieval | | BIB | 117-126 | |
| Douglas P. Metzler; Stephanie W. Hass | |||
| Word Sense Disambiguation Using Machine-Readable Dictionaries | | BIBA | 127-136 | |
| Robert Krovetz; W. Bruce Croft | |||
| Most approaches to full-text information retrieval currently index documents based on the words they contain, and retrieve them based on the word's frequency of occurrence. This can cause many irrelevant documents to be retrieved because words are often ambiguous. We propose an approach in which documents are indexed by word senses, and in which these senses are taken from a machine-readable dictionary. We review some of the work on machine-readable dictionaries and the approaches that have been taken to word sense disambiguation. We then discuss our own approach to the problem based on the use of multiple sources of evidence. We conclude with the results of some experiments that indicate the degree to which lexical ambiguity is a factor in current systems. | |||
| On the Application of Syntactic Methodologies in Automatic Text Analysis | | BIBA | 137-150 | |
| Gerard Salton; Maria Smith | |||
| This study summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Included are standard syntactic methods to generate complex content identifiers, and the use of semantic know-how obtained from machine-readable dictionaries and from specially constructed knowledge bases. A particular syntactic analysis methodology is also outlined and its usefulness for the automatic construction of book indexes is examined. | |||
| File Organizations and Access Methods for CLV Optical Discs | | BIBA | 152-159 | |
| Stavros Christodoulakis; Daniel Alexander Ford | |||
| A large and important class of optical disc technology are CLV format discs
such as CD ROM and WORM. In this paper, we examine the issues related to the
implementation and performance of several different file organizations on CLV
format optical discs such as CD ROM and WORM. The organizations examined are
based on hashing and trees.
The CLV recording scheme is shown to be a good environment for efficiently implementing hashing. Single seek access and storage utilization levels approaching 100% can be achieved for CD ROM's. It is shown that a B-tree organization is not a good choice for WORM discs (both CAV and CLV), but a modified ISAM approach can be appropriate for WORM discs. We describe clustered BIM's, a class of tree organizations appropriate for CD ROMS. Expressions for the expected retrieval performance of both hashing and trees are also given. The paper concludes by outlining recent results and future directions on buffered implementations of access methods for WORM discs, as well as advantages of signature based access methods for text retrieval in WORM disc architectures. | |||
| Storing Text Retrieval Systems on CD-ROM: Compression and Encryption Considerations | | BIB | 160-167 | |
| Shmuel T. Klein; Abraham Bookstein; Scott Deerwester | |||
| A New Approach to Text Searching | | BIBA | 168-175 | |
| Ricardo A. Baeza-Yates; Gaston H. Gonnet | |||
| We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they are real time algorithms, they don't need to buffer the input, and they are suitable to be implemented in hardware. | |||
| Multikey Access Methods Based on Term Discrimination and Signature Clustering | | BIBA | 176-185 | |
| Jae W. Chang; Joon H. Lee; Yoon J. Lee | |||
| In order to improve the two-level signature file method designed by Sacks-Davis et al. [20], we propose new multikey access methods based on term discrimination and signature clustering. By term discrimination, we create separate, efficient access methods for the terms frequently used in user queries. We in addition cluster similar signatures by means of these terms so that we may achieve good performance on retrieval. Meanwhile we provide the space time analysis of the proposed methods and compare them with the two-level signature file method. We show that the proposed methods achieve 15-30% savings in retrieval time and require 3-9% more storage overhead. | |||
| Indexing Medical Reports in a Multimedia Environment: The RIME Experimental Approach | | BIBA | 187-197 | |
| Catherine Berrut; Yves Chiaramella | |||
| This paper focuses on the RIME system aimed to the indexing of medical reports in a multimedia environment. This particular application is viewed as representative of a large set of still unanswered needs of large communities of users: domain experts dealing with on-line specialized documentation such as software engineers, medical specialists and so on. In this application textual information appears as an interesting media to access related pictures in the data base. After the presentation of the application and a study of the particular corpus involved we define a semantic model for the documents which is based on a Conceptual Language. Then we detail the indexing process and its various linguistic components which perform the translation of every medical report according to this semantic model. | |||
| Full Text Indexing Based on Lexical Relations An Application: Software Libraries | | BIBAK | 198-206 | |
| Yoelle S. Maarek; Frank A. Smadja | |||
| In contrast to other kinds of libraries, software libraries need to be
conceptually organized. When looking for a component, the main concern of
users is the functionality of the desired component; implementation details are
secondary. Software reuse would be enhanced with conceptually organized large
libraries of software components. In this paper, we present GURU, a tool that
allows automatical building of such large software libraries from documented
software components. We focus here on GURU's indexing component which extracts
conceptual attributes from natural language documentation. This indexing
method is based on words' co-occurrences. It first uses EXTRACT, a
co-occurrence knowledge compiler for extracting potential attributes from
textual documents. Conceptually relevant collocations are then selected
according to their resolving power, which scales down the noise due to context
words. This fully automated indexing tool thus goes further than keyword-based
tools in the understanding of a document without the brittleness of
knowledge-based tools. The indexing component of GURU is fully implemented,
and some results are given in the paper. Keywords: Automatic indexing, Software libraries, software reuse, Lexical relations,
Natural language processing, Co-occurrence knowledge | |||
| How a Personal Document's Intended Use or Purpose Affects its Classification in an Office | | BIBA | 207-210 | |
| Barbara H. Kwasnik | |||
| This paper reports on one of the findings of a larger case study that attempts to describe how people organize documents in their own offices. In that study, several dimensions along which people make classificatory decisions were identified. Of these, the use to which a document is put emerged as a strong determiner of that document's classification. The method of analysis is reviewed, and examples of different kinds of uses are presented, demonstrating that it is possible to describe a wide variety of specific instances using a closed set of descriptors. The suggestion is made that, in designing systems for organizing materials, it might be advantageous to incorporate information about contextual variables, such as use, since these seem to be particularly important in classification decisions made within personal environments. | |||
| Information Retrieval Using a Hypertext-Based Help System | | BIBAK | 212-220 | |
| F. R. Campagnoni; Kate Ehrlich | |||
| Hypertext offers users a simple, flexible way to navigate through electronic
information systems but at the potential risk of becoming lost in the network
of interconnected pieces of information. A study was conducted on information
retrieval using a commercial hypertext-based help system. It was found that
the predominant search strategy was "browsing" (characterized by scanning
tables of contents and paging through topics), rather than employing the
indexes ("analytical search"). Although subjects did not become lost,
individuals with better spatial visualization ability, as measured by a
standardized test, were faster at retrieving information and returned to the
top of the information hierarchy less often than those with poorer spatial
visualization ability. These results support previous studies that have found
a strong preference by users to browse in hypertext systems and extend those
findings to a new domain (help), a different type of user interface, and a
different information architecture. In addition, the results demonstrate the
importance of spatial visualization ability for efficient navigation and
information retrieval in a hierarchical hypertext system. Keywords: Hypertext, Help systems, Information retrieval, Individual differences,
Visualization | |||
| A Hypertext Knowledge Base for Primary Care -- Limeds in Lincks | | BIBA | 221-228 | |
| Toomas Timpka; Lin Padgham; Per Hedblom; Stefan Wallin; Gosta Tibblin | |||
| In organized health care, primary care is the first level. It is characterized by the wide span of health problems managed as well as remote location from traditional medical information and knowledge sources. The LIMEDS project has formulated the special requirements for integrated knowledge and data base management in primary care. This paper presents Gosta's book, a hypertext knowledge base implemented in LINCKS, an object oriented, networked database system. Firstly, aspects which make integrated hypermedia systems particularly suitable for application in primary health care are explored. We then describe the hypertext knowledge base, consisting of 500 basic text objects and 3000 links, and current implementations using the NODE data model. NODE is implemented on a SUN III fileserver, and the user interface for the hypertext context on Apple Macintosh. Combination of design methods towards a parallel means-ends strategy was found to be necessary to achieve Gosta's book. Design groups need to be composed of computer science, medical, psychological and organizational competences. | |||
| Settings and the Settings Structure: The Description and Automated Propagation of Networks for Perusing Videodisc Image States | | BIBA | 229-238 | |
| Alan P. Parkes | |||
| This paper describes a system for formally representing spatial relationships between videodisc image states called settings. A number of setting relations are defined, these being based on the manipulations of the camera typically used in the production of the moving film: zooming in or out, panning etc. An algorithm is presented which, given a limited level of initial specification by a describer, will constrain, where possible, the selling relations holding between all pairs of settings. The resulting network is called the settings structure. The paper begins by placing the settings structure into the context of its being one part of the CLORIS system. | |||
| The Lexicon and Information Retrieval | | BIB | 240-241 | |
| Robert Krovetz; Ed Fox; Robert J. P. Ingria; Henry Kucera | |||
| Research Toward the Development of a Lexical Knowledge Base for Natural Language Processing | | BIBA | 242-249 | |
| Robert A. Amsler | |||
| This paper documents research toward building a complete lexicon containing all the words found in general newspaper text. It is intended to provide the reader with an understanding of the inherent limitations of existing vocabulary collection methods and the need for greater attention to multi-word phrases as the building blocks of text. Additionally, while traditional reference books define many proper nouns, they appear to be very limited in their coverage of the new proper nouns appearing daily in newspapers. Proper nouns appear to require a grammar and lexicon of components much the way general parsing of text requires syntactic rules and a lexicon of common nouns. | |||
| Present and Future of Electronic Databases | | BIB | 250 | |
| Gerard Salton; Martha Williams; David Penniman; John Regazzi; Tadeusz Radecki | |||
| Information Retrieval and Software Reuse | | BIBA | 251-256 | |
| W. B. Frakes; N. Belkin; R. Prieto-Diaz; S. Wartik | |||
| Software reuse is widely believed to be the most promising technology for improving software quality and productivity. There are many technical and non-technical problems to be solved, however, before widespread reuse of software lifecycle objects becomes a reality. One class of problem concerns the classification, storage, and retrieval of reusable components. Panel members will discuss these problems and some approaches to solving them. | |||