HCI Bibliography Home | HCI Conferences | IR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 8687888990919293949596

Proceedings of the Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Editors:Fausto Rabitti
Location:Pisa, Italy
Dates:1986-Sep-08 to 1986-Sep-10
Standard No:ISBN 0-89791-187-3; ACM Order Number 606860; ACM DL: Table of Contents hcibib: IR86
  1. Keynote Speech
  2. Office Systems
  3. User Interfaces
  4. Storage Structures
  5. Linguistic Retrieval
  6. Information Retrieval Systems
  7. Clustering
  8. Retrieval Strategies
  9. Knowledge Based Information Retrieval (I)
  10. Knowledge Based Information Retrieval (II)
  11. Learning Systems
  12. Probabilistic Retrieval

Keynote Speech

Recent Trends in Automatic Information Retrieval BIBA 1-10
  Gerard Salton
Substantial successes were achieved in the early years in automatic indexing and retrieval using single term indexing theories with term weight assignments based on frequency considerations. The development of more refined indexing systems using thesaurus aids and automatically constructed term association maps changed the retrieval effectiveness only slightly. The recent introduction of the relevance concept in the form of probabilistic retrieval models provided a firm basis for term weighting and document ranking practices. However, the probabilistic methods were not helpful in substantially enhancing the retrieval effectiveness.
   At the present time, attempts are made to add artificial intelligence concepts to the document retrieval environment in the form of fancy graphics interfaces, learning systems for query and document indexing and for collection searching, extended logic models relating documents and information requests, and analysis methods based on the use of semantic maps and other kinds of knowledge structures. Using the earlier developments and evaluation results as guidelines, an attempt is made to outline the information retrieval environment of the future and to assess the usefulness of some of the currently proposed search and retrieval methods.
Using Structural Representations of Anomalous States of Knowledge for Choosing Document Retrieval Strategies BIBA 11-22
  N. J. Belkin; B. H. Kwasnik
We report on a project which attempts to classify representations of the anomalous states of knowledge (ASKs) of users of document retrieval systems on the basis of structural characteristics of the representations, and which specifies different retrieval strategies and ranking mechanisms for each ASK class. The classification and retrieval strategy specification is based on 53 real problem statements, 35 of which have a total of 250 evaluated documents. Four facets of the ASK structures have been tentatively identified, whose combinations determine the method and order of application of five basic ranking strategies. This work is still in progress, so results presented here are incomplete.

Office Systems

Document Presentation and Query Formulation in Muse BIBA 23-30
  S. Gibbs; D. Tsichritzis
Several problems of document presentation and query formulation arising in systems dealing with multimedia documents are discussed. Examples from a prototype distributed multimedia document filing system are described.
An Approach to Multimedia Information Management BIBA 31-38
  S. Gallelli; C. Iacobelli; P. Marchisio
The integrated management of multimedia information, that is of complex information consisting of conventional data, text, graphics, images and voice, is of great interest not only in fields like Information Retrieval, Office Automation, Computer Aided Instruction, Computer Aided Design, but also in other emerging fields such as Tourist Applications, Computer generated films, Newspapers and magazines production, and so on.
   In this paper the definition of multimedia object, partially derived from the ECMA standard "Office Document Architecture", is given and an approach to multimedia information management is proposed. A multilevel environment, where multimedia information can be handled, stored and retrieved and the inner level of which consists of a general purpose Multimedia Data Base Management System (MDBMS), is described. They are outlined the main functionalities that languages which describe and manipulate multimedia objects should provide.
Methodological Issues for the Design of an Office Information Server -- Focal Topics for the Analysis from an Office System Perspective -- BIBA 39-48
  Till W. Truckenmuller
This paper deal with the necessity of consideration of organizational and user requirements to create the basis for the successful design of future office information servers.
   Today volumes of the order of 10,000 to 50,000 multimode documents and 1 to 10 million documents at a company level (companies with over 1000 employees) per system have to be archived. The average amount of filing is 12 to 16 running metres of paper per year, with an increasing tendency (Ben84).
   However the shortcoming of systems of today is not the incapability of storage of big amounts of information but the fact they only support particular (and well structured) office tasks in an operative way. Future systems have to support all kind of work (procedures) people do in an office, consider the strategic and operative goals of the particular office and take the user behaviours especially their knowledge to solve problems and the individual kind of doing their jobs (e.g. search strategies, reminder functions) into account.
   Analysis methods for the collection of data for the design of office systems have mostly been developed in the context of office automation to develop systems to support particular tasks, to restructure offices to improve the profitability of the company. Because of that fact, these methods must be quite extended to supply data which are suited for the successful design and development of future office information servers.
   Focal topics of the analysis from an office system perspective are: information generation and location, consistency and permanence, work support, information handling and manipulation, access right and confidentiality, accountability, information flow and use of abstractions. Detailed dimensions concerning these areas are posted in this paper.

User Interfaces

IR, NLP, AI and UFOS: Or IR-Relevance, Natural Language Problems, Artful Intelligence and User-Friendly Online Systems BIBA 49-57
  Tamas E. Doszkocs
User Friendly Online Searching is examined in the context of Natural Language Processing in Information Retrieval and Artificial Intelligence. Opportunities for synergetic R & D are identified as the basis for Intelligent Information Retrieval and Artificial Retrieval Intelligence.
The Visual Display of Information in an Information Retrieval Environment BIBA 58-67
  Donald B. Crouch
This paper gives an overview of the graphical techniques which have been used in the representation of information in a document collection environment. An assessment of the applicability of existing multivariate data graphical techniques to the vector space model is presented.
Improved Subject Access, Browsing and Scanning Mechanisms in Modern Online IR BIBA 68-76
  Peter Ingwersen; Irene Wormell
Focusing on communication, the paper analyses and proposes practical solutions to key problems in online IR, in particular concerned with ill-defined and "muddled" information requirements, concept interpretation in searching and text representation.
   The need for development of new additional browsing and scanning feedback devices, based on existing methods to support searchers is emphasised. The paper points to economically feasible indexing methods fitting the potentials of current information technology and adaptable to in-house information environments: the SAP (Subject Access Project) principles. Focusing display mechanisms and the term frequency analysis feature Zoom are discussed and suggested combined into a flexible database front-end. The design principles are outlined and demonstrated in a worked example.

Storage Structures

S-tree: A Dynamic Balanced Signature Index for Office Retrieval BIBA 77-87
  Uwe Deppisch
The signature approach is an access method for partial-match retrieval which meets many requirements of an office environment. Signatures are hash coded binary words derived from objects stored in the data base. They serve as a filter for retrieval in order to discard a large number of nonqualifying objects. In an indexed signature method the signatures of objects stored on a single page are used to form a signature for that page. In this paper we describe a new technique of indexed signatures which combines the dynamic balancing of B-trees with the signature approach. The main problem of appropriate splitting is solved in a heuristic way. Operations are described and a simple performance analysis is given. The analysis and some experimental results indicate a considerable performance gain. Moreover, the new S-tree approach supports a clustering on a signature basis. Further remarks on adaptability complete this work.
Improved Hierarchical Bit-Vector Compression in Document Retrieval Systems BIBA 88-96
  Y. Choueka; A. S. Fraenkel; S. T. Klein; E. Segal
The "concordance" of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.
Text Compression Using Prediction BIBA 97-102
  Jukka Teuhola; Timo Raita
In the compression of the text files, the dependencies between the successive characters should be exploited to as great an extent as possible. There are two obvious possibilities: either to detect and encode often occurring character strings, or to encode successors of character blocks. This paper presents two methods based on the latter approach. In the first method we encode only the most probable successors of blocks, whereas in the second we encode them all, using the knowledge of their distribution. The second method uses recursion to store effectively the dependencies between the characters and this results in good compression gains in practical cases.

Linguistic Retrieval

Incorporating Syntactic Information into a Document Retrieval Strategy: An Investigation BIBA 103-113
  Alan F. Smeaton
This paper deals with mechanisms for performing text retrieval which incorporate a degree of linguistic processing into the overall strategy. We have performed some experiments using parsing of text an a test collection of documents and queries to try and find out exactly if and how parsing could contribute to an overall improvement in retrieval effectiveness. Investigating this topic has led us to the definition of a retrieval strategy which incorporates parsing of query text and a more "shallow" parsing of document texts, whose retrieval effectiveness is investigated and described. Our results indicate that significant improvements in retrieval effectiveness can be obtained by incorporating such linguistic processing into an overall retrieval strategy.
CALIN -- A User Interface Based on a Simple Natural Language BIBA 114-122
  P. Bosc; M. Courant; S. Robin
In the framework of an application dealing with classified advertisement matching, a dedicated user interface has been designed and implemented. Its major originality relies on the user's language which is neither an artificial one, nor the usual natural language, but in fact the ad language. Beyond the language itself, the interface provides some facilities such as paraphrasing or explanations when needed. An expert system approach has been adopted and the interface is built up from the knowledge given by experts. They are in charge of describing what are acceptable ads, from both syntax and semantics points of view ... Although designed in the context of ad matching, that interface may interestingly be adapted to other retrieving systems. We especially think that an ad-like language is well-suited to ask questions since it is based on natural simple expressions. A given sentence involves terms that stand for elementary conditions applying to instances of a logical object contained inside the information system. This approach defines a complete interface, involving both a language and aiding capabilities. Moreover, the query language, although less powerful, represents a compromise between artificial languages and the usual natural language, with respect to ergonomics and analysis complexity.
Solving Grammatical Ambiguities within a Surface Syntactical Parser for Automatic Indexing BIBA 123-130
  Catherine Berrut; Patrick Palmer
This paper describes linguistic tools specifically designed for performant automatic indexing of natural language texts. By performant indexing, we mean the ability of the system to extract noun phrases (considered as main conceptual frames regarding text content) without processing full syntactic analysis of sentences (surface analyzer) both with its ability in learning unknown words. The paper describes the overall principles of this parser, emphasizing the use of syntactic networks and precedence matrix to fulfil the above goals of reducing the analysis cost and inferring new vocabularies without interrupting the indexing process.

Information Retrieval Systems

A Design of a Distributed Full Text Retrieval System BIBA 131-137
  Patrick Martin; Ian A. Macleod; Brent Nordin
This paper describes the design of a distributed information system for full text retrieval. The system is similar in functionality to STAIRS and is being developed on a network of PC's interconnected by PC Network. The implementation is built on a generalisation of the remote procedure call concept. Communications are based upon the recent CCITT X.400 standard. Examples are given of the design strategy for a subset of the STAIRS system.
A Common Architecture for Different Text Processing Techniques in an Information Retrieval Environment BIBA 138-143
  G. Thurmair
The following paper gives an overview on a text processing software called REALIST (Retrieval Aids by Linguistics and Statistics) which integrates different text processing techniques into a common surface. It supports the user by offering the environment of a given term, using morphological, syntactic and statistic means. The user can call up the processing results, use it for indexing, classification or retrieval purposes and combine them as he wishes e.g. to set up a search logic. The text processing is done on a main frame computer, the results are transferred to a minicomputer where the evaluation is performed. REALIST is a stand alone package, fitting any existing search systems.
   In the retrieval context, this technique reduces connecting time and improves the search results.
   REALIST is able to run on English and German texts. Each REALIST component has been separately tested with good success. An integrated version is currently under test at the US Patent and Trademark Office using 150000 English patent abstracts, and a German version is being tested with 12000 legal texts of the European Community.
COREL -- A Conceptual Retrieval System BIBA 144-148
  M. Kathryn Di Benigno; George R. Cross; Cary G. deBessonet
COREL is an experimental retrieval system that employs techniques of artificial intelligence. Articles of the Civil Code of Louisiana have been conceptually indexed using frame-based knowledge structures in hope of improving accessibility over traditional key-word retrieval systems. A set of macro packages has been developed to allow a domain expert to implement a retrieval system based on this methodology.


Hierarchic Document Clustering Using Ward's Method BIBA 149-156
  A. El-Hamdouchi; P. Willett
In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbour searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.
User-Oriented Document Clustering: A Framework for Learning in Information Retrieval BIBA 157-163
  J. S. Deogun; V. V. Raghavan
In information retrieval, cluster analysis is an important tool employed to enhance both efficiency and effectiveness of the retrieval process. Most clustering algorithms have difficulty in reflecting the closeness of documents as perceived by the user. A two phase scheme for document clustering, whose results reflect the "conceptual" clusters that are perceived by the user of the retrieval system, is proposed. Since the clusters obtained by this scheme are not characterized in terms of the document representations, a strategy for cluster searching is also developed. Both the proposed document clustering scheme and document searching strategy are experimentally evaluated using a test collection from the SMART system. The preliminary experimental results obtained are very encouraging.
The Efficiency of Inverted Index and Cluster Searches BIBA 164-174
  Ellen M. Voorhees
The processing time and disk space requirements of an inverted index and top-down cluster search are compared. The cluster search is shown to use both more time and more disk space, mostly due to the large number of cluster centroids needed by the search. When shorter centroids are used, the efficiency of the cluster search improves, but the inverted index search remains more efficient.

Retrieval Strategies

On Extending the Vector Space Model for Boolean Query Processing BIBA 175-185
  S. K. M. Wong; W. Ziarko; V. V. Raghavan; P. C. N. Wong
An information retrieval model, named the Generalized Vector Space Model (GVSM), is extended to handle situations where queries are specified as (extended) Boolean expressions. It is shown that this unified model, unlike currently available alternatives, has the advantage of incorporating term correlations into the retrieval process. The query language extension is attractive in the sense that most of the algebraic properties of the strict Boolean language are still preserved. Although the experimental results for extended Boolean retrieval are not always better than the vector processing method, the developments here are significant in facilitating commercially available retrieval systems to benefit from the vector based methods. The proposed scheme is compared to the p-norm model advanced by Salton and coworkers. An important conclusion is that it is desirable to investigate further extensions that can offer the benefits of both proposals.
An Experimental Study of Factors Important in Document Ranking BIBA 186-193
  Donna Harman
The ability to effectively rank retrieved documents in order of their probable relevance to a query is a critical factor in statistically-based keyword retrieval systems. This paper summarizes a set of experiments with different methods of term weighting for documents, using measures of term importance within an entire document collection, term importance within a given document, and document length. It is shown that significant improvements over no term weighting can be made using a combination of weighting measures and normalizing for document length.

Knowledge Based Information Retrieval (I)

A New Theoretical Framework for Information Retrieval BIBA 194-200
  C. J. van Rijsbergen
A new framework based on a non-classical logic is proposed for investigating IR. The paper motivates the use of a particular conditional logic as the 'right' logic for IR. A new principle, the logical uncertainty principle, is proposed, to deal with the inherent uncertainty associated with applicable inferences.
User-Specified Domain Knowledge for Document Retrieval BIBA 201-206
  W. B. Croft
The introduction of domain knowledge into a document retrieval system has two important consequences; an increase in the effectiveness of retrieval and a decrease in the efficiency of text processing. In this paper, a method is presented of combining user-specified domain knowledge with efficient retrieval techniques based on probabilistic models. The domain knowledge is represented as a collection of frames that contain rules specifying recognition conditions for domain concepts and relationships between concepts. The inference network represented in these frames is used to infer the concepts that are related to a user's query. This approach is being implemented as part of the I{cubed}R expert intermediary system.

Knowledge Based Information Retrieval (II)

IOTA: A Full Text Information Retrieval System BIBA 207-213
  Y. Chiaramella; B. Defude; M. F. Bruandet; D. Kerkouba
IOTA is a prototype of an Information Retrieval System which can manage a corpus made of highly structured, full text documents. The first version presented here has intelligent capabilities related to heuristic pattern matching procedures for processing natural language queries, which involve an automatically built thesaurus. The paper emphasizes the overall principles of query processing and gives hints about the underlying techniques used while constructing the thesaurus and automatically indexing highly structured documents.
An Information Retrieval System Based on Artificial Intelligence Techniques BIBA 214-220
  Dario De Jaco; Gianluca Garbolino
This paper describes a possible use of Artificial Intelligence models and techniques in the design of a small Information Retrieval system. In particular, some knowledge representation models, such as semantic networks and frame-like structures, are viewed as interesting tools for the implementation of a thesaurus, and also for a description of the stored documents' contents. In addition, a parser based on the ATN (Augmented Transition Network) model which can analyze Italian sentences concerning a legal domain is described. We are including it in an user/system interface whose goal is to provide the user with the possibility of expressing search topics by using noun phrases or other linguistic expressions, rather than single words or Boolean combinations of them. Finally, some tasks requiring automated reasoning facilities are outlined.
   The kernel of the system, i.e. the component which both performs traditional information retrieval and allows the insertion of new documents, is described in an appendix. It was first applied to a bibliographic database containing about a thousand references (with abstracts) of both papers and books concerned with Artificial Intelligence; now we are working on its application to a legal domain, with a database of laws, decrees and sentences concerning pollution and environmental protection.
The Use of Inference Mechanisms to Improve the Retrieval Facilities from Large Relational Databases BIBA 221-227
  Gian Piero Zarri
This paper describes the development of "intelligent" tools aimed at improving the retrieval facilities from large relational databases. When a natural language query does not correspond directly to the data contained in the base, a class of inferential processes called "transformations" is applied. The original query is thus automatically converted into one or more "semantically close" ones. "Semantically close" means that the data possibly obtained with the new query will give useful information about the data originally searched for.

Learning Systems

A Machine Learning Approach to Information Retrieval BIB 228-233
  S. K. M. Wong; W. Ziarko
An Automatic and Tunable Document Indexing System BIBA 234-243
  Esen Ozkarahan; Fazli Can
In this article we present an interactive automatic document indexing software together with various index tuning/optimization strategies. After stems are generated from the raw text, the initial index vocabulary is narrowed down and tuned with the use of indexing versus clustering theory relationships. The narrowed down vocabulary is further optimized with the inclusion of term phrases and virtual terms corresponding to high and low frequency terms respectively. The results of performance experimentation which proved significant improvements of index vocabulary optimization are presented. The exploitation of the term discrimination value concept in index and retrieval system tuning and optimization is discussed.
Performance of Self-Taught Documents: Exploiting Co-Relevance Structure in a Document Collection BIBA 244-248
  Abraham Bookstein
In this paper we study the behavior of an information retrieval system in which index terms are assigned at random to both documents and requests. The random indexing is then modified by means of a feedback mechanism derived from a normal probability model and applied to both the request and document representations. Of interest is the convergence properties of the representation vectors. After few feedback iterations, it is found that well defined clusters form that accurately represent the co-relevance structure among the documents -- in effect the feedback mechanism has permitted the documents to index themselves. This approach offers an interesting way to extend the dimensionality of the indexing vocabulary. Both this application and a theoretical analysis of the impact of extending the indexing vocabulary are discussed.

Probabilistic Retrieval

Two Models of Retrieval with Probabilistic Indexing BIBA 249-257
  Norbert Fuhr
We describe two retrieval models for probabilistic indexing. The binary independence indexing (BII) model is a generalized version of the Maron & Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the probability of relevance of this document with respect to queries using this descriptor. The retrieval-with probabilistic-indexing (RPI) model is suited to different kinds of probabilistic indexing. Therefore we assume that each indexing model has its own concept of 'correctness' to which the probabilities relate. The concept of correctness is not necessarily identical with the concept of relevance, it is only required to depend on relevance. In addition to the probabilistic indexing weights, the RPI model provides the possibility of relevance weighting of search terms. Both retrieval models are compared in experiments, showing equally good results.
Probabilistic Models for Document Retrieval: A Comparison of Performance on Experimental and Synthetic Databases BIBA 258-264
  Robert Losee; Abraham Bookstein; Clement Yu
Probabilistic document retrieval systems consistent with the two Poisson independence model outperforms the binary independence model if the terms are distributed as described by the model's assumptions. The Two Poisson Effectiveness Hypothesis suggests that retrieval models based upon the two Poisson model will outperform binary independent models when used on a "real-world" database, where independence and two Poisson term occurrence distributions fail to hold, because the added information obtained from incorporating term frequency information will more than compensate for the non-Poisson distributions of terms. Searches of the MED1033 database suggest that if terms are not independent and frequencies of term occurrence are not distributed in a two Poisson manner, the binary independence sequential retrieval model outperforms the two Poisson independence retrieval model.
Non-Binary Independence Model BIB 265-268
  C. T. Yu; T. C. Lee
The Maximum Entropy Principle in Information Retrieval BIBA 269-274
  Paul B. Kantor; Jung Jin Lee
Applications, assumptions and properties of the maximum entropy principle are discussed. The maximum entropy principle integrates prior estimates of relevance with the observed distribution of term combinations. The result may be a reordering of the segments of a database, compared to a naive estimate. Numerical examples obtained by solution of the non-linear equations for the dual variables are presented and discussed.
An Interpretation of Index Term Weighting Schemes Based on Document Components BIBA 275-283
  K. L. Kwok
A theory of indexing is presented and is based on viewing a document as constituted of components. A component may be chosen as any run of text unit that can be: (a) judged as to its relevancy property; and (b) considered as independent within the document. By looking at the constituent components of a document in relation to the universe of all components from the collection, we have been able to apply Bayes' decision theory to derive the index term representation for the document, as well as attaching an initial probabilistic weight for each term based on a Principle of Document Self-Recovery. It turns out that different choices of document components, such as a word or a whole abstract, can lead to different term weighting schemes that have been introduced before and are based on probability considerations; specifically, Edmundson and Wyllys' term significance formula, Sparck Jones' inverse document frequency, and later modified by Croft and Harper into the 'combination match' formula. Thus, a unified interpretation of various probabilistic term weighting schemes appears possible.