HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 050607080910111213141516171819202122232425

ACM Transactions on Information Systems 15

Editors:W. Bruce Croft
Standard No:ISSN 1046-8188; HF S548.125 A33
Links:Table of Contents
  1. TOIS 1997 Volume 15 Issue 1
  2. TOIS 1997 Volume 15 Issue 2
  3. TOIS 1997 Volume 15 Issue 3
  4. TOIS 1997 Volume 15 Issue 4

TOIS 1997 Volume 15 Issue 1

Hyperform: A Hypermedia System Development Environment BIBAKPDF 1-31
  Uffe K. Wiil; John J. Leggett
Development of hypermedia systems is a complex matter. The current trend toward open, extensible, and distributed multiuser hypermedia systems adds additional complexity to the development process. As a means of reducing this complexity, there has been an increasing interest in hyperbase management systems that allow hypermedia system developers to abstract from the intricacies and complexity of the hyperbase layer and fully attend to application and user interface issues. Design, development, and deployment experiences of a dynamic, open, and distributed multiuser hypermedia system development environment called Hyperform is presented. Hyperform is based on the concepts of extensibility, tailorability, and rapid prototyping of hypermedia system services. Open, extensible hyperbase management systems permit hypermedia system developers to tailor hypermedia functionality for specific applications and to serve as a platform for research. The Hyperform development environment is comprised of multiple instances of four component types: (1) a hyperbase management system server, (2) a tool integrator, (3) editors, and (4) participating tools. Hyperform has been deployed in Unix environments, and experiments have shown that Hyperform greatly reduces the effort required to provide customized hyperbase management system support for distributed multiuser hypermedia systems.
Keywords: Design, Experimentation, Management, Advanced hypermedia system architecture, Extensible hyperbase management system, Object-oriented extension language, H.1.1 Models and principles, Systems and information theory, General systems theory, H.2.1 Database management, Logical design, Data models, H.2.4 Database management, Systems, Distributed databases, H.2.8 Database management, Database applications, H.3.4 Information storage and retrieval, Systems and software, H.3.m Information storage and retrieval, Miscellaneous, Hypermedia
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems BIBAKPDF 32-66
  Norbert Fuhr; Thomas Rolleke
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. In PRA, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always conform to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modeled. We introduce the concept of vague predicates which yield probabilistic weights instead of Boolean values, thus allowing for queries with vague selection conditions. With these features, PRA implements uncertainty and vagueness in combination with the relational model.
Keywords: Theory, Hypertext, Retrieval, Imprecise data, Logical retrieval model, Probabilistic retrieval, Relational data model, Uncertain data, Vague predicates, H.3.3 Information storage and retrieval, Information search and retrieval, Retrieval models, H.2.1 Database management, Logical design, Data models
Customizing Information Capture and Access BIBAKPDF 67-101
  Daniela Rus; Devika Subramanian
This article presents a customizable architecture for software agents that capture and access information in large, heterogeneous, distributed electronic repositories. The key idea is to exploit underlying structure at various levels of granularity to build high-level indices with task-specific interpretations. Information agents construct such indices and are configured as a network of reusable modules called structure detectors and segmenters. We illustrate our architecture with the design and implementation of smart information filters in two contexts: retrieving stock market data from Internet newsgroups and retrieving technical reports from Internet FTP sites.
Keywords: Algorithms, Documentation, Experimentation, Standardization, H.3.3 Information storage and retrieval, Information search and retrieval, Selection process, H.3.4 Information storage and retrieval, Systems and software, Current awareness systems, Information networks

TOIS 1997 Volume 15 Issue 2

Making a Digital Library: The Contents of the CORE Project BIBAKPDF 103-123
  Richard Entlich; Lorrin Garson; Michael Lesk; Lorraine Normore; Jan Olsen; Stuart Weibel
The CORE (Chemical Online Retrieval Experiment) project is a library of primary journal articles in chemistry. Any library has an inside and an outside; in this article we describe the inside of the library and the methods for building the system and accumulating the database. A later article will describe the outside (user experiences). Among electronic-library projects, the CORE project is unusual in that it has both ASCII derived from typesetting and image data for all its pages, and among experimental electronic-library projects, it is unusually large. We describe here (a) the processes of scanning and analyzing about 400,000 pages of primary journal material, (b) the conversion of a similar amount of textual database material, (c) the linking of these two data sources, and (d) the indexing of the text material.
Keywords: Algorithms, Design, Experimentation, Human factors, Image segmentation, H.3.2 Information storage and retrieval, Information storage, H.3.6 Information storage and retrieval, Library automation, Large text archives, H.5.2 Information interfaces and presentation, User interfaces, Interaction styles, I.4.5 Image processing, Reconstruction
A Text Compression Scheme that Allows Fast Searching Directly in the Compressed File BIBAKPDF 124-136
  Udi Manber
A new text compression scheme is presented in this article. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and precessing time than a search in the original file. For typical text files, we achieve about 30% reduction of space and slightly less of search time. A 30% space saving is not competitive with good text compression schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are searched often, such as catalogs, bibliographic files, and address books. Such files are typically not compressed, but with this scheme they can remain compressed indefinitely, saving space while allowing faster search at the same time. A particular application to an information retrieval system that we developed is also discussed.
Keywords: Algorithms, Performance, Data compression search, E.4 Data, Coding and information theory, Data compaction and compression, F.2.2 Analysis of algorithms and problem complexity, Nonnumerical algorithms and problems, Pattern matching, H.3.3 Information storage and retrieval, Information search and retrieval, Search process
The Effect of Accessing Nonmatching Documents on Relevance Feedback BIBAKPDF 137-153
  Mark D. Dunlop
Traditional information retrieval (IR) systems only allow users access to documents that match their current query, and therefore, users can only give relevance feedback on matching documents (or those with a matching strength greater than a set threshold. This article shows that, in systems that allow access to nonmatching documents (e.g., hybrid hypertext and information retrieval systems), the strength of the effect of giving relevance feedback varies between matching and nonmatching documents. For positive feedback the results shown here are encouraging, as they can be justified by an intuitive view of the process. However, for negative feedback the results show behavior that cannot easily be justified and that varies greatly depending on the model of feedback used.
Keywords: Experimentation, Theory, Free-text information retrieval, Hypertext, Negative feedback, Probabilistic model, Relevance feedback, Vector space model, H.3.3 Information storage and retrieval, Information search and retrieval, Search process
Access Control for Large Collections BIBAKPDF 154-194
  H. M. Gladney
Efforts to place vast information resources at the fingertips of each individual in large user populations must be balanced by commensurate attention to information protection. For distributed systems with less-structured tasks, more-diversified information, and a heterogeneous user set, the computing system must administer enterprise-chosen access control policies. One kind of resource is a digital library that emulates massive collections of paper and other physical media for clerical, engineering, and cultural applications. This article considers the security requirements for such libraries and proposes an access control method that mimics organizational practice by combining a subject tree with ad hoc role granting that controls privileges for many operations independently, that treats (all but one) privileged roles (e.g., auditor, security officer) like every other individual authorization, and that binds access control information to objects indirectly for scaling, flexibility, and reflexive protection. We sketch a realization and show that it will perform well, generalizes many deployed proposed access control policies, and permits individual data centers to implement other models economically and without disruption.
Keywords: Management, Security, Access control, Digital library, Document, Electronic library, Information security, C.2.4 Computer-communication networks, Distributed systems, Distributed applications, C.2.4 Computer-communication networks, Distributed systems, Distributed databases, D.2.0 Software engineering, General, Protection mechanisms, D.4.6 Operating systems, Security and protection, H.2.0 Database management, General, H.2.8 Database management, Database applications, H.3.5 Information storage and retrieval, Online information services, H.3.6 Information storage and retrieval, Library automation, Large text archives

TOIS 1997 Volume 15 Issue 3

Experiences with Selecting Search Engines using Metasearch BIBAKPDF 195-222
  Daniel Dreilinger; Adele E. Howe
Search engines are among the most useful and high-profile resources on the Internet. The problem of finding information on the Internet has been replaced with the problem of knowing where search engines are, what they are designed to retrieve, and how to use them. This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing the effect of time and experience on performance. We also compared the metaindex approach to the simpler categorical approach and showed how much experience is required to surpass the simple scheme.
Keywords: Algorithms, Experimentation, Information retrieval, Machine learning, Search engine, WWW, H.3.3 Information storage and retrieval, Information search and retrieval, H.3.4 Information storage and retrieval, Systems and software
Data Structures for Efficient Broker Implementation BIBAKPDF 223-253
  Anthony Tomasic; Luis Gravano; Calvin Lue; Peter Schwarz; Laura Haas
With the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases and must be compact while allowing efficient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databased and can examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of workloads), and we discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; in particular, we show that by varying the policies used to construct the grid file we can provide good performance for a wide range of workloads even when storing highly skewed data.
Keywords: Algorithms, Measurement, Performance, Broker architecture, Broker performance, Distributed information, GlOSS, Grid files, Partitioned hashing, H.2.2 Database management, Physical design, Access methods, H.3.1 Information storage and retrieval, Content analysis and indexing, Indexing methods, H.3.3 Information storage and retrieval, Information search and retrieval, Search process, H.3.4 Information storage and retrieval, Systems and software, Information networks
Modeling Word Occurrences for the Compression of Concordances BIBAKPDF 254-290
  A. Bookstein; S. T. Klein; T. Raita
An earlier paper developed a procedure for compressing concordances, assuming that all alements occurred independently. The models introduced in that paper are extended here to take the possibility of clustering into account. The concordance is conceptualized as a set of bitmaps, in which the bit locations represent documents, and the one-bits represent the occurrence of given terms. Hidden Markov Models (HMM's) are used to describe the clustering of the one-bits. However, for computational reasons, the HMM is approximated by traditional Markov models. A set of criteria is developed to constrain the allowable set of n-state models, and a full inventory is given for n <= 4. Graph-theoretic reduction and complementation operations are defined among the various models and are used to provide a structure relating the models studied. Finally, the new methods were tested on the concordances of the English Bible and of two of the world's largest full-text retrieval systems: the Tresor de la Langue Francaise and the Responsa Project.
Keywords: Algorithms, Performance, Classification of graph nodes, Concordance storage, Concordance organization, Graph structure, E.2 Data, Data storage representations, Composite structures, E.4 Data, Coding and information theory, Data compaction and compression, F.1.2 Computation by abstract devices, Modes of computation, G.2.2 Discrete mathematics, Graph theory, H.3.2 Information storage and retrieval, Information storage, File organization
Recursive Hashing Functions for n-Grams BIBAKPDF 291-320
  Jonathan D. Cohen
Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of implementing such operations is through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a recursive hash function, in which the hash value of the current n-gram is derived from the hash value of its predecessor. This article generalizes recursive hash functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial speed improvement over conventional approaches, while retaining near-ideal hash value distribution.
Keywords: Algorithms, Experimentation, Theory, Hashing, Hashing functions, n-grams, Recursive hashing, E.2 Data, Data storage representations, Hash-table representations, G.2.1 Discrete mathematics, Combinatorics, Recurrences and difference equations, G.3 Mathematics of computing, Probability and statistics, Probabilistic algorithms H.3.1 Information storage and retrieval, Content analysis and indexing, Indexing methods, H.3.2 Information storage and retrieval, Information storage, H.3.3 Information storage and retrieval, Information search and retrieval

TOIS 1997 Volume 15 Issue 4

On Automated Message Processing in Electronic Commerce and Work Support Systems: Speech Act Theory and Expressive Felicity BIBAKPDF 321-367
  Steven O. Kimbrough; Scott A. Moore
Electronic messaging, whether in an office environment or for electronic commerce, is normally carried out in natural language, even when supported by information systems. For a variety of reasons, it would be useful if electronic messaging systems could have semantic access to, that is, access to the meanings and contents of, the messages they process. Given that natural language understanding is not a practicable alternative, there remain three approaches to delivering systems with semantic access: electronic data interchange (EDI), tagged messages, and the development of a formal language for business communication (FLBC). We favor the latter approach. In this article we compare and contrast these three approaches, present a theoretical basis for an FLBC (using speech act theory), and describe a prototype implementation.
Keywords: Languages, Theory, Electronic commerce, Formal language for business communication, Speech act theory, I.2.4 Artificial intelligence, Knowledge representation formalisms and methods, Representation languages,
A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation BIBAKPDF 368-399
  J. Mostafa; S. Mukhopadhyay; W. Lam; M. Palakal
In information-filtering environments, uncertainties associated with changing interests of the user and the dynamic document stream must be handled efficiently. In this article, a filtering model is proposed that decomposes the overall task into subsystem functionalities and highlights the need for multiple adaptation techniques to cope with uncertainties. A filtering system, SIFTER, has been implemented based on the model, using established techniques in information retrieval and artificial intelligence. These techniques include document representation by a vector-space model, document classification by unsupervised learning, and user modeling by reinforcement learning. The system can filter information based on content and a user's specific interests. The user's interests are automatically learned with only limited user intervention in the form of optional relevance feedback for documents. We also describe experimental studies conducted with SIFTER to filter computer and information science documents collected from the Internet and commercial database services. The experimental results demonstrate that the system performs very well in filtering documents in a realistic problem setting.
Keywords: Algorithms, Experimentation, Theory, Automated document representation, Information filtering, User modeling, H.3.3 Information storage and retrieval, Information search and retrieval, Clustering, H.3.3 Information storage and retrieval, Information search and retrieval, Selection process, I.2.6 Artificial intelligence, Learning, I.7.3 Text processing, Index generation
Proximal Nodes: A Model to Query Document Databases by Content and Structure BIBAKPDF 400-435
  Gonzalo Navarro; Ricardo Baeza-Yates
A model to query document databases by both their content and structure is presented. The goal is to obtain a query language that is expressive in practice while being efficiently implementable, features not present at the same time in previous work. The key ideas of the model are a set-oriented query language based on operations on nearby structure elements of one or more hierarchies, together with content and structural indexing and bottom-up evaluation. The model is evaluated in regard to expressiveness and efficiency, showing that it provides a good trade-off between both goals. Finally, it is shown how to include in the model other media different from text.
Keywords: Algorithms, Design, Human factors, Languages, Performance, Expressivity and efficiency of query languages, Hierarchical documents, Structured text, Text algebras, H.1.2 Information systems, User/machine systems, Human information processing, H.2.1 Database management, Logical design, Data models, H.2.2 Database management, Physical design, Access methods, H.2.3 Database management, Languages, Query languages, H.2.4 Database management, Systems, Query processing, H.3 Information storage and retrieval, I.7.2 Text processing, Document preparation, Format and notation, I.7.2 Text processing, Document preparation, Languages and systems, I.7.2 Text processing, Document preparation, Standards I.7.3 Text processing, Document preparation, Index generation,