HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 070809101112131415161718192021222324252627

ACM Transactions on Information Systems 17

Editors:W. Bruce Croft
Standard No:ISSN 1046-8188; HF S548.125 A33
Links:Table of Contents
  1. TOIS 1999 Volume 17 Issue 1
  2. TOIS 1999 Volume 17 Issue 2
  3. TOIS 1999 Volume 17 Issue 3
  4. TOIS 1999 Volume 17 Issue 4

TOIS 1999 Volume 17 Issue 1

Predicate rewriting for translating Boolean queries in a heterogeneous information system BIBAKFull-Text 1-39
  Chen-Chuan K. Chang; Hector Garcia-Molina; Andreas Paepcke
Searching over heterogeneous information sources is difficult in part because of the nonuniform query languages. Our approach is to allow users to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final results. In this article we introduce the architecture and associated mechanism for query translation. In particular, we discuss techniques for rewriting predicates in Boolean queries into native subsuming forms, which is a basis of translating complex queries. In addition, we present experimental results for evaluating the cost of postfiltering. We also discuss the drawbacks of this approach and cases when it may not be effective. We have implemented prototype versions of these mechanisms and demonstrated them on heterogeneous Boolean systems.
Keywords: Boolean queries, content-based retrieval, filtering, predicate rewriting, query subsumption, query translation
Methods for information server selection BIBAKFull-Text 40-76
  David Hawking; Paul Thistlewaite
The problem of using a broker to select a subset of available information servers in order to achieve a good trade-off between document retrieval effectiveness and cost is addressed. Server selection methods which are capable of operating in the absence of global information, and where servers have no knowledge of brokers, are investigated. A novel method using Lightweight Probe queries (LWP method) is compared with several methods based on data from past query processing, while Random and Optimal server rankings serve as controls. Methods are evaluated, using TREC data and relevance judgments, by computing ratios, both empirical and ideal, of recall and early precision for the subset versus the complete set of available servers. Estimates are also made of the best-possible performance of each of the methods. LWP and Topic Similarity methods achieved best results, each being capable of retrieving about 60% of the relevant documents for only one-third of the cost of querying all servers. Subject to the applicable cost model, the LWP method is likely to be preferred because it is suited to dynamic environments. The good results obtained with a simple automatic LWP implementation were replicated using different data and a larger set of query topics.
Keywords: Lightweight Probe queries, information servers, network servers, server ranking, server selection, text retrieval
The equalizing impact of a group support system on status differentials BIBAKFull-Text 77-100
  Bernard C. Y. Tan; Kwok-kee Wei; Richard T. Watson
This study investigates the impact of the electronic communication capability of a group support system (GSS) on status differentials in small groups. A laboratory experiment was used to answer the research questions. Three support levels were studied: manual, face-to-face GSS, and dispersed GSS. Two task types were examined: intellective and preference. Five dependent variables reflecting different aspects of status differentials were measured: status influence, sustained influence, residual disagreement, perceived influence, and decision confidence. The results show that manual groups had higher status influence, sustained influence, and decision confidence, but lower residual disagreement than face-to-face GSS and dispersed GSS groups. Preference task groups also produced higher status influence and sustained influence, but lower residual disagreement compared to intellective task groups. In addition, manual groups working on the preference task reported higher perceived influence than face-to-face GSS and dispersed GSS groups working on the same task. These findings suggest that when groups are engaged in activities for which status differentials are undesirable, a GSS can be used in both face-to-face and dispersed settings to dampen status differentials. Moreover, when a task amplifies status differentials, the use of a GSS tends to produce corresponding stronger dampening effects.
Keywords: electronic communication, group support systems, status differentials, task type

TOIS 1999 Volume 17 Issue 2

A flexible authorization mechanism for relational data management systems BIBAKFull-Text 101-140
  Elisa Bertino; Sushil Jajodia; Pierangela Samarati
In this article, we present an authorization model that can be used to express a number of discretionary access control policies for relational data management systems. The model permits both positive and negative authorizations and supports exceptions at the same time. The model is flexible in that the users can specify, for each authorization they grant, whether the authorization can allow for exceptions or whether it must be strongly obeyed. It provides authorization management for groups with exceptions at any level of the group hierarchy, and temporary suspension of authorizations. The model supports ownership together with decentralized administration of authorizations. Administrative privileges can also be restricted so that owners retain control over their tables.
Keywords: access control mechanism, access control policy, authorization, data management system, group management support, relational database
Context-sensitive learning methods for text categorization BIBAKFull-Text 141-173
  William W. Cohen; Yoram Singer
Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the "context" of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.
Keywords: context-sensitive models, mistake-driven algorithms, on-line learning, rule learning, text categorization
A robust framework for content-based retrieval by spatial similarity in image databases BIBAKFull-Text 174-198
  Essam A. El-Kwae; Mansur R. Kabuka
A framework for retrieving images by spatial similarity (FRISS) in ima ge databases is presented. In this framework, a robust retrieval by spatial similarity (RSS) algorithm is defined as one that incorporates both directional and topological spatial constraints, retrieves similar images, and recognized images even after they undergo translation, scaling, rotation (both perfect and multiple), or any arbitrary combination of transformations. The FRISS framework is discussed and used as a base for comparing various existing RSS algorithms. Analysis shows that none of them satisfies all the FRISS specifications. An algorithm, SIMdtc, is then presented. SIMdtc introduces the concept of a rotation correction angle (RCA) to align objects in one image spatially closer to matching objects in another image for more accurate similarity assessment. Similarity between two images is a function of the number of common objects between them and the closeness of directional and topological spatial relationships between object pairs in both images. The SIMdtc retrieval is invariant under translation, scaling, and perfect rotation, and the algorithm is able to rank multiple rotation variants. The algorithm was tested using synthetic images and the TESSA image database. Analysis shows the robustness of the SIMdtc algorithm over current algorithms.
Keywords: content-based retrieval, image databases, multimedia databases, query formulation, retrieval models, similarity retrieval, spatial similarity
Incremental formalization with the hyper-object substrate BIBAFull-Text 199-227
  Frank M., III Shipman; Raymond J. McCall
Computers require formally represented information to perform computations that support users; yet users who have needed such support have often proved to be unable or unwilling to formalize it. To address this problem, this article introduces an approach called incremental formalization, in which, first, users express information informally and then the system aids them in formalizing it. Incremental formalization requires a system architecture the (1) integrates formal and informal representations and (2) supports progressive formalization of information. The system should have both tools to capture naturally available informal information and techniques to suggest possible formalizations of this information. The hyper-object substrate (HOS) was developed to satisfy these requirements. HOS has been applied to a number of problem domains, including network design, archeological site analysis, and neuroscience education. Users have been successful in adding informal information and then later formalizing it incrementally with the aid of the system. Our experience with HOS has reaffirmed the need for information spaces to evolve during use and has identified additional considerations in the design and instantiation of systems enabling and supporting incremental formalization.

TOIS 1999 Volume 17 Issue 3

A decision-theoretic approach to database selection in networked IR BIBAKFull-Text 229-249
  Norbert Fuhr
In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarly to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.
Keywords: networked retrieval, probabilistic retrieval, probability ranking principle, resource discovery
A corpus analysis approach for automatic query expansion and its extension to multiple databases BIBAKFull-Text 250-269
  Susan Gauch; Jianying Wang; Satya Mahesh Rachakonda
Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved, while many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performances, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. We are able to achieve a 7.6% improvement for TREC 5 queries and up to a 28.5% improvement on the narrow-domain Cystic Fibrosis collection. This work has been extended to multidatabase collections where each subdatabase has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. Various techniques to select the appropriate matrix for a particular query are analyzed, and a 4.8% improvement in the results is validated.
Keywords: query expansion
Context interchange: new features and formalisms for the intelligent integration of information BIBAKFull-Text 270-293
  Cheng Hian Goh; Stefane Bressan; Stuart Madnick; Michael Siegel
The Context Interchange strategy presents a novel perspective for mediated data access in which semantic conflicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a context mediator through comparison of contexts axioms corresponding to the systems engaged in data exchange. In this article, we show that queries formulated on shared views, export schema, and shared "ontologies" can be mediated in the same way using the Context Interchange framework. The proposed framework provides a logic-based object-oriented formalsim for representing and reasoning about data semantics in disparate systems, and has been validated in a prototype implementation providing mediated data access to both traditional and web-based information sources.
Keywords: abductive reasoning, information integration, mediators, semantic heterogeneity, semantic interoperability
Harp: a distributed query system for legacy public libraries and structured databases BIBAKFull-Text 294-319
  Ee-Peng Lim; Ying Lu
The main purpose of a digital library is to facilitate users easy access to enormous amount of globally networked information. Typically, this information includes preexisting public library catalog data, digitized document collections, and other databases. In this article, we describe the distributed query system of a digital library prototype system known as HARP. In the HARP project, we have designed and implemented a distributed query processor and its query front-end to support integrated queries to preexisting public library catalogs and structured databases. This article describes our experiences in the design of an extended Sequel (SQL) query language known as HarpSQL. It also presents the design and implementation of the distributed query system. Our experience in distributed query processor and user interface design and development will be highlighted. We believe that our prototyping effort will provide useful lessons to the development of a complete digital library infrastructure.
Keywords: Internet databases, digital libraries, interoperable databases
Interface and data architecture for query preview in networked information systems BIBAKFull-Text 320-341
  Catherine Plaisant; Ben Shneiderman; Khoa Doan; Tom Bruns
There are numerous problems associated with formulating queries on networked information systems. These include increased data volume and complexity, accompanied by slow network access. This article proposes a new approach to a network query user interfaces that consists of two phases: query preview and query refinement. This new approach is based on the concepts of dynamic queries and query previews, which guides users in rapidly and dynamically eliminating undesired records, reducing the data volume to a manageable size, and refining queries locally before submission over a network. Examples of two applications are given: a Restaurant Finder and a prototype for NASA's Earth Observing Systems Data Information Systems (EOSDIS). Data architecture is discussed, and user feedback is presented.
Keywords: EOSDIS, direct manipulation, dynamic query, graphical user interface, query preview, query refinement, science data

TOIS 1999 Volume 17 Issue 4

Integrating geometrical and linguistic analysis for email signature block parsing BIBAKFull-Text 343-366
  Hao Chen; Jianying Hu; Richard W. Sproat
The signature block is a common structured component found in email messages. Accurate identification and analysis of signature blocks is important in many multimedia messaging and information retrieval applications such as email text-to-speech rendering, automatic construction of personal address databases, and interactive message retrieval. It is also a very challenging task, because signature blocks often appear in complex two-dimensional layouts which are guided only by loose conventions. Traditional text analysis methods designed to deal with sequential text cannot handle two-dimensional structures, while the highly unconstrained nature of signature blocks makes the application of two-dimensional grammars very difficult. In this article, we describe an algorithm for signature block analysis which combines two-dimensional structural segmentation with one-dimensional grammatical constraints. The information obtained from both layout and linguistic analysis is integrated in the form of weighted finite-state transducers. The algorithm is currently implemented as a component in a preprocessing system for email text-to-speech rendering.
Keywords: email signature block, finite-state transducer, geometrical analysis, linguistic analysis, text-to-speech rendering
PIC matrices: a computationally tractable class of probabilistic query operators BIBAKFull-Text 367-405
  Warren R. Greiff; W. Bruce Croft; Howard Turtle
The inference network model of information retrieval allows a probabilistic interpretation of query operators. In particular, Boolean query operators are conveniently modeled as link matrices of the Bayesian Network. Prior work has shown, however, that these operators do not perform as well as the pnorm operators used for modeling query operators in the context of the vector space model. This motivates the search for alternative probabilistic formulations for these operators. The design of such alternatives must contend with the issue of computational tractability, since the evaluation of an arbitrary operator requires exponential time. We define a flexible class of link matrices that are natural candidates for the implementation of query operators and an O(n2) algorithm (n = the number of parent nodes) for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators in the context of the INQUERY inference network.
Keywords: Bayesian networks, Boolean queries, computational complexity, inference networks, link matrices, piecewise linear functions, pnorm, probabilistic information retrieval, query operators
Efficient passage ranking for document databases BIBAKFull-Text 406-439
  Marcin Kaszkiel; Justin Zobel; Ron Sacks-Davis
Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this article we explore alternative query evaluation techniques, and develop new techniques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new "DO-TOS" passage-ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.
Keywords: inverted files, passage retrieval, query evaluation, text databases, text retrieval
The impact on retrieval effectiveness of skewed frequency distributions BIBAKFull-Text 440-465
  Mark Sanderson; C. J. Van Rijsbergen
We present an analysis of word senses that provides a fresh insight into the impact of word ambiguity on retrieval effectiveness with potential broader implications for other processes of information retrieval. Using a methodology of forming artifically ambiguous words, known as pseudowords, and through reference to other researchers' work, the analysis illustrates that the distribution of the frequency of occurrance of the senses of a word plays a strong role in ambiguity's impact of effectiveness. Further investigation shows that this analysis may also be applicable to other processes of retrieval, such as Cross Language Information Retrieval, query expansion, retrieval of OCR'ed texts, and stemming. The analysis appears to provide a means of explaining, at least in part, reasons for the processes' impact (or lack of it) on effectiveness.
Keywords: pseudowords, word sense ambiguity, word sense disambiguation