HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 091011121314151617181920212223242526272829

ACM Transactions on Information Systems 19

Editors:W. Bruce Croft
Standard No:ISSN 1046-8188; HF S548.125 A33
Links:Table of Contents
  1. TOIS 2001 Volume 19 Issue 1
  2. TOIS 2001 Volume 19 Issue 2
  3. TOIS 2001 Volume 19 Issue 3
  4. TOIS 2001 Volume 19 Issue 4

TOIS 2001 Volume 19 Issue 1

An information-theoretic approach to automatic query expansion BIBAKFull-Text 1-27
  Claudio Carpineto; Renato de Mori; Giovanni Romano; Brigitte Bigi
Techniques for automatic query expansion from top retrieved documents have shown promise for improving retrieval effectiveness on large collections; however, they often rely on an empirical ground, and there is a shortage of cross-system comparisons. Using ideas from Information Theory, we present a computationally simple and theoretically justified method for assigning scores to candidate expansion terms. Such scores are used to select and weight expansion terms within Rocchio's framework for query reweigthing. We compare ranking with information-theoretic query expansion versus ranking with other query expansion techniques, showing that the former achieves better retrieval effectiveness on several performance measures. We also discuss the effect on retrieval effectiveness of the main parameters involved in automatic query expansion, such as data sparseness, query difficulty, number of selected documents, and number of selected terms, pointing out interesting relationships.
Keywords: automatic query expansion, information retrieval, information theory, pseudorelevance feedback
A statechart-based model for hypermedia applications BIBAKFull-Text 28-52
  Maria Cristina Ferreira de Oliveira; Marcelo Augusto Santos Turine; Paulo Cesar Masiero
This paper presents a formal definition for HMBS (Hypermedia Model Based on Statecharts). HMBS uses the structure and execution semantics of statecharts to specify both the structural organization and the browsing semantics of hypermedia applications. Statecharts are an extension of finite-state machines and the model is thus a generalization of hypergraph-based hypertext models. Some of the most important features of HMBS are its ability to model hierarchy and synchronization of information; provision of mechanisms for specifying access structures, navigational contexts, access control, multiple tailored versions,and hierarchical views. Analysis of the underlying statechart machine allows verification of page reachability, valid paths, and other properties, thus providing mechanisms to support authors in the development of structured applications.
Keywords: HMBS, browsing semantics, hypermedia specification, navigational model, statecharts
Approximate spatio-temporal retrieval BIBAFull-Text 53-96
  Dimitris Papadias; Nikos Mamoulis; Vasilis Delis
This paper proposes a framework for the handling of spatio-temporal queries with inexact matches, using the concept of relation similarity. We initially describe a binary string encoding for 1D relations that permits the automatic derivation of similarity measures. We then extend this model to various granularity levels and many dimensions, and show that reasoning on spatio-temporal structure is significantly facilitated in the new framework. Finally, we provide algorithms and optimization methods for four types of queries: (i) object retrieval based on some spatio-temporal relations with respect to a reference object, (ii) spatial joins, i.e., retrieval of object pairs that satisfy some input relation, (iii) structural queries, which retrieve configurations matching a particular spatio-temporal structure, and (iv) special cases of motion queries. Considering the current large availability of multidimensional data and the increasing need for flexible query-answering mechanisms, our techniques can be used as the core of spatio-temporal query processors.

TOIS 2001 Volume 19 Issue 2

Query-based sampling of text databases BIBAKFull-Text 97-130
  Jamie Callan; Margaret Connell
The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technique for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic database selection.
Keywords: distributed information retrieval, query-based sampling, resource ranking, resource selection, server selection
SALSA: the stochastic approach for link-structure analysis BIBAKFull-Text 131-160
  R. Lempel; S. Moran
Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he devised an algorithm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same meta-algorithm. We then prove that SALSA is equivalent to a weighted in degree analysis of the link-structure of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisons reveal a topological Phenomenon called the TKC effect which, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
Keywords: Link-structure analysis, SALSA, TKC effect, hubs and authorities, random walks
Complete answer aggregates for treelike databases: a novel approach to combine querying and navigation BIBAKFull-Text 161-215
  Holger Meuss; Klaus U. Schulz
The use of markup languages like SGML, HTML or XML for encoding the structure of documents or linguistic data has lead to many databases where entries are adequately described as trees. In this context querying formalisms are interesting that offer the possibility to refer both to textual content and logical structure. We consider models where the structure specified in a query is not only used as a filter, but also for selecting and presenting different parts of the data. If answers are formalized as mapping from query nodes to the database, a simple enumeration of all mappings in the answer set will often suffer from the effect that many answers have common subparts. From a theoretical point of view this may lead to an exponential time complexity of the computation and presentation of all answers. Concentration on the language of so called tree queries -- a variant and extension of Kilpelainen's Tree Matching formalism -- we introduce the notion of a "complete answer aggregate" for a given query. This new data structure offers a compact view of the set of all answer and supports active exploration of the answer space. Since complete answer aggregates use a powerful structure-sharing mechanism their maximal size is of order &sgr;(d*h*q) where d and q respectively denote the size of the database and the query, and h is the maximal depth of a path of the database. An algorithm is given that computes a complete answer aggregate for a given tree query in time &sgr;(d*log(d)*h*). For the sublanguage of so-called rigid tree queries, as well as for so-called "nonrecursive" databases, an improved bound of :&sgr;(d*log(d)*q) is obtained. The algorithm is based on a specific index structure that supports practical efficiency.
Keywords: SGML, XML, answer presentation, information retrieval, logic, query languages, semistructured data, structured documents, tree databases, tree matching

TOIS 2001 Volume 19 Issue 3

Building a distributed full-text index for the web BIBAKFull-Text 217-241
  Sergey Melnik; Sriram Raghavan; Beverly Yang; Hector Garcia-Molina
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.
Keywords: Distributed indexing, Embedded databases, Inverted files, Pipelining, Text retrieval
Scaling question answering to the web BIBAKFull-Text 242-262
  Cody Kwok; Oren Etzioni; Daniel S. Weld
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as "who was the first American in space?" or "what is the second tallest mountain in the world?" Yet today's most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance. First we introduce Mulder, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe Mulder's architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare Mulder's performance to that of Google and AskJeeves on questions drawn from the TREC-8 question answering track. We find that Mulder's recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as Mulder.
Keywords: answer extraction, answer selection, natural language processing, query formulation, search engines
WebQuilt: A proxy-based approach to remote web usability testing BIBAKFull-Text 263-285
  Jason I. Hong; Jeffrey Heer; Sarah Waterson; James A. Landay
WebQuilt is a web logging and visualization system that helps web design teams run usability tests (both local and remote) and analyze the collected data. Logging is done through a proxy, overcoming many of the problems with server-side and client-side logging. Captured usage traces can be aggregated and visualized in a zooming interface that shows the web pages people viewed. The visualization also shows the most common paths taken through the web site for a given task, as well as the optimal path for that task, as designated by the designer. This paper discusses the architecture of WebQuilt and describes how it can be extended for new kinds of analyses and visualizations.
Keywords: Usability evaluation, WebQuilt, log file analysis, web proxy, web visualization
On the design of a learning crawler for topical resource discovery BIBAKFull-Text 286-309
  Charu C. Aggarwal; Fatima Al-Garawi; Philip S. Yu
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of information are available on practically every possible topic. In such cases, it is valuable to perform topical resource discovery effectively. Consequently, several new ideas have been proposed in recent years; among them a key technique is focused crawling which is able to crawl particular topical portions of the World Wide Web quickly, without having to explore all web pages. In this paper, we propose the novel concept of intelligent crawling which actually learns characteristics of the linkage structure of the World Wide Web while performing the crawling. Specifically, the intelligent crawler uses the inlinking web page content, candidate URL structure, or other behaviors of the inlinking web pages or siblings in order to estimate the probability that a candidate is useful for a given crawl. This is a much more general framework than the focused crawling technique which is based on a pre-defined understanding of the topical structure of the web. The techniques discussed in this paper are applicable for crawling web pages which satisfy arbitrary user-defined predicates such as topical queries, keyword queries, or any combinations of the above. Unlike focused crawling, it is not necessary to provide representative topical examples, since the crawler can learn its way into the appropriate topic. We refer to this technique as intelligent crawling because of its adaptive nature in adjusting to the web page linkage structure. We discuss how to intelligently select features which are most useful for a given crawl. The learning crawler is capable of reusing the knowledge gained in a given crawl in order to provide more efficient crawling for closely related predicates.
Keywords: Crawling, World Wide Web
A highly scalable and effective method for metasearch BIBAKFull-Text 310-335
  Weiyi Meng; Zonghuan Wu; Clement Yu; Zhuogang Li
A metasearch engine is a system that supports unified access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful local search engines to invoke for each user query. In order to enable accurate selection, metadata that reflect the contents of each search engine need to be collected and used. This article proposes a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computational efficiency and storage efficiency. Second, the new selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective. An operational prototype system has been built based on the proposed approach.
Keywords: Database selection, distributed text retrieval, metasearch engine, resource discovery

TOIS 2001 Volume 19 Issue 4

Application of aboutness to functional benchmarking in information retrieval BIBAKFull-Text 337-370
  Kam-Fai Wong; Dawei Song; Peter Bruza; Chun-Hung Cheng
Experimental approaches are widely employed to benchmark the performance of an information retrieval (IR) system. Measurements in terms of recall and precision are computed as performance indicators. Although they are good at assessing the retrieval effectiveness of an IR system, they fail to explore deeper aspects such as its underlying functionality and explain why the system shows such performance. Recently, inductive (i.e., theoretical) evaluation of IR systems has been proposed to circumvent the controversies of the experimental methods. Several studies have adopted the inductive approach, but they mostly focus on theoretical modeling of IR properties by using some metalogic. In this article, we propose to use inductive evaluation for functional benchmarking of IR models as a complement of the traditional experiment-based performance benchmarking. We define a functional benchmark suite in two stages: the evaluation criteria based on the notion of "aboutness," and the formal evaluation methodology using the criteria. The proposed benchmark has been successfully applied to evaluate various well-known classical and logic-based IR models. The functional benchmarking results allow us to compare and analyze the functionality of the different IR models.
Keywords: Aboutness, functional benchmarking, inductive evaluation, logic-based information retrieval
Computing graphical queries over XML data BIBAKFull-Text 371-430
  Sara Comai; Ernesto Damiani; Piero Fraternali
The rapid evolution of XML from a mere data exchange format to a universal syntax for encoding domain-specific information raises the need for new query languages specifically conceived to address the characteristics of XML. Such languages should be able not only to extract information from XML documents, but also to apply powerful transformation and restructuring operators, based on a well-defined semantics. Moreover, XML queries should be natural to write and understand, as nontechnical persons also are expected to access the large XML information bases supporting their businesses. This article describes XML-GL, a graphical query language for XML data. XML-GL's uniqueness is in the definition of a graph-based syntax to express a wide variety of XML queries, ranging from simple selections to expressive data transformations involving grouping, aggregation, and arithmetic calculations. XML-GL has an operational semantics based on the notion of graph matching, which serves as a guideline both for the implementation of native processors, and for the adoption of XML-GL as a front-end to any of the XML query languages that are presently under discussion as the standard paradigm for querying XML data.
Keywords: Document restructuring, graphical query languages, semantics
Genre taxonomy: A knowledge repository of communicative actions BIBAKFull-Text 431-456
  Takeshi Yoshioka; George Herman; JoAnne Yates; Wanda Orlikowski
We propose a genre taxonomy as a knowledge repository of communicative structures or "typified actions" enacted by organizational members. The genre taxonomy is intended to help people make sense of diverse types of communicative actions and provide ideas for improving work processes that coordinate the communication of information. It engages several features to achieve this objective. First, the genre taxonomy represents the elements of both genres and genre systems as embedded in a social context reflecting the communicative questions why, what, who, when, where, and how (5W1H). In other words, the genre taxonomy represents the purpose, content, participants, timing, location, and form of communicative action. Second, the genre taxonomy distinguishes between widely recognized genres such as a report and specific genres such as a particular company's technical report, because the difference sheds light on the context of genre use. Third, the genre taxonomy represents use and evolution of a genre over time to help people understand how a genre is used and changed by a community over time. Fourth, the genre taxonomy represents aspects of information coordination via genres, thus providing ideas for improving work processes using genres. We have constructed a prototype of such a genre taxonomy using the Process Handbook, a process knowledge repository developed at MIT. We have included both widely recognized genres such as the memo and specific genres such as those used in the Process Handbook itself. We suggest that this genre taxonomy may be useful in the innovation of new document templates or methods for communication because it helps to clarify different possible uses of similar genres and explicates how genres play a coordination role among people and between people and their tasks.
Keywords: Genre, genre systems, grammar, information search and retrieval, taxonomy