HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

DL'98: Proceedings of the 3rd ACM International Conference on Digital Libraries

Fullname:3rd ACM International Conference on Digital Libraries
Editors:Ian Witten; Rob Akscyn; Frank M. Shipman, III
Location:Pittsburgh, PA
Dates:1998-Jun-23 to 1998-Jun-26
Publisher:ACM
Standard No:ACM ISBN 0-89791-965-3; ACM Order Number 606981; ACM DL: Table of Contents hcibib: DL98
Papers:49
Pages:266
Information Space Representation in Interactive Systems: Relationship to Spatial Abilities BIBAKPDF 1-10
  Bryce Allen
Digital libraries, lacking a natural spatial organization, may adopt a variety of mechanisms for visualizing information in two or more dimensions. Understanding the cognitive basis for the use of spatial features in information retrieval including spatial abilities, is important to the development of interactive information retrieval. This research investigated the interaction of spatial abilities with two-dimensional data representations in an experimental interactive system. The results showed that users with lower levels of spatial abilities were assisted in finding and interpreting digital information when spatial representations of information were employed. These results have implications for the design of digital libraries.
Keywords: Data visualization, Cognitive abilities, Spatial representations
Comparing Feature-Based and Clique-Based User Models for Movie Selection BIBAKPDF 11-18
  Joshua Alspector; Aleksander Kolcz; Nachimuthu Karunanithi
The huge amount of information available in the currently evolving world wide information infrastructure at any one time can easily overwhelm end-users. One way to address the information explosion is to use an "information filtering agent" which can select information according to the interest and/or need of an end-user. However, at present few such information filtering agents exist. In this study, we evaluate the use of feature-based approaches to user modeling with the purpose of creating a filtering agent for the video-on-demand application. We evaluate several feature and clique-based models for 10 voluntary subjects who provided ratings for the movies. Our preliminary results suggest that feature-based selection can be a useful tool to recommend movies according to the taste of the user and can be as effective as a movie rating expert. We compare our feature-based approach with a clique-based approach, which has advantages where information from other users is available.
Keywords: User modeling, Information filtering, Collaborative filtering, Feature extraction, Neural networks, Linear models, Regression trees, Bagging, CART
An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers BIBAKPDF 19-28
  Michelle Baldonado; Seth Katz; Andreas Paepcke; Chen-Chuan K. Chang; Hector Garcia-Molina; Terry Winograd
We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer produces a graphical user interface in which a digital-library patron can specify a high-level, fielded, multi-source query. Furthermore, a query synthesizer interacts with a query translator and an attribute translator to transform high-level queries into sets of source-specific queries. In this paper, we discuss how our tool for constructing synthesizers can facilitate the discovery of available attributes (e.g., 'title'), the collation of schemas from different sources, the selection of input widgets for a synthesizer (e.g., a drop-down list widget to support input of controlled vocabulary), and other design aspects. We also describe the user interface of our prototype constructor, which is implemented based on the Stanford InfoBus and metadata architecture.
Keywords: Constructor tool, Query synthesizer, Regional schema, Query generation, Query translation, Attribute translation, Metadata architecture, Schema
Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Components BIBAKPDF 29-39
  Ann Peterson Bishop
A scientific journal article is comprised of standard components, such as author names, an abstract, figures, a bibliography, and sections describing methods and results. With the creation of digital documents and new tools for manipulating them comes the ability to facilitate the disaggregation of journal articles into separate components. This paper describes how article components are identified, mobilized, and used by students and faculty members, based on the preliminary analysis of data collected through focus groups, workplace interviews, transaction logging, and usability testing associated with the University of Illinois Digital Libraries Initiative project. The paper presents a schema of component use purposes, discusses the intellectual and physical processes of component use, identifies several issues and implications for digital library design, and highlights the need for multiple methods in studying document disaggregation.
Keywords: User Studies, Documents, Information seeking and use
Technologies for Repository Interoperation and Access Control BIBAKPDF 40-48
  Shirley Browne; Jack Dongarra; Jeff Horner; Paul McMahan; Scott Wells
Over the past several years, network-accessible repositories have been developed by various academic, government, and industrial organizations to provide access to software and related resources. Allowing distributed maintenance of these repositories while enabling users to access resources from multiple repositories via a single interface has brought about the need for interoperation. Concerns about intellectual property rights and export regulations have brought about the need for access control. This paper describes technologies for interoperation and access control that have been developed as part of the National High-performance Software Exchange (NHSE) project, as well as their deployment in a freely available repository maintainer's toolkit called Repository in a Box. The approach to interoperation has been to participate in the development of and to implement an IEEE standard data model for software catalog records. The approach to access control has been to extend the data model in the area of intellectual property rights and to implement access control mechanisms of varying strengths, ranging from email address verification to X.509 certificates, that enforce software distribution policies specified via the data model. Although they have been developed within the context of software repositories, these technologies should be applicable to distributed digital libraries in general.
Keywords: Interoperation, Intellectual property rights, Access control, Authentication, Software licensing, Data modeling, Software reuse, Standards
Conjunctive Constraint Mapping for Data Translation BIBAKPDF 49-58
  Chen-Chuan K. Chang; Hector Garcia-Molina
In this paper we present a mechanism for translating information in heterogeneous digital library environments. We model information as a set of conjunctive constraints that are satisfied by real-world objects (e.g., documents, their metadata). Through application of semantic rules and value transformation functions, constraints are mapped into ones understood and supported in another context. Our machinery can also deal with hierarchically structured information.
Keywords: Constraint mapping, Data translation, Semantic interoperability
Automatic Subject Indexing Using an Associative Neural Network BIBAKPDF 59-68
  Yi-Ming Chung; William M. Pottenger; Bruce R. Schatz
The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser based search tools which in turn have led to an increased demand for indexing techniques and technologies. As the amount of globally accessible information in community repositories grows, it is no longer cost-effective for such repositories to be indexed by professional indexers who have been trained to be consistent in subject assignment from controlled vocabulary lists. The era of amateur indexers is thus upon us, and the information infrastructure needs to provide support for such indexing if search of the Net is to produce useful results.
   In this paper, we propose the ConceptAssigner, an automatic subject indexing system based on a variant of the Hopfield network [13]. In the application discussed herein, a collection of documents is used to automatically create a subset of a thesaurus termed a Concept Space [4]. To automatically index an individual document, concepts extracted from the given document become the input pattern to a Concept Space represented as a Hopfield network. The Hopfield net parallel spreading activation process produces another set of concepts that are strongly related to the concepts of the input document. Such concepts are suitable for use in an interactive indexing environment.
   A prototype of our automatic subject indexing system has been implemented as part of the Interspace, a semantic indexing and retrieval environment which supports statistically-based semantic indexing in a persistent object environment.
Keywords: Automatic indexing, Semantic indexing, Semantic retrieval, Automatic subject assignment, Amateur indexing, Concept Space, Information retrieval, Interspace, Semantic locality
Archival Storage for Digital Libraries BIBAKPDF 69-78
  Arturo Crespo; Hector Garcia-Molina
We propose an architecture for Digital Library Repositories that assures long-term archival storage of digital objects. The architecture is formed by a federation of independent but collaborating sites, each managing a collection of digital objects. The architecture is based on the following key components: use of signatures as object handles, no deletions of digital objects, functional layering of services, the presence of an awareness service in all layers, and use of disposable auxiliary structures. Long-term persistence of digital objects is achieved by creating replicas at several sites.
Keywords: Digital library repository, Archival storage, Long-term preservation of data
Considerations for Information Environments and the NaviQue Workspace BIBAKPDF 79-88
  George W. Furnas; Samuel J. Rauch
This paper presents design considerations for the construction of advanced information environments, and a prototype interface that attempts to respond to them. The design considerations came from task analyses of information gathering activities, from changes in the global information environment, and from advances in human-computer interaction. These led to a number of desired design properties that are guiding our prototyping efforts, including the system, NaviQue, detailed here. It is a visually rich environment for information gathering and organizing, based on a navigable, fractal structure of information, ubiquitous queriability, lightweight interaction with ad hoc sets, and information visualization. The resulting interaction paradigm smoothly integrates more than a half dozen synergies between querying, navigation and organization.
Keywords: Digital library, Multiscale worlds, Query, Navigation, Browsing, Search, Information visualization, Information gathering environments
CiteSeer: An Automatic Citation Indexing System BIBAKPDF 89-98
  C. Lee Giles; Kurt D. Bollacker; Steve Lawrence
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Given a particular paper of interest, CiteSeer can display the context of how the paper is cited in subsequent publications. This context may contain a brief summary of the paper, another author's response to the paper, or subsequent work which builds upon the original article. CiteSeer allows the location of papers by keyword search or by citation links. Papers related to a given paper can be located using common citation information or word vector similarity. CiteSeer will soon be available for public use.
Keywords: Citation indexing, Citation context, Literature search, Bibliometrics
Page and Link Classifications: Connecting Diverse Resources BIBAKPDF 99-107
  Stephanie W. Haas; Erika S. Grams
As digital libraries of all kinds increase in size and scope, they contain more and more diverse information objects. The value of any collection is drawn in part from an understanding of what is there and what relationships exist between items. We believe that classification systems for World Wide Web pages and links, and by extension for any diverse digital library, will be most effective if they are developed in tandem. Therefore, we propose integrated classification systems for Web pages and links which are based on a content analysis of 75 source pages, the almost 1,500 links they contained, and the target pages to which the links led. The consistency with which we were able to classify pages and links bodes well for the possibilities of automatic classification. The slightly lower level of consistency of the link classifications emphasizes the importance of considering context and user expectations in specifying anchors. We conclude by raising important questions about how best to design and link together diverse resources such as those found on the Web or in a digital library.
Keywords: Web page classification, Link types, Retrieval, Content analysis, Style recommendations
Axis-Specified Search: A Fine-Grained Full-Text Search Method for Gathering and Structuring Excerpts BIBAKPDF 108-117
  Yasusi Kanada
A text search method, which is called an axis-specified search method, is proposed. This method is suitable for full-text searches of a large-scale text collection. In this method, in addition to specifying search strings, the user selects an axis from a predefined set. The system outputs excerpts and hyperlinks that are ordered along the axis. The search strings express the specific subject of the search, and the axis specifies a general-purpose method of ordering results. Short sub-topics, which cannot be easily caught by statistical methods, are effectively gathered from the text collection. The user can get satisfactory results using a simple search string. Even if the number of results is very large, the user can easily survey them, because they are well structured. This method has been applied to an electronic encyclopedia and a newspaper database. In these applications, distributed descriptions that were related to each other could be gathered, and the user could discover their relationships from the results. For example, by specifying "semiconductor" for a search string and "year" for an axis, a table listing seven decades of semi-conductor-related topics sorted by year was generated from newspaper issues published over a single year. By specifying "basin" for a search string and "area" (m²) for an axis, descriptions of the world's largest rivers were extracted from the encyclopedia and sorted according to their basin areas.
Keywords: Information retrieval, Full-text search, Information extraction, Information gathering, Document classification, Electronic encyclopedia, Newspaper database
Key Frame Preview Techniques for Video Browsing BIBAKPDF 118-125
  Anita Komlodi; Gary Marchionini
Digitized video is an important format in digital libraries. Browsing video surrogates saves user time, storage capacity and avoids unnecessary downloading of large files. The study presented in this paper compared dynamic and static presentation techniques for key frames extracted from video documents. For this study key frames were automatically extracted and then a subset was manually selected to best represent the document. The three interface designs used were: 4 key frame static storyboard display, 12 key frame static storyboard display and 12 key frame dynamic slideshow display. The key frames in all displays were shown in temporal order. User performances on object, action identification, and gist comprehension and selection tasks were compared across treatments. Examination time and user satisfaction were also measured. Static storyboard displays proved to support object identification better, while other user performance measures showed no statistically significant differences. Using fewer key frames in static displays saved considerable amount of user time and screen real estate and user performance on gist comprehension and selection did not decrease when key frames were carefully selected to support queries. Implications for interface design and further research are discussed.
Keywords: Video browsing, Interface design, Key frames, User testing
Metadata Visualization for Digital Libraries: Interactive Timeline Editing and Review BIBAKPDF 126-133
  Vijay Kumar; Richard Furuta; Robert B. Allen
Interactive Timeline Editing and Review (ITER), a general framework for modeling and presenting temporal information, is described. In addition, the tmViewer interface is described for viewing temporal and other metadata. ITER and tmViewer go beyond previous electronic timeline displays in treating timelines as hypertexts and structured documents, and allowing interactive display of the metadata in addition to the events. The use of the tool is described for exploring bibliographic records, such as search hits from the book database available at amazon.com, and for the presentation of timelines.
Keywords: History, Interactivity, Metadata, Taxonomy, Timelines, tmViewer, Visualization
Making Global Digital Libraries Work: Collection Services, Connectivity Regions, and Collection Views BIBAKPDF 134-143
  Carl Lagoze; David Fielding; Sandra Payette
There are many technical challenges in designing the architecture of globally-distributed, federated digital libraries. This paper focuses on the problem of global resource discovery and describes a service architecture and server topology for improving the performance and reliability of that process. The technique described is based on three concepts. Connectivity regions are groups of sites with relatively good network connectivity. Collection services provide the necessary meta-information so that a group of digital library servers can interoperate as a collection. Collection views represent the configuration of the collection that conforms to connectivity regions. The work that is described here is based on experience with the NCSTRL international digital library of computer science research and is implemented as part of the Dienst architecture upon which NCSTRL is based.
Keywords: Digital library architecture, Distributed searching, Case studies
An Integrated Reading and Editing Environment for Scholarly Research on Literary Works and their Handwritten Sources BIBAKPDF 144-151
  E. Lecolinet; L. Likforman-Sulem; L. Robert; F. Role; J-L. Lebrave
We present an integrated system devoted to the visualization and the editing of hypermedia documents from literary material including document images and structured text. First, capabilities are offered to transcribe manuscript images. Transcribing the text consists in coupling lines typed on the keyboard with their corresponding text lines in the manuscript images. A semi-automatic system based on computer-human interaction and document analysis is proposed for performing this task. This system provides editing capabilities for linking document images and the corresponding structured textual representations (encoded by means of a logical markup language). Finally, application-specific visualization tools have been developed in order to provide users with an idea of the overall organization of the hyperdocument and help them to navigate.
Keywords: Hypermedia, Reading/editing environment, Text / image coupling, Image analysis, Text encoding, Information visualization
Heroic Measures: Reflections on the Possibility and Purpose of Digital Preservation BIBAKPDF 152-161
  David M. Levy
Preserving digital information is a difficult and poorly understood problem. The current inability to accomplish digital preservation is a major impediment to the adoption of digital forms on a grand scale. Recognizing this, in December, 1994 the Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG) convened a Task Force on Digital Archiving. In May, 1996 the Task Force issued its report, entitled "Preserving Digital Information: Report of the Task Force on Archiving of Digital Information." This paper has been written in the context of, and partly as a response to, the Task Force report. It analyzes migration as a preservation strategy and, as a counterweight to an overly narrow technical focus on "the digital object," it argues that use considerations -- the purposes for which the digital materials being preserved are to be used -- must figure centrally in any preservation strategy.
Keywords: Digital preservation, Digital documents, Use, Archiving
Making Metadata: A Study of Metadata Creation for a Mixed Physical-Digital Collection BIBAKPDF 162-171
  Catherine C. Marshall
Metadata is an important way of creating order in emerging distributed digital library collections. This paper presents an analysis of ethnographic data gathered in a university library's educational technology center as the staff develops metadata for a mixed physical-digital collection of visual resources. In particular, the paper explores issues associated with the application of standards, uncertain collection and metadata boundaries, distribution and responsibility, the types of description that arise in practice, and metadata temporality and scope. These issues help to characterize a problem space, and to explore the trade-offs collection maintainers must face when they create metadata for heterogeneous materials.
Keywords: Metadata, Digital library, Ethnographic study, Mixed physical-digital collections, Visual resources, Local knowledge
Beyond SGML BIBAKPDF 172-181
  Roger Price
The International Standard for the Standard Generalized Markup Language (SGML) published in 1986 is now seen as a mature language for expressing document structure and is accepted as the basis for major projects such as the Text Encoding Initiative and important hypertext languages such as HTML and XML. The historical origin of SGML as a technique for adding marks to texts has left a legacy of complexities and difficulties which hinder its wide acceptance. A key difficulty is the dual role that SGML documents currently play: they are both a representation for interchange and a human readable presentation. We examine possible document markup techniques in a post-SGML 86 world with emphasis on the framework architecture. The novel ideas include the generalization of the notion of a "character" to a much broader token which is strongly typed to differentiate text, markup, images and other component types.
Keywords: SGML, Document architectures, Document, Views
Query Performance for Tightly Coupled Distributed Digital Libraries BIBAKPDF 182-190
  Berthier A. Ribeiro-Neto; Ramurti A. Barbosa
We consider a digital library distributed in a tightly coupled environment. The library is indexed by inverted files and the vector space model is used as ranking strategy. Using a simple analytical model coupled with a small simulator, we study how query performance is affected by the index organization, the network speed, and the disks transfer rate. Our results, which are based on the Tipster/Trec3 collection, indicate that a global index organization might outperform a local index organization.
Keywords: Digital library, Distributed, Query performance
Practical Application of Existing Hypermedia Standards and Tools BIBAKPDF 191-199
  Lloyd Rutledge; Jacco van Ossenbruggen; Lynda Hardman; Dick C. A. Bulterman
In order for multimedia presentations to be stored, accessed and played from a large library they should not be encoded as final form presentations, since these consume storage space and cannot easily be adapted to variations in presentation-time circumstances such as user characteristics and changes in end-user technology. Instead, a more presentation independent approach needs to be taken that allows the generation of multiple versions of a presentation based on a presentation-independent description.
   In order for such a generated presentation to be widely viewable, it must be in a format that is widely implemented and adopted. Such a format for hypermedia presentations does not yet exist. However, the recent release of SMIL, whose creation and promotion is managed by the World Wide Web Consortium, promises to become such a format in the short term and be for hypermedia what HTML is for hypertext.
   The technology for enabling this presentation-independent approach is already available, but requires the use of large and unapproachable standards, such as DSSSL and HyTime. In this paper we show that these two standards can be used with SMIL, and by concentrating on a particular application, illustrate the use of publicly available tools to support the generation of multiple presentations from a single presentation-independent source.
Keywords: Hypermedia, HyTime, DSSSL, SMIL, SP, Jade, Berlage, GRiNS
SONIA: A Service for Organizing Networked Information Autonomously BIBAKPDF 200-209
  Mehran Sahami; Salim Yusufali; Michelle Q. W. Baldonado
The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topical hierarchies. However, these tools are quickly becoming inadequate as query results grow incomprehensibly large and manual classification in topic hierarchies creates an immense bottleneck. We address these problems with a system for topical information space navigation that combines the query-based and taxonomic systems. We employ machine learning techniques to create dynamic document categorizations based on the full-text of articles that are retrieved in response to users' queries. Our system, named SONIA (Service for Organizing Networked Information Autonomously), has been implemented as part of the Stanford Digital Libraries Testbed. It employs a combination of technologies that takes the results of queries to networked information sources and, in real-time, automatically retrieve, parse and organize these documents into coherent categories for presentation to the user. Moreover, the system can then save such document organizations in user profiles which can then be used to help classify future query results by the same user. SONIA uses a multi-tier approach to extracting relevant terms from documents as well as statistical clustering methods to determine potential topics within a document collection. It also makes use of Bayesian classification techniques to classify new documents within an existing categorization scheme. In this way, it allows users to navigate the results of a query at a more topical level rather than having to examine each document text separately.
Keywords: Clustering, Classification, Feature selection, Distributed information
An Agent-Based Approach to the Construction of Floristic Digital Libraries BIBAKPDF 210-216
  J. Alfredo Sanchez; Cristina A. Lopez; John L. Schnase
This paper describes an agent-assisted approach to the construction of floristic digital libraries, which consist of very large botanical data repositories and related services. We propose an environment, termed Chrysalis, in which authors of plant morphologic descriptions can enter data into a digital library via a web-based editor. An agent that runs concurrently with the editor suggests potentially useful morphologic descriptions based on similar documents existing in the library. Benefits derived from the introduction of Chrysalis include reduced potential for errors and data inconsistencies, increased parallelism among descriptions, and considerable savings in the time regularly spent in visually checking for parallelism and manually editing data.
Keywords: Agents, Agent-based interfaces, Floristic digital libraries, FNA, Chrysalis
Digital Library Information Appliances BIBAKPDF 217-226
  Bill N. Schilit; Morgan N. Price; Gene Golovchinsky
Although digital libraries are intended to support education and knowledge work, current digital library interfaces are narrowly focused on retrieval. Furthermore, they are designed for desktop computers with keyboards, mice, and high-speed network connections. Desktop computers fail to support many key aspects of knowledge work, including active reading, free form ink annotation, fluid movement among document activities, and physical mobility. This paper proposes portable computers specialized for knowledge work, or digital library information appliances, as a new platform for accessing digital libraries. We present a number of ways that knowledge work can be augmented and transformed by the use of such appliances. These insights are based on our implementation of two research prototype systems: XLibris, an "active reading machine," and TeleWeb, a mobile World Wide Web browser.
Keywords: Digital library, Information appliance, Paper document metaphor, Active reading, Browsing, Information exploration, Digital ink, Pen computing, Mobile Web browser, Mobile computing, Network computer
Herbarium Specimen Browser: A Tool for Accessing Botanical Specimen Collections BIBAKPDF 227-234
  Erich R. Schneider; John J. Leggett; Richard K. Furuta; Hugh D. Wilson; Stephan L. Hatch
For several years the Texas A&M Bioinformatics Working Group has pursued the construction of a novel digital library resource, an electronic adaptation of the information in the S.M. Tracy Herbarium, a major collection of preserved plants. This paper describes a tool we have developed for panoramically surveying the contents of the collection: the Herbarium Specimen Browser. While some of the Specimen Browser's implementation details (particularly its unconventional use of a full-text retrieval system to store its database, and its specialized mapping software) are of general interest, it also exhibits properties which designers of similar digital library access systems may find worth considering: support for pattern discovery, use of regularity in hypertext link sources and destinations, and employment of Javascript as an interface simplification mechanism.
Keywords: Browsing, Pattern discovery, Mapping, Full-text retrieval, WWW, Botanical collections
BUS: An Effective Indexing and Retrieval Scheme in Structured Documents BIBAKPDF 235-243
  Dongwook Shin; Hyuncheol Jang; Honglan Jin
In recent digital library systems or World Wide Web environment, many documents are beginning to be provided in the structured format, tagged in mark up languages like SGML or XML. Hence, indexing and query evaluation of structured documents have been drawing attention since they enable to access and retrieve a certain part of documents easily. However, conventional information retrieval techniques do not scale up well in structured documents.
   This paper suggests an efficient indexing and query evaluation scheme for structured documents (named BUS) that minimizes the indexing overhead and guarantees fast query processing at any level in the document structure. The basic idea is that indexing is performed at the lowest level of the given structure and query evaluation computes the similarity at higher level by accumulating the term frequencies at the lowest level in the bottom up way. The accumulators summing up the similarity play the role of accumulating all the term frequencies of the related part at a certain level.
   This paper also addresses the implementation of BUS and proves that BUS works correctly. In addition, along with several experiments, it shows that BUS facilitates efficient indexing in terms of space and time and guarantees the reasonable retrieval time in response to user queries.
Keywords: Structured documents, SGML, XML, Information retrieval, Indexing, Accumulator
Global Digital Museum: Multimedia Information Access and Creation on the Internet BIBAKPDF 244-253
  Junichi Takahashi; Takayuki Kushida; Jung-Kook Hong; Shigeharu Sugita; Yasuyuki Kurita; Robert Rieger; Wendy Martin; Geri Gay; John Reeve; Rowena Loverance
Multimedia information access on the Internet creates a new paradigm for museum information and education service that complements conventional school programs. We designed and developed the Global Digital Museum to permit easy access to the cultural heritage stored in museums around the world. The system provides a single virtual museum, enabling global search and edit of museum contents on the Internet. We applied the Global Digital Museum model to K-12 museum education by using real museum multimedia data. Technical issues addressed include: 1) unified and global access to heterogeneous and distributed multimedia contents of museums; and 2) interactive editing of the contents on the World-Wide Web. We describe the concept of Global Digital Museum, the system and network architecture, the data model for museum information, and implementation of a prototype system on the Internet.
Keywords: Digital museum, Internet, World-Wide Web, Global search, Museum education
Ontology-Based Metadata: Transforming the MARC Legacy BIBAKPDF 254-263
  Peter C. Weinstein
We propose a new catalog based on a formal ontological model of bibliographic relations. A hierarchy of live central concepts describes the creation of work. Each kind of relation between works occurs at a particular level in the hierarchy. Related works share data at some level of the hierarchy, yielding a tree structure that reduces redundant representation of shared attributes.
   To show that ontology-based metadata is practical, we generated a knowledge base of metadata from a sample of MARC records. We implemented the ontology in description logic (Loom), mapped Machine Readable Cataloging (MARC) attributes and values to the ontology, and loaded the data into Loom with all values treated as separate instances. We then unified matching instances, and deduced relations between works. This process thus converts relationships implicit in MARC into explicit relations that are easy to utilize with computers.
   Our web interface permits browsing by navigating relations between works. Ontology-based metadata can also support user inquiry and digital-library operation in other important ways.
Keywords: Metadata, Ontology, Bibliographic relations, Catalog structure
Database Selection Techniques for Routing Bibliographic Queries BIBAKPDF 264-273
  Jian Xu; Yinyan Cao; Ee-Peng Lim; Wee-Keong Ng
Query routing refers to the general problem of selecting from a large set of accessible information sources the ones relevant to a given query (i.e. database selection), evaluating the query on the selected sources, and merging their results. As the number of information sources on the Internet increases dramatically, query routing is becoming increasingly important. Much of the previous work in query routing focused on information sources that are document collections. In this paper, we address the database selection problem for databases with multiple text attributes. In particular, we have proposed a number of different database selection techniques each requiring different types of knowledge about the databases' content, e.g. past queries, past query results, and statistical information collected from the database records. By conducting a series of experiments on a set of bibliographic databases, we evaluate and compare the performance of these proposed techniques.
Keywords: Query routing, Database selection, Collection fusion, Bibliographic databases, Information retrieval
A Hierarchical Access Control Scheme for Digital Libraries BIBAKPDF 275-276
  Chaitanya Baru; Arcot Rajasekar
We present an access control scheme that extends the authorization/privilege model employed in database systems to handle the notion of digital library collection hierarchies. This scheme is being implemented within the digital library infrastructure at the San Diego Supercomputer Center.
Keywords: Digital library, Access control, Security
A Web Art Gallery BIBPDF 277-278
  Murat Bayraktar; Chang Zhang; Bharadwaj Vadapalli; Neill A. Kipp; Edward A. Fox
Collaborative Information Agents on the World Wide Web BIBAKPDF 279-280
  James R. Chen; Nathalie Mathe; Shawn Wolfe
In this paper, we present DIAMS, a system of distributed, collaborative information agents which help users access, collect, organize and exchange information on the World Wide Web. Personal agents provide their owners dynamic displays of well organized information collections, as well as friendly information management utilities. Personal agents exchange information with one another. They also work with other types of information agents such as matchmakers and knowledge experts to facilitate collaboration and communication.
Keywords: Intelligent agents, Information access, Collaborative system, Knowledge-base, World Wide Web
Experimenting a 3D Interface for the Access to a Digital Library BIBAKPDF 281-282
  Pierre Cubaud; Claire Thiria; Alexandre Topol
We experiment the production of VRML scenes on the fly for the unguided browsing through a digitalized rare books collection. The physical appearance of books is captured by photographic textures in order to help the user evaluation of the collection relevance. In this preliminary work, we address the problems of 3D scenes dynamic specification and VRML browser response time for such scenes.
Keywords: Digital library, Human-computer interaction, Virtual reality, VRML
Efficient Searching in Distributed Digital Libraries BIBAKPDF 283-284
  James C. French; Allison L. Powell; Walter R., III Creighton
When a digital library is decomposed into many geographically distributed repositories, search efficiency becomes an issue. Increasing network congestion makes this a compelling issue. We discuss an effective method for reducing the number of servers needed to respond to a query and give examples of search space reduction in the NCSTRL distributed digital library.
Keywords: Collection selection, Database selection, Text resource discovery, Distributed searching
Using Decision Theory to Order Documents BIBAPDF 285-286
  Eric J. Glover; William P. Birmingham
As the content in digital libraries grow, it is important to organize query results so that more "valuable" results are ranked higher. We postulate that this kind of ranking will make it easier for users to find documents that meet their information need. We describe a model of a user's information need that incorporates the idea of "value." This model is based on decision theory, and is realized in a digital library. Preliminary results show that ranking documents based on value is beneficial.
   Document value stems from the information need of a user of an information retrieval (IR) system [2]. This typically includes topical relevance as well as publication date, grade level, size of document, and so forth. For the most part, IR systems focus on finding documents that are topically relevant, and often ignore other factors, or consider them as constraints.
   We believe that a more effective model of information need is value, where the value of a document is rationally determined by considering a variety of factors. In particular, the value perspective allows us to do away with artificial constraints on the search process, replacing them with preferences.
   In this paper, we focus on using value to order documents that are returned by a search engine. The advantage to this "post ordering" arrangement is that it is fully compatible with existing search engines.
Topic Labeling of Broadcast News Stories in the Informedia Digital Video Library BIBAKPDF 287-288
  Alexander G. Hauptmann; Danny Lee
This paper describes the implementation of a topic labeling component for the Informedia Digital Video Library. Each news story recorded from the evening news is assigned to one of 3178 topic categories using a K-nearest neighbor classification algorithm. In preliminary tests, the system achieved recall of 0.491 with relevance of 0.482 when up to 5 topics could be assigned to a news story.
Keywords: Topic detection and labeling, Topic spotting and classification, Video library, Digital libraries, Broadcast news story indexing
Failure Analysis in Query Construction: Data and Analysis from a Large Sample of Web Queries BIBAKPDF 289-290
  Bernard J. Jansen; Amanda Spink; Tefko Saracevic
This paper reports results from a failure analysis (i.e., incorrect query construction) of 51,473 queries from 18,113 users of Excite, a major Web search engine. Given that many digital libraries are accessed via the Web, this analysis points to the need for redesign of the traditional search engine interfaces.
Keywords: Web queries, Query construction
Dynamic Query Result Previews for a Digital Library BIBAKPDF 291-292
  Steve Jones
Previous models of dynamic querying supported by query previews have focussed on attribute based querying, have required information providers to create preview tables, and have provided little information to support initial query refinement. We present an alternative model that has been implemented for the New Zealand Digital Library, and describe the system architecture and user interface.
Keywords: Dynamic queries, Query previews
Usage Analysis of a Digital Library BIBAKPDF 293-294
  Steve Jones; Sally Jo Cunningham; Rodger McNab
We analyse transaction logs for a large full-text document collection for Computer Science researchers. We report insights gained from this analysis and identify resulting search interface design issues.
Keywords: Transaction log analysis, Search interface, Usage analysis
Preserving Electronic Documents BIBAKPDF 295-296
  Douglas A. Kranch
Electronic documents have many advantages, but they also have the serious disadvantages. One of them is difficulty in preservation. A method should be developed for preserving electronic documents in their original form that is independent of the hardware or software standards used.
Keywords: Electronic documents, Preservation
Querying Structured Web Resources BIBAPDF 297-298
  Ee-Peng Lim; Cheng-Hai Tan; Boon-Wan Lim; Wee-Keong Ng
In our ongoing WebIR (Web Information Retrieval) research, we are looking into how web search engines can be extended to exploit the structuredness of web collections for retrieval type queries. Our overall goal is to create a new breed of web search engines that handle retrieval queries involving both intra- and inter-document structures. To achieve this goal, the following issues have to be addressed:
  • How do we obtain the structural information about a web collection?
  • What is the appropriate query model? In other words, how should the new
       retrieval queries look like? How should the query results be represented?
  • What are the appropriate indexing and query evaluation strategies?
  • What should be the ranking formula for the query results?
  • What is the appropriate framework to measure the performance of the new
       search engines? In the remaining sections of our paper, we will present our approaches to address the first two issues in the WebIR research project.
  • Searching for Content-Based Addresses on the World-Wide Web BIBAKPDF 299-300
      Joel D. Martin; Robert Holte
    This paper presents a method for constructing queries that are sufficient to retrieve a target web page. These queries can be thought of as content-based addresses for the target page and can have many potential uses.
    Keywords: Distributed digital libraries, Query, Web, Search engines, Content-based addresses, Dead links, QuerySearch
    An Image-Capable Audio Internet Browser for Facilitating Blind User Access to Digital Libraries BIBAKPDF 301-302
      Thierry Pun; Patrick Roth; Lori Petrucci; Andre Assimacopoulos
    The Internet now permits widespread access to textual and pictorial material from digital libraries. The widespread use of graphical user interfaces, however, increasingly bars visually handicapped people from using such material. We present here our current work aimed at the adaptation of an Internet browser to facilitate blind user access to digital libraries. The main distinguishing characteristics of this browser are: (1) active user interaction, both for the macro-analysis and micro-analysis of screen objects of interest; (2) use of a touch-sensitive screen to facilitate user interaction; (3) generation of a virtual sound space into which the screen information is mapped; (4) transcription into sounds not only of text, but also of images. Several prototypes have been implemented, and are being evaluated by blind users.
    Keywords: Internet, WWW, Digital libraries, Blind user access, Sound space, Image analysis, Rehabilitation
    Information Forage Through Adaptive Visualization BIBAKPDF 303-304
      Dmitri Roussinov; Marshall Ramsey
    Automatically created maps of concepts improve navigation in a collection of text documents. We report our research on leveraging navigation by providing interactively the ability to modify the maps themselves. We believe that this functionality leads to better responsiveness to the user and a more effective search. For this purpose we have created and tested a prototype system that builds and refines in real-time a map of concepts found in Web documents returned by a commercial search engine.
    Keywords: Intelligent searching, Interactive data exploration, Information representation, WWW, Search engines, Information retrieval
    A Graphical Interface for Speech-Based Retrieval BIBAKPDF 305-306
      Laura Slaughter; Douglas W. Oard; Vernon L. Warnick; Julie L. Harding; Galen J. Wilkerson
    This paper describes preliminary usability testing for a graphical interface designed to facilitate rapid browsing of recorded speech. Expert interviews and focus group discussions were used to assess the alignment between browsing behaviors employed by members of the intended user population and an early mockup of the interface. The results provide guidelines for the next iteration of prototype development and suggest that graphical representations offer a viable method for browsing audio and multimedia recordings.
    Keywords: Speech-based retrieval, GUI, Digital library
    An Interactive WWW Search Engine for User-Defined Collections BIBAKPDF 307-308
      Robert G., Jr. Sumner; Kiduk Yang; Bert J. Dempsey
    Given the dynamic nature and the quantity of information on the WWW, many individual users and organizations compile and use focused WWW resource lists related to a particular topic or subject domain. The IRISWeb system extends this concept such that any user-defined set of WWW pages (a virtual collection) can be retrieved, indexed, and searched using a powerful full-text search engine with a relevance-feedback interface. This capability adds full-text searching to highly customized subsets of the WWW. Here we describe the IRISWeb software and an experiment that highlights its potential.
    Keywords: Search engines, WWW, Virtual collection, LIBClient, Subject gateway, IRISWeb
    Site Outlining BIBAKPDF 309-310
      Koichi Takeda; Hiroshi Nomiyama
    In this paper, we propose a "site outlining" technique for building highly integrated digital libraries comprising dynamic information sources such as Web sites on the Internet. The notion of a site is defined as a structured entity with annotated links.
    Keywords: Site outlining, Views, Visualization
    Digital Library for Education and Medical Decision Making BIBAKPDF 311-312
      Mark C. Tsai; Kenneth L. Melmon
    Stanford Health Information Network for Education (SHINE) integrates online guideline texts, textbooks, journals, bibliographic systems, medical images, digital video, relational databases, and knowledge-based systems formerly accessible individually through the Z39.50 protocol, SQL language, HTTP protocol, and full-text search engines. We discuss the architecture that is used to integrate these distributed heterogeneous systems. We explain other components: electronic notebook, and log recording systems that make SHINE a complete system to support medical decision making and learning. We also discuss how the system will be integrated with electronic medical record systems to support medical decisions. The same concepts can be applied to aggregation of knowledge domains to optimize the functions of other (non-medical) target users.
    Keywords: Integration system, Intelligent integration, Distributed database systems, Digital library, Electronic notebook, Information retrieval
    Internet Access to Scanned Paper Documents BIBAKPDF 313-314
      Marcel Worring; Arnold W. M. Smeulders
    In this contribution we identify the different structures to encounter in a hyperdocument. Methods are described for deriving those structures from scanned paper originals. The content and structure of the document is then made available in a form suited for an Internet browser. It provides convenient access to the scanned paper document.
    Keywords: Document access, Document understanding, Hypertext structure, Hypertext understanding