| Information Space Representation in Interactive Systems: Relationship to Spatial Abilities | | BIBAK | PDF | 1-10 | |
| Bryce Allen | |||
| Digital libraries, lacking a natural spatial organization, may adopt a
variety of mechanisms for visualizing information in two or more dimensions.
Understanding the cognitive basis for the use of spatial features in
information retrieval including spatial abilities, is important to the
development of interactive information retrieval. This research investigated
the interaction of spatial abilities with two-dimensional data representations
in an experimental interactive system. The results showed that users with
lower levels of spatial abilities were assisted in finding and interpreting
digital information when spatial representations of information were employed.
These results have implications for the design of digital libraries. Keywords: Data visualization, Cognitive abilities, Spatial representations | |||
| Comparing Feature-Based and Clique-Based User Models for Movie Selection | | BIBAK | PDF | 11-18 | |
| Joshua Alspector; Aleksander Kolcz; Nachimuthu Karunanithi | |||
| The huge amount of information available in the currently evolving world
wide information infrastructure at any one time can easily overwhelm end-users.
One way to address the information explosion is to use an "information
filtering agent" which can select information according to the interest and/or
need of an end-user. However, at present few such information filtering agents
exist. In this study, we evaluate the use of feature-based approaches to user
modeling with the purpose of creating a filtering agent for the video-on-demand
application. We evaluate several feature and clique-based models for 10
voluntary subjects who provided ratings for the movies. Our preliminary
results suggest that feature-based selection can be a useful tool to recommend
movies according to the taste of the user and can be as effective as a movie
rating expert. We compare our feature-based approach with a clique-based
approach, which has advantages where information from other users is available. Keywords: User modeling, Information filtering, Collaborative filtering, Feature
extraction, Neural networks, Linear models, Regression trees, Bagging, CART | |||
| An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers | | BIBAK | PDF | 19-28 | |
| Michelle Baldonado; Seth Katz; Andreas Paepcke; Chen-Chuan K. Chang; Hector Garcia-Molina; Terry Winograd | |||
| We describe an extensible constructor tool that helps information experts
(e.g., librarians) create specialized query synthesizers for heterogeneous
digital-library environments. A query synthesizer produces a graphical user
interface in which a digital-library patron can specify a high-level, fielded,
multi-source query. Furthermore, a query synthesizer interacts with a query
translator and an attribute translator to transform high-level queries into
sets of source-specific queries. In this paper, we discuss how our tool for
constructing synthesizers can facilitate the discovery of available attributes
(e.g., 'title'), the collation of schemas from different sources, the selection
of input widgets for a synthesizer (e.g., a drop-down list widget to support
input of controlled vocabulary), and other design aspects. We also describe
the user interface of our prototype constructor, which is implemented based on
the Stanford InfoBus and metadata architecture. Keywords: Constructor tool, Query synthesizer, Regional schema, Query generation,
Query translation, Attribute translation, Metadata architecture, Schema | |||
| Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Components | | BIBAK | PDF | 29-39 | |
| Ann Peterson Bishop | |||
| A scientific journal article is comprised of standard components, such as
author names, an abstract, figures, a bibliography, and sections describing
methods and results. With the creation of digital documents and new tools for
manipulating them comes the ability to facilitate the disaggregation of journal
articles into separate components. This paper describes how article components
are identified, mobilized, and used by students and faculty members, based on
the preliminary analysis of data collected through focus groups, workplace
interviews, transaction logging, and usability testing associated with the
University of Illinois Digital Libraries Initiative project. The paper
presents a schema of component use purposes, discusses the intellectual and
physical processes of component use, identifies several issues and implications
for digital library design, and highlights the need for multiple methods in
studying document disaggregation. Keywords: User Studies, Documents, Information seeking and use | |||
| Technologies for Repository Interoperation and Access Control | | BIBAK | PDF | 40-48 | |
| Shirley Browne; Jack Dongarra; Jeff Horner; Paul McMahan; Scott Wells | |||
| Over the past several years, network-accessible repositories have been
developed by various academic, government, and industrial organizations to
provide access to software and related resources. Allowing distributed
maintenance of these repositories while enabling users to access resources from
multiple repositories via a single interface has brought about the need for
interoperation. Concerns about intellectual property rights and export
regulations have brought about the need for access control. This paper
describes technologies for interoperation and access control that have been
developed as part of the National High-performance Software Exchange (NHSE)
project, as well as their deployment in a freely available repository
maintainer's toolkit called Repository in a Box. The approach to
interoperation has been to participate in the development of and to implement
an IEEE standard data model for software catalog records. The approach to
access control has been to extend the data model in the area of intellectual
property rights and to implement access control mechanisms of varying
strengths, ranging from email address verification to X.509 certificates, that
enforce software distribution policies specified via the data model. Although
they have been developed within the context of software repositories, these
technologies should be applicable to distributed digital libraries in general. Keywords: Interoperation, Intellectual property rights, Access control,
Authentication, Software licensing, Data modeling, Software reuse, Standards | |||
| Conjunctive Constraint Mapping for Data Translation | | BIBAK | PDF | 49-58 | |
| Chen-Chuan K. Chang; Hector Garcia-Molina | |||
| In this paper we present a mechanism for translating information in
heterogeneous digital library environments. We model information as a set of
conjunctive constraints that are satisfied by real-world objects (e.g.,
documents, their metadata). Through application of semantic rules and value
transformation functions, constraints are mapped into ones understood and
supported in another context. Our machinery can also deal with hierarchically
structured information. Keywords: Constraint mapping, Data translation, Semantic interoperability | |||
| Automatic Subject Indexing Using an Associative Neural Network | | BIBAK | PDF | 59-68 | |
| Yi-Ming Chung; William M. Pottenger; Bruce R. Schatz | |||
| The global growth in popularity of the World Wide Web has been enabled in
part by the availability of browser based search tools which in turn have led
to an increased demand for indexing techniques and technologies. As the amount
of globally accessible information in community repositories grows, it is no
longer cost-effective for such repositories to be indexed by professional
indexers who have been trained to be consistent in subject assignment from
controlled vocabulary lists. The era of amateur indexers is thus upon us, and
the information infrastructure needs to provide support for such indexing if
search of the Net is to produce useful results.
In this paper, we propose the ConceptAssigner, an automatic subject indexing system based on a variant of the Hopfield network [13]. In the application discussed herein, a collection of documents is used to automatically create a subset of a thesaurus termed a Concept Space [4]. To automatically index an individual document, concepts extracted from the given document become the input pattern to a Concept Space represented as a Hopfield network. The Hopfield net parallel spreading activation process produces another set of concepts that are strongly related to the concepts of the input document. Such concepts are suitable for use in an interactive indexing environment. A prototype of our automatic subject indexing system has been implemented as part of the Interspace, a semantic indexing and retrieval environment which supports statistically-based semantic indexing in a persistent object environment. Keywords: Automatic indexing, Semantic indexing, Semantic retrieval, Automatic subject
assignment, Amateur indexing, Concept Space, Information retrieval, Interspace,
Semantic locality | |||
| Archival Storage for Digital Libraries | | BIBAK | PDF | 69-78 | |
| Arturo Crespo; Hector Garcia-Molina | |||
| We propose an architecture for Digital Library Repositories that assures
long-term archival storage of digital objects. The architecture is formed by a
federation of independent but collaborating sites, each managing a collection
of digital objects. The architecture is based on the following key components:
use of signatures as object handles, no deletions of digital objects,
functional layering of services, the presence of an awareness service in all
layers, and use of disposable auxiliary structures. Long-term persistence of
digital objects is achieved by creating replicas at several sites. Keywords: Digital library repository, Archival storage, Long-term preservation of data | |||
| Considerations for Information Environments and the NaviQue Workspace | | BIBAK | PDF | 79-88 | |
| George W. Furnas; Samuel J. Rauch | |||
| This paper presents design considerations for the construction of advanced
information environments, and a prototype interface that attempts to respond to
them. The design considerations came from task analyses of information
gathering activities, from changes in the global information environment, and
from advances in human-computer interaction. These led to a number of desired
design properties that are guiding our prototyping efforts, including the
system, NaviQue, detailed here. It is a visually rich environment for
information gathering and organizing, based on a navigable, fractal structure
of information, ubiquitous queriability, lightweight interaction with ad hoc
sets, and information visualization. The resulting interaction paradigm
smoothly integrates more than a half dozen synergies between querying,
navigation and organization. Keywords: Digital library, Multiscale worlds, Query, Navigation, Browsing, Search,
Information visualization, Information gathering environments | |||
| CiteSeer: An Automatic Citation Indexing System | | BIBAK | PDF | 89-98 | |
| C. Lee Giles; Kurt D. Bollacker; Steve Lawrence | |||
| We present CiteSeer: an autonomous citation indexing system which indexes
academic literature in electronic format (e.g. Postscript files on the Web).
CiteSeer understands how to parse citations, identify citations to the same
paper in different formats, and identify the context of citations in the body
of articles. CiteSeer provides most of the advantages of traditional (manually
constructed) citation indexes (e.g. the ISI citation indexes), including:
literature retrieval by following citation links (e.g. by providing a list of
papers that cite a given paper), the evaluation and ranking of papers, authors,
journals, etc. based on the number of citations, and the identification of
research trends. CiteSeer has many advantages over traditional citation
indexes, including the ability to create more up-to-date databases which are
not limited to a preselected set of journals or restricted by journal
publication delays, completely autonomous operation with a corresponding
reduction in cost, and powerful interactive browsing of the literature using
the context of citations. Given a particular paper of interest, CiteSeer can
display the context of how the paper is cited in subsequent publications. This
context may contain a brief summary of the paper, another author's response to
the paper, or subsequent work which builds upon the original article. CiteSeer
allows the location of papers by keyword search or by citation links. Papers
related to a given paper can be located using common citation information or
word vector similarity. CiteSeer will soon be available for public use. Keywords: Citation indexing, Citation context, Literature search, Bibliometrics | |||
| Page and Link Classifications: Connecting Diverse Resources | | BIBAK | PDF | 99-107 | |
| Stephanie W. Haas; Erika S. Grams | |||
| As digital libraries of all kinds increase in size and scope, they contain
more and more diverse information objects. The value of any collection is
drawn in part from an understanding of what is there and what relationships
exist between items. We believe that classification systems for World Wide Web
pages and links, and by extension for any diverse digital library, will be most
effective if they are developed in tandem. Therefore, we propose integrated
classification systems for Web pages and links which are based on a content
analysis of 75 source pages, the almost 1,500 links they contained, and the
target pages to which the links led. The consistency with which we were able
to classify pages and links bodes well for the possibilities of automatic
classification. The slightly lower level of consistency of the link
classifications emphasizes the importance of considering context and user
expectations in specifying anchors. We conclude by raising important questions
about how best to design and link together diverse resources such as those
found on the Web or in a digital library. Keywords: Web page classification, Link types, Retrieval, Content analysis, Style
recommendations | |||
| Axis-Specified Search: A Fine-Grained Full-Text Search Method for Gathering and Structuring Excerpts | | BIBAK | PDF | 108-117 | |
| Yasusi Kanada | |||
| A text search method, which is called an axis-specified search method, is
proposed. This method is suitable for full-text searches of a large-scale text
collection. In this method, in addition to specifying search strings, the user
selects an axis from a predefined set. The system outputs excerpts and
hyperlinks that are ordered along the axis. The search strings express the
specific subject of the search, and the axis specifies a general-purpose method
of ordering results. Short sub-topics, which cannot be easily caught by
statistical methods, are effectively gathered from the text collection. The
user can get satisfactory results using a simple search string. Even if the
number of results is very large, the user can easily survey them, because they
are well structured. This method has been applied to an electronic
encyclopedia and a newspaper database. In these applications, distributed
descriptions that were related to each other could be gathered, and the user
could discover their relationships from the results. For example, by
specifying "semiconductor" for a search string and "year" for an axis, a table
listing seven decades of semi-conductor-related topics sorted by year was
generated from newspaper issues published over a single year. By specifying
"basin" for a search string and "area" (m²) for an axis, descriptions of
the world's largest rivers were extracted from the encyclopedia and sorted
according to their basin areas. Keywords: Information retrieval, Full-text search, Information extraction, Information
gathering, Document classification, Electronic encyclopedia, Newspaper database | |||
| Key Frame Preview Techniques for Video Browsing | | BIBAK | PDF | 118-125 | |
| Anita Komlodi; Gary Marchionini | |||
| Digitized video is an important format in digital libraries. Browsing video
surrogates saves user time, storage capacity and avoids unnecessary downloading
of large files. The study presented in this paper compared dynamic and static
presentation techniques for key frames extracted from video documents. For
this study key frames were automatically extracted and then a subset was
manually selected to best represent the document. The three interface designs
used were: 4 key frame static storyboard display, 12 key frame static
storyboard display and 12 key frame dynamic slideshow display. The key frames
in all displays were shown in temporal order. User performances on object,
action identification, and gist comprehension and selection tasks were compared
across treatments. Examination time and user satisfaction were also measured.
Static storyboard displays proved to support object identification better,
while other user performance measures showed no statistically significant
differences. Using fewer key frames in static displays saved considerable
amount of user time and screen real estate and user performance on gist
comprehension and selection did not decrease when key frames were carefully
selected to support queries. Implications for interface design and further
research are discussed. Keywords: Video browsing, Interface design, Key frames, User testing | |||
| Metadata Visualization for Digital Libraries: Interactive Timeline Editing and Review | | BIBAK | PDF | 126-133 | |
| Vijay Kumar; Richard Furuta; Robert B. Allen | |||
| Interactive Timeline Editing and Review (ITER), a general framework for
modeling and presenting temporal information, is described. In addition, the
tmViewer interface is described for viewing temporal and other metadata. ITER
and tmViewer go beyond previous electronic timeline displays in treating
timelines as hypertexts and structured documents, and allowing interactive
display of the metadata in addition to the events. The use of the tool is
described for exploring bibliographic records, such as search hits from the
book database available at amazon.com, and for the presentation of timelines. Keywords: History, Interactivity, Metadata, Taxonomy, Timelines, tmViewer,
Visualization | |||
| Making Global Digital Libraries Work: Collection Services, Connectivity Regions, and Collection Views | | BIBAK | PDF | 134-143 | |
| Carl Lagoze; David Fielding; Sandra Payette | |||
| There are many technical challenges in designing the architecture of
globally-distributed, federated digital libraries. This paper focuses on the
problem of global resource discovery and describes a service architecture and
server topology for improving the performance and reliability of that process.
The technique described is based on three concepts. Connectivity regions are
groups of sites with relatively good network connectivity. Collection services
provide the necessary meta-information so that a group of digital library
servers can interoperate as a collection. Collection views represent the
configuration of the collection that conforms to connectivity regions. The
work that is described here is based on experience with the NCSTRL
international digital library of computer science research and is implemented
as part of the Dienst architecture upon which NCSTRL is based. Keywords: Digital library architecture, Distributed searching, Case studies | |||
| An Integrated Reading and Editing Environment for Scholarly Research on Literary Works and their Handwritten Sources | | BIBAK | PDF | 144-151 | |
| E. Lecolinet; L. Likforman-Sulem; L. Robert; F. Role; J-L. Lebrave | |||
| We present an integrated system devoted to the visualization and the editing
of hypermedia documents from literary material including document images and
structured text. First, capabilities are offered to transcribe manuscript
images. Transcribing the text consists in coupling lines typed on the keyboard
with their corresponding text lines in the manuscript images. A semi-automatic
system based on computer-human interaction and document analysis is proposed
for performing this task. This system provides editing capabilities for
linking document images and the corresponding structured textual
representations (encoded by means of a logical markup language). Finally,
application-specific visualization tools have been developed in order to
provide users with an idea of the overall organization of the hyperdocument and
help them to navigate. Keywords: Hypermedia, Reading/editing environment, Text / image coupling, Image
analysis, Text encoding, Information visualization | |||
| Heroic Measures: Reflections on the Possibility and Purpose of Digital Preservation | | BIBAK | PDF | 152-161 | |
| David M. Levy | |||
| Preserving digital information is a difficult and poorly understood problem.
The current inability to accomplish digital preservation is a major impediment
to the adoption of digital forms on a grand scale. Recognizing this, in
December, 1994 the Commission on Preservation and Access (CPA) and the Research
Libraries Group (RLG) convened a Task Force on Digital Archiving. In May, 1996
the Task Force issued its report, entitled "Preserving Digital Information:
Report of the Task Force on Archiving of Digital Information." This paper has
been written in the context of, and partly as a response to, the Task Force
report. It analyzes migration as a preservation strategy and, as a
counterweight to an overly narrow technical focus on "the digital object," it
argues that use considerations -- the purposes for which the digital materials
being preserved are to be used -- must figure centrally in any preservation
strategy. Keywords: Digital preservation, Digital documents, Use, Archiving | |||
| Making Metadata: A Study of Metadata Creation for a Mixed Physical-Digital Collection | | BIBAK | PDF | 162-171 | |
| Catherine C. Marshall | |||
| Metadata is an important way of creating order in emerging distributed
digital library collections. This paper presents an analysis of ethnographic
data gathered in a university library's educational technology center as the
staff develops metadata for a mixed physical-digital collection of visual
resources. In particular, the paper explores issues associated with the
application of standards, uncertain collection and metadata boundaries,
distribution and responsibility, the types of description that arise in
practice, and metadata temporality and scope. These issues help to
characterize a problem space, and to explore the trade-offs collection
maintainers must face when they create metadata for heterogeneous materials. Keywords: Metadata, Digital library, Ethnographic study, Mixed physical-digital
collections, Visual resources, Local knowledge | |||
| Beyond SGML | | BIBAK | PDF | 172-181 | |
| Roger Price | |||
| The International Standard for the Standard Generalized Markup Language
(SGML) published in 1986 is now seen as a mature language for expressing
document structure and is accepted as the basis for major projects such as the
Text Encoding Initiative and important hypertext languages such as HTML and
XML. The historical origin of SGML as a technique for adding marks to texts
has left a legacy of complexities and difficulties which hinder its wide
acceptance. A key difficulty is the dual role that SGML documents currently
play: they are both a representation for interchange and a human readable
presentation. We examine possible document markup techniques in a post-SGML 86
world with emphasis on the framework architecture. The novel ideas include the
generalization of the notion of a "character" to a much broader token which is
strongly typed to differentiate text, markup, images and other component types. Keywords: SGML, Document architectures, Document, Views | |||
| Query Performance for Tightly Coupled Distributed Digital Libraries | | BIBAK | PDF | 182-190 | |
| Berthier A. Ribeiro-Neto; Ramurti A. Barbosa | |||
| We consider a digital library distributed in a tightly coupled environment.
The library is indexed by inverted files and the vector space model is used as
ranking strategy. Using a simple analytical model coupled with a small
simulator, we study how query performance is affected by the index
organization, the network speed, and the disks transfer rate. Our results,
which are based on the Tipster/Trec3 collection, indicate that a global index
organization might outperform a local index organization. Keywords: Digital library, Distributed, Query performance | |||
| Practical Application of Existing Hypermedia Standards and Tools | | BIBAK | PDF | 191-199 | |
| Lloyd Rutledge; Jacco van Ossenbruggen; Lynda Hardman; Dick C. A. Bulterman | |||
| In order for multimedia presentations to be stored, accessed and played from
a large library they should not be encoded as final form presentations, since
these consume storage space and cannot easily be adapted to variations in
presentation-time circumstances such as user characteristics and changes in
end-user technology. Instead, a more presentation independent approach needs
to be taken that allows the generation of multiple versions of a presentation
based on a presentation-independent description.
In order for such a generated presentation to be widely viewable, it must be in a format that is widely implemented and adopted. Such a format for hypermedia presentations does not yet exist. However, the recent release of SMIL, whose creation and promotion is managed by the World Wide Web Consortium, promises to become such a format in the short term and be for hypermedia what HTML is for hypertext. The technology for enabling this presentation-independent approach is already available, but requires the use of large and unapproachable standards, such as DSSSL and HyTime. In this paper we show that these two standards can be used with SMIL, and by concentrating on a particular application, illustrate the use of publicly available tools to support the generation of multiple presentations from a single presentation-independent source. Keywords: Hypermedia, HyTime, DSSSL, SMIL, SP, Jade, Berlage, GRiNS | |||
| SONIA: A Service for Organizing Networked Information Autonomously | | BIBAK | PDF | 200-209 | |
| Mehran Sahami; Salim Yusufali; Michelle Q. W. Baldonado | |||
| The recent explosion of on-line information in Digital Libraries and on the
World Wide Web has given rise to a number of query-based search engines and
manually constructed topical hierarchies. However, these tools are quickly
becoming inadequate as query results grow incomprehensibly large and manual
classification in topic hierarchies creates an immense bottleneck. We address
these problems with a system for topical information space navigation that
combines the query-based and taxonomic systems. We employ machine learning
techniques to create dynamic document categorizations based on the full-text of
articles that are retrieved in response to users' queries. Our system, named
SONIA (Service for Organizing Networked Information Autonomously), has been
implemented as part of the Stanford Digital Libraries Testbed. It employs a
combination of technologies that takes the results of queries to networked
information sources and, in real-time, automatically retrieve, parse and
organize these documents into coherent categories for presentation to the user.
Moreover, the system can then save such document organizations in user profiles
which can then be used to help classify future query results by the same user.
SONIA uses a multi-tier approach to extracting relevant terms from documents as
well as statistical clustering methods to determine potential topics within a
document collection. It also makes use of Bayesian classification techniques
to classify new documents within an existing categorization scheme. In this
way, it allows users to navigate the results of a query at a more topical level
rather than having to examine each document text separately. Keywords: Clustering, Classification, Feature selection, Distributed information | |||
| An Agent-Based Approach to the Construction of Floristic Digital Libraries | | BIBAK | PDF | 210-216 | |
| J. Alfredo Sanchez; Cristina A. Lopez; John L. Schnase | |||
| This paper describes an agent-assisted approach to the construction of
floristic digital libraries, which consist of very large botanical data
repositories and related services. We propose an environment, termed
Chrysalis, in which authors of plant morphologic descriptions can enter data
into a digital library via a web-based editor. An agent that runs concurrently
with the editor suggests potentially useful morphologic descriptions based on
similar documents existing in the library. Benefits derived from the
introduction of Chrysalis include reduced potential for errors and data
inconsistencies, increased parallelism among descriptions, and considerable
savings in the time regularly spent in visually checking for parallelism and
manually editing data. Keywords: Agents, Agent-based interfaces, Floristic digital libraries, FNA, Chrysalis | |||
| Digital Library Information Appliances | | BIBAK | PDF | 217-226 | |
| Bill N. Schilit; Morgan N. Price; Gene Golovchinsky | |||
| Although digital libraries are intended to support education and knowledge
work, current digital library interfaces are narrowly focused on retrieval.
Furthermore, they are designed for desktop computers with keyboards, mice, and
high-speed network connections. Desktop computers fail to support many key
aspects of knowledge work, including active reading, free form ink annotation,
fluid movement among document activities, and physical mobility. This paper
proposes portable computers specialized for knowledge work, or digital library
information appliances, as a new platform for accessing digital libraries. We
present a number of ways that knowledge work can be augmented and transformed
by the use of such appliances. These insights are based on our implementation
of two research prototype systems: XLibris, an "active reading machine," and
TeleWeb, a mobile World Wide Web browser. Keywords: Digital library, Information appliance, Paper document metaphor, Active
reading, Browsing, Information exploration, Digital ink, Pen computing, Mobile
Web browser, Mobile computing, Network computer | |||
| Herbarium Specimen Browser: A Tool for Accessing Botanical Specimen Collections | | BIBAK | PDF | 227-234 | |
| Erich R. Schneider; John J. Leggett; Richard K. Furuta; Hugh D. Wilson; Stephan L. Hatch | |||
| For several years the Texas A&M Bioinformatics Working Group has pursued the
construction of a novel digital library resource, an electronic adaptation of
the information in the S.M. Tracy Herbarium, a major collection of preserved
plants. This paper describes a tool we have developed for panoramically
surveying the contents of the collection: the Herbarium Specimen Browser.
While some of the Specimen Browser's implementation details (particularly its
unconventional use of a full-text retrieval system to store its database, and
its specialized mapping software) are of general interest, it also exhibits
properties which designers of similar digital library access systems may find
worth considering: support for pattern discovery, use of regularity in
hypertext link sources and destinations, and employment of Javascript as an
interface simplification mechanism. Keywords: Browsing, Pattern discovery, Mapping, Full-text retrieval, WWW, Botanical
collections | |||
| BUS: An Effective Indexing and Retrieval Scheme in Structured Documents | | BIBAK | PDF | 235-243 | |
| Dongwook Shin; Hyuncheol Jang; Honglan Jin | |||
| In recent digital library systems or World Wide Web environment, many
documents are beginning to be provided in the structured format, tagged in mark
up languages like SGML or XML. Hence, indexing and query evaluation of
structured documents have been drawing attention since they enable to access
and retrieve a certain part of documents easily. However, conventional
information retrieval techniques do not scale up well in structured documents.
This paper suggests an efficient indexing and query evaluation scheme for structured documents (named BUS) that minimizes the indexing overhead and guarantees fast query processing at any level in the document structure. The basic idea is that indexing is performed at the lowest level of the given structure and query evaluation computes the similarity at higher level by accumulating the term frequencies at the lowest level in the bottom up way. The accumulators summing up the similarity play the role of accumulating all the term frequencies of the related part at a certain level. This paper also addresses the implementation of BUS and proves that BUS works correctly. In addition, along with several experiments, it shows that BUS facilitates efficient indexing in terms of space and time and guarantees the reasonable retrieval time in response to user queries. Keywords: Structured documents, SGML, XML, Information retrieval, Indexing,
Accumulator | |||
| Global Digital Museum: Multimedia Information Access and Creation on the Internet | | BIBAK | PDF | 244-253 | |
| Junichi Takahashi; Takayuki Kushida; Jung-Kook Hong; Shigeharu Sugita; Yasuyuki Kurita; Robert Rieger; Wendy Martin; Geri Gay; John Reeve; Rowena Loverance | |||
| Multimedia information access on the Internet creates a new paradigm for
museum information and education service that complements conventional school
programs. We designed and developed the Global Digital Museum to permit easy
access to the cultural heritage stored in museums around the world. The system
provides a single virtual museum, enabling global search and edit of museum
contents on the Internet. We applied the Global Digital Museum model to K-12
museum education by using real museum multimedia data. Technical issues
addressed include: 1) unified and global access to heterogeneous and
distributed multimedia contents of museums; and 2) interactive editing of the
contents on the World-Wide Web. We describe the concept of Global Digital
Museum, the system and network architecture, the data model for museum
information, and implementation of a prototype system on the Internet. Keywords: Digital museum, Internet, World-Wide Web, Global search, Museum education | |||
| Ontology-Based Metadata: Transforming the MARC Legacy | | BIBAK | PDF | 254-263 | |
| Peter C. Weinstein | |||
| We propose a new catalog based on a formal ontological model of
bibliographic relations. A hierarchy of live central concepts describes the
creation of work. Each kind of relation between works occurs at a particular
level in the hierarchy. Related works share data at some level of the
hierarchy, yielding a tree structure that reduces redundant representation of
shared attributes.
To show that ontology-based metadata is practical, we generated a knowledge base of metadata from a sample of MARC records. We implemented the ontology in description logic (Loom), mapped Machine Readable Cataloging (MARC) attributes and values to the ontology, and loaded the data into Loom with all values treated as separate instances. We then unified matching instances, and deduced relations between works. This process thus converts relationships implicit in MARC into explicit relations that are easy to utilize with computers. Our web interface permits browsing by navigating relations between works. Ontology-based metadata can also support user inquiry and digital-library operation in other important ways. Keywords: Metadata, Ontology, Bibliographic relations, Catalog structure | |||
| Database Selection Techniques for Routing Bibliographic Queries | | BIBAK | PDF | 264-273 | |
| Jian Xu; Yinyan Cao; Ee-Peng Lim; Wee-Keong Ng | |||
| Query routing refers to the general problem of selecting from a large set of
accessible information sources the ones relevant to a given query (i.e.
database selection), evaluating the query on the selected sources, and merging
their results. As the number of information sources on the Internet increases
dramatically, query routing is becoming increasingly important. Much of the
previous work in query routing focused on information sources that are document
collections. In this paper, we address the database selection problem for
databases with multiple text attributes. In particular, we have proposed a
number of different database selection techniques each requiring different
types of knowledge about the databases' content, e.g. past queries, past query
results, and statistical information collected from the database records. By
conducting a series of experiments on a set of bibliographic databases, we
evaluate and compare the performance of these proposed techniques. Keywords: Query routing, Database selection, Collection fusion, Bibliographic
databases, Information retrieval | |||
| A Hierarchical Access Control Scheme for Digital Libraries | | BIBAK | PDF | 275-276 | |
| Chaitanya Baru; Arcot Rajasekar | |||
| We present an access control scheme that extends the authorization/privilege
model employed in database systems to handle the notion of digital library
collection hierarchies. This scheme is being implemented within the digital
library infrastructure at the San Diego Supercomputer Center. Keywords: Digital library, Access control, Security | |||
| A Web Art Gallery | | BIB | PDF | 277-278 | |
| Murat Bayraktar; Chang Zhang; Bharadwaj Vadapalli; Neill A. Kipp; Edward A. Fox | |||
| Collaborative Information Agents on the World Wide Web | | BIBAK | PDF | 279-280 | |
| James R. Chen; Nathalie Mathe; Shawn Wolfe | |||
| In this paper, we present DIAMS, a system of distributed, collaborative
information agents which help users access, collect, organize and exchange
information on the World Wide Web. Personal agents provide their owners
dynamic displays of well organized information collections, as well as friendly
information management utilities. Personal agents exchange information with
one another. They also work with other types of information agents such as
matchmakers and knowledge experts to facilitate collaboration and
communication. Keywords: Intelligent agents, Information access, Collaborative system,
Knowledge-base, World Wide Web | |||
| Experimenting a 3D Interface for the Access to a Digital Library | | BIBAK | PDF | 281-282 | |
| Pierre Cubaud; Claire Thiria; Alexandre Topol | |||
| We experiment the production of VRML scenes on the fly for the unguided
browsing through a digitalized rare books collection. The physical appearance
of books is captured by photographic textures in order to help the user
evaluation of the collection relevance. In this preliminary work, we address
the problems of 3D scenes dynamic specification and VRML browser response time
for such scenes. Keywords: Digital library, Human-computer interaction, Virtual reality, VRML | |||
| Efficient Searching in Distributed Digital Libraries | | BIBAK | PDF | 283-284 | |
| James C. French; Allison L. Powell; Walter R., III Creighton | |||
| When a digital library is decomposed into many geographically distributed
repositories, search efficiency becomes an issue. Increasing network
congestion makes this a compelling issue. We discuss an effective method for
reducing the number of servers needed to respond to a query and give examples
of search space reduction in the NCSTRL distributed digital library. Keywords: Collection selection, Database selection, Text resource discovery,
Distributed searching | |||
| Using Decision Theory to Order Documents | | BIBA | PDF | 285-286 | |
| Eric J. Glover; William P. Birmingham | |||
| As the content in digital libraries grow, it is important to organize query
results so that more "valuable" results are ranked higher. We postulate that
this kind of ranking will make it easier for users to find documents that meet
their information need. We describe a model of a user's information need that
incorporates the idea of "value." This model is based on decision theory, and
is realized in a digital library. Preliminary results show that ranking
documents based on value is beneficial.
Document value stems from the information need of a user of an information retrieval (IR) system [2]. This typically includes topical relevance as well as publication date, grade level, size of document, and so forth. For the most part, IR systems focus on finding documents that are topically relevant, and often ignore other factors, or consider them as constraints. We believe that a more effective model of information need is value, where the value of a document is rationally determined by considering a variety of factors. In particular, the value perspective allows us to do away with artificial constraints on the search process, replacing them with preferences. In this paper, we focus on using value to order documents that are returned by a search engine. The advantage to this "post ordering" arrangement is that it is fully compatible with existing search engines. | |||
| Topic Labeling of Broadcast News Stories in the Informedia Digital Video Library | | BIBAK | PDF | 287-288 | |
| Alexander G. Hauptmann; Danny Lee | |||
| This paper describes the implementation of a topic labeling component for
the Informedia Digital Video Library. Each news story recorded from the
evening news is assigned to one of 3178 topic categories using a K-nearest
neighbor classification algorithm. In preliminary tests, the system achieved
recall of 0.491 with relevance of 0.482 when up to 5 topics could be assigned
to a news story. Keywords: Topic detection and labeling, Topic spotting and classification, Video
library, Digital libraries, Broadcast news story indexing | |||
| Failure Analysis in Query Construction: Data and Analysis from a Large Sample of Web Queries | | BIBAK | PDF | 289-290 | |
| Bernard J. Jansen; Amanda Spink; Tefko Saracevic | |||
| This paper reports results from a failure analysis (i.e., incorrect query
construction) of 51,473 queries from 18,113 users of Excite, a major Web search
engine. Given that many digital libraries are accessed via the Web, this
analysis points to the need for redesign of the traditional search engine
interfaces. Keywords: Web queries, Query construction | |||
| Dynamic Query Result Previews for a Digital Library | | BIBAK | PDF | 291-292 | |
| Steve Jones | |||
| Previous models of dynamic querying supported by query previews have
focussed on attribute based querying, have required information providers to
create preview tables, and have provided little information to support initial
query refinement. We present an alternative model that has been implemented
for the New Zealand Digital Library, and describe the system architecture and
user interface. Keywords: Dynamic queries, Query previews | |||
| Usage Analysis of a Digital Library | | BIBAK | PDF | 293-294 | |
| Steve Jones; Sally Jo Cunningham; Rodger McNab | |||
| We analyse transaction logs for a large full-text document collection for
Computer Science researchers. We report insights gained from this analysis and
identify resulting search interface design issues. Keywords: Transaction log analysis, Search interface, Usage analysis | |||
| Preserving Electronic Documents | | BIBAK | PDF | 295-296 | |
| Douglas A. Kranch | |||
| Electronic documents have many advantages, but they also have the serious
disadvantages. One of them is difficulty in preservation. A method should be
developed for preserving electronic documents in their original form that is
independent of the hardware or software standards used. Keywords: Electronic documents, Preservation | |||
| Querying Structured Web Resources | | BIBA | PDF | 297-298 | |
| Ee-Peng Lim; Cheng-Hai Tan; Boon-Wan Lim; Wee-Keong Ng | |||
| In our ongoing WebIR (Web Information Retrieval) research, we are looking
into how web search engines can be extended to exploit the structuredness of
web collections for retrieval type queries. Our overall goal is to create a
new breed of web search engines that handle retrieval queries involving both
intra- and inter-document structures. To achieve this goal, the following
issues have to be addressed:
* How do we obtain the structural information about a web collection?
* What is the appropriate query model? In other words, how should the new
retrieval queries look like? How should the query results be represented? * What are the appropriate indexing and query evaluation strategies? * What should be the ranking formula for the query results? * What is the appropriate framework to measure the performance of the new search engines? In the remaining sections of our paper, we will present our approaches to address the first two issues in the WebIR research project. | |||
| Searching for Content-Based Addresses on the World-Wide Web | | BIBAK | PDF | 299-300 | |
| Joel D. Martin; Robert Holte | |||
| This paper presents a method for constructing queries that are sufficient to
retrieve a target web page. These queries can be thought of as content-based
addresses for the target page and can have many potential uses. Keywords: Distributed digital libraries, Query, Web, Search engines, Content-based
addresses, Dead links, QuerySearch | |||
| An Image-Capable Audio Internet Browser for Facilitating Blind User Access to Digital Libraries | | BIBAK | PDF | 301-302 | |
| Thierry Pun; Patrick Roth; Lori Petrucci; Andre Assimacopoulos | |||
| The Internet now permits widespread access to textual and pictorial material
from digital libraries. The widespread use of graphical user interfaces,
however, increasingly bars visually handicapped people from using such
material. We present here our current work aimed at the adaptation of an
Internet browser to facilitate blind user access to digital libraries. The
main distinguishing characteristics of this browser are: (1) active user
interaction, both for the macro-analysis and micro-analysis of screen objects
of interest; (2) use of a touch-sensitive screen to facilitate user
interaction; (3) generation of a virtual sound space into which the screen
information is mapped; (4) transcription into sounds not only of text, but also
of images. Several prototypes have been implemented, and are being evaluated
by blind users. Keywords: Internet, WWW, Digital libraries, Blind user access, Sound space, Image
analysis, Rehabilitation | |||
| Information Forage Through Adaptive Visualization | | BIBAK | PDF | 303-304 | |
| Dmitri Roussinov; Marshall Ramsey | |||
| Automatically created maps of concepts improve navigation in a collection of
text documents. We report our research on leveraging navigation by providing
interactively the ability to modify the maps themselves. We believe that this
functionality leads to better responsiveness to the user and a more effective
search. For this purpose we have created and tested a prototype system that
builds and refines in real-time a map of concepts found in Web documents
returned by a commercial search engine. Keywords: Intelligent searching, Interactive data exploration, Information
representation, WWW, Search engines, Information retrieval | |||
| A Graphical Interface for Speech-Based Retrieval | | BIBAK | PDF | 305-306 | |
| Laura Slaughter; Douglas W. Oard; Vernon L. Warnick; Julie L. Harding; Galen J. Wilkerson | |||
| This paper describes preliminary usability testing for a graphical interface
designed to facilitate rapid browsing of recorded speech. Expert interviews
and focus group discussions were used to assess the alignment between browsing
behaviors employed by members of the intended user population and an early
mockup of the interface. The results provide guidelines for the next iteration
of prototype development and suggest that graphical representations offer a
viable method for browsing audio and multimedia recordings. Keywords: Speech-based retrieval, GUI, Digital library | |||
| An Interactive WWW Search Engine for User-Defined Collections | | BIBAK | PDF | 307-308 | |
| Robert G., Jr. Sumner; Kiduk Yang; Bert J. Dempsey | |||
| Given the dynamic nature and the quantity of information on the WWW, many
individual users and organizations compile and use focused WWW resource lists
related to a particular topic or subject domain. The IRISWeb system extends
this concept such that any user-defined set of WWW pages (a virtual collection)
can be retrieved, indexed, and searched using a powerful full-text search
engine with a relevance-feedback interface. This capability adds full-text
searching to highly customized subsets of the WWW. Here we describe the
IRISWeb software and an experiment that highlights its potential. Keywords: Search engines, WWW, Virtual collection, LIBClient, Subject gateway, IRISWeb | |||
| Site Outlining | | BIBAK | PDF | 309-310 | |
| Koichi Takeda; Hiroshi Nomiyama | |||
| In this paper, we propose a "site outlining" technique for building highly
integrated digital libraries comprising dynamic information sources such as Web
sites on the Internet. The notion of a site is defined as a structured entity
with annotated links. Keywords: Site outlining, Views, Visualization | |||
| Digital Library for Education and Medical Decision Making | | BIBAK | PDF | 311-312 | |
| Mark C. Tsai; Kenneth L. Melmon | |||
| Stanford Health Information Network for Education (SHINE) integrates online
guideline texts, textbooks, journals, bibliographic systems, medical images,
digital video, relational databases, and knowledge-based systems formerly
accessible individually through the Z39.50 protocol, SQL language, HTTP
protocol, and full-text search engines. We discuss the architecture that is
used to integrate these distributed heterogeneous systems. We explain other
components: electronic notebook, and log recording systems that make SHINE a
complete system to support medical decision making and learning. We also
discuss how the system will be integrated with electronic medical record
systems to support medical decisions. The same concepts can be applied to
aggregation of knowledge domains to optimize the functions of other
(non-medical) target users. Keywords: Integration system, Intelligent integration, Distributed database systems,
Digital library, Electronic notebook, Information retrieval | |||
| Internet Access to Scanned Paper Documents | | BIBAK | PDF | 313-314 | |
| Marcel Worring; Arnold W. M. Smeulders | |||
| In this contribution we identify the different structures to encounter in a
hyperdocument. Methods are described for deriving those structures from
scanned paper originals. The content and structure of the document is then
made available in a form suited for an Internet browser. It provides
convenient access to the scanned paper document. Keywords: Document access, Document understanding, Hypertext structure, Hypertext
understanding | |||