| Integrating Automatic Genre Analysis into Digital Libraries | | BIBAK | PDF | 1-10 | |
| Andreas Rauber; Alexander Muller-Kogler | |||
| With the number and types of documents in digital library systems
increasing, tools for automatically organizing and presenting the content have
to be found. While many approaches focus on topic-based organization and
structuring, hardly any system incorporates automatic structural analysis and
representation. Yet, genre information (unconsciously) forms one of the most
distinguishing features in conventional libraries and in information searches.
In this paper we present an approach to automatically analyze the structure of
documents and to integrate this information into an automatically created
content-based organization. In the resulting visualization, documents on
similar topics, yet representing different genres, are depicted as books in
differing colors. This representation supports users intuitively in locating
relevant information presented in a relevant form. Keywords: SOMLib, document clustering, genre analysis, metaphor graphics,
self-organizing map (SOM), visualization | |||
| Text Categorization for Multi-Page Documents: A Hybrid Naive Bayes HMM Approach | | BIBAK | PDF | 11-20 | |
| Paolo Frasconi; Giovanni Soda; Alessandro Vullo | |||
| Text categorization is typically formulated as a concept learning problem
where each instance is a single isolated document. In this paper we are
interested in a more general formulation where documents are organized as page
sequences, as naturally occurring in digital libraries of scanned books and
magazines. We describe a method for classifying pages of sequential OCR text
documents into one of several assigned categories and suggest that taking into
account contextual information provided by the whole page sequence can
significantly improve classification accuracy. The proposed architecture relies
on hidden Markov models whose emissions are bag-of-words according to a
multinomial word event model, as in the generative portion of the Naive Bayes
classifier. Our results on a collection of scanned journals from the Making of
America project confirm the importance of using whole page sequences. Empirical
evaluation indicates that the error rate (as obtained by running a plain Naive
Bayes classifier on isolated page) can be roughly reduced by half if contextual
information is incorporated. Keywords: Computing Methodologies -Artificial Intelligence - Learning (I.2.6);
Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Computing Methodologies -Document and Text Processing - Miscellaneous
(I.7.m); Algorithms, Performance; hidden Markov models, multi-page documents,
naive Bayes classifier, text categorization | |||
| Automated Name Authority Control | | BIBAK | PDF | 21-22 | |
| James W. Warner; Elizabeth W. Brown | |||
| This paper describes a system for the automated assignment of authorized
names. A collaboration between a computer scientist and a librarian, the system
provides for enhanced end-user searching of digital libraries without
increasing drastically the cost and effort of creating a digital library. It is
a part of the workflow management system of the Levy Sheet Music Project. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); automation, indexing, metadata, name authority control, workflow
management | |||
| Automatic Event Generation from Multi-Lingual News Stories | | BIBAK | PDF | 23-24 | |
| Kin Hui; Wai Lam; Helen M. Meng | |||
| We propose a novel approach for automatic generation of topically-related
events from multi-lingual news sources. Named entity terms are extracted
automatically from the news content. Together with the content terms, they
constitute the basis of representing the story. We employ transformation-based
linguistic tagging approach for named entity extraction. Two methods of gross
translation on Chinese story representation into English have been implemented.
The first approach uses only a bilingual dictionary. The second method makes
use of a parallel corpus as an additional resource. Unsupervised learning is
employed to discover the events. Keywords: event detection, event discovery, multilingual text processing | |||
| Linked Active Content: A Service for Digital Libraries for Education | | BIBAK | PDF | 25-32 | |
| David Yaron; D. Jeff Milton; Rebecca Freeland | |||
| A service is described to help enable digital libraries for education, such
as the NSDL, to serve as collaboration spaces for the creation, modification
and use of active learning experiences. The goal is to redefine the line
between those activities that fall within the domain of computer programming
and those that fall within the domain of content authoring. The current
location of this line, as defined by web technologies, is such that far too
much of the design and development process is in the domain of software
creation. This paper explores the definition and use of "linked active
content", which builds on the hypertext paradigm by extending it to support
active content. This concept has community development advantages, since it
provides an authoring paradigm that supports contributions from a more diverse
audience, including especially those who have substantial classroom and
pedagogical expertise but lack programming expertise. It also promotes the
extraction of content from software so that collections may be better organized
and more easily repurposed to meet the needs of a diverse audience of educators
and students. Keywords: Computing Milieux -Computers and Education - Computer and Information
Science Education (K.3.2); Experimentation, Human Factors; active learning,
education, web authoring | |||
| A Component Repository for Learning Objects: A Progress Report | | BIBAK | PDF | 33-40 | |
| Jean R. Laleuf; Anne Morgan Spalter | |||
| We believe that an important category of SMET digital library content will
be highly interactive, explorable microworlds for teaching science,
mathematics, and engineering concepts. Such environments have proved
extraordinarily time-consuming and difficult to produce, however, threatening
the goals of widespread creation and use.
One proposed solution for accelerating production has been the creation of repositories of reusable software components or learning objects. Programmers would use such components to rapidly assemble larger-scale environments. Although many agree on the value of this approach, few repositories of such components have been successfully created. We suggest some reasons for the lack of expected results and propose two strategies for developing such repositories. We report on a case study that provides a proof of concept of these strategies. Keywords: NSDL, components, design, digital library, education, learning objects,
reuse, software engineering, standards | |||
| Designing E-Books for Legal Research | | BIBAK | PDF | 41-48 | |
| Catherine C. Marshall; Morgan N. Price; Gene Golovchinsky; Bill N. Schilit | |||
| In this paper we report the findings from a field study of legal research in
a first-tier law school and on the resulting redesign of XLibris, a
next-generation e-book. We first characterize a work setting in which we
expected an e-book to be a useful interface for reading and otherwise using a
mix of physical and digital library materials, and explore what kinds of
reading-related functionality would bring value to this setting. We do this by
describing important aspects of legal research in a heterogeneous information
environment, including mobility, reading, annotation, link following and
writing practices, and their general implications for design. We then discuss
how our work with a user community and an evolving e-book prototype allowed us
to examine tandem issues of usability and utility, and to redesign an existing
e-book user interface to suit the needs of law students. The study caused us to
move away from the notion of a stand-alone reading device and toward the
concept of a document laptop, a platform that would provide wireless access to
information resources, as well as support a fuller spectrum of reading-related
activities. Keywords: digital libraries, e-books, field study, information appliances, legal
education, legal research, physical and digital information resources | |||
| The Open Archives Initiative: Perspectives on Metadata Harvesting | | BIBA | PDF | 49 | |
| James B. Lloyd; Tim Cole; Donald Waters; Caroline Arms; Simeon Warner; Jeffrey Young | |||
| The Open Archives Initiative [www.openarchives.org] has developed a metadata harvesting protocol to further its aim of efficient dissemination of content through interoperability standards. In early 2001, at meetings in the U.S. and Europe, the version of the protocol to be used for beta testing was announced. The HTTP-based protocol uses URLs for queries and XML for responses. The default metadata record structure is unqualified Dublin Core using a specified XML Schema. This simple metadata record form is intended to support cross-domain discovery; other record structures for which XML Schemas are defined can also be made available. Developments during the beta test should include the creation of OAI-compliant repositories (data providers) and harvesters (service providers). This panel will explore the purpose and evolution of the Open Archives Initiative from the point of view of various stakeholders, with emphasis on developments during 2001. | |||
| Mapping the Interoperability Landscape for Networked Information Retrieval | | BIBAK | PDF | 50-51 | |
| William E. Moen | |||
| Interoperability is a fundamental challenge for networked information
discovery and retrieval. Often treated monolithically in the literature,
interoperability is multifaceted and can be analyzed into different types and
levels. This paper discusses an approach to map the interoperability landscape
for networked information retrieval as part of an interoperability assessment
research project. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Information Systems -Information Storage and Retrieval - Digital
Libraries (H.3.7): Systems issues; Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): User issues; Standardization;
interoperability, networked information discovery and retrieval, testbeds | |||
| Distributed Resource Discovery: Using Z39.50 to Build Cross-Domain Information Servers | | BIBAK | PDF | 52-53 | |
| Ray R. Larson | |||
| This short paper describes the construction and application of Cross-Domain
Information Servers using features of the standard Z39.50 information retrieval
protocol[11]. We use the Z39.50 Explain Database to determine the databases and
indexes of a given server, then use the SCAN facility to extract the contents
of the indexes. This information is used to build "collection documents" that
can be retrieved using probabilistic retrieval algorithms. Keywords: cross-domain resource discovery, distributed information retrieval,
distributed search | |||
| The Open Archives Initiative: Building a Low-Barrier Interoperability Framework | | BIBAK | PDF | 54-62 | |
| Carl Lagoze; Herbert Van de Sompel | |||
| The Open Archives Initiative (OAI) develops and promotes interoperability
solutions that aim to facilitate the efficient dissemination of content. The
roots of the OAI lie in the E-Print community. Over the last year its focus has
been extended to include all content providers. This paper describes the recent
history of the OAI - its origins in promoting E-Prints, the broadening of its
focus, the details of its technical standard for metadata harvesting, the
applications of this standard, and future plans. Keywords: Software -Software Engineering - Interoperability (D.2.12); Experimentation,
Standardization; digital libraries, interoperability, metadata, protocols | |||
| Enforcing Interoperability with the Open Archives Initiative Repository Explorer | | BIBAK | PDF | 63-64 | |
| Hussein Suleman | |||
| The Open Archives Initiative (OAI) is an organization dedicated to solving
problems of digital library interoperability by defining simple protocols, most
recently for the exchange of metadata. The success of such an activity requires
vigilance in specification of the protocol as well as standardization of
implementation. The lack of standardized implementation is a substantial
barrier to interoperability in many existing client/server protocols. To avoid
this pitfall we developed the Repository Explorer, a tool that supports manual
and automated protocol testing. This tool has a significant impact on
simplifying development of interoperability interfaces and increasing the level
of confidence of early adopters of the technology, thus exemplifying the
positive impact of exhaustive testing and quality assurance on interoperability
ventures. Keywords: Software -Software Engineering - Interoperability (D.2.12); Computer Systems
Organization -Computer-Communication Networks - Network Protocols (C.2.2);
Experimentation, Reliability, Standardization, Verification; interoperability,
protocol, testing, validation | |||
| Arc: An OAI Service Provider for Cross-Archive Searching | | BIBAK | PDF | 65-66 | |
| Xiaoming Liu; Kurt Maly; Mohammad Zubair; Michael L. Nelson | |||
| The usefulness of the many on-line journals and scientific digital libraries
that exist today is limited by the lack of a service that can federate them
through a unified interface. The Open Archive Initiative (OAI) is one major
effort to address technical interoperability among distributed archives. The
objective of OAI is to develop a framework to facilitate the discovery of
content in distributed archives. In this paper, we describe our experience and
lessons learned in building Arc, the first federated searching service based on
the OAI protocol. Arc harvests metadata from several OAI compliant archives,
normalizes them, and stores them in a search service based on a relational
database (MySQL or Oracle). At present we have over 165K metadata records from
16 data providers from various domains. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): Collection; Information Systems -Information Storage and Retrieval -
Digital Libraries (H.3.7): Dissemination; Information Systems -Information
Storage and Retrieval - Digital Libraries (H.3.7): Standards; Design,
Experimentation, Languages, Standardization; digital library, open archive
initiative | |||
| Managing Change on the Web | | BIBAK | PDF | 67-76 | |
| Luis Francisco-Revilla; Frank Shipman; Richard Furuta; Unmil Karadkar; Avital Arora | |||
| Increasingly, digital libraries are being defined that collect pointers to
World-Wide Web based resources rather than hold the resources themselves.
Maintaining these collections is challenging due to distributed document
ownership and high fluidity. Typically a collections maintainer has to assess
the relevance of changes with little system aid. In this paper, we describe the
Waldens Paths Path Manager, which assists a maintainer in discovering when
relevant changes occur to linked resources. The approach and system design was
informed by a study of how humans perceive changes of Web pages. The study
indicated that structural changes are key in determining the overall change and
that presentation changes are considered irrelevant. Keywords: Computing Methodologies -Computer Graphics - Three-Dimensional Graphics and
Realism (I.3.7); Information Systems -Information Interfaces and Presentation -
Hypertext/Hypermedia (H.5.4); Algorithms, Design, Experimentation, Management,
Reliability, Verification; Walden's path, path maintenance | |||
| Measuring the Reputation of Web Sites: A Preliminary Exploration | | BIBAK | PDF | 77-78 | |
| Greg Keast; Elaine G. Toms; Joan Cherry | |||
| We describe the preliminary results from a pilot study, which assessed the
perceived reputation - authority and trustworthiness - of the output from five
WWW indexing/ranking tools. The tools are based on three techniques: external
link structures, internal content, or human selection/indexing. Twenty-two
participants reviewed the output from each tool and assessed the reputation of
the retrieved sites. Keywords: Information Systems -Information Storage and Retrieval - Information Search
and Retrieval (H.3.3); Experimentation, Measurement, Performance, Reliability;
Lycos, TOPIC, Yahoo, alta vista, authority, evaluation, google, reputation, web
sites | |||
| Personalized Spiders for Web Search and Analysis | | BIBAK | PDF | 79-87 | |
| Michael Chau; Daniel Zeng; Hsinchun Chen | |||
| Searching for useful information on the World Wide Web has become
increasingly difficult. While Internet search engines have been helping people
to search on the web, low recall rate and outdated indexes have become more and
more problematic as the web grows. In addition, search tools usually present to
the user only a list of search results, failing to provide further personalized
analysis which could help users identify useful information and comprehend
these results. To alleviate these problems, we propose a client-based
architecture that incorporates noun phrasing and self-organizing map
techniques. Two systems, namely CI Spider and Meta Spider, have been built
based on this architecture. User evaluation studies have been conducted and the
findings suggest that the proposed architecture can effectively facilitate web
search and analysis. Keywords: Information Systems -Information Storage and Retrieval - Information Search
and Retrieval (H.3.3); Design, Experimentation; information retrieval, internet
searching and browsing, internet spider, noun-phrasing, personalization,
self-organizing map | |||
| Salticus: Guided Crawling for Personal Digital Libraries | | BIBAK | PDF | 88-89 | |
| Robin Burke | |||
| In this paper, we describe Salticus, a web crawler that learns from users
web browsing activity. Salticus enables users to build a personal digital
library by collecting documents and generalizing over the user's choices. Keywords: business intelligence, crawling, document acquisition, personal digital
library | |||
| Different Cultures Meet: Lessons Learned in Global Digital Library Development | | BIBA | PDF | 90-93 | |
| Ching Chen; Wen Gao; Hsueh-hua Chen; Li-Zhu Zhou; Von-Wun Soo | |||
| This panel is organized to share the experience gained and lessons learned in developing cutting-edge technology applications and digital libraries when different cultures meet together. "Culture" is interpreted in different ways and different context. This include the interdisciplinary collaboration among professionals from different fields with their own cultures -- such as library/information science, computer science, humanities, social sciences, science and technology, etc; to more globally as experienced in major international collaborative projects involving R&D professionals from two or more different cultures -- the East and the West, or the North and the South. | |||
| Power to the People: End-User Building of Digital Library Collections | | BIBA | PDF | 94-103 | |
| Ian H. Witten; David Bainbridge; Stefan J. Boddie | |||
| Naturally, digital library systems focus principally on the reader: the consumer of the material that constitutes the library. In contrast, this paper describes an interface that makes it easy for people to build their own library collections. Collections may be built and served locally from the user's own web server, or (given appropriate permissions) remotely on a shared digital library host. End users can easily build new collections styled after existing ones from material on the Web or from their local files-or both, and collections can be updated and new ones brought on-line at any time. The interface, which is intended for non-professional end users, is modeled after widely used commercial software installation packages. Lest one quail at the prospect of end users building their own collections on a shared system, we also describe an interface for the administrative user who is responsible for maintaining a digital library installation. | |||
| Web-Based Scholarship: Annotating the Digital Library | | BIBAK | PDF | 104-105 | |
| Bruce Rosenstock; Michael Gertz | |||
| The DL offers the possibility of collaborative scholarship, but the
appropriate tools must be integrated within the DL to serve this purpose. We
propose a Web-based tool to guide controlled data annotations that link items
in the DL to a domain-specific ontology and which provide an effective means to
query a data collection in an abstract and uniform fashion. Keywords: Information Systems -Information Interfaces and Presentation - Group and
Organization Interfaces (H.5.3); data annotations, folk literature DL | |||
| A Multi-View Intelligent Editor for Digital Video Libraries | | BIBAK | PDF | 106-115 | |
| Brad A. Myers; Juan P. Casares; Scott Stevens; Laura Dabbish; Dan Yocum; Albert Corbett | |||
| Silver is an authoring tool that aims to allow novice users to edit digital
video. The goal is to make editing of digital video as easy as text editing.
Silver provides multiple coordinated views, including project, source, outline,
subject, storyboard, textual transcript and timeline views. Selections and
edits in any view are synchronized with all other views. A variety of
recognition algorithms are applied to the video and audio content and then are
used to aid in the editing tasks. The Informedia Digital Library supplies the
recognition algorithms and metadata used to support intelligent editing, and
Informedia also provides search and a repository. The metadata includes shot
boundaries and a time-synchronized transcript, which are used to support
intelligent selection and intelligent cut/copy/paste. Keywords: digital video editing, informedia, multimedia authoring, silver, video
library | |||
| VideoGraph: A New Tool for Video Mining and Classification | | BIBA | PDF | 116-117 | |
| Jia-Yu Pan; Christos Faloutsos | |||
| This paper introduces VideoGraph, a new tool for video mining and visualizing the structure of the plot of a video sequence. The main idea is to "stitch" together similar scenes which are apart in time. We give a fast algorithm to do stitching and we show case studies, where our approach (a) gives good features for classification (91% accuracy), and (b) results in VideoGraphs which reveal the logical structure of the plot of the video clips. | |||
| The Alexandria Digital Earth Prototype | | BIBA | PDF | 118-119 | |
| Terence R. Smith; Greg Janee; James Frew; Anita Coleman | |||
| This note summarizes the system development activities of the Alexandria Digital Earth Prototype (ADEPT) Project.5 ADEPT and the Alexandria Digital Library (ADL) are, respectively, the research and operational components of the Alexandria Digital Library Project. The goal of ADEPT is to build a distributed digital library (DL) of personalized collections of geospatially referenced information. This DL is characterized by: (1) services for building, searching, and using personalized collections; (2) collections of georeferenced multimedia information, including dynamic simulation models of spatially distributed processes; and (3) user interfaces employing the concept of a "Digital Earth". Important near-term objectives for ADEPT are to build prototype collections that support undergraduate learning in physical, human, and cultural geography and related disciplines, and then to evaluate whether using such resources helps students learn to reason scientifically. Collections and services developed by ADEPT researchers will migrate to ADL as they mature. | |||
| Iscapes: Digital Libraries Environments for the Promotion of Scientific Thinking by Undergraduates in Geography | | BIBAK | PDF | 120-121 | |
| Anne J. Gilliland-Swetland; Gregory L. Leazer | |||
| This paper reviews considerations associated with implementing the
Alexandria Digital Earth Prototype (ADEPT) in undergraduate geography education
by means of Iscapes (or Information landscapes). In particular, we are
interested in how Iscapes might be used to promote scientific thinking by
undergraduate students. Based upon an ongoing educational needs assessment, we
present a set of conceptual principles that might selectively be implemented in
the design of educational digital library environments. Keywords: digital libraries, geography, scientific thinking, undergraduate education | |||
| Project ANGEL: An Open Virtual Learning Environment with Sophisticated Access Management | | BIBAK | PDF | 122-123 | |
| John MacColl | |||
| This paper describes a new project funded in the UK by the Joint Information
Systems Committee, to develop a virtual learning environment which combines a
new awareness of internet sources such as bibliographic databases and full-text
electronic journals with a sophisticated access management component which
permits single sign-on authentication. Keywords: Design, Standardization; access management, authentication, virtual learning
environments | |||
| NBDL: A CIS Framework for NSDL | | BIBAK | PDF | 124-125 | |
| Joe Futrelle; Su-Shing Chen; Kevin C. Chang | |||
| In this paper, we describe the NBDL (National Biology Digital Library)
project, one of the six CIS (Core Integration System) projects of the NSF NSDL
(National SMETE Digital Library) Program. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Information Systems -Information Storage and Retrieval - Digital
Libraries (H.3.7): User issues; Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): Dissemination; Algorithms, Design,
Standardization; SMET education, digital library, federated search | |||
| Automatic Identification and Organization of Index Terms for Interactive Browsing | | BIBAK | PDF | 126-134 | |
| Nina Wacholder; Dvid K. Evans; Judith L. Klavans | |||
| The potential of automatically generated indexes for information access has
been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys
1961 [4]), but the quantity of text and the ambiguity of natural language
processing have made progress at this task more difficult than was originally
foreseen. Recently, a body of work on development of interactive systems to
support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997
[1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart
1998 [9]). In this paper, we consider two issues related to the use of
automatically identified phrases as index terms in a dynamic text browser
(DTB), a user-centered system for navigating and browsing index terms: 1) What
criteria are useful for assessing the usefulness of automatically identified
index terms? and 2) Is the quality of the terms identified by automatic
indexing such that they provide useful access to document content?
The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications. Keywords: browsing, genre, indexing, natural language processing, phrases | |||
| Digital Library Collaborations in a World Community | | BIBA | PDF | 135 | |
| David Fulker; Sharon Dawes; Leonid Kalinichenko; Tamara Sumner; Constantino Thanos; Alex Ushakov | |||
| Digital libraries and their user communities are increasingly international in nature. However - though technological progress and global education have brought American and European communities closer - cross-cultural and other crosscutting issues impede the formation of world community on larger scales. The pertinent issues include: collaboration in the presence of language and cultural barriers, international copyrights, international revenue streams, and universal access. This panel will examine notions of "community" from a variety of theoretical and practical perspectives, and discuss lessons that can be gleaned from applications of the community concept. Topics are expected to include scalability, sustainability, regenerative cycles in healthy communities, and examples of digital-library efforts that have international potential or implications. | |||
| Public Use of Digital Community Information Systems: Findings from A Recent Study with Implications for System Design | | BIBAK | PDF | 136-143 | |
| Karen E. Pettigrew; Joan C. Durrance | |||
| The Internet has considerably empowered libraries and changed common
perception of what they entail. Public libraries, in particular, are using
technological advancements to expand their range of services and enhance their
civic roles. Providing community information (CI) in innovative, digital forms
via community networks is one way in which public libraries are facilitating
everyday information needs. These networks have been lauded for their potential
to strengthen physical communities through increasing information flow about
local services and events, and through facilitating civic interaction. However,
little is known about how the public uses such digital services and what
barriers they encounter. This paper presents findings about how digital CI
systems benefit physical communities based on extensive case studies in three
states. At each site, rich data were collected using online surveys, field
observation, in-depth interviews and focus groups with Internet users, human
service providers and library staff. Both the online survey and the follow-up
interviews with respondents were based on sense-making theory. In our paper we
discuss our findings regarding: (1) how the public is using digital CI systems
for daily problem solving, and (2) the types of barriers they encounter.
Suggestions for improving digital CI systems are provided. Keywords: Human Factors, Measurement, Performance, Theory; barriers, community
information, community networks, information behavior, qualitative methods,
sensemaking | |||
| Evaluating the Distributed National Electronic Resource | | BIBAK | PDF | 144-145 | |
| Peter Brophy; Shelagh Fisher | |||
| The UKs development of a Distributed National Electronic Resource (DNE R) is
being subjected to intensive formative evaluation by a multi-disciplinary team.
In this paper the Project Director reports on initial actions designed to
characterise the DNER from multi-stakeholder perspectives. Keywords: Computer Systems Organization -Computer-Communication Networks - Network
Architecture and Design (C.2.1); Computer Systems Organization
-Computer-Communication Networks - Distributed Systems (C.2.4); Design,
Economics, Human Factors, Measurement, Management, Performance, Reliability,
Verification; distributed collections, evaluation, information environments | |||
| Collaborative Design with Use Case Scenarios | | BIBAK | PDF | 146-147 | |
| Lynne Davis; Melissa Dawe | |||
| Digital libraries, particularly those with a community-based governance
structure, are best designed in a collaborative setting. In this paper, we
compare our experience using two design methods: a Task-centered method that
draws upon a group's strength for eliciting and formulating tasks, and a Use
Case method that tends to require a focus on defining an explicit process for
tasks. We discuss how these methods did and did not work well in a
collaborative setting. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Design, Experimentation, Human Factors; collaboration, design,
methodology, task-centered, use case | |||
| Human Evaluation of Kea, An Automatic Keyphrasing System | | BIBAK | PDF | 148-156 | |
| Steve Jones; Gordon W. Paynter | |||
| This paper describes an evaluation of the Kea automatic keyphrase extraction
algorithm. Tools that automatically identify keyphrases are desirable because
document keyphrases have numerous applications in digital library systems, but
are costly and time consuming to manually assign. Keyphrase extraction
algorithms are usually evaluated by comparison to author-specified keywords,
but this methodology has several well-known shortcomings. The results presented
in this paper are based on subjective evaluations of the quality and
appropriateness of keyphrases by human assessors, and make a number of
contributions. First, they validate previous evaluations of Kea that rely on
author keywords. Second, they show Kea's performance is comparable to that of
similar systems that have been evaluated by human assessors. Finally, they
justify the use of author keyphrases as a performance metric by showing that
authors generally choose good keywords. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Computing Methodologies -Artificial Intelligence - Natural Language
Processing (I.2.7); Algorithms, Experimentation, Performance; author
keyphrases, digital libraries, keyphrase extraction, subjective evaluation,
user interface | |||
| Community Design of DLESE's Collections Review Policy: A Technological Frames Analysis | | BIBAK | PDF | 157-164 | |
| Michael Khoo | |||
| In this paper, I describe the design of a collection review policy for the
Digital Library for Earth System Education (DLESE). A distinctive feature of
DLESE as a digital library is the DLESE community, composed of voluntary
members who contribute metadata and resource reviews to DLESE. As the DLESE
community is open, the question of how to evaluate community contributions is a
crucial part of the review policy design process. In this paper, technological
frames theory is used to analyse this design process by looking at how the
designers work with two differing definitions of the peer reviewer, (a) peer
reviewer as arbiter or editor, and (b) peer reviewer as colleague. Content
analysis of DLESE documents shows that these frames can in turn be related to
two definitions that DLESE offers of itself: DLESE as a library, and DLESE as a
digital artifact. The implications of the presence of divergent technological
frames for the design process are summarised, and some suggestions for future
research are outlined. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Design, Human Factors; content analysis, decision making, design,
digital library, ethnography, peer review, technological frames | |||
| Legal Deposit of Digital Publications: A Review of Research and Development Activity | | BIBAK | PDF | 165-173 | |
| Adrienne Muir | |||
| There is a global trend towards extending legal deposit to include digital
publications in order to maintain comprehensive national archives. However,
including digital publications in legal deposit regulation is not enough to
ensure the long-term preservation of these publications. Concepts, principles
and practices accepted and understood in the print environment, may have new
meanings or no longer be appropriate in a networked environment. Mechanisms for
identifying, selecting and depositing digital material either do not exist, or
are inappropriate, for some kinds of digital publication. Work on developing
digital preservation strategies is at an early stage. National and other
deposit libraries are at the forefront of research and develop in this area,
often working in partnership with other libraries, publishers and technology
vendors. Most work is of a technical nature. There is some work on developing
policies and strategies for managing digital resources. However, not all
management issues or users needs are being addressed. Keywords: Computing Milieux -Legal Aspects of Computing - General (K.5.0); Legal
Aspects, Management; digital preservation, digital publications, legal deposit | |||
| Comprehensive Access to Printed Materials (CAPM) | | BIBAK | PDF | 174-175 | |
| G. Sayeed Choudhury; Mark Lorie; Erin Fitzpatrick; Ben Hobbs; Greg Chirikjian; Allison Okamura; Nicholas E. Flores | |||
| The CAPM Project features the development and evaluation of an automated,
robotic on-demand scanning system for materials at remote locations. To date,
we have developed a book retrieval robot and a valuation analysis framework for
evaluating CAPM. We intend to augment CAPM by exploring approaches for
automated page turning and improved valuation. These extensions will results in
a more fully automated CAPM system and a valuation framework that will not only
be useful for assessing CAPM specifically, but also for library services and
functions generally. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Design, Economics, Experimentation, Measurement; browsing, digital
conversion, digital preservation, evaluation methods, information economics,
paper manipulation, robotics | |||
| Technology and Values: Lessons from Central and Eastern Europe | | BIBAK | PDF | 176-177 | |
| Nadia Caidi | |||
| Technology does not develop independently of its social context. Rather,
there is a range of social, cultural and economic factors (in addition to
technical factors) that define the parameters for the development and use of
technologies. This paper presents a case study of the social shaping of one
aspect of digital libraries, the development of national union catalogs (NUC),
in four countries of Central and Eastern Europe (CEE). It examines the specific
choices and values that are embedded in the design of a NUC, and how these
might be transferred to other cultural contexts. Keywords: central and eastern europe, information infrastructure, national union
catalogs, social shaping of technology | |||
| A Digital Strategy for the Library of Congress: Discussion of the LC21 Report and the Role of the Digital Library Community | | BIBA | PDF | 178 | |
| Alan Inouye; Margaret Hedstrom; Dale Flecker; David Levy | |||
| Digital libraries challenge the core practices of libraries and archives in many respects, not only in terms of accommodating digital information and technology, but also through the need to develop new economic and organizational models. As the world's largest library, the Library of Congress (LC) perhaps faces the most profound questions of how to collect, catalog, preserve, and provide access to digital resources. LC asked the Computer Science and Telecommunications Board of the National Academies for advice in this area by commissioning the study that culminated with the publication of LC21: A Digital Strategy for the Library of Congress. The panelists at this session will provide a brief summary of the LC21 report, review developments subsequent to the publication of LC21, and offer their thoughts on how the library community and information industry could engage LC to the benefit of the nation. | |||
| Use of Multiple Digital Libraries: A Case Study | | BIBAK | PDF | 179-188 | |
| Ann Blandford; Hanna Stelmaszewska; Nick Bryan-Kinns | |||
| The aim of the work reported here was to better understand the usability
issues raised when digital libraries are used in a natural setting. The method
used was a protocol analysis of users working on a task of their own choosing
to retrieve documents from publicly available digital libraries. Various
classes of usability difficulties were found. Here, we focus on use in context
- that is, usability concerns that arise from the fact that libraries are
accessed in particular ways, under technically and organisationally imposed
constraints, and that use of any particular resource is discretionary. The
concepts from an Interaction Framework, which provides support for reasoning
about patterns of interaction between users and systems, are applied to
understand interaction issues. Keywords: HCI, digital libraries, interaction modelling, video protocols | |||
| An Ethnographic Study of Technical Support Workers: Why We Didn't Build a Tech Support Digital Library | | BIBAK | PDF | 189-198 | |
| Sally Jo Cunningham; Chris Knowles; Nina Reeves | |||
| In this paper we describe the results of an ethnographic study of the
information behaviours of university technical support workers and their
information needs. The study looked at how the group identified, located and
used information from a variety of sources to solve problems arising in the
course of their work. The results of the investigation are discussed in the
context of the feasibility of developing a potential information base that
could be used by all members of the group. Whilst a number of their
requirements would easily be fulfilled by the use of a digital library, other
requirements would not. The paper illustrates the limitations of a digital
library with respect to the information behaviours of this group of subjects
and focuses on why a digital library would not appear to be the ideal support
tool for their work. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Software -Software Engineering - Requirements/Specifications (D.2.1);
Design, Human Factors; ethnography, requirements analysis, user studies | |||
| Developing Recommendation Services for a Digital Library with Uncertain and Changing Data | | BIBAK | PDF | 199-200 | |
| Gary Geisler; David McArthur; Sarah Giersch | |||
| In developing recommendation services for a new digital library called
iLumina (www.ilumina-project.org), we are faced with several challenges related
to the nature of the data we have available. The availability and consistency
of data associated with iLumina is likely to be highly variable. Any
recommendation strategy we develop must be able to cope with this fact, while
also being robust enough to adapt to additional types of data available over
time as the digital library develops. In this paper we describe the challenges
we are faced with in developing a system that can provide our users with good,
consistent recommendations under changing and uncertain conditions. Keywords: digital library, recommender system, user services | |||
| Evaluation of DEFINDER: A System to Mine Definitions from Consumer-Oriented Medical Text | | BIBAK | PDF | 201-202 | |
| Judith L. Klavans; Smaranda Muresan | |||
| In this paper we present DEFINDER, a rule-based system that mines
consumer-oriented full text articles in order to extract definitions and the
terms they define. This research is part of Digital Library Project at Columbia
University, entitled PERSIVAL (PErsonalized Retrieval and Summarization of
Image, Video and Language resources) [5]. One goal of the project is to present
information to patients in language they can understand. A key component of
this stage is to provide accurate and readable lay definitions for technical
terms, which may be present in articles of intermediate complexity.
The focus of this short paper is on quantitative and qualitative evaluation of the DEFINDER system [3]. Our basis for comparison was definitions from Unified Medical Language System (UMLS), On-line Medical Dictionary (OMD) and Glossary of Popular and Technical Medical Terms (GPTMT). Quantitative evaluations show that DEFINDER obtained 87% precision and 75% recall and reveal the incompleteness of existing resources and the ability of DEFINDER to address gaps. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-based criteria of usability and readability than definitions from on-line specialized dictionaries. Thus the output of DEFINDER can be used to enhance existing specialized dictionaries, and also as a key feature in summarizing technical articles for non-specialist users. Keywords: automatic dictionary creation, medical digital libraries, natural language
processing, text data mining | |||
| Overview of the Virtual Data Center Project and Software | | BIBAK | PDF | 203-204 | |
| Micah Altman; L. Andreev; M. Diggory; G. King; E. Kolster; A. Sone; S. Verba; Daniel Kiskis; M. Krot | |||
| In this paper, we present an overview of the Virtual Data Center (VDC)
software, an open-source digital library system for the management and
dissemination of distributed collections of quantitative data. (see ). The VDC
functionality provides everything necessary to maintain and disseminate an
individual collection of research studies, including facilities for the
storage, archiving, cataloging, translation, and on-line analysis of a
particular collection. Moreover, the system provides extensive support for
distributed and federated collections including: location-independent naming of
objects, distributed authentication and access control, federated metadata
harvesting, remote repository caching, and distributed virtual collections of
remote objects. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Design, Management, Standardization; numeric data, open-source,
warehousing | |||
| Digital Libraries and Data Scholarship | | BIBAK | PDF | 205-206 | |
| Bruce R. Barkstrom | |||
| In addition to preserving and retrieving digital information, digital
libraries need to allow data scholars to create post-publication references to
objects within files and across collections of files. Such references can serve
as new metadata in their own right and should also provide methods for
efficiently extracting the subset of the original data that belongs to the
object. This paper discusses some ideas about the requirements for such
references within the context of long-term, active archival, where neither the
data format nor the institutional basis can be guaranteed to remain constant. Keywords: EOSDIS, data scholarship, digital libraries, object data references,
structural data reference | |||
| SDLIP + STARTS = SDARTS A Protocol and Toolkit for Metasearching | | BIBAK | PDF | 207-214 | |
| Noah Green; Panagiotis G. Ipeirotis; Luis Gravano | |||
| In this paper we describe how we combined SDLIP and STARTS, two
complementary protocols for searching over distributed document collections.
The resulting protocol, which we call SDARTS, is simple yet expressible enough
to enable building sophisticated metasearch engines. SDARTS can be viewed as an
instantiation of SDLIP with metasearch-specific elements from STARTS. We also
report on our experience building three SDARTS-compliant wrappers: for locally
available plain-text document collections, for locally available XML document
collections, and for external web-accessible collections. These wrappers were
developed to be easily customizable for new collections. Our work was developed
as part of Columbia University's Digital Libraries Initiative--Phase 2 (DLI2)
project, which involves the departments of Computer Science, Medical
Informatics, and Electrical Engineering, the Columbia University libraries, and
a large number of industrial partners. The main goal of the project is to
provide personalized access to a distributed patient-care digital library. Keywords: Information Systems -Information Storage and Retrieval - Information Search
and Retrieval (H.3.3); Information Systems -Information Storage and Retrieval -
Online Information Services (H.3.5); Information Systems -Information Storage
and Retrieval - Digital Libraries (H.3.7); Information Systems -Database
Management - Systems (H.2.4): Database Manager; Information Systems -Database
Management - Systems (H.2.4); Information Systems -Database Management -
Systems (H.2.4): Distributed databases; Information Systems -Database
Management - Heterogeneous Databases (H.2.5); Information Systems -Database
Management - Heterogeneous Databases (H.2.5): Data translation**; | |||
| Database Selection for Processing k Nearest Neighbors Queries in Distributed Environments | | BIBAK | PDF | 215-222 | |
| Clement Yu; Prasoon Sharma; Weiyi Meng; Yan Qin | |||
| We consider the processing of digital library queries, consisting of a text
component and a structured component in distributed environments. The text
component can be processed using techniques given in previous papers such as
[7, 8, 11]. In this paper, we concentrate on the processing of the structured
component of a distributed query. Histograms are constructed and algorithms are
given to provide estimates of the desirabilities of the databases with respect
to the given query. Databases are selected in descending order of desirability.
An algorithm is also given to select tuples from the selected databases.
Experimental results are given to show that the techniques provided here are
effective and efficient. Keywords: database selection, distributed databases, k nearest neighbors, query
processing | |||
| The President's Information Technology Advisory Committee's February 2001 Digital Library Report and its Impact | | BIBAK | PDF | 223-225 | |
| Sally E. Howe; David C. Nagel; Ching-chih Chen; Stephen M. Griffin; James Lightbourne; Walter L. Warnick | |||
| In February 2001 the Panel on Digital Libraries of the President's
Information Technology Advisory Committee issued a report entitled "Digital
Libraries: Universal Access to Human Knowledge". This JCDL panel, which
consists of two members of the PITAC Panel on Digital Libraries and
representatives of key Federal science and digital library agencies who had
briefed the Panel, will discuss the report's findings and recommendations and
how the report is and can be helpful in improving the development and use of
digital libraries. Keywords: Economics, Experimentation, Human Factors, Legal Aspects, Management,
Security, Standardization, Verification; digital libraries, federal government,
policy, research and development | |||
| Building Searchable Collections of Enterprise Speech Data | | BIBAK | PDF | 226-234 | |
| James W. Cooper; Mahesh Viswanathan; Donna Byron; Margaret Chan | |||
| We have applied speech recognition and text-mining technologies to a set of
recorded outbound marketing calls and analyzed the results. Since
speaker-independent speech recognition technology results in a significantly
lower recognition rate than that found when the recognizer is trained for a
particular speaker, we applied a number of post-processing algorithms to the
output of the recognizer to render it suitable for the Textract text mining
system.
We indexed the call transcripts using a search engine and used Textract and associated Java technologies to place the relevant terms for each document in a relational database. Following a search query, we generated a thumbnail display of the results of each call with the salient terms highlighted. We illustrate these results and discuss their utility. We took the results of these experiments and continued this analysis on a set of talks and presentations. We describe a distinct document genre based on the note-taking concept of document content, and propose a significant new method for measuring speech recognition accuracy. This procedure is generally relevant to the problem of capturing meetings and talks and providing a searchable index of these presentations on the web. Keywords: document display, search, speech analysis, speech retrieval, text mining | |||
| Transcript-Free Search of Audio Archives for the National Gallery of the Spoken Word | | BIBAK | PDF | 235-236 | |
| John H. L. Hansen; J. R. Deller; Michael S. Seadle | |||
| The National Gallery of the Spoken Word (NGSW) project is creating a
carefully organized on-line repository of spoken-word collections spanning the
20th century. Unprecedented technical challenges are inherent in the
development of an archive of such extensive scale and diversity. This paper
describes research on the development of text-free search-engine technology
used to locate requested content in the audio records. A companion paper in
these proceedings addresses watermarking technologies for copyright protection. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Data - Files (E.5); | |||
| Audio Watermarking Techniques for the National Gallery of the Spoken Word | | BIBA | PDF | 237-238 | |
| J. R. Deller; Aparna Gurijala; Michael S. Seadle | |||
| This is one of two companion papers describing technical challenges faced in the development of the National Gallery of the Spoken Word (NGSW). The present paper describes watermarking technologies for intellectual property protection. Following an introduction to data watermarking, the paper focuses on a new algorithm called transform encryption coding (TEC) and its application to watermarking the NGSW archives. TEC has a number of flexible features that make it amenable to the NGSW development. | |||
| Music-Notation Searching and Digital Libraries | | BIBA | PDF | 239-246 | |
| Donald Byrd | |||
| Almost all work on music information retrieval to date has concentrated on music in the audio and event (normally MIDI) domains. However, music in the form of notation, especially Conventional Music Notation (CMN), is of much interest to musically-trained persons, both amateurs and professionals, and searching CMN has great value for digital music libraries. One obvious reason little has been done on music retrieval in CMN form is the overwhelming complexity of CMN, which requires a very substantial investment in programming before one can even begin studying music IR. This paper reports on work adding music-retrieval capabilities to Nightingale?, an existing professional-level music-notation editor. | |||
| Feature Selection for Automatic Classification of Musical Instrument Sounds | | BIBAK | PDF | 247-248 | |
| Mingchun Liu; Chunru Wan | |||
| In this paper, we carry out a study on classification of musical instruments
using a small set of features selected from a broad range of extracted ones by
sequential forward feature selection method. Firstly, we extract 58 features
for each record in the music database of 351 sound files. Then, the sequential
forward selection method is adopted to choose the best feature set to achieve
high classification accuracy. Three different classification techniques have
been tested out and an accuracy of up to 93% can be achieved by using 19
features. Keywords: classification, feature extraction, musical instrument, sequential forward
feature selection | |||
| Adding Content-Based Searching to a Traditional Music Library Catalogue Server | | BIBAK | PDF | 249-250 | |
| Matthew J. Dovey | |||
| Most online music library catalogues can only be searched by textual
metadata. Whilst highly effective - since the rules for maintaining consistency
have been refined over many years - this does not allow searching by musical
content. Many music librarians are familiar with users humming their enquiries.
Most systems providing a "query by humming interface tend to run independently
of music library catalogue systems and not offer similar textual metadata
searching. This paper discusses the ongoing investigative work on integrating
these two types of system conducted as part of the NSF/JISC funded OMRAS
project (http://www.omras.org). Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Algorithms, Design; Z39.50, music information retrieval | |||
| Locating Question Difficulty through Explorations in Question Space | | BIBAK | PDF | 251-252 | |
| Terry Sullivan | |||
| Three different search effectiveness measures were used to classify 50
question narratives as easy or hard. Each measure was then encoded onto a
spatial representation of interquestion similarity. Discriminant analysis based
on the resulting map was able to predict question difficulty with approximately
80% accuracy, robust across multiple measures. Implications for the design of
digital document collections are discussed. Keywords: information visualization, question classification | |||
| Browsing by Phrases: Terminological Information in Interactive Multilingual Text Retrieval | | BIBAK | PDF | 253-254 | |
| Anselmo Penas; Julio Gonzalo; Felisa Verdejo | |||
| This paper present an interactive search engine (Website Term Browser) which
makes use of phrasal information to process queries and suggest relevant topics
in a fully multilingual setting. Keywords: interaction, multilingual information access, natural language processing,
terminology extraction | |||
| Approximate Ad-Hoc Query Engine for Simulation Data | | BIBAK | PDF | 255-256 | |
| Ghaleb Abdulla; Chuck Baldwin; Terence Critchlow; Roy Kamimura; Ida Lozares; Ron Musick; Nu Ai Tang; Byung S. Lee; Robert Snapp | |||
| In this paper, we describe AQSim, an ongoing effort to design and implement
a system to manage terabytes of scientific simulation data. The goal of this
project is to reduce data storage requirements and access times while
permitting ad-hoc queries using statistical and mathematical models of the
data. In order to facilitate data exchange between models based on different
representations, we are evaluating using the ASCI common data model that is
comprised of several layers of increasing semantic complexity. To support
queries over the spatial-temporal mesh structured data we are in the process of
defining and implementing a grammar for MeshSQL. Keywords: data integration, data retrieval, mesh data, query, scientific data
management, visualization | |||
| Extracting Taxonomic Relationships from On-Line Definitional Sources using LEXING | | BIBAK | PDF | 257-258 | |
| Judith Klavans; Brian Whitman | |||
| We present a system which extracts the genus word and phrase from free-form
definition text, entitled LEXING, for Lexical Information from Glossaries. The
extractions will be used to build automatically a lexical knowledge base from
on-line domain specific glossary sources. We combine statistical and semantic
processes to extract these terms, and demonstrate that this combination allows
us to predict the genus even in difficult situations such as empty head
definitions or verb definitions. We also discuss the use of "linking
prepositions" for use in skipping past empty head genus phrases. This system is
part of a project to extract ontological information for energy glossary
information. Keywords: definitions, glossaries, information retrieval, lexical knowledge bases,
natural language processing, ontologies | |||
| Hierarchical Indexing and Document Matching in BoW | | BIBA | PDF | 259-267 | |
| Maayan Geffet; Dror G. Feitelson | |||
| BoW is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with. | |||
| Scalable Integrated Region-Based Image Retrieval using IRM and Statistical Clustering | | BIBAK | PDF | 268-277 | |
| James Z. Wang; Yanping Du | |||
| Statistical clustering is critical in designing scalable image retrieval
systems. In this paper, we present a scalable algorithm for indexing and
retrieving images based on region segmentation. The method uses statistical
clustering on region features and IRM (Integrated Region Matching), a measure
developed to evaluate overall similarity between images that incorporates
properties of all the regions in the images by a region-matching scheme.
Compared with retrieval based on individual regions, our overall similarity
approach (a) reduces the influence of inaccurate segmentation, (b) helps to
clarify the semantics of a particular region, and (c) enables a simple querying
interface for region-based image retrieval systems. The algorithm has been
implemented as a part of our experimental SIMPLIcity image retrieval system and
tested on large-scale image databases of both general-purpose images and
pathology slides. Experiments have demonstrated that this technique maintains
the accuracy and robustness of the original system while reducing the matching
time significantly. Keywords: clustering, content-based image retrieval, integrated region matching,
segmentaton, wavelets | |||
| The National SMETE Digital Library Program | | BIBAK | PDF | 278-281 | |
| Brandon Muramatsu; Cathryn A. Manduca; Marcia Mardis; James H. Lightbourne; Flora P. McMartin | |||
| "To catalyze and support continual improvements in the quality of science,
mathematics, engineering, and technology (SMET) education, the National Science
Foundation (NSF) has established the National Science, Mathematics,
Engineering, and Technology Education Digital Library (NSDL) program. The
resulting digital library, a network of learning environments and resources for
SMET education, will ultimately meet the needs of students and teachers at all
levels-K-12, undergraduate, graduate, and lifelong learning-in both individual
and collaborative settings, as well as formal and informal modes." -National
Science Foundation, 2001
The national in the NSDL program is quickly becoming a reality with the broad reach of the currently funded projects. This panel session will provide bring together the leaders developing the National SMETE Digital Library to provide a brief background and broad overview of the NSDL program. Panelists will discuss the overall vision and broad steps underway to develop the National SMETE Digital Library. Building the National SMETE Digital Library presents many challenges: * Developing a shared vision for the form and function of the NSDL; * Meeting the needs of diverse learners and of the many disciplines encompassed by the NSDL; * Acquiring input from the community of users to ensure that the NSDL is both used and useable; * Evaluating progress and impacts; * Integrating technologies that already exist, and the development of new technologies; and * Providing mechanisms for sharing and cooperation of knowledge and resources among NSDL collaborators. Keywords: National SMETE Digital Library, NSDL, Education, Teaching and Learning | |||
| Cumulating and Sharing End Users Knowledge to Improve Video Indexing in a Video Digital Library | | BIBAK | PDF | 282-289 | |
| Marc Nanard; Jocelyne Nanard | |||
| In this paper, we focus on a user driven approach to improve video indexing.
It consists in cumulating the large amount of small, individual efforts done by
the users who access information, and to provide a community management
mechanism to let users share the elicited knowledge. This technique is
currently being developed in the "OPALES" environment and tuned up at the
"Institut National de l'Audiovisuel" (INA), a National Video Library in Paris,
to increase the value of its patrimonial video archive collections. It relies
on a portal providing private workspaces to end users, so that a large part of
their work can be shared between them. The effort for interpreting documents is
directly done by the expert users who work for their own job on the archives.
OPALES provides an original notion of "point of view" to enable the elicitation
and the sharing of knowledge between communities of users, without leading to
messy structures. The overall result consists in linking exportable private
metadata to archive documents and managing the sharing of the elicited
knowledge between users communities. Keywords: H.3.5[INFORMATION STORAGE AND RETRIEVAL]: Online Information Services - Data
bank sharing Design; Video annotation. Video indexing. Private workspaces.
Users communities. Knowledge sharing. | |||
| XSLT for Tailored Access to a Digtal Video Library | | BIBAK | PDF | 290-299 | |
| Michael G. Christel; Bryan Maher; Andrew Begun | |||
| Surrogates, summaries, and visualizations have been developed and evaluated
for accessing a digital video library containing thousands of documents and
terabytes of data. These interfaces, formerly implemented within a monolithic
stand-alone application, are being migrated to XML and XSLT for delivery
through web browsers. The merits of these interfaces are presented, along with
a discussion of the benefits in using W3C recommendations such as XML and XSLT
for delivering tailored access to video over the web. Keywords: Information Systems -Information Interfaces and Presentation - Multimedia
Information Systems (H.5.1); Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): Standards; Information Systems
-Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination;
Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): User issues; Design, Human Factors, Standardization; XML, XSLT,
digital video library, surrogate | |||
| Design of a Digital Library for Human Movement | | BIBAK | PDF | 300-309 | |
| Jezekiel Ben-Arie; Purvin Pandit; ShyamSundar Rajaram | |||
| This paper is focused on a central aspect in the design of our planned
digital library for human movement, i.e. on the aspect of representation and
recognition of human activity from video data. The method of representation is
important since it has a major impact on the design of all the other building
blocks of our system such as the user interface/query block or the activity
recognition/storage block. In this paper we evaluate a representation method
for human movement that is based on sequences of angular poses and angular
velocities of the human skeletal joints, for storage and retrieval of human
actions in video databases. The choice of a representation method plays an
important role in the database structure, search methods, storage efficiency
etc.. For this representation, we develop a novel approach for complex human
activity recognition by employing multidimensional indexing combined with
temporal or sequential correlation. This scheme is then evaluated with respect
to its efficiency in storage and retrieval.
For the indexing we use postures of humans in videos that are decomposed into a set of multidimensional tuples which represent the poses/velocities of human body parts such as arms, legs and torso. Three novel methods for human activity recognition are theoretically and experimentally compared. The methods require only a few sparsely sampled human postures. We also achieve speed invariant recognition of activities by eliminating the time factor and replacing it with sequence information. The indexing approach also provides robust recognition and an efficient storage/retrieval of all the activities in a small set of hash tables. Keywords: Computing Methodologies -Image Processing And Computer Vision - Scene
Analysis (I.4.8); Computing Methodologies -Image Processing And Computer Vision
- Scene Analysis (I.4.8): Motion; Computing Methodologies -Image Processing And
Computer Vision - Scene Analysis (I.4.8): Tracking; Computing Methodologies
-Pattern Recognition - Design Methodology (I.5.2); Computing Methodologies
-Pattern Recognition - Design Methodology (I.5.2): Pattern analysis; Data -
Data Storage Representations (E.2); Algorithms, Design; human activity
recognition, multi dimensional indexing, sequence recognition, temporal
correlation | |||
| A Bucket Architecture for the Open Video Project | | BIBAK | PDF | 310-311 | |
| Michael L. Nelson; Gary Marchionini; Gary Geisler; Meng Yang | |||
| The Open Video project is a collection of public domain digital video
available for research and other purposes. The Open Video collection currently
consists of approximately 350 video segments, ranging in duration from 10
seconds to 1 hour. Rapid growth for the collection is planned through
agreements with other video repository projects and provision for user
contribution of video. To handle the increased accession, we are experimenting
with "buckets", aggregative intelligent publishing constructs for use in
digital libraries. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7); Information Systems -Information Storage and Retrieval - Digital
Libraries (H.3.7): Collection; Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems
-Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues;
Design, Documentation, Experimentation, Management; buckets, digital objects,
digital video, open source | |||
| The Fischlar Digital Video System: A Digital Library of Broadcast TV Programmes | | BIBA | PDF | 312-313 | |
| A. F. Smeaton; N. Murphy; N. E. O'Connor; S. Marlow; H. Lee; K. McDonald; P. Browne; J. Ye | |||
| Fischlar is a system for recording, indexing, browsing and playback of broadcast TV programmes which has been operational on our University campus for almost 18 months. In this paper we give a brief overview of how the system operates, how TV programmes are organised for browse/playback and a short report on the system usage by over 900 users in our University. | |||
| Design Principles for the Information Architecture of a SMET Education Digital Library | | BIBAK | PDF | 314-321 | |
| Andy Dong; Alice M. Agogino | |||
| This implementation paper introduces principles for the information
architecture of an educational digital library, principles that address the
distinction between designing digital libraries for education and designing
digital libraries for information retrieval in general. Design is a key element
of any successful product. Good designers and their designs, put technology
into the hands of the user, making the products focus comprehensible and
tangible through design. As straightforward as this may appear, the design of
learning technologies is often masked by the enabling technology. In fact, they
often lack an explicitly stated instructional design methodology. While the
technologies are important hurdles to overcome, we advocate learning systems
that empower education-driven / experiences rather than technology-driven
experiences. This work describes a concept for a digital library for science,
mathematics, engineering and technology education (SMETE), a library with an
information architecture designed to meet learners and educators needs.
Utilizing a constructivist model of learning, the authors present practical
approaches to implementing the information architecture and its technology
underpinnings. The authors propose the specifications for the information
architecture and a visual design of a digital library for communicating
learning to the audience. The design methodology indicates that a
scenario-driven design technique sensitive to the contextual nature of learning
offers a useful framework for tailoring technologies that help empower, not
hinder, the educational sector. Keywords: Design, Human Factors; education, engineering, learning technology,
mathematics, science, technology | |||
| Toward a Model of Self-Administering Data | | BIBAK | PDF | 322-330 | |
| ByungHoon Kang; Robert Wilensky | |||
| We describe a model of self-administering data. In this model, a declarative
description of how a data object should behave is attached to the object,
either by a user or by a data input device. A widespread infrastructure of
self-administering data handlers is presumed to exist; these handlers are
responsible for carrying out the specifications attached to the data.
Typically, the specifications express how and to whom the data should be
transferred, how it should be incorporated when it is received, what rights
recipients of the data will have with respect to it, and the kind of relation
that should exist between distributed copies of the object. Functions such as
distributed version control can be implemented on top of the basic handler
functions.
We suggest that this model can provide superior support for common cooperative functions. Because the model is declarative, users need only express their intentions once in creating a self-administering description, and need not be concerned with manually performing subsequent repetitious operations. Because the model is peer-to-peer, users are less dependent on additional, perhaps costly resources, at least when these are not critical. An initial implementation of the model has been created. We are experimenting with the model both as a tool to aid in digital library functions, and as a possible replacement for some server oriented functions. Keywords: asynchronous collaboration, data access model, data management, distributed
file system, file sharing, peer to peer, scalable update propagation,
self-administering data | |||
| PERSIVAL, A System for Personalized Search and Summarization over Multimedia Healthcare Information | | BIBAK | PDF | 331-340 | |
| Kathleen R. McKeown; Shih-Fu Chang; James Cimino; Steven Feiner; Carol Friedman; Luis Gravano; Vasileios Hatzivassiloglou; Steven Johnson; Desmond A. Jordan; Judith L. Klavans; Andre Kushniruk; Vimla Patel; Simone Teufel | |||
| In healthcare settings, patients need access to online information that can
help them understand their medical situation. Physicians need information that
is clinically relevant to an individual patient. In this paper, we present our
progress on developing a system, PERSIVAL, that is designed to provide
personalized access to a distributed patient care digital library. Using the
secure, online patient records at New York Presbyterian Hospital as a user
model, PERSIVAL's components tailor search, presentation and summarization of
online multimedia information to both patients and healthcare providers. Keywords: Computing Methodologies -Artificial Intelligence - Natural Language
Processing (I.2.7); Information Systems -Information Interfaces and
Presentation - User Interfaces (H.5.2); Information Systems -Information
Storage and Retrieval - Online Information Services (H.3.5): Web-based
services; Information Systems -Information Storage and Retrieval - Online
Information Services (H.3.5); medical digital library, multimedia, natural
language, personalization, query interface, search, summarization | |||
| An Approach to Search for the Digital Library | | BIBAK | PDF | 341-342 | |
| Elaine G. Toms; Joan C. Bartlett | |||
| The chief form of accessing the content of a digital library (DL) is its
search interface. While a DL needs an interface that integrates a range of
options from search to browse to serendipity, in this work we focus on
analytical search. We propose using Bates' search tactics as a basis for the
re-design of search interfaces. We believe this approach will help to identify
the types of tools that need to be supported by a DL interface. Keywords: Information Systems -Information Storage and Retrieval - Information Search
and Retrieval (H.3.3); digital libraries, search interface, search tactics,
searching | |||
| TilePic: A File Format for Tiled Hierarchical Data | | BIBA | PDF | 343-344 | |
| Jeff Anderson-Lee; Robert Wilensky | |||
| TilePic is a method for storing tiled data of arbitrary type in a hierarchical, indexed format for fast retrieval. It is useful for storing moderately large, static, spatial datasets in a manner that is suitable for panning and zooming over the data, especially in distributed applications. Because different data types may be stored in the same object, TilePic can support semantic zooming as well. It has proven suitable for a wide variety of applications involving the networked access and presentation of images, geographic data, and text. The TilePic format and its supporting tools are unencumbered, and available to all. | |||
| High Tech or High Touch: Automation and Human Mediation in Libraries | | BIBA | PDF | 345 | |
| David Levy; William Arms; Oren Etzioni; Diane Nester; Barbara Tillett | |||
| There are those who now think that traditional library services, such as cataloging and reference, will no longer be needed in the future, or at least will be fully automated. Others are equally adamant that human intervention is not only important but essential. Underlying such positions are a host of assumptions - about the continued existence and place of paper, the role of human intelligence and interpretation, the nature of research, and the significance of the human element. This panel brings together experts in libraries and digital technology to uncover such issues and assumptions and to discuss and debate the place of people and machines in cataloging and reference work. | |||
| Long Term Preservation of Digital Information | | BIBAK | PDF | 346-352 | |
| Raymond A. Lorie | |||
| The preservation of digital data for the long term presents a variety of
challenges from technical to social and organizational. The technical challenge
is to ensure that the information, generated today, can survive long term
changes in storage media, devices and data formats. This paper presents a novel
approach to the problem. It distinguishes between archiving of data files and
archiving of programs (so that their behavior may be reenacted in the future).
For the archiving of a data file, the proposal consists of specifying the processing that needs to be performed on the data (as physically stored) in order to return the information to a future client (according to a logical view of the data). The process specification and the logical view definition are archived with the data. For the archiving of a program behavior, the proposal consists of saving the original executable object code together with the specification of the processing that needs to be performed for each machine instruction of the original computer (emulation). In both cases, the processing specification is based on a Universal Virtual Computer that is general, yet basic enough as to remain relevant in the future. Keywords: Languages, Standardization; archival, digital documents, digital
information, digital library, emulation, preservation | |||
| Creating Trading Networks of Digital Archives | | BIBAK | PDF | 353-362 | |
| Brian Cooper; Hector Garcia | |||
| Digital archives can best survive failures if they have made several copies
of their collections at remote sites. In this paper, we discuss how autonomous
sites can cooperate to provide preservation by trading data. We examine the
decisions that an archive must make when forming trading networks, such as the
amount of storage space to provide and the best number of partner sites. We
also deal with the fact that some sites may be more reliable than others.
Experimental results from a data trading simulator illustrate which policies
are most reliable. Our techniques focus on preserving the "bits" of digital
collections; other services that focus on other archiving concerns (such as
preserving meaningful metadata) can be built on top of the system we describe
here. Keywords: data trading, digital archiving, fault tolerance, preservation, replication | |||
| Cost-Driven Design for Archival Repositories | | BIBA | PDF | 363-372 | |
| Arturo Crespo; Hector Garcia-Molina | |||
| Designing an archival repository is a complex task because there are many alternative configurations, each with different reliability levels and costs. In this paper we study the costs involved in an Archival Repository and we introduce a design framework for evaluating alternatives and choosing the best configuration in terms of reliability and cost. We also present a new version of our simulation tool, ArchSim/C that aids in the decision process. The design framework and the usage of ArchSim/C are illustrated with a case study of a hypothetical (yet realistic) archival repository shared between two universities. | |||
| Hermes: A Notification Service for Digital Libraries | | BIBAK | PDF | 373-380 | |
| D. Faensen; L. Faultstich; H. Schweppe; A. Hinze; A. Steidinger | |||
| The high publication rate of scholarly material makes searching and browsing
an inconvenient way to keep oneself up-to-date. Instead of being the active
part in information access, researchers want to be notified whenever a new
paper in one's research area is published.
While more and more publishing houses or portal sites offer notification services this approach has several disadvantages. We introduce the Hermes alerting service, a service that integrates a variety of different information providers making their heterogeneity transparent for the users. Hermes offers sophisticated filtering capabilities preventing the user from drowning in a flood of irrelevant information. From the user's point of view it integrates the providers into a single source. Its simple provider interface makes it easy for publishers to join the service and thus reaching the potential readers directly. This paper presents the architecture of the Hermes service and discusses the issues of heterogeneity of information sources. Furthermore, we discuss the benefits and disadvantages of message-oriented middleware for implementing such a service for digital libraries. Keywords: collaborative filtering, electronic publishing, recommender system | |||
| An Algorithm for Automated Rating of Reviewers | | BIBA | PDF | 381-387 | |
| Tracy Riggs; Robert Wilensky | |||
| The current system for scholarly information dissemination may be amenable
to significant improvement. In particular, going from the current system of
journal publication to one of self-distributed documents offers significant
cost and timeliness advantages. A major concern with such alternatives is how
to provide the value currently afforded by the peer review system.
Here we propose a mechanism that could plausibly supply such value. In the peer review system, papers are judged meritorious if good reviewers give them good reviews. In its place, we propose a collaborative filtering algorithm which automatically rates reviewers, and incorporates the quality of the reviewer into the metric of merit for the paper. Such a system seems to provide all the benefits of the current peer review system, while at the same time being much more flexible. We have implemented a number of parameterized variations of this algorithm, and tested them on data available from a quite different application. Our initial experiments suggest that the algorithm is in fact ranking reviewers reasonably. | |||
| HeinOnline: An Online Archive of Law Journals | | BIBAK | PDF | 388-394 | |
| Richard J. Marisa | |||
| HeinOnline is a new online archive of law journals. Development of
HeinOnline began in late 1997 through the cooperation of Cornell Information
Technologies, William S. Hein & Co., Inc. of Buffalo, NY, and the Cornell Law
Library. Built upon the familiar Dienst and new Open Archive Initiative
protocols, HeinOnline extends the reliable and well-established management
practices of open access archives like NCSTRL and CoRR to a subscription-based
collection. The decisions made in creating HeinOnline, Dienst architectural
extensions, and issues which have arisen during operation of HeinOnline are
described. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): Collection; Information Systems -Information Storage and Retrieval -
Digital Libraries (H.3.7): Systems issues; Design, Experimentation, Management;
dienst, digital library, document structure, law journals, metadata, system
design | |||
| Digital Libraries Supporting Digital Government | | BIBA | PDF | 395-397 | |
| Gary Marchionini; Anne Craig; Larry Brandt; Judith Klavans; Hsinchun Chen | |||
| The needs of society have long been addressed through government research support for new technologies-the Internet representing one example. Today, under the rubric of digital government, federal agencies as well as state and local units of governments at all levels have begun to leverage the fruits of these research investments to better serve the needs of their constituencies. Government agencies apply these technologies in a variety of settings including emergency response, health and safety regulation, financial management, data gathering, and hosts of information dissemination needs. In addition, governments are investigating ways to use technology to encourage citizen participation. There is a growing digital government community of practice that strongly parallels the evolving digital library community. These parallel developments are not surprising because libraries and governments share service missions for their overlapping constituencies. | |||
| Designing a Digital Library for Young Children | | BIBAK | PDF | 398-405 | |
| Allison Druin; Benjamin B. Bederson; Juan Pablo Hourcade; Lisa Sherman; Glenda Revelle; Michele Platner; Stacy Weng | |||
| As more information resources become accessible using computers, our digital
interfaces to those resources need to be appropriate for all people. However
when it comes to digital libraries, the interfaces have typically been designed
for older children or adults. Therefore, we have begun to develop a digital
library interface developmentally appropriate for young children (ages 5-10
years old). Our prototype system we now call SearchKids offers a graphical
interface for querying, browsing and reviewing search results. This paper
describes our motivation for the research, the design partnership we
established between children and adults, our design process, the technology
outcomes of our current work, and the lessons we have learned. Keywords: Information Systems -Information Interfaces and Presentation - User
Interfaces (H.5.2): Graphical user interfaces (GUI); Information Systems
-Information Interfaces and Presentation - User Interfaces (H.5.2): Interaction
styles; Information Systems -Information Interfaces and Presentation - User
Interfaces (H.5.2): Screen design; Information Systems -Information Interfaces
and Presentation - User Interfaces (H.5.2): User-centered design; Information
Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User
issues; Information Systems -Information Storage and Retrieval - Information
Search and Retrieval (H.3.3): Query formulation; Software -Software Engineering
- Requirements/Specifications (D.2.1): Elicitation methods (e.g., rapid
prototyping, interviews, JAD); Design, Human Factors; children, cooperative
inquiry, digital libraries, education applications, information retrieval
design techniques, intergenerational design team, participatory design,
zoomable user interfaces | |||
| Dynamic Digital Libraries for Children | | BIBAK | PDF | 406-415 | |
| Yin Leng Theng; Norliza Mohd-Nasir; George Buchanan; Bob Fields; Harold Thimbleby; Noel Cassidy | |||
| The majority of current digital libraries (DLs) are not designed for
children. For DLs to be popular with children, they need to be fun, easy-to-use
and empower them, whether as readers or authors. This paper describes a new
children's DL emphasizing its design and evaluation, working with the children
(11-14 year olds) as design partners and testers. A truly participatory process
was used, and observational study was used as a means of refinement to the
initial design of the DL prototype. In contrast with current DLs, the
children's DL provides both a static as well as a dynamic environment to
encourage active engagement of children in using it. Design, implementation and
security issues are also raised. Keywords: collaborative writing, design partners and testers, design process,
ethnography, observational study, participatory design | |||
| Looking at Digital Library Usability from a Reuse Perspective | | BIBAK | PDF | 416-425 | |
| Tamara Sumner; Melissa Dawe | |||
| The need for information systems to support the dissemination and reuse of
educational resources has sparked a number of large-scale digital library
efforts. This article describes usability findings from one such project - the
Digital Library for Earth System Education (DLESE) - focusing on its role in
the process of educational resource reuse. Drawing upon a reuse model developed
in the domain of software engineering, the reuse cycle is broken down into five
stages: formulation of a reuse intention, location, comprehension,
modification, and sharing. Using this model to analyze user studies in the
DLESE project, several implications for library system design and library
outreach activities are highlighted. One finding is that resource reuse occurs
at different stages in the educational design process, and each stage imposes
different and possibly conflicting requirements on digital library design.
Another finding is that reuse is a distributed process across several
artifacts, both within and outside of the library itself. In order for reuse to
be successful, a usability line cannot be drawn at the library boundary, but
instead must encompass both the library system and the educational resources
themselves. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): Systems issues; Information Systems -Information Storage and Retrieval
- Digital Libraries (H.3.7): User issues; Computer Applications - Physical
Sciences and Engineering (J.2); Computer Applications - Physical Sciences and
Engineering (J.2): Earth and atmospheric sciences; Design, Human Factors;
comprehension, digital libraries, educational resources, learning impact,
location, modification, reuse, sharing | |||
| Building a Hypertextual Digital Library in the Humanities: A Case Study on London | | BIBAK | PDF | 426-434 | |
| Gregory Crane; David A. Smith; Clifford E. Wulfman | |||
| This paper describes the creation of a new humanities digital library
collection: 11,000,000 words and 10,000 images representing books, images and
maps on pre-twentieth century London and its environs. The London collection
contained far more dense and precise information than the materials from the
Greco-Roman world on which we had previously concentrated. The London
collection thus allowed us to explore new problems of data structure,
manipulation, and visualization. This paper contrasts our model for how
humanities digital libraries are best used with the assumptions that underlie
many academic digital libraries on the one hand and more literary hypertexts on
the other. Since encoding guidelines such as those from the TEI provide
collection designers with far more options than any one project can realize,
this paper describes what structures we used to organize the collection and
why. We particularly emphasize the importance of mining historical authority
lists (encyclopedias, gazetteers, etc.) and then generating automatic
span-to-span links within the collection. Keywords: automatic linking, browsing, collection development, document design,
reading | |||
| Document Quality Indicators and Corpus Editions | | BIBAK | PDF | 435-436 | |
| Jeffrey A. Rydberg-Cox; Anne Mahoney; Gregory R. Crane | |||
| Corpus editions can only be useful to scholars when users know what to
expect of the texts. We argue for text quality indicators, both general and
domain-specific. Keywords: Design, Documentation, Languages, Standardization, Theory; automatic
linking, browsing, collection development, document design, reading | |||
| The Digital Atheneum: New Approaches for Preserving, Restoring and Analyzing Damaged Manuscripts | | BIBAK | PDF | 437-443 | |
| Michael S. Brown; W. Brent | |||
| This paper presents research focused on developing new techniques and
algorithms for the digital acquisition, restoration, and study of damaged
manuscripts. We present results from an acquisition effort in partnership with
the British Library, funded through the NSF DLI-2 program, designed to capture
3-D models of old and damaged manuscripts. We show how these 3-D facsimiles can
be analyzed and manipulated in ways that are tedious or even impossible if
confined to the physical manuscript. In particular, we present results from a
restoration framework we have developed for "flattening" the 3-D representation
of badly warped manuscripts. We expect these research directions to give
scholars more sophisticated methods to preserve, restore, and better understand
the physical objects they study. Keywords: digital libraries, digital preservation, document analysis, humanities
computing, restoration | |||
| Towards an Electronic Variorum Edition of Don Quixote | | BIBAK | PDF | 444-445 | |
| Richard Furuta; Shueh-Cheng Hu; Siddarth Kalasapur; Rajiv Kochumman; Eduardo Urbina; Ricardo Vivancos | |||
| known Don Quixote. This paper gives an overview of the computer-based tools
that we are using in this endeavor, and summarizes the current status of the
project. The Electronic Variorum Edition will join the other content elements
maintained by the project, which focuses on electronic resources in support of
the study of Cervantes, his works, and his times. Keywords: cervantes digital library, cervantes project, hispanic culture, humanities
digital libraries | |||
| Digital Music Libraries -- Research and Development | | BIBA | PDF | 446-448 | |
| David Bainbridge; Gerry Bernbom; Mary Wallace; Andrew P. Dillon; Matthew Dovey; Jon W. Dunn; Michael Fingerhut; Ichiro Fujinaga; Eric J. Isaacson | |||
| Digital music libraries provide enhanced access and functionality that facilitates scholarly research and education. This panel will present a report on the progress of several major research and development projects in digital music libraries. | |||
| Content Management for Digital Museum Exhibitions | | BIBAK | PDF | 450 | |
| Jen-Shin Hong; Bai-Hsuen Chen; Jieh Hsiang; Tien-Yu Hsu | |||
| An online exhibition of a digital museum often consists of a variety of
multimedia objects such as webpages, animation, and video clips. Ideally, there
should be different exhibitions on the same topic for users with different
needs. The difficulty is that it is time-consuming to produce illustrative and
intriguing online exhibitions. In this paper, we present a content management
system for producing exhibitions. This framework is a novel approach for
organizing digital collections and for quickly selecting, integrating, and
composing objects from the collection to produce exhibitions of different
presentation styles, one for each user group. A prototype based on our
framework has been implemented and successfully used in the production of a
Lanyu digital museum. Using our method, the Lanyu Digital Museum online
exhibition has several features: (1) It provides an easy way to compose
artifacts extracted from the digital collection into exhibitions. (2) It
provides an easy way to create different presentations of the same exhibition
content that are catered to users with different needs. (3) It provides
easy-to-use film-editing capability to re-arrange an exhibition and to produce
new exhibitions from existing ones. Keywords: XML, content management, digital museum, multipresentation | |||
| Demonstration of Hierarchical Document Clustering of Digital Library Retrieval Results | | BIBAK | PDF | 451 | |
| C. R. Palmer; J. Pesenti; R. E. Valdes-Perez; M. G. Christel; A. G. Hauptmann; D. Ng; H. D. Wactlar | |||
| As digital libraries grow in size, querying their contents will become as
frustrating as querying the web is now. One remedy is to hierarchically cluster
the results that are returned by searching a digital library. We demonstrate
the clustering of search results from Carnegie Mellon's Informedia database, a
large video library that supports indexing and retrieval with automatically
generated descriptors. Keywords: Information Systems -Information Storage and Retrieval - Information Search
and Retrieval (H.3.3): Clustering; Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): User issues; hierarchical document
clustering | |||
| Indiana University Digital Music Library Project | | BIBAK | PDF | 452 | |
| Jon W. Dunn; Eric J. Isaacson | |||
| The Indiana University Digital Music Library project plans to create a
digital library testbed system containing music in a variety of formats,
designed to support research and education in the field of music and to serve
as a platform for digital library research. Prototypes of user interfaces to
the system will be demonstrated. Keywords: Computer Applications - Arts and Humanities (J.5): Performing arts (e.g.,
dance, music); Information Systems -Information Storage and Retrieval - Digital
Libraries (H.3.7): Collection; Information Systems -Information Storage and
Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems
-Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues;
Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): User issues; Design; music digital libraries, music instruction | |||
| Interactive Visualization of Video Metadata | | BIBAK | PDF | 453 | |
| Mark Derthick | |||
| Much current research on digital libraries focuses on named entity
extraction and transformation into structured information. Examples include
entities like events, people, and places, and attributes like birth date or
latitude. This video demonstration illustrates the potential for finding
relationships among entities extracted from 50,000 news segments from CMUs
Informedia Digital Video Library. A visual query language is used to specify
relationships among entities. Data populate the query structure, which becomes
an interface for exploration that gives continuous feedback in the form of
visualizations of summary statistics. The target user is a data analyst
familiar with the domain from which the entities come, but not a computer
scientist. Keywords: Information Systems -Information Interfaces and Presentation - User
Interfaces (H.5.2): Graphical user interfaces (GUI); Information Systems
-Information Interfaces and Presentation - User Interfaces (H.5.2): Interaction
styles; Algorithms, Human Factors; information visualization | |||
| PERSIVAL Demo: Categorizing Hidden-Web Resources | | BIB | PDF | 454 | |
| Panagiotis G. Ipeirotis; Luis Gravano; Mehran Sahami | |||
| PERSIVAL: Personalized Summarization Over Multimedia Health-Care Information | | BIBA | PDF | 455 | |
| Noemie Elhadad; Min-Yen Kan; Simon Lok; Smaranda Muresan | |||
| In this demonstration, we present several integrated components of PERSIVAL
PErsonalized Retrieval and Summarization of Image, Video And Language)[1], a
system designed to provide personalized access to a distributed digital library
of medical literature and consumer health information. The global system
architecture of PERSIVAL is best described as a two-stage processing pipeline.
The first stage is a retrieval system that matches user queries with relevant
multimedia data in the library. The second stage is a visualization system that
processes the multimedia data matched by the first stage for display.
Our demonstration focuses on the second stage of PERSIVAL's processing pipeline. Given a set of relevant documents for certain predefined queries, our integrated demonstration seeks to give a tailored response for either physicians or patients, featuring textual summaries, as well as relevant medical definitions. To visualize the summaries and definitions, we employ automated constraint-based layout of the user interface that allows for rich interaction between summaries and definitions. PERSIVAL's natural language processing and user interface modules make up the visualization portion of the system and illustrate state-of-the-art digital library technology. Following are the modules presented in our demonstration. | |||
| View Segmentation and Static/Dynamic Summary Generation for Echocardiogram Videos | | BIBA | PDF | 456 | |
| Shahram Ebadollahi; Shih-Fu Chang | |||
| The demonstration described here is a part of the PERSIVAL system [1]. In PERSIVAL the user of the echocardiogram video archives is able to access, browse, search and interact with the echocardiogram videos efficiently and effectively. Video data is also integrated with other modalities of information and presented to the right users in the right context. | |||
| Stanford Encyclopedia of Philosophy: A Dynamic Reference Work | | BIBA | PDF | 457 | |
| Edward N. Zalta; Colin Allen; Uri Nodelman | |||
| The primary goal of the Stanford Encyclopedia of Philosophy project (http://plato.stanford.edu/) is to produce an authoritative and comprehensive reference work devoted to the academic discipline of philosophy that will be kept up to date ally so as to remain useful to those in academia and the general public. To accomplish this goal we have designed and implemented web-based software by which academic philosophers can collaboratively write and maintain such a 'dynamic reference work'. Our implementation has features that are not found in any other online reference work in any discipline, and that enable the profession of philosophy to maintain such a reference work without the cost or level of staff support required for traditional reference work publishing. | |||
| A System for Adding Content-Based Searching to a Traditional Music Library Catalogue Server | | BIBA | PDF | 458 | |
| Matthew J. Dovey | |||
| Most online music library catalogues can only be searched by textual metadata. Whilst highly effective - since the rules for maintaining consistency have been refined over many years - this does not allow searching by musical content. Many music librarians are familiar with users humming their enquiries. Most systems providing a query by humming interface tend to run independently of music library catalogue systems and not offer similar textual metadata searching. This demonstration shows how we can integrate these two types of system based on work conducted as part of the NSF/JISC funded OMRAS project (http://www.omras.org). | |||
| Using the Repository Explorer to Achieve OAI Protocol Compliance | | BIBA | PDF | 459 | |
| Hussein Suleman | |||
| The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability by defining simple protocols, most recently the Open Archives Initiative Protocol for Metadata Harvesting [2], which was unveiled in January 2001. To support the adoption of this new interoperability technology, we have developed the Repository Explorer [1], a web-based tool to enforce compliance to the same interpretation of the protocol by the various different server implementations. This demonstration will show how the Repository Explorer can be used to perform either user-driven browsing or automatic testing of an implementation of the protocol. | |||
| An Atmospheric Visualization Collection for the NSDL | | BIBAK | PDF | 463 | |
| Christopher Klaus; Keith Andrew | |||
| In this poster, we describe visualization and educational efforts underway
to build an Atmospheric Visualization Collection for the NSDL. Keywords: Computing Methodologies -Image Processing And Computer Vision - General
(I.4.0): Image displays; Computer Applications - Physical Sciences and
Engineering (J.2): Earth and atmospheric sciences; Computing Methodologies
-Computer Graphics - Picture/Image Generation (I.3.3); Algorithms,
Experimentation, Human Factors, Measurement; atmospheric science, digital
library, visualization education | |||
| Breaking the Metadata Generation Bottleneck: Preliminary Findings | | BIBA | PDF | 464 | |
| Elizabeth D. Liddy; Stuart Sutton; Woojin Paik; Eileen Allen; Sarah Harwell; Michelle Monsour; Anne Turner; Jennifer Liddy | |||
| The goal of our 18 month NSDL-funded project is to develop Natural Language Processing and Machine Learning technology which will accomplish automatic metadata generation for individual educational resources in digital collections. The metadata tags that the system will be learning to automatically assign are the full complement of Gateway to Educational Materials (GEM) metadata tags -- from the nationally recognized consortium of organizations concerned with access to educational resources. The documents that comprise the sample for this research come from the Eisenhower National Clearinghouse on Science and Mathematics. | |||
| Building the Physical Sciences Information Infrastructure, A Phased Approach | | BIBAK | PDF | 465 | |
| Judy C. Gilmore; Valerie S. Allen | |||
| In 2000, a vision of a Physical Sciences Information Infrastructure - an
integrated network for the physical sciences - was captured and endorsed. Work
continues in 2001 as partnerships are formed and strategies are formulated to
move the vision forward. Keywords: federal agencies, physical sciences | |||
| Development of an Earth Environmental Digital Library System for Soil and Land-Atmospheric Data | | BIBAK | PDF | 466 | |
| Eiji Ikoma; Taikan Oki; Masaru Kitsuregawa | |||
| We propose and examine new methods for automatic data loading system and
flexible user interface system with many features such as 3D visualization. We
implement the earth environmental digital library and operate it on the Web.
Though our system is focusing the limited users like earth environmental
researchers, more than 8000 hits per month describe the practical usefulness of
it. Keywords: Experimentation; VRML, digital library, user interface | |||
| Digital Facsimile Editions and On-Line Editing | | BIBA | PDF | 467 | |
| Harry Plantinga | |||
| Digitizing a large collection of books is an expensive and time-consuming task -- but there may be volunteers all over the world who are willing to do a small portion of the task. This poster describes a system for making digital facsimile editions-e-books consisting of page images and OCRed but uncorrected text. The user can choose to view low or high resolution page images or text for each page or search the text. Authenticated users with little or no training can correct the text on-line, and the corrections are incorporated in the document. Source code is available for the described implementation, which is a part of the Christian Classics Ethereal Library (http://www.ccel.org). | |||
| DSpace at MIT: Meeting the Challenges | | BIBAK | PDF | 468 | |
| Michael J. Bass; Margret Branschofsky | |||
| DSpace is a joint development effort by HP and MIT to establish an
electronic system that will enable MIT faculty and researchers to capture,
preserve, manage, and disseminate their intellectual output, and that will
enable the Institute to maintain its intellectual heritage. The effort further
aims to facilitate sharing of intellectual content and metadata among
institutions by minimizing barriers to adoption and federation. This brief
paper describes the motivation behind the project, its goals, objectives,
progress, and references to detailed definition & design materials. Keywords: Information Systems -Information Storage and Retrieval - Library Automation
(H.3.6); Information Systems -Information Storage and Retrieval - Online
Information Services (H.3.5); Data - Data Structures (E.1); Design, Economics,
Experimentation, Legal Aspects, Management; application service platform,
architecture, archive, digital libraries, digital media, federation, metadata,
repository | |||
| Exploiting Image Semantics for Picture Libraries | | BIBAK | PDF | 469 | |
| Kobus Barnard; David Forsyth | |||
| We consider the application of a system for learning the semantics of image
collections to digital libraries. We discuss our approach to browsing and
search, and investigate the integration both in more detail. Keywords: Information Systems - Information Storage and Retrieval (H.3); Algorithms,
Human Factors, Performance; digital libraries, hierarchical image clustering | |||
| Feature Extraction for Content-Based Image Retrieval in DARWIN | | BIB | PDF | 470 | |
| K. R. Debure; A. S. Russell | |||
| Guided Linking: Efficiently Making Image-to-Transcript Correspondence | | BIBAK | PDF | 471 | |
| Cheng Jiun Yuan; W. Brent Seales | |||
| The problem of annotating unstructured images is labor intensive and
difficult to automate. Linking is a type of annotation where an image region is
tagged by representing a correspondence between the region and other
information. Any serious effort at creating a digital edition of a manuscript
from nothing but images and their associated information, such as transcripts
and editorial remarks, must include the task of creating a large number of
links between image regions and the related information. We present an approach
to the problem of image linking, which concentrates on the fundamental and
labor-intensive task of associating image regions with their textual
counterparts. We assume the input to the system is a set of images representing
a manuscript, and that associated data, such as a transcript, is available to
provide guidance to the automated portion of the system. Our approach targets
collections that are damaged and difficult-to-read, such as manuscripts that
require intensive editorial annotation. It is essentially impossible to perform
fully automated techniques, such as optical character recognition (OCR) or
accurate handwriting analysis [2], on these kinds of manuscripts. Keywords: application, digital libraries, humanities computing, image analysis,
image/text correspondence | |||
| Integrating Digital Libraries by CORBA, XML and Servlet | | BIBA | PDF | 472 | |
| Wing Hang Cheung; Michael R. Lyu; Kam Wing Ng | |||
| In this paper, we describe how we use a mediator-based architecture for integrating digital libraries. We discuss how we tackle the obstacles of firewalls in the expansion of our system by using XML and Java Servlet, which are used to achieve CORBA general communications and callback features across the firewalls. | |||
| A National Digital Library for Undergraduate Mathematics and Science Teacher Preparation and Professional Development | | BIBAK | PDF | 473 | |
| Kimberly S. Roempler | |||
| The primary goal of the National Digital Library for Undergraduate
Mathematics and Science Teacher Preparation and Professional Development,
funded through the NSF Division of Undergraduate Education National Science
Digital Libraries Initiative, is to increase the use of best teaching practices
by faculty by providing the resources - tools, training, and data - needed to
build inquiry and discovery into all undergraduate science and mathematics
courses. Improving the math and science education of future and in-service K-12
teachers is one of the most important challenges facing college and university
faculties.
The preparation of future teachers is a fundamental element in the improvement of the learning experience of all students, from grades K-16. As teachers know, it is natural to teach as we have been taught ourselves. The standards in mathematics and science call for greater integration of inquiry-based techniques and more rigorous mathematical and science content. Teachers at all levels will be better equipped to meet these standards if they are taught using these approaches during their own education. Keywords: inservice teachers, mathematics education, pedagogy, preservice teachers,
science education, teacher preparation | |||
| Print to Electronic: Measuring the Operational and Economic Implications of an Electronic Journal Collection | | BIBAK | PDF | 474 | |
| Carol Hansen Montgomery; Linda S. Marion | |||
| In this poster, we report methodology and initial results from a study of an
academic library's migration to an all-electronic journal collection. Keywords: Computing Milieux -Management of Computing and Information Systems - General
(K.6.0): Economics; Economics, Measurement, Management; academic library,
digital library, electronic journals | |||
| Turbo Recognition: Decoding Page Layout | | BIB | PDF | 475 | |
| Taku A. Tokuyasu | |||
| Using Markov Models and Innovation-Diffusion as a Tool for Predicting Digital Library Access and Distribution | | BIBAK | PDF | 476 | |
| Bruce R. Barkstrom | |||
| This paper, discusses a general approach to predicting data access rates and
user access patterns for planning distribution capacities and for monitoring
data usage. The approach uses a steady-state Markov model to describe user
activities and innovation-diffusion to describe the rate at which a naive
population adopts accessing data from a digital library. Keywords: EOSDIS, Markov models, innovation-diffusion, user access patterns, user
access rates, user modeling | |||
| A Versatile Facsimile and Transcription Service for Manuscripts and Rare Old Books at the Miguel de Cervantes Digital Library | | BIBA | PDF | 477 | |
| Alejandro Bia | |||
| The purpose of this poster is to describe our approach to provide facsimiles of manuscripts and old books as one of our DL services publicly available by Internet. | |||
| The Virtual Naval Hospital: The Digital Library as Knowledge Management Tool for Nomadic Patrons | | BIBAK | PDF | 478 | |
| Michael P. D'Alessandro; Richard S. Bakalar; Donna M. D'Alessandro; Denis E. Ashley; Mary J. C. Hendrix | |||
| To meet the information needs of isolated primary care providers and their
patients in the United States (U.S.) Navy, a digital health sciences library -
Virtual Naval Hospital (http://www.vnh.org) - was created through a unique
partnership between academia and government. The creation of the digital
library was heavily influenced by the principles of user-centered design, and
made allowances for the nomadic nature of the digital library's patrons and the
heterogeneous access they have to Internet bandwidth. The result is a digital
library that has been in operation since 1997, that continues to expand in
size, that is heavily used, and that is highly regarded by its patrons. Over
time, the digital library has evolved into a knowledge-management system for
the U.S. Navy Bureau of Medicine and Surgery. A number of valuable technical,
personal, and political lessons have been learned about delivering digital
library and knowledge management services to nomadic patrons. They can be
summarized by stating that to succeed in the design and implementation of a
digital library that serves as a knowledge management tool, regardless of the
field of endeavor, one must focus initially and then consistently on the
population served and what their mission is, and tailor the digital library to
their needs. If this is done, the result will be a tool that is heavily used
and sincerely appreciated. These lessons learned will become increasingly
valuable as society moves towards a ubiquitous computing environment. Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries
(H.3.7): Collection; Information Systems -Information Storage and Retrieval -
Digital Libraries (H.3.7): Dissemination; Information Systems -Information
Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Information
Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User
issues; Design, Human Factors, Measurement; case study, digital libraries,
knowledge management, lessons learned, nomadic computing | |||
| Tutorial 1: Practical Digital Libraries Overview | | BIB | PDF | 479 | |
| Edward A. Fox | |||
| Tutorial 2: Evaluating, Using, and Publishing eBooks | | BIB | PDF | 479 | |
| Gene Golovchinsky; Cathy Marshall; Elli Mylonas | |||
| Tutorial 3: Thesauri and Ontologies | | BIB | PDF | 479 | |
| Dagobert Soergel | |||
| Tutorial 4: How to Build a Digital Library Using Open-Source Software | | BIB | PDF | 480 | |
| Ian H. Witten | |||
| Hands-On Workshop: Build Your Own Digital Library Collections | | BIB | PDF | 480 | |
| Ian H. Witten; David Bainbridge | |||
| Tutorial 6: Building Interoperable Digital Libraries: A Practical Guide to Creating Open Archives | | BIB | PDF | 480 | |
| Hussein Suleman | |||
| Workshop 1: Visual Interfaces to Digital Libraries -- Its Past, Present, and Future | | BIBAK | PDF | 482 | |
| Katy Borner; Chaomei Chen | |||
| The design of easy-to-use and informative visual interfaces to digital
libraries is an integral part to the advances of digital libraries. A wide
range of approaches have been developed from a diverse spectrum of perspectives
that focus on users and tasks to be supported, data to be modeled, and the
efficiency of algorithms. Information visualization aims to exploit the human
visual information processing system, especially with non-spatial data (such as
documents and images typically found in digital libraries). Generally,
information visualization examines semantic relationships intrinsic to an
abstract information space and how they can be spatially navigated and
memorized using similar cognitive processes to those that would apply during
interactions with the real world. This workshop promotes the convergence of
information visualization and digital libraries. It brings together researchers
and practitioners in the areas of information visualization, digital libraries,
human-computer interaction, library and information science, and computer
science to identify the most important issues in the past and the present, and
what should be done in the future. Keywords: cognitive psychology, digital libraries, human-computer interaction,
information visualization, usability studies | |||
| Workshop 2: The Technology of Browsing Applications | | BIBA | PDF | 483 | |
| Nina Wacholder; Craig Nevill Manning | |||
| Phrase browsing applications provide information seekers with access to text
content via structured lists of index terms. These lists provide a preview of
the content of a collection. The index terms, which may be identified by a
variety of techniques, are phrases that represent important concepts referred
to in a document or collection of documents. The browsing system supports
interactive navigation and organization of the phrases.
The goal of this workshop is to bring together researchers interested in any aspect of phrase browsing technology, including, but not limited to, identification of index terms, techniques for hierarchical organization of the terms, implementation of efficient systems, usability of browsing applications, and techniques for evaluating this technology. | |||
| Workshop 3: Classification Crosswalks | | BIBAK | PDF | 484 | |
| Paul Thompson; Traugott Koch; John Carter; Heike Neuroth; Ed O'Neill; Dagobert Soergel | |||
| Mapping between/among classification schemes is beneficial within an
organization that has a number of implicit schemes, between organizations
seeking to exchange information, and in a digital library context where
collections are organized by different classifications. This cross scheme
mapping could be done manually, but if many schemes are to be mapped, it may be
desirable to provide automated tools and techniques to support the process.
This workshop will present research and projects that identify the
state-of-the-practice and outline the research agenda.
In addition to the educational part of the program, the afternoon will be devoted to ongoing NKOS activities related to a vocabulary mark-up language, mechanisms for search and retrieval of online knowledge organization sources, and a typology for describing knowledge organization sources that supports the development of knowledge organization services on the Web. The program is available from the NKOS Web site at http://nkos.slis.kent.edu. Keywords: classification schemes, controlled vocabularies, digital libraries,
vocabulary integration tools | |||
| Workshop 4: Digital Libraries in Asian Languages | | BIB | -- | |
| Su-Shing Chen; Ching-chih Chen | |||
| Workshop 5: Information Visualization for Digital Libraries: Defining a Research Agenda for Heterogeneous Multimedia Collections | | BIBAK | -- | |
| Lucy Nowell; Elizabeth Hetzler | |||
| This workshop will emphasize small group discussion and brainstorming to
explore issues of visualization for heterogeneous digital libraries. The power
of visualization lies in its ability to convey information at the high
bandwidth of the human perceptual system, facilitating recognition of patterns
in the information space, and supporting navigation in large collections. How
do we extend these benefits to collections that span the range of digital
media? Participants will explore this issue, with the aim of identifying a
research agenda. Keywords: heterogeneous digital libraries, human computer interaction, multimedia,
visualization | |||