HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

JCDL'01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries

Fullname:ACM/IEEE Joint Conference on Digital Libraries
Location:Roanoke, Virginia, USA
Dates:2001-Jun-24 to 2001-Jun-28
Publisher:ACM
Standard No:ACM ISBN 1-58113-456-6; ACM Order Number 606012; ACM DL: Table of Contents hcibib: DL01
Papers:122
Pages:485
  1. Methods for Classifying & Organizing Content in Digital Libraries
  2. Digital Libraries for Education: Technology, Services, & User Studies
  3. Panel
  4. Approaches to Interoperability Among Digital Libraries
  5. Digital Libraries and the Web: Technology and Trust
  6. Panel
  7. Tools for Constructing and Using Digital Libraries
  8. Systems Design and Evaluation for Undergraduate Learning Environments
  9. Panel
  10. Studying the Users of Digital Libraries: Formative and Summative Evaluations
  11. Digital Library Collections: Policies and Practices
  12. Panel
  13. Studying the Users of Digital Libraries: Qualitative Approaches
  14. Techniques for Managing Distributed Collections
  15. Panel
  16. The Sound of Digital Libraries: Audio, Music, and Speech
  17. Information Search and Retrieval in Digital Libraries
  18. Panel
  19. Digital Video Libraries: Design and Access
  20. Systems Design and Architecture for Digital Libraries
  21. Panel
  22. Digital Preservation: Technology, Economics, and Policy
  23. Scholarly Communication and Digital Libraries
  24. Panel
  25. Designing Digital Libraries for Education: Technology, Services and User Studies
  26. Applications of Digital Libraries in the Humanities
  27. Panel
  28. Demonstrations
  29. Posters
  30. Tutorials
  31. Workshops

Methods for Classifying & Organizing Content in Digital Libraries

Integrating Automatic Genre Analysis into Digital Libraries BIBAKPDF 1-10
  Andreas Rauber; Alexander Muller-Kogler
With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in differing colors. This representation supports users intuitively in locating relevant information presented in a relevant form.
Keywords: SOMLib, document clustering, genre analysis, metaphor graphics, self-organizing map (SOM), visualization
Text Categorization for Multi-Page Documents: A Hybrid Naive Bayes HMM Approach BIBAKPDF 11-20
  Paolo Frasconi; Giovanni Soda; Alessandro Vullo
Text categorization is typically formulated as a concept learning problem where each instance is a single isolated document. In this paper we are interested in a more general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. We describe a method for classifying pages of sequential OCR text documents into one of several assigned categories and suggest that taking into account contextual information provided by the whole page sequence can significantly improve classification accuracy. The proposed architecture relies on hidden Markov models whose emissions are bag-of-words according to a multinomial word event model, as in the generative portion of the Naive Bayes classifier. Our results on a collection of scanned journals from the Making of America project confirm the importance of using whole page sequences. Empirical evaluation indicates that the error rate (as obtained by running a plain Naive Bayes classifier on isolated page) can be roughly reduced by half if contextual information is incorporated.
Keywords: Computing Methodologies -Artificial Intelligence - Learning (I.2.6); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Computing Methodologies -Document and Text Processing - Miscellaneous (I.7.m); Algorithms, Performance; hidden Markov models, multi-page documents, naive Bayes classifier, text categorization
Automated Name Authority Control BIBAKPDF 21-22
  James W. Warner; Elizabeth W. Brown
This paper describes a system for the automated assignment of authorized names. A collaboration between a computer scientist and a librarian, the system provides for enhanced end-user searching of digital libraries without increasing drastically the cost and effort of creating a digital library. It is a part of the workflow management system of the Levy Sheet Music Project.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); automation, indexing, metadata, name authority control, workflow management
Automatic Event Generation from Multi-Lingual News Stories BIBAKPDF 23-24
  Kin Hui; Wai Lam; Helen M. Meng
We propose a novel approach for automatic generation of topically-related events from multi-lingual news sources. Named entity terms are extracted automatically from the news content. Together with the content terms, they constitute the basis of representing the story. We employ transformation-based linguistic tagging approach for named entity extraction. Two methods of gross translation on Chinese story representation into English have been implemented. The first approach uses only a bilingual dictionary. The second method makes use of a parallel corpus as an additional resource. Unsupervised learning is employed to discover the events.
Keywords: event detection, event discovery, multilingual text processing

Digital Libraries for Education: Technology, Services, & User Studies

Linked Active Content: A Service for Digital Libraries for Education BIBAKPDF 25-32
  David Yaron; D. Jeff Milton; Rebecca Freeland
A service is described to help enable digital libraries for education, such as the NSDL, to serve as collaboration spaces for the creation, modification and use of active learning experiences. The goal is to redefine the line between those activities that fall within the domain of computer programming and those that fall within the domain of content authoring. The current location of this line, as defined by web technologies, is such that far too much of the design and development process is in the domain of software creation. This paper explores the definition and use of "linked active content", which builds on the hypertext paradigm by extending it to support active content. This concept has community development advantages, since it provides an authoring paradigm that supports contributions from a more diverse audience, including especially those who have substantial classroom and pedagogical expertise but lack programming expertise. It also promotes the extraction of content from software so that collections may be better organized and more easily repurposed to meet the needs of a diverse audience of educators and students.
Keywords: Computing Milieux -Computers and Education - Computer and Information Science Education (K.3.2); Experimentation, Human Factors; active learning, education, web authoring
A Component Repository for Learning Objects: A Progress Report BIBAKPDF 33-40
  Jean R. Laleuf; Anne Morgan Spalter
We believe that an important category of SMET digital library content will be highly interactive, explorable microworlds for teaching science, mathematics, and engineering concepts. Such environments have proved extraordinarily time-consuming and difficult to produce, however, threatening the goals of widespread creation and use.
   One proposed solution for accelerating production has been the creation of repositories of reusable software components or learning objects. Programmers would use such components to rapidly assemble larger-scale environments. Although many agree on the value of this approach, few repositories of such components have been successfully created. We suggest some reasons for the lack of expected results and propose two strategies for developing such repositories. We report on a case study that provides a proof of concept of these strategies.
Keywords: NSDL, components, design, digital library, education, learning objects, reuse, software engineering, standards
Designing E-Books for Legal Research BIBAKPDF 41-48
  Catherine C. Marshall; Morgan N. Price; Gene Golovchinsky; Bill N. Schilit
In this paper we report the findings from a field study of legal research in a first-tier law school and on the resulting redesign of XLibris, a next-generation e-book. We first characterize a work setting in which we expected an e-book to be a useful interface for reading and otherwise using a mix of physical and digital library materials, and explore what kinds of reading-related functionality would bring value to this setting. We do this by describing important aspects of legal research in a heterogeneous information environment, including mobility, reading, annotation, link following and writing practices, and their general implications for design. We then discuss how our work with a user community and an evolving e-book prototype allowed us to examine tandem issues of usability and utility, and to redesign an existing e-book user interface to suit the needs of law students. The study caused us to move away from the notion of a stand-alone reading device and toward the concept of a document laptop, a platform that would provide wireless access to information resources, as well as support a fuller spectrum of reading-related activities.
Keywords: digital libraries, e-books, field study, information appliances, legal education, legal research, physical and digital information resources

Panel

The Open Archives Initiative: Perspectives on Metadata Harvesting BIBAPDF 49
  James B. Lloyd; Tim Cole; Donald Waters; Caroline Arms; Simeon Warner; Jeffrey Young
The Open Archives Initiative [www.openarchives.org] has developed a metadata harvesting protocol to further its aim of efficient dissemination of content through interoperability standards. In early 2001, at meetings in the U.S. and Europe, the version of the protocol to be used for beta testing was announced. The HTTP-based protocol uses URLs for queries and XML for responses. The default metadata record structure is unqualified Dublin Core using a specified XML Schema. This simple metadata record form is intended to support cross-domain discovery; other record structures for which XML Schemas are defined can also be made available. Developments during the beta test should include the creation of OAI-compliant repositories (data providers) and harvesters (service providers). This panel will explore the purpose and evolution of the Open Archives Initiative from the point of view of various stakeholders, with emphasis on developments during 2001.

Approaches to Interoperability Among Digital Libraries

Mapping the Interoperability Landscape for Networked Information Retrieval BIBAKPDF 50-51
  William E. Moen
Interoperability is a fundamental challenge for networked information discovery and retrieval. Often treated monolithically in the literature, interoperability is multifaceted and can be analyzed into different types and levels. This paper discusses an approach to map the interoperability landscape for networked information retrieval as part of an interoperability assessment research project.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Standardization; interoperability, networked information discovery and retrieval, testbeds
Distributed Resource Discovery: Using Z39.50 to Build Cross-Domain Information Servers BIBAKPDF 52-53
  Ray R. Larson
This short paper describes the construction and application of Cross-Domain Information Servers using features of the standard Z39.50 information retrieval protocol[11]. We use the Z39.50 Explain Database to determine the databases and indexes of a given server, then use the SCAN facility to extract the contents of the indexes. This information is used to build "collection documents" that can be retrieved using probabilistic retrieval algorithms.
Keywords: cross-domain resource discovery, distributed information retrieval, distributed search
The Open Archives Initiative: Building a Low-Barrier Interoperability Framework BIBAKPDF 54-62
  Carl Lagoze; Herbert Van de Sompel
The Open Archives Initiative (OAI) develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. The roots of the OAI lie in the E-Print community. Over the last year its focus has been extended to include all content providers. This paper describes the recent history of the OAI - its origins in promoting E-Prints, the broadening of its focus, the details of its technical standard for metadata harvesting, the applications of this standard, and future plans.
Keywords: Software -Software Engineering - Interoperability (D.2.12); Experimentation, Standardization; digital libraries, interoperability, metadata, protocols
Enforcing Interoperability with the Open Archives Initiative Repository Explorer BIBAKPDF 63-64
  Hussein Suleman
The Open Archives Initiative (OAI) is an organization dedicated to solving problems of digital library interoperability by defining simple protocols, most recently for the exchange of metadata. The success of such an activity requires vigilance in specification of the protocol as well as standardization of implementation. The lack of standardized implementation is a substantial barrier to interoperability in many existing client/server protocols. To avoid this pitfall we developed the Repository Explorer, a tool that supports manual and automated protocol testing. This tool has a significant impact on simplifying development of interoperability interfaces and increasing the level of confidence of early adopters of the technology, thus exemplifying the positive impact of exhaustive testing and quality assurance on interoperability ventures.
Keywords: Software -Software Engineering - Interoperability (D.2.12); Computer Systems Organization -Computer-Communication Networks - Network Protocols (C.2.2); Experimentation, Reliability, Standardization, Verification; interoperability, protocol, testing, validation
Arc: An OAI Service Provider for Cross-Archive Searching BIBAKPDF 65-66
  Xiaoming Liu; Kurt Maly; Mohammad Zubair; Michael L. Nelson
The usefulness of the many on-line journals and scientific digital libraries that exist today is limited by the lack of a service that can federate them through a unified interface. The Open Archive Initiative (OAI) is one major effort to address technical interoperability among distributed archives. The objective of OAI is to develop a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience and lessons learned in building Arc, the first federated searching service based on the OAI protocol. Arc harvests metadata from several OAI compliant archives, normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle). At present we have over 165K metadata records from 16 data providers from various domains.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Collection; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Standards; Design, Experimentation, Languages, Standardization; digital library, open archive initiative

Digital Libraries and the Web: Technology and Trust

Managing Change on the Web BIBAKPDF 67-76
  Luis Francisco-Revilla; Frank Shipman; Richard Furuta; Unmil Karadkar; Avital Arora
Increasingly, digital libraries are being defined that collect pointers to World-Wide Web based resources rather than hold the resources themselves. Maintaining these collections is challenging due to distributed document ownership and high fluidity. Typically a collections maintainer has to assess the relevance of changes with little system aid. In this paper, we describe the Waldens Paths Path Manager, which assists a maintainer in discovering when relevant changes occur to linked resources. The approach and system design was informed by a study of how humans perceive changes of Web pages. The study indicated that structural changes are key in determining the overall change and that presentation changes are considered irrelevant.
Keywords: Computing Methodologies -Computer Graphics - Three-Dimensional Graphics and Realism (I.3.7); Information Systems -Information Interfaces and Presentation - Hypertext/Hypermedia (H.5.4); Algorithms, Design, Experimentation, Management, Reliability, Verification; Walden's path, path maintenance
Measuring the Reputation of Web Sites: A Preliminary Exploration BIBAKPDF 77-78
  Greg Keast; Elaine G. Toms; Joan Cherry
We describe the preliminary results from a pilot study, which assessed the perceived reputation - authority and trustworthiness - of the output from five WWW indexing/ranking tools. The tools are based on three techniques: external link structures, internal content, or human selection/indexing. Twenty-two participants reviewed the output from each tool and assessed the reputation of the retrieved sites.
Keywords: Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3); Experimentation, Measurement, Performance, Reliability; Lycos, TOPIC, Yahoo, alta vista, authority, evaluation, google, reputation, web sites
Personalized Spiders for Web Search and Analysis BIBAKPDF 79-87
  Michael Chau; Daniel Zeng; Hsinchun Chen
Searching for useful information on the World Wide Web has become increasingly difficult. While Internet search engines have been helping people to search on the web, low recall rate and outdated indexes have become more and more problematic as the web grows. In addition, search tools usually present to the user only a list of search results, failing to provide further personalized analysis which could help users identify useful information and comprehend these results. To alleviate these problems, we propose a client-based architecture that incorporates noun phrasing and self-organizing map techniques. Two systems, namely CI Spider and Meta Spider, have been built based on this architecture. User evaluation studies have been conducted and the findings suggest that the proposed architecture can effectively facilitate web search and analysis.
Keywords: Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3); Design, Experimentation; information retrieval, internet searching and browsing, internet spider, noun-phrasing, personalization, self-organizing map
Salticus: Guided Crawling for Personal Digital Libraries BIBAKPDF 88-89
  Robin Burke
In this paper, we describe Salticus, a web crawler that learns from users web browsing activity. Salticus enables users to build a personal digital library by collecting documents and generalizing over the user's choices.
Keywords: business intelligence, crawling, document acquisition, personal digital library

Panel

Different Cultures Meet: Lessons Learned in Global Digital Library Development BIBAPDF 90-93
  Ching Chen; Wen Gao; Hsueh-hua Chen; Li-Zhu Zhou; Von-Wun Soo
This panel is organized to share the experience gained and lessons learned in developing cutting-edge technology applications and digital libraries when different cultures meet together. "Culture" is interpreted in different ways and different context. This include the interdisciplinary collaboration among professionals from different fields with their own cultures -- such as library/information science, computer science, humanities, social sciences, science and technology, etc; to more globally as experienced in major international collaborative projects involving R&D professionals from two or more different cultures -- the East and the West, or the North and the South.

Tools for Constructing and Using Digital Libraries

Power to the People: End-User Building of Digital Library Collections BIBAPDF 94-103
  Ian H. Witten; David Bainbridge; Stefan J. Boddie
Naturally, digital library systems focus principally on the reader: the consumer of the material that constitutes the library. In contrast, this paper describes an interface that makes it easy for people to build their own library collections. Collections may be built and served locally from the user's own web server, or (given appropriate permissions) remotely on a shared digital library host. End users can easily build new collections styled after existing ones from material on the Web or from their local files-or both, and collections can be updated and new ones brought on-line at any time. The interface, which is intended for non-professional end users, is modeled after widely used commercial software installation packages. Lest one quail at the prospect of end users building their own collections on a shared system, we also describe an interface for the administrative user who is responsible for maintaining a digital library installation.
Web-Based Scholarship: Annotating the Digital Library BIBAKPDF 104-105
  Bruce Rosenstock; Michael Gertz
The DL offers the possibility of collaborative scholarship, but the appropriate tools must be integrated within the DL to serve this purpose. We propose a Web-based tool to guide controlled data annotations that link items in the DL to a domain-specific ontology and which provide an effective means to query a data collection in an abstract and uniform fashion.
Keywords: Information Systems -Information Interfaces and Presentation - Group and Organization Interfaces (H.5.3); data annotations, folk literature DL
A Multi-View Intelligent Editor for Digital Video Libraries BIBAKPDF 106-115
  Brad A. Myers; Juan P. Casares; Scott Stevens; Laura Dabbish; Dan Yocum; Albert Corbett
Silver is an authoring tool that aims to allow novice users to edit digital video. The goal is to make editing of digital video as easy as text editing. Silver provides multiple coordinated views, including project, source, outline, subject, storyboard, textual transcript and timeline views. Selections and edits in any view are synchronized with all other views. A variety of recognition algorithms are applied to the video and audio content and then are used to aid in the editing tasks. The Informedia Digital Library supplies the recognition algorithms and metadata used to support intelligent editing, and Informedia also provides search and a repository. The metadata includes shot boundaries and a time-synchronized transcript, which are used to support intelligent selection and intelligent cut/copy/paste.
Keywords: digital video editing, informedia, multimedia authoring, silver, video library
VideoGraph: A New Tool for Video Mining and Classification BIBAPDF 116-117
  Jia-Yu Pan; Christos Faloutsos
This paper introduces VideoGraph, a new tool for video mining and visualizing the structure of the plot of a video sequence. The main idea is to "stitch" together similar scenes which are apart in time. We give a fast algorithm to do stitching and we show case studies, where our approach (a) gives good features for classification (91% accuracy), and (b) results in VideoGraphs which reveal the logical structure of the plot of the video clips.

Systems Design and Evaluation for Undergraduate Learning Environments

The Alexandria Digital Earth Prototype BIBAPDF 118-119
  Terence R. Smith; Greg Janee; James Frew; Anita Coleman
This note summarizes the system development activities of the Alexandria Digital Earth Prototype (ADEPT) Project.5 ADEPT and the Alexandria Digital Library (ADL) are, respectively, the research and operational components of the Alexandria Digital Library Project. The goal of ADEPT is to build a distributed digital library (DL) of personalized collections of geospatially referenced information. This DL is characterized by: (1) services for building, searching, and using personalized collections; (2) collections of georeferenced multimedia information, including dynamic simulation models of spatially distributed processes; and (3) user interfaces employing the concept of a "Digital Earth". Important near-term objectives for ADEPT are to build prototype collections that support undergraduate learning in physical, human, and cultural geography and related disciplines, and then to evaluate whether using such resources helps students learn to reason scientifically. Collections and services developed by ADEPT researchers will migrate to ADL as they mature.
Iscapes: Digital Libraries Environments for the Promotion of Scientific Thinking by Undergraduates in Geography BIBAKPDF 120-121
  Anne J. Gilliland-Swetland; Gregory L. Leazer
This paper reviews considerations associated with implementing the Alexandria Digital Earth Prototype (ADEPT) in undergraduate geography education by means of Iscapes (or Information landscapes). In particular, we are interested in how Iscapes might be used to promote scientific thinking by undergraduate students. Based upon an ongoing educational needs assessment, we present a set of conceptual principles that might selectively be implemented in the design of educational digital library environments.
Keywords: digital libraries, geography, scientific thinking, undergraduate education
Project ANGEL: An Open Virtual Learning Environment with Sophisticated Access Management BIBAKPDF 122-123
  John MacColl
This paper describes a new project funded in the UK by the Joint Information Systems Committee, to develop a virtual learning environment which combines a new awareness of internet sources such as bibliographic databases and full-text electronic journals with a sophisticated access management component which permits single sign-on authentication.
Keywords: Design, Standardization; access management, authentication, virtual learning environments
NBDL: A CIS Framework for NSDL BIBAKPDF 124-125
  Joe Futrelle; Su-Shing Chen; Kevin C. Chang
In this paper, we describe the NBDL (National Biology Digital Library) project, one of the six CIS (Core Integration System) projects of the NSF NSDL (National SMETE Digital Library) Program.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Algorithms, Design, Standardization; SMET education, digital library, federated search
Automatic Identification and Organization of Index Terms for Interactive Browsing BIBAKPDF 126-134
  Nina Wacholder; Dvid K. Evans; Judith L. Klavans
The potential of automatically generated indexes for information access has been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys 1961 [4]), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997 [1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart 1998 [9]). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content?
   The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications.
Keywords: browsing, genre, indexing, natural language processing, phrases

Panel

Digital Library Collaborations in a World Community BIBAPDF 135
  David Fulker; Sharon Dawes; Leonid Kalinichenko; Tamara Sumner; Constantino Thanos; Alex Ushakov
Digital libraries and their user communities are increasingly international in nature. However - though technological progress and global education have brought American and European communities closer - cross-cultural and other crosscutting issues impede the formation of world community on larger scales. The pertinent issues include: collaboration in the presence of language and cultural barriers, international copyrights, international revenue streams, and universal access. This panel will examine notions of "community" from a variety of theoretical and practical perspectives, and discuss lessons that can be gleaned from applications of the community concept. Topics are expected to include scalability, sustainability, regenerative cycles in healthy communities, and examples of digital-library efforts that have international potential or implications.

Studying the Users of Digital Libraries: Formative and Summative Evaluations

Public Use of Digital Community Information Systems: Findings from A Recent Study with Implications for System Design BIBAKPDF 136-143
  Karen E. Pettigrew; Joan C. Durrance
The Internet has considerably empowered libraries and changed common perception of what they entail. Public libraries, in particular, are using technological advancements to expand their range of services and enhance their civic roles. Providing community information (CI) in innovative, digital forms via community networks is one way in which public libraries are facilitating everyday information needs. These networks have been lauded for their potential to strengthen physical communities through increasing information flow about local services and events, and through facilitating civic interaction. However, little is known about how the public uses such digital services and what barriers they encounter. This paper presents findings about how digital CI systems benefit physical communities based on extensive case studies in three states. At each site, rich data were collected using online surveys, field observation, in-depth interviews and focus groups with Internet users, human service providers and library staff. Both the online survey and the follow-up interviews with respondents were based on sense-making theory. In our paper we discuss our findings regarding: (1) how the public is using digital CI systems for daily problem solving, and (2) the types of barriers they encounter. Suggestions for improving digital CI systems are provided.
Keywords: Human Factors, Measurement, Performance, Theory; barriers, community information, community networks, information behavior, qualitative methods, sensemaking
Evaluating the Distributed National Electronic Resource BIBAKPDF 144-145
  Peter Brophy; Shelagh Fisher
The UKs development of a Distributed National Electronic Resource (DNE R) is being subjected to intensive formative evaluation by a multi-disciplinary team. In this paper the Project Director reports on initial actions designed to characterise the DNER from multi-stakeholder perspectives.
Keywords: Computer Systems Organization -Computer-Communication Networks - Network Architecture and Design (C.2.1); Computer Systems Organization -Computer-Communication Networks - Distributed Systems (C.2.4); Design, Economics, Human Factors, Measurement, Management, Performance, Reliability, Verification; distributed collections, evaluation, information environments
Collaborative Design with Use Case Scenarios BIBAKPDF 146-147
  Lynne Davis; Melissa Dawe
Digital libraries, particularly those with a community-based governance structure, are best designed in a collaborative setting. In this paper, we compare our experience using two design methods: a Task-centered method that draws upon a group's strength for eliciting and formulating tasks, and a Use Case method that tends to require a focus on defining an explicit process for tasks. We discuss how these methods did and did not work well in a collaborative setting.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Design, Experimentation, Human Factors; collaboration, design, methodology, task-centered, use case
Human Evaluation of Kea, An Automatic Keyphrasing System BIBAKPDF 148-156
  Steve Jones; Gordon W. Paynter
This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this paper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First, they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Computing Methodologies -Artificial Intelligence - Natural Language Processing (I.2.7); Algorithms, Experimentation, Performance; author keyphrases, digital libraries, keyphrase extraction, subjective evaluation, user interface

Digital Library Collections: Policies and Practices

Community Design of DLESE's Collections Review Policy: A Technological Frames Analysis BIBAKPDF 157-164
  Michael Khoo
In this paper, I describe the design of a collection review policy for the Digital Library for Earth System Education (DLESE). A distinctive feature of DLESE as a digital library is the DLESE community, composed of voluntary members who contribute metadata and resource reviews to DLESE. As the DLESE community is open, the question of how to evaluate community contributions is a crucial part of the review policy design process. In this paper, technological frames theory is used to analyse this design process by looking at how the designers work with two differing definitions of the peer reviewer, (a) peer reviewer as arbiter or editor, and (b) peer reviewer as colleague. Content analysis of DLESE documents shows that these frames can in turn be related to two definitions that DLESE offers of itself: DLESE as a library, and DLESE as a digital artifact. The implications of the presence of divergent technological frames for the design process are summarised, and some suggestions for future research are outlined.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Design, Human Factors; content analysis, decision making, design, digital library, ethnography, peer review, technological frames
Legal Deposit of Digital Publications: A Review of Research and Development Activity BIBAKPDF 165-173
  Adrienne Muir
There is a global trend towards extending legal deposit to include digital publications in order to maintain comprehensive national archives. However, including digital publications in legal deposit regulation is not enough to ensure the long-term preservation of these publications. Concepts, principles and practices accepted and understood in the print environment, may have new meanings or no longer be appropriate in a networked environment. Mechanisms for identifying, selecting and depositing digital material either do not exist, or are inappropriate, for some kinds of digital publication. Work on developing digital preservation strategies is at an early stage. National and other deposit libraries are at the forefront of research and develop in this area, often working in partnership with other libraries, publishers and technology vendors. Most work is of a technical nature. There is some work on developing policies and strategies for managing digital resources. However, not all management issues or users needs are being addressed.
Keywords: Computing Milieux -Legal Aspects of Computing - General (K.5.0); Legal Aspects, Management; digital preservation, digital publications, legal deposit
Comprehensive Access to Printed Materials (CAPM) BIBAKPDF 174-175
  G. Sayeed Choudhury; Mark Lorie; Erin Fitzpatrick; Ben Hobbs; Greg Chirikjian; Allison Okamura; Nicholas E. Flores
The CAPM Project features the development and evaluation of an automated, robotic on-demand scanning system for materials at remote locations. To date, we have developed a book retrieval robot and a valuation analysis framework for evaluating CAPM. We intend to augment CAPM by exploring approaches for automated page turning and improved valuation. These extensions will results in a more fully automated CAPM system and a valuation framework that will not only be useful for assessing CAPM specifically, but also for library services and functions generally.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Design, Economics, Experimentation, Measurement; browsing, digital conversion, digital preservation, evaluation methods, information economics, paper manipulation, robotics
Technology and Values: Lessons from Central and Eastern Europe BIBAKPDF 176-177
  Nadia Caidi
Technology does not develop independently of its social context. Rather, there is a range of social, cultural and economic factors (in addition to technical factors) that define the parameters for the development and use of technologies. This paper presents a case study of the social shaping of one aspect of digital libraries, the development of national union catalogs (NUC), in four countries of Central and Eastern Europe (CEE). It examines the specific choices and values that are embedded in the design of a NUC, and how these might be transferred to other cultural contexts.
Keywords: central and eastern europe, information infrastructure, national union catalogs, social shaping of technology

Panel

A Digital Strategy for the Library of Congress: Discussion of the LC21 Report and the Role of the Digital Library Community BIBAPDF 178
  Alan Inouye; Margaret Hedstrom; Dale Flecker; David Levy
Digital libraries challenge the core practices of libraries and archives in many respects, not only in terms of accommodating digital information and technology, but also through the need to develop new economic and organizational models. As the world's largest library, the Library of Congress (LC) perhaps faces the most profound questions of how to collect, catalog, preserve, and provide access to digital resources. LC asked the Computer Science and Telecommunications Board of the National Academies for advice in this area by commissioning the study that culminated with the publication of LC21: A Digital Strategy for the Library of Congress. The panelists at this session will provide a brief summary of the LC21 report, review developments subsequent to the publication of LC21, and offer their thoughts on how the library community and information industry could engage LC to the benefit of the nation.

Studying the Users of Digital Libraries: Qualitative Approaches

Use of Multiple Digital Libraries: A Case Study BIBAKPDF 179-188
  Ann Blandford; Hanna Stelmaszewska; Nick Bryan-Kinns
The aim of the work reported here was to better understand the usability issues raised when digital libraries are used in a natural setting. The method used was a protocol analysis of users working on a task of their own choosing to retrieve documents from publicly available digital libraries. Various classes of usability difficulties were found. Here, we focus on use in context - that is, usability concerns that arise from the fact that libraries are accessed in particular ways, under technically and organisationally imposed constraints, and that use of any particular resource is discretionary. The concepts from an Interaction Framework, which provides support for reasoning about patterns of interaction between users and systems, are applied to understand interaction issues.
Keywords: HCI, digital libraries, interaction modelling, video protocols
An Ethnographic Study of Technical Support Workers: Why We Didn't Build a Tech Support Digital Library BIBAKPDF 189-198
  Sally Jo Cunningham; Chris Knowles; Nina Reeves
In this paper we describe the results of an ethnographic study of the information behaviours of university technical support workers and their information needs. The study looked at how the group identified, located and used information from a variety of sources to solve problems arising in the course of their work. The results of the investigation are discussed in the context of the feasibility of developing a potential information base that could be used by all members of the group. Whilst a number of their requirements would easily be fulfilled by the use of a digital library, other requirements would not. The paper illustrates the limitations of a digital library with respect to the information behaviours of this group of subjects and focuses on why a digital library would not appear to be the ideal support tool for their work.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Software -Software Engineering - Requirements/Specifications (D.2.1); Design, Human Factors; ethnography, requirements analysis, user studies
Developing Recommendation Services for a Digital Library with Uncertain and Changing Data BIBAKPDF 199-200
  Gary Geisler; David McArthur; Sarah Giersch
In developing recommendation services for a new digital library called iLumina (www.ilumina-project.org), we are faced with several challenges related to the nature of the data we have available. The availability and consistency of data associated with iLumina is likely to be highly variable. Any recommendation strategy we develop must be able to cope with this fact, while also being robust enough to adapt to additional types of data available over time as the digital library develops. In this paper we describe the challenges we are faced with in developing a system that can provide our users with good, consistent recommendations under changing and uncertain conditions.
Keywords: digital library, recommender system, user services
Evaluation of DEFINDER: A System to Mine Definitions from Consumer-Oriented Medical Text BIBAKPDF 201-202
  Judith L. Klavans; Smaranda Muresan
In this paper we present DEFINDER, a rule-based system that mines consumer-oriented full text articles in order to extract definitions and the terms they define. This research is part of Digital Library Project at Columbia University, entitled PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video and Language resources) [5]. One goal of the project is to present information to patients in language they can understand. A key component of this stage is to provide accurate and readable lay definitions for technical terms, which may be present in articles of intermediate complexity.
   The focus of this short paper is on quantitative and qualitative evaluation of the DEFINDER system [3]. Our basis for comparison was definitions from Unified Medical Language System (UMLS), On-line Medical Dictionary (OMD) and Glossary of Popular and Technical Medical Terms (GPTMT). Quantitative evaluations show that DEFINDER obtained 87% precision and 75% recall and reveal the incompleteness of existing resources and the ability of DEFINDER to address gaps. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-based criteria of usability and readability than definitions from on-line specialized dictionaries. Thus the output of DEFINDER can be used to enhance existing specialized dictionaries, and also as a key feature in summarizing technical articles for non-specialist users.
Keywords: automatic dictionary creation, medical digital libraries, natural language processing, text data mining

Techniques for Managing Distributed Collections

Overview of the Virtual Data Center Project and Software BIBAKPDF 203-204
  Micah Altman; L. Andreev; M. Diggory; G. King; E. Kolster; A. Sone; S. Verba; Daniel Kiskis; M. Krot
In this paper, we present an overview of the Virtual Data Center (VDC) software, an open-source digital library system for the management and dissemination of distributed collections of quantitative data. (see ). The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies, including facilities for the storage, archiving, cataloging, translation, and on-line analysis of a particular collection. Moreover, the system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed virtual collections of remote objects.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Design, Management, Standardization; numeric data, open-source, warehousing
Digital Libraries and Data Scholarship BIBAKPDF 205-206
  Bruce R. Barkstrom
In addition to preserving and retrieving digital information, digital libraries need to allow data scholars to create post-publication references to objects within files and across collections of files. Such references can serve as new metadata in their own right and should also provide methods for efficiently extracting the subset of the original data that belongs to the object. This paper discusses some ideas about the requirements for such references within the context of long-term, active archival, where neither the data format nor the institutional basis can be guaranteed to remain constant.
Keywords: EOSDIS, data scholarship, digital libraries, object data references, structural data reference
SDLIP + STARTS = SDARTS A Protocol and Toolkit for Metasearching BIBAKPDF 207-214
  Noah Green; Panagiotis G. Ipeirotis; Luis Gravano
In this paper we describe how we combined SDLIP and STARTS, two complementary protocols for searching over distributed document collections. The resulting protocol, which we call SDARTS, is simple yet expressible enough to enable building sophisticated metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch-specific elements from STARTS. We also report on our experience building three SDARTS-compliant wrappers: for locally available plain-text document collections, for locally available XML document collections, and for external web-accessible collections. These wrappers were developed to be easily customizable for new collections. Our work was developed as part of Columbia University's Digital Libraries Initiative--Phase 2 (DLI2) project, which involves the departments of Computer Science, Medical Informatics, and Electrical Engineering, the Columbia University libraries, and a large number of industrial partners. The main goal of the project is to provide personalized access to a distributed patient-care digital library.
Keywords: Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3); Information Systems -Information Storage and Retrieval - Online Information Services (H.3.5); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Information Systems -Database Management - Systems (H.2.4): Database Manager; Information Systems -Database Management - Systems (H.2.4); Information Systems -Database Management - Systems (H.2.4): Distributed databases; Information Systems -Database Management - Heterogeneous Databases (H.2.5); Information Systems -Database Management - Heterogeneous Databases (H.2.5): Data translation**;
Database Selection for Processing k Nearest Neighbors Queries in Distributed Environments BIBAKPDF 215-222
  Clement Yu; Prasoon Sharma; Weiyi Meng; Yan Qin
We consider the processing of digital library queries, consisting of a text component and a structured component in distributed environments. The text component can be processed using techniques given in previous papers such as [7, 8, 11]. In this paper, we concentrate on the processing of the structured component of a distributed query. Histograms are constructed and algorithms are given to provide estimates of the desirabilities of the databases with respect to the given query. Databases are selected in descending order of desirability. An algorithm is also given to select tuples from the selected databases. Experimental results are given to show that the techniques provided here are effective and efficient.
Keywords: database selection, distributed databases, k nearest neighbors, query processing

Panel

The President's Information Technology Advisory Committee's February 2001 Digital Library Report and its Impact BIBAKPDF 223-225
  Sally E. Howe; David C. Nagel; Ching-chih Chen; Stephen M. Griffin; James Lightbourne; Walter L. Warnick
In February 2001 the Panel on Digital Libraries of the President's Information Technology Advisory Committee issued a report entitled "Digital Libraries: Universal Access to Human Knowledge". This JCDL panel, which consists of two members of the PITAC Panel on Digital Libraries and representatives of key Federal science and digital library agencies who had briefed the Panel, will discuss the report's findings and recommendations and how the report is and can be helpful in improving the development and use of digital libraries.
Keywords: Economics, Experimentation, Human Factors, Legal Aspects, Management, Security, Standardization, Verification; digital libraries, federal government, policy, research and development

The Sound of Digital Libraries: Audio, Music, and Speech

Building Searchable Collections of Enterprise Speech Data BIBAKPDF 226-234
  James W. Cooper; Mahesh Viswanathan; Donna Byron; Margaret Chan
We have applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, we applied a number of post-processing algorithms to the output of the recognizer to render it suitable for the Textract text mining system.
   We indexed the call transcripts using a search engine and used Textract and associated Java technologies to place the relevant terms for each document in a relational database. Following a search query, we generated a thumbnail display of the results of each call with the salient terms highlighted. We illustrate these results and discuss their utility. We took the results of these experiments and continued this analysis on a set of talks and presentations.
   We describe a distinct document genre based on the note-taking concept of document content, and propose a significant new method for measuring speech recognition accuracy. This procedure is generally relevant to the problem of capturing meetings and talks and providing a searchable index of these presentations on the web.
Keywords: document display, search, speech analysis, speech retrieval, text mining
Transcript-Free Search of Audio Archives for the National Gallery of the Spoken Word BIBAKPDF 235-236
  John H. L. Hansen; J. R. Deller; Michael S. Seadle
The National Gallery of the Spoken Word (NGSW) project is creating a carefully organized on-line repository of spoken-word collections spanning the 20th century. Unprecedented technical challenges are inherent in the development of an archive of such extensive scale and diversity. This paper describes research on the development of text-free search-engine technology used to locate requested content in the audio records. A companion paper in these proceedings addresses watermarking technologies for copyright protection.
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Data - Files (E.5);
Audio Watermarking Techniques for the National Gallery of the Spoken Word BIBAPDF 237-238
  J. R. Deller; Aparna Gurijala; Michael S. Seadle
This is one of two companion papers describing technical challenges faced in the development of the National Gallery of the Spoken Word (NGSW). The present paper describes watermarking technologies for intellectual property protection. Following an introduction to data watermarking, the paper focuses on a new algorithm called transform encryption coding (TEC) and its application to watermarking the NGSW archives. TEC has a number of flexible features that make it amenable to the NGSW development.
Music-Notation Searching and Digital Libraries BIBAPDF 239-246
  Donald Byrd
Almost all work on music information retrieval to date has concentrated on music in the audio and event (normally MIDI) domains. However, music in the form of notation, especially Conventional Music Notation (CMN), is of much interest to musically-trained persons, both amateurs and professionals, and searching CMN has great value for digital music libraries. One obvious reason little has been done on music retrieval in CMN form is the overwhelming complexity of CMN, which requires a very substantial investment in programming before one can even begin studying music IR. This paper reports on work adding music-retrieval capabilities to Nightingale?, an existing professional-level music-notation editor.
Feature Selection for Automatic Classification of Musical Instrument Sounds BIBAKPDF 247-248
  Mingchun Liu; Chunru Wan
In this paper, we carry out a study on classification of musical instruments using a small set of features selected from a broad range of extracted ones by sequential forward feature selection method. Firstly, we extract 58 features for each record in the music database of 351 sound files. Then, the sequential forward selection method is adopted to choose the best feature set to achieve high classification accuracy. Three different classification techniques have been tested out and an accuracy of up to 93% can be achieved by using 19 features.
Keywords: classification, feature extraction, musical instrument, sequential forward feature selection
Adding Content-Based Searching to a Traditional Music Library Catalogue Server BIBAKPDF 249-250
  Matthew J. Dovey
Most online music library catalogues can only be searched by textual metadata. Whilst highly effective - since the rules for maintaining consistency have been refined over many years - this does not allow searching by musical content. Many music librarians are familiar with users humming their enquiries. Most systems providing a "query by humming interface tend to run independently of music library catalogue systems and not offer similar textual metadata searching. This paper discusses the ongoing investigative work on integrating these two types of system conducted as part of the NSF/JISC funded OMRAS project (http://www.omras.org).
Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Algorithms, Design; Z39.50, music information retrieval

Information Search and Retrieval in Digital Libraries

Locating Question Difficulty through Explorations in Question Space BIBAKPDF 251-252
  Terry Sullivan
Three different search effectiveness measures were used to classify 50 question narratives as easy or hard. Each measure was then encoded onto a spatial representation of interquestion similarity. Discriminant analysis based on the resulting map was able to predict question difficulty with approximately 80% accuracy, robust across multiple measures. Implications for the design of digital document collections are discussed.
Keywords: information visualization, question classification
Browsing by Phrases: Terminological Information in Interactive Multilingual Text Retrieval BIBAKPDF 253-254
  Anselmo Penas; Julio Gonzalo; Felisa Verdejo
This paper present an interactive search engine (Website Term Browser) which makes use of phrasal information to process queries and suggest relevant topics in a fully multilingual setting.
Keywords: interaction, multilingual information access, natural language processing, terminology extraction
Approximate Ad-Hoc Query Engine for Simulation Data BIBAKPDF 255-256
  Ghaleb Abdulla; Chuck Baldwin; Terence Critchlow; Roy Kamimura; Ida Lozares; Ron Musick; Nu Ai Tang; Byung S. Lee; Robert Snapp
In this paper, we describe AQSim, an ongoing effort to design and implement a system to manage terabytes of scientific simulation data. The goal of this project is to reduce data storage requirements and access times while permitting ad-hoc queries using statistical and mathematical models of the data. In order to facilitate data exchange between models based on different representations, we are evaluating using the ASCI common data model that is comprised of several layers of increasing semantic complexity. To support queries over the spatial-temporal mesh structured data we are in the process of defining and implementing a grammar for MeshSQL.
Keywords: data integration, data retrieval, mesh data, query, scientific data management, visualization
Extracting Taxonomic Relationships from On-Line Definitional Sources using LEXING BIBAKPDF 257-258
  Judith Klavans; Brian Whitman
We present a system which extracts the genus word and phrase from free-form definition text, entitled LEXING, for Lexical Information from Glossaries. The extractions will be used to build automatically a lexical knowledge base from on-line domain specific glossary sources. We combine statistical and semantic processes to extract these terms, and demonstrate that this combination allows us to predict the genus even in difficult situations such as empty head definitions or verb definitions. We also discuss the use of "linking prepositions" for use in skipping past empty head genus phrases. This system is part of a project to extract ontological information for energy glossary information.
Keywords: definitions, glossaries, information retrieval, lexical knowledge bases, natural language processing, ontologies
Hierarchical Indexing and Document Matching in BoW BIBAPDF 259-267
  Maayan Geffet; Dror G. Feitelson
BoW is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with.
Scalable Integrated Region-Based Image Retrieval using IRM and Statistical Clustering BIBAKPDF 268-277
  James Z. Wang; Yanping Du
Statistical clustering is critical in designing scalable image retrieval systems. In this paper, we present a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images that incorporates properties of all the regions in the images by a region-matching scheme. Compared with retrieval based on individual regions, our overall similarity approach (a) reduces the influence of inaccurate segmentation, (b) helps to clarify the semantics of a particular region, and (c) enables a simple querying interface for region-based image retrieval systems. The algorithm has been implemented as a part of our experimental SIMPLIcity image retrieval system and tested on large-scale image databases of both general-purpose images and pathology slides. Experiments have demonstrated that this technique maintains the accuracy and robustness of the original system while reducing the matching time significantly.
Keywords: clustering, content-based image retrieval, integrated region matching, segmentaton, wavelets

Panel

The National SMETE Digital Library Program BIBAKPDF 278-281
  Brandon Muramatsu; Cathryn A. Manduca; Marcia Mardis; James H. Lightbourne; Flora P. McMartin
"To catalyze and support continual improvements in the quality of science, mathematics, engineering, and technology (SMET) education, the National Science Foundation (NSF) has established the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL) program. The resulting digital library, a network of learning environments and resources for SMET education, will ultimately meet the needs of students and teachers at all levels-K-12, undergraduate, graduate, and lifelong learning-in both individual and collaborative settings, as well as formal and informal modes." -National Science Foundation, 2001
   The national in the NSDL program is quickly becoming a reality with the broad reach of the currently funded projects. This panel session will provide bring together the leaders developing the National SMETE Digital Library to provide a brief background and broad overview of the NSDL program. Panelists will discuss the overall vision and broad steps underway to develop the National SMETE Digital Library.
   Building the National SMETE Digital Library presents many challenges:
  • Developing a shared vision for the form and function of the NSDL;
  • Meeting the needs of diverse learners and of the many disciplines encompassed
       by the NSDL;
  • Acquiring input from the community of users to ensure that the NSDL is both
       used and useable;
  • Evaluating progress and impacts;
  • Integrating technologies that already exist, and the development of new
       technologies; and
  • Providing mechanisms for sharing and cooperation of knowledge and resources
       among NSDL collaborators.
    Keywords: National SMETE Digital Library, NSDL, Education, Teaching and Learning
  • Digital Video Libraries: Design and Access

    Cumulating and Sharing End Users Knowledge to Improve Video Indexing in a Video Digital Library BIBAKPDF 282-289
      Marc Nanard; Jocelyne Nanard
    In this paper, we focus on a user driven approach to improve video indexing. It consists in cumulating the large amount of small, individual efforts done by the users who access information, and to provide a community management mechanism to let users share the elicited knowledge. This technique is currently being developed in the "OPALES" environment and tuned up at the "Institut National de l'Audiovisuel" (INA), a National Video Library in Paris, to increase the value of its patrimonial video archive collections. It relies on a portal providing private workspaces to end users, so that a large part of their work can be shared between them. The effort for interpreting documents is directly done by the expert users who work for their own job on the archives. OPALES provides an original notion of "point of view" to enable the elicitation and the sharing of knowledge between communities of users, without leading to messy structures. The overall result consists in linking exportable private metadata to archive documents and managing the sharing of the elicited knowledge between users communities.
    Keywords: H.3.5[INFORMATION STORAGE AND RETRIEVAL]: Online Information Services - Data bank sharing Design; Video annotation. Video indexing. Private workspaces. Users communities. Knowledge sharing.
    XSLT for Tailored Access to a Digtal Video Library BIBAKPDF 290-299
      Michael G. Christel; Bryan Maher; Andrew Begun
    Surrogates, summaries, and visualizations have been developed and evaluated for accessing a digital video library containing thousands of documents and terabytes of data. These interfaces, formerly implemented within a monolithic stand-alone application, are being migrated to XML and XSLT for delivery through web browsers. The merits of these interfaces are presented, along with a discussion of the benefits in using W3C recommendations such as XML and XSLT for delivering tailored access to video over the web.
    Keywords: Information Systems -Information Interfaces and Presentation - Multimedia Information Systems (H.5.1); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Standards; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Design, Human Factors, Standardization; XML, XSLT, digital video library, surrogate
    Design of a Digital Library for Human Movement BIBAKPDF 300-309
      Jezekiel Ben-Arie; Purvin Pandit; ShyamSundar Rajaram
    This paper is focused on a central aspect in the design of our planned digital library for human movement, i.e. on the aspect of representation and recognition of human activity from video data. The method of representation is important since it has a major impact on the design of all the other building blocks of our system such as the user interface/query block or the activity recognition/storage block. In this paper we evaluate a representation method for human movement that is based on sequences of angular poses and angular velocities of the human skeletal joints, for storage and retrieval of human actions in video databases. The choice of a representation method plays an important role in the database structure, search methods, storage efficiency etc.. For this representation, we develop a novel approach for complex human activity recognition by employing multidimensional indexing combined with temporal or sequential correlation. This scheme is then evaluated with respect to its efficiency in storage and retrieval.
       For the indexing we use postures of humans in videos that are decomposed into a set of multidimensional tuples which represent the poses/velocities of human body parts such as arms, legs and torso. Three novel methods for human activity recognition are theoretically and experimentally compared. The methods require only a few sparsely sampled human postures. We also achieve speed invariant recognition of activities by eliminating the time factor and replacing it with sequence information. The indexing approach also provides robust recognition and an efficient storage/retrieval of all the activities in a small set of hash tables.
    Keywords: Computing Methodologies -Image Processing And Computer Vision - Scene Analysis (I.4.8); Computing Methodologies -Image Processing And Computer Vision - Scene Analysis (I.4.8): Motion; Computing Methodologies -Image Processing And Computer Vision - Scene Analysis (I.4.8): Tracking; Computing Methodologies -Pattern Recognition - Design Methodology (I.5.2); Computing Methodologies -Pattern Recognition - Design Methodology (I.5.2): Pattern analysis; Data - Data Storage Representations (E.2); Algorithms, Design; human activity recognition, multi dimensional indexing, sequence recognition, temporal correlation
    A Bucket Architecture for the Open Video Project BIBAKPDF 310-311
      Michael L. Nelson; Gary Marchionini; Gary Geisler; Meng Yang
    The Open Video project is a collection of public domain digital video available for research and other purposes. The Open Video collection currently consists of approximately 350 video segments, ranging in duration from 10 seconds to 1 hour. Rapid growth for the collection is planned through agreements with other video repository projects and provision for user contribution of video. To handle the increased accession, we are experimenting with "buckets", aggregative intelligent publishing constructs for use in digital libraries.
    Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Collection; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Design, Documentation, Experimentation, Management; buckets, digital objects, digital video, open source
    The Fischlar Digital Video System: A Digital Library of Broadcast TV Programmes BIBAPDF 312-313
      A. F. Smeaton; N. Murphy; N. E. O'Connor; S. Marlow; H. Lee; K. McDonald; P. Browne; J. Ye
    Fischlar is a system for recording, indexing, browsing and playback of broadcast TV programmes which has been operational on our University campus for almost 18 months. In this paper we give a brief overview of how the system operates, how TV programmes are organised for browse/playback and a short report on the system usage by over 900 users in our University.

    Systems Design and Architecture for Digital Libraries

    Design Principles for the Information Architecture of a SMET Education Digital Library BIBAKPDF 314-321
      Andy Dong; Alice M. Agogino
    This implementation paper introduces principles for the information architecture of an educational digital library, principles that address the distinction between designing digital libraries for education and designing digital libraries for information retrieval in general. Design is a key element of any successful product. Good designers and their designs, put technology into the hands of the user, making the products focus comprehensible and tangible through design. As straightforward as this may appear, the design of learning technologies is often masked by the enabling technology. In fact, they often lack an explicitly stated instructional design methodology. While the technologies are important hurdles to overcome, we advocate learning systems that empower education-driven / experiences rather than technology-driven experiences. This work describes a concept for a digital library for science, mathematics, engineering and technology education (SMETE), a library with an information architecture designed to meet learners and educators needs. Utilizing a constructivist model of learning, the authors present practical approaches to implementing the information architecture and its technology underpinnings. The authors propose the specifications for the information architecture and a visual design of a digital library for communicating learning to the audience. The design methodology indicates that a scenario-driven design technique sensitive to the contextual nature of learning offers a useful framework for tailoring technologies that help empower, not hinder, the educational sector.
    Keywords: Design, Human Factors; education, engineering, learning technology, mathematics, science, technology
    Toward a Model of Self-Administering Data BIBAKPDF 322-330
      ByungHoon Kang; Robert Wilensky
    We describe a model of self-administering data. In this model, a declarative description of how a data object should behave is attached to the object, either by a user or by a data input device. A widespread infrastructure of self-administering data handlers is presumed to exist; these handlers are responsible for carrying out the specifications attached to the data. Typically, the specifications express how and to whom the data should be transferred, how it should be incorporated when it is received, what rights recipients of the data will have with respect to it, and the kind of relation that should exist between distributed copies of the object. Functions such as distributed version control can be implemented on top of the basic handler functions.
       We suggest that this model can provide superior support for common cooperative functions. Because the model is declarative, users need only express their intentions once in creating a self-administering description, and need not be concerned with manually performing subsequent repetitious operations. Because the model is peer-to-peer, users are less dependent on additional, perhaps costly resources, at least when these are not critical.
       An initial implementation of the model has been created. We are experimenting with the model both as a tool to aid in digital library functions, and as a possible replacement for some server oriented functions.
    Keywords: asynchronous collaboration, data access model, data management, distributed file system, file sharing, peer to peer, scalable update propagation, self-administering data
    PERSIVAL, A System for Personalized Search and Summarization over Multimedia Healthcare Information BIBAKPDF 331-340
      Kathleen R. McKeown; Shih-Fu Chang; James Cimino; Steven Feiner; Carol Friedman; Luis Gravano; Vasileios Hatzivassiloglou; Steven Johnson; Desmond A. Jordan; Judith L. Klavans; Andre Kushniruk; Vimla Patel; Simone Teufel
    In healthcare settings, patients need access to online information that can help them understand their medical situation. Physicians need information that is clinically relevant to an individual patient. In this paper, we present our progress on developing a system, PERSIVAL, that is designed to provide personalized access to a distributed patient care digital library. Using the secure, online patient records at New York Presbyterian Hospital as a user model, PERSIVAL's components tailor search, presentation and summarization of online multimedia information to both patients and healthcare providers.
    Keywords: Computing Methodologies -Artificial Intelligence - Natural Language Processing (I.2.7); Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2); Information Systems -Information Storage and Retrieval - Online Information Services (H.3.5): Web-based services; Information Systems -Information Storage and Retrieval - Online Information Services (H.3.5); medical digital library, multimedia, natural language, personalization, query interface, search, summarization
    An Approach to Search for the Digital Library BIBAKPDF 341-342
      Elaine G. Toms; Joan C. Bartlett
    The chief form of accessing the content of a digital library (DL) is its search interface. While a DL needs an interface that integrates a range of options from search to browse to serendipity, in this work we focus on analytical search. We propose using Bates' search tactics as a basis for the re-design of search interfaces. We believe this approach will help to identify the types of tools that need to be supported by a DL interface.
    Keywords: Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3); digital libraries, search interface, search tactics, searching
    TilePic: A File Format for Tiled Hierarchical Data BIBAPDF 343-344
      Jeff Anderson-Lee; Robert Wilensky
    TilePic is a method for storing tiled data of arbitrary type in a hierarchical, indexed format for fast retrieval. It is useful for storing moderately large, static, spatial datasets in a manner that is suitable for panning and zooming over the data, especially in distributed applications. Because different data types may be stored in the same object, TilePic can support semantic zooming as well. It has proven suitable for a wide variety of applications involving the networked access and presentation of images, geographic data, and text. The TilePic format and its supporting tools are unencumbered, and available to all.

    Panel

    High Tech or High Touch: Automation and Human Mediation in Libraries BIBAPDF 345
      David Levy; William Arms; Oren Etzioni; Diane Nester; Barbara Tillett
    There are those who now think that traditional library services, such as cataloging and reference, will no longer be needed in the future, or at least will be fully automated. Others are equally adamant that human intervention is not only important but essential. Underlying such positions are a host of assumptions - about the continued existence and place of paper, the role of human intelligence and interpretation, the nature of research, and the significance of the human element. This panel brings together experts in libraries and digital technology to uncover such issues and assumptions and to discuss and debate the place of people and machines in cataloging and reference work.

    Digital Preservation: Technology, Economics, and Policy

    Long Term Preservation of Digital Information BIBAKPDF 346-352
      Raymond A. Lorie
    The preservation of digital data for the long term presents a variety of challenges from technical to social and organizational. The technical challenge is to ensure that the information, generated today, can survive long term changes in storage media, devices and data formats. This paper presents a novel approach to the problem. It distinguishes between archiving of data files and archiving of programs (so that their behavior may be reenacted in the future).
       For the archiving of a data file, the proposal consists of specifying the processing that needs to be performed on the data (as physically stored) in order to return the information to a future client (according to a logical view of the data). The process specification and the logical view definition are archived with the data.
       For the archiving of a program behavior, the proposal consists of saving the original executable object code together with the specification of the processing that needs to be performed for each machine instruction of the original computer (emulation).
       In both cases, the processing specification is based on a Universal Virtual Computer that is general, yet basic enough as to remain relevant in the future.
    Keywords: Languages, Standardization; archival, digital documents, digital information, digital library, emulation, preservation
    Creating Trading Networks of Digital Archives BIBAKPDF 353-362
      Brian Cooper; Hector Garcia
    Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, such as the amount of storage space to provide and the best number of partner sites. We also deal with the fact that some sites may be more reliable than others. Experimental results from a data trading simulator illustrate which policies are most reliable. Our techniques focus on preserving the "bits" of digital collections; other services that focus on other archiving concerns (such as preserving meaningful metadata) can be built on top of the system we describe here.
    Keywords: data trading, digital archiving, fault tolerance, preservation, replication
    Cost-Driven Design for Archival Repositories BIBAPDF 363-372
      Arturo Crespo; Hector Garcia-Molina
    Designing an archival repository is a complex task because there are many alternative configurations, each with different reliability levels and costs. In this paper we study the costs involved in an Archival Repository and we introduce a design framework for evaluating alternatives and choosing the best configuration in terms of reliability and cost. We also present a new version of our simulation tool, ArchSim/C that aids in the decision process. The design framework and the usage of ArchSim/C are illustrated with a case study of a hypothetical (yet realistic) archival repository shared between two universities.

    Scholarly Communication and Digital Libraries

    Hermes: A Notification Service for Digital Libraries BIBAKPDF 373-380
      D. Faensen; L. Faultstich; H. Schweppe; A. Hinze; A. Steidinger
    The high publication rate of scholarly material makes searching and browsing an inconvenient way to keep oneself up-to-date. Instead of being the active part in information access, researchers want to be notified whenever a new paper in one's research area is published.
       While more and more publishing houses or portal sites offer notification services this approach has several disadvantages. We introduce the Hermes alerting service, a service that integrates a variety of different information providers making their heterogeneity transparent for the users. Hermes offers sophisticated filtering capabilities preventing the user from drowning in a flood of irrelevant information. From the user's point of view it integrates the providers into a single source. Its simple provider interface makes it easy for publishers to join the service and thus reaching the potential readers directly.
       This paper presents the architecture of the Hermes service and discusses the issues of heterogeneity of information sources. Furthermore, we discuss the benefits and disadvantages of message-oriented middleware for implementing such a service for digital libraries.
    Keywords: collaborative filtering, electronic publishing, recommender system
    An Algorithm for Automated Rating of Reviewers BIBAPDF 381-387
      Tracy Riggs; Robert Wilensky
    The current system for scholarly information dissemination may be amenable to significant improvement. In particular, going from the current system of journal publication to one of self-distributed documents offers significant cost and timeliness advantages. A major concern with such alternatives is how to provide the value currently afforded by the peer review system.
       Here we propose a mechanism that could plausibly supply such value. In the peer review system, papers are judged meritorious if good reviewers give them good reviews. In its place, we propose a collaborative filtering algorithm which automatically rates reviewers, and incorporates the quality of the reviewer into the metric of merit for the paper. Such a system seems to provide all the benefits of the current peer review system, while at the same time being much more flexible.
       We have implemented a number of parameterized variations of this algorithm, and tested them on data available from a quite different application. Our initial experiments suggest that the algorithm is in fact ranking reviewers reasonably.
    HeinOnline: An Online Archive of Law Journals BIBAKPDF 388-394
      Richard J. Marisa
    HeinOnline is a new online archive of law journals. Development of HeinOnline began in late 1997 through the cooperation of Cornell Information Technologies, William S. Hein & Co., Inc. of Buffalo, NY, and the Cornell Law Library. Built upon the familiar Dienst and new Open Archive Initiative protocols, HeinOnline extends the reliable and well-established management practices of open access archives like NCSTRL and CoRR to a subscription-based collection. The decisions made in creating HeinOnline, Dienst architectural extensions, and issues which have arisen during operation of HeinOnline are described.
    Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Collection; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Design, Experimentation, Management; dienst, digital library, document structure, law journals, metadata, system design

    Panel

    Digital Libraries Supporting Digital Government BIBAPDF 395-397
      Gary Marchionini; Anne Craig; Larry Brandt; Judith Klavans; Hsinchun Chen
    The needs of society have long been addressed through government research support for new technologies-the Internet representing one example. Today, under the rubric of digital government, federal agencies as well as state and local units of governments at all levels have begun to leverage the fruits of these research investments to better serve the needs of their constituencies. Government agencies apply these technologies in a variety of settings including emergency response, health and safety regulation, financial management, data gathering, and hosts of information dissemination needs. In addition, governments are investigating ways to use technology to encourage citizen participation. There is a growing digital government community of practice that strongly parallels the evolving digital library community. These parallel developments are not surprising because libraries and governments share service missions for their overlapping constituencies.

    Designing Digital Libraries for Education: Technology, Services and User Studies

    Designing a Digital Library for Young Children BIBAKPDF 398-405
      Allison Druin; Benjamin B. Bederson; Juan Pablo Hourcade; Lisa Sherman; Glenda Revelle; Michele Platner; Stacy Weng
    As more information resources become accessible using computers, our digital interfaces to those resources need to be appropriate for all people. However when it comes to digital libraries, the interfaces have typically been designed for older children or adults. Therefore, we have begun to develop a digital library interface developmentally appropriate for young children (ages 5-10 years old). Our prototype system we now call SearchKids offers a graphical interface for querying, browsing and reviewing search results. This paper describes our motivation for the research, the design partnership we established between children and adults, our design process, the technology outcomes of our current work, and the lessons we have learned.
    Keywords: Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): Graphical user interfaces (GUI); Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): Interaction styles; Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): Screen design; Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): User-centered design; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3): Query formulation; Software -Software Engineering - Requirements/Specifications (D.2.1): Elicitation methods (e.g., rapid prototyping, interviews, JAD); Design, Human Factors; children, cooperative inquiry, digital libraries, education applications, information retrieval design techniques, intergenerational design team, participatory design, zoomable user interfaces
    Dynamic Digital Libraries for Children BIBAKPDF 406-415
      Yin Leng Theng; Norliza Mohd-Nasir; George Buchanan; Bob Fields; Harold Thimbleby; Noel Cassidy
    The majority of current digital libraries (DLs) are not designed for children. For DLs to be popular with children, they need to be fun, easy-to-use and empower them, whether as readers or authors. This paper describes a new children's DL emphasizing its design and evaluation, working with the children (11-14 year olds) as design partners and testers. A truly participatory process was used, and observational study was used as a means of refinement to the initial design of the DL prototype. In contrast with current DLs, the children's DL provides both a static as well as a dynamic environment to encourage active engagement of children in using it. Design, implementation and security issues are also raised.
    Keywords: collaborative writing, design partners and testers, design process, ethnography, observational study, participatory design
    Looking at Digital Library Usability from a Reuse Perspective BIBAKPDF 416-425
      Tamara Sumner; Melissa Dawe
    The need for information systems to support the dissemination and reuse of educational resources has sparked a number of large-scale digital library efforts. This article describes usability findings from one such project - the Digital Library for Earth System Education (DLESE) - focusing on its role in the process of educational resource reuse. Drawing upon a reuse model developed in the domain of software engineering, the reuse cycle is broken down into five stages: formulation of a reuse intention, location, comprehension, modification, and sharing. Using this model to analyze user studies in the DLESE project, several implications for library system design and library outreach activities are highlighted. One finding is that resource reuse occurs at different stages in the educational design process, and each stage imposes different and possibly conflicting requirements on digital library design. Another finding is that reuse is a distributed process across several artifacts, both within and outside of the library itself. In order for reuse to be successful, a usability line cannot be drawn at the library boundary, but instead must encompass both the library system and the educational resources themselves.
    Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Computer Applications - Physical Sciences and Engineering (J.2); Computer Applications - Physical Sciences and Engineering (J.2): Earth and atmospheric sciences; Design, Human Factors; comprehension, digital libraries, educational resources, learning impact, location, modification, reuse, sharing

    Applications of Digital Libraries in the Humanities

    Building a Hypertextual Digital Library in the Humanities: A Case Study on London BIBAKPDF 426-434
      Gregory Crane; David A. Smith; Clifford E. Wulfman
    This paper describes the creation of a new humanities digital library collection: 11,000,000 words and 10,000 images representing books, images and maps on pre-twentieth century London and its environs. The London collection contained far more dense and precise information than the materials from the Greco-Roman world on which we had previously concentrated. The London collection thus allowed us to explore new problems of data structure, manipulation, and visualization. This paper contrasts our model for how humanities digital libraries are best used with the assumptions that underlie many academic digital libraries on the one hand and more literary hypertexts on the other. Since encoding guidelines such as those from the TEI provide collection designers with far more options than any one project can realize, this paper describes what structures we used to organize the collection and why. We particularly emphasize the importance of mining historical authority lists (encyclopedias, gazetteers, etc.) and then generating automatic span-to-span links within the collection.
    Keywords: automatic linking, browsing, collection development, document design, reading
    Document Quality Indicators and Corpus Editions BIBAKPDF 435-436
      Jeffrey A. Rydberg-Cox; Anne Mahoney; Gregory R. Crane
    Corpus editions can only be useful to scholars when users know what to expect of the texts. We argue for text quality indicators, both general and domain-specific.
    Keywords: Design, Documentation, Languages, Standardization, Theory; automatic linking, browsing, collection development, document design, reading
    The Digital Atheneum: New Approaches for Preserving, Restoring and Analyzing Damaged Manuscripts BIBAKPDF 437-443
      Michael S. Brown; W. Brent
    This paper presents research focused on developing new techniques and algorithms for the digital acquisition, restoration, and study of damaged manuscripts. We present results from an acquisition effort in partnership with the British Library, funded through the NSF DLI-2 program, designed to capture 3-D models of old and damaged manuscripts. We show how these 3-D facsimiles can be analyzed and manipulated in ways that are tedious or even impossible if confined to the physical manuscript. In particular, we present results from a restoration framework we have developed for "flattening" the 3-D representation of badly warped manuscripts. We expect these research directions to give scholars more sophisticated methods to preserve, restore, and better understand the physical objects they study.
    Keywords: digital libraries, digital preservation, document analysis, humanities computing, restoration
    Towards an Electronic Variorum Edition of Don Quixote BIBAKPDF 444-445
      Richard Furuta; Shueh-Cheng Hu; Siddarth Kalasapur; Rajiv Kochumman; Eduardo Urbina; Ricardo Vivancos
    known Don Quixote. This paper gives an overview of the computer-based tools that we are using in this endeavor, and summarizes the current status of the project. The Electronic Variorum Edition will join the other content elements maintained by the project, which focuses on electronic resources in support of the study of Cervantes, his works, and his times.
    Keywords: cervantes digital library, cervantes project, hispanic culture, humanities digital libraries

    Panel

    Digital Music Libraries -- Research and Development BIBAPDF 446-448
      David Bainbridge; Gerry Bernbom; Mary Wallace; Andrew P. Dillon; Matthew Dovey; Jon W. Dunn; Michael Fingerhut; Ichiro Fujinaga; Eric J. Isaacson
    Digital music libraries provide enhanced access and functionality that facilitates scholarly research and education. This panel will present a report on the progress of several major research and development projects in digital music libraries.

    Demonstrations

    Content Management for Digital Museum Exhibitions BIBAKPDF 450
      Jen-Shin Hong; Bai-Hsuen Chen; Jieh Hsiang; Tien-Yu Hsu
    An online exhibition of a digital museum often consists of a variety of multimedia objects such as webpages, animation, and video clips. Ideally, there should be different exhibitions on the same topic for users with different needs. The difficulty is that it is time-consuming to produce illustrative and intriguing online exhibitions. In this paper, we present a content management system for producing exhibitions. This framework is a novel approach for organizing digital collections and for quickly selecting, integrating, and composing objects from the collection to produce exhibitions of different presentation styles, one for each user group. A prototype based on our framework has been implemented and successfully used in the production of a Lanyu digital museum. Using our method, the Lanyu Digital Museum online exhibition has several features: (1) It provides an easy way to compose artifacts extracted from the digital collection into exhibitions. (2) It provides an easy way to create different presentations of the same exhibition content that are catered to users with different needs. (3) It provides easy-to-use film-editing capability to re-arrange an exhibition and to produce new exhibitions from existing ones.
    Keywords: XML, content management, digital museum, multipresentation
    Demonstration of Hierarchical Document Clustering of Digital Library Retrieval Results BIBAKPDF 451
      C. R. Palmer; J. Pesenti; R. E. Valdes-Perez; M. G. Christel; A. G. Hauptmann; D. Ng; H. D. Wactlar
    As digital libraries grow in size, querying their contents will become as frustrating as querying the web is now. One remedy is to hierarchically cluster the results that are returned by searching a digital library. We demonstrate the clustering of search results from Carnegie Mellon's Informedia database, a large video library that supports indexing and retrieval with automatically generated descriptors.
    Keywords: Information Systems -Information Storage and Retrieval - Information Search and Retrieval (H.3.3): Clustering; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; hierarchical document clustering
    Indiana University Digital Music Library Project BIBAKPDF 452
      Jon W. Dunn; Eric J. Isaacson
    The Indiana University Digital Music Library project plans to create a digital library testbed system containing music in a variety of formats, designed to support research and education in the field of music and to serve as a platform for digital library research. Prototypes of user interfaces to the system will be demonstrated.
    Keywords: Computer Applications - Arts and Humanities (J.5): Performing arts (e.g., dance, music); Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Collection; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Design; music digital libraries, music instruction
    Interactive Visualization of Video Metadata BIBAKPDF 453
      Mark Derthick
    Much current research on digital libraries focuses on named entity extraction and transformation into structured information. Examples include entities like events, people, and places, and attributes like birth date or latitude. This video demonstration illustrates the potential for finding relationships among entities extracted from 50,000 news segments from CMUs Informedia Digital Video Library. A visual query language is used to specify relationships among entities. Data populate the query structure, which becomes an interface for exploration that gives continuous feedback in the form of visualizations of summary statistics. The target user is a data analyst familiar with the domain from which the entities come, but not a computer scientist.
    Keywords: Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): Graphical user interfaces (GUI); Information Systems -Information Interfaces and Presentation - User Interfaces (H.5.2): Interaction styles; Algorithms, Human Factors; information visualization
    PERSIVAL Demo: Categorizing Hidden-Web Resources BIBPDF 454
      Panagiotis G. Ipeirotis; Luis Gravano; Mehran Sahami
    PERSIVAL: Personalized Summarization Over Multimedia Health-Care Information BIBAPDF 455
      Noemie Elhadad; Min-Yen Kan; Simon Lok; Smaranda Muresan
    In this demonstration, we present several integrated components of PERSIVAL PErsonalized Retrieval and Summarization of Image, Video And Language)[1], a system designed to provide personalized access to a distributed digital library of medical literature and consumer health information. The global system architecture of PERSIVAL is best described as a two-stage processing pipeline. The first stage is a retrieval system that matches user queries with relevant multimedia data in the library. The second stage is a visualization system that processes the multimedia data matched by the first stage for display.
       Our demonstration focuses on the second stage of PERSIVAL's processing pipeline. Given a set of relevant documents for certain predefined queries, our integrated demonstration seeks to give a tailored response for either physicians or patients, featuring textual summaries, as well as relevant medical definitions. To visualize the summaries and definitions, we employ automated constraint-based layout of the user interface that allows for rich interaction between summaries and definitions.
       PERSIVAL's natural language processing and user interface modules make up the visualization portion of the system and illustrate state-of-the-art digital library technology. Following are the modules presented in our demonstration.
    View Segmentation and Static/Dynamic Summary Generation for Echocardiogram Videos BIBAPDF 456
      Shahram Ebadollahi; Shih-Fu Chang
    The demonstration described here is a part of the PERSIVAL system [1]. In PERSIVAL the user of the echocardiogram video archives is able to access, browse, search and interact with the echocardiogram videos efficiently and effectively. Video data is also integrated with other modalities of information and presented to the right users in the right context.
    Stanford Encyclopedia of Philosophy: A Dynamic Reference Work BIBAPDF 457
      Edward N. Zalta; Colin Allen; Uri Nodelman
    The primary goal of the Stanford Encyclopedia of Philosophy project (http://plato.stanford.edu/) is to produce an authoritative and comprehensive reference work devoted to the academic discipline of philosophy that will be kept up to date ally so as to remain useful to those in academia and the general public. To accomplish this goal we have designed and implemented web-based software by which academic philosophers can collaboratively write and maintain such a 'dynamic reference work'. Our implementation has features that are not found in any other online reference work in any discipline, and that enable the profession of philosophy to maintain such a reference work without the cost or level of staff support required for traditional reference work publishing.
    A System for Adding Content-Based Searching to a Traditional Music Library Catalogue Server BIBAPDF 458
      Matthew J. Dovey
    Most online music library catalogues can only be searched by textual metadata. Whilst highly effective - since the rules for maintaining consistency have been refined over many years - this does not allow searching by musical content. Many music librarians are familiar with users humming their enquiries. Most systems providing a query by humming interface tend to run independently of music library catalogue systems and not offer similar textual metadata searching. This demonstration shows how we can integrate these two types of system based on work conducted as part of the NSF/JISC funded OMRAS project (http://www.omras.org).
    Using the Repository Explorer to Achieve OAI Protocol Compliance BIBAPDF 459
      Hussein Suleman
    The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability by defining simple protocols, most recently the Open Archives Initiative Protocol for Metadata Harvesting [2], which was unveiled in January 2001. To support the adoption of this new interoperability technology, we have developed the Repository Explorer [1], a web-based tool to enforce compliance to the same interpretation of the protocol by the various different server implementations. This demonstration will show how the Repository Explorer can be used to perform either user-driven browsing or automatic testing of an implementation of the protocol.

    Posters

    An Atmospheric Visualization Collection for the NSDL BIBAKPDF 463
      Christopher Klaus; Keith Andrew
    In this poster, we describe visualization and educational efforts underway to build an Atmospheric Visualization Collection for the NSDL.
    Keywords: Computing Methodologies -Image Processing And Computer Vision - General (I.4.0): Image displays; Computer Applications - Physical Sciences and Engineering (J.2): Earth and atmospheric sciences; Computing Methodologies -Computer Graphics - Picture/Image Generation (I.3.3); Algorithms, Experimentation, Human Factors, Measurement; atmospheric science, digital library, visualization education
    Breaking the Metadata Generation Bottleneck: Preliminary Findings BIBAPDF 464
      Elizabeth D. Liddy; Stuart Sutton; Woojin Paik; Eileen Allen; Sarah Harwell; Michelle Monsour; Anne Turner; Jennifer Liddy
    The goal of our 18 month NSDL-funded project is to develop Natural Language Processing and Machine Learning technology which will accomplish automatic metadata generation for individual educational resources in digital collections. The metadata tags that the system will be learning to automatically assign are the full complement of Gateway to Educational Materials (GEM) metadata tags -- from the nationally recognized consortium of organizations concerned with access to educational resources. The documents that comprise the sample for this research come from the Eisenhower National Clearinghouse on Science and Mathematics.
    Building the Physical Sciences Information Infrastructure, A Phased Approach BIBAKPDF 465
      Judy C. Gilmore; Valerie S. Allen
    In 2000, a vision of a Physical Sciences Information Infrastructure - an integrated network for the physical sciences - was captured and endorsed. Work continues in 2001 as partnerships are formed and strategies are formulated to move the vision forward.
    Keywords: federal agencies, physical sciences
    Development of an Earth Environmental Digital Library System for Soil and Land-Atmospheric Data BIBAKPDF 466
      Eiji Ikoma; Taikan Oki; Masaru Kitsuregawa
    We propose and examine new methods for automatic data loading system and flexible user interface system with many features such as 3D visualization. We implement the earth environmental digital library and operate it on the Web. Though our system is focusing the limited users like earth environmental researchers, more than 8000 hits per month describe the practical usefulness of it.
    Keywords: Experimentation; VRML, digital library, user interface
    Digital Facsimile Editions and On-Line Editing BIBAPDF 467
      Harry Plantinga
    Digitizing a large collection of books is an expensive and time-consuming task -- but there may be volunteers all over the world who are willing to do a small portion of the task. This poster describes a system for making digital facsimile editions-e-books consisting of page images and OCRed but uncorrected text. The user can choose to view low or high resolution page images or text for each page or search the text. Authenticated users with little or no training can correct the text on-line, and the corrections are incorporated in the document. Source code is available for the described implementation, which is a part of the Christian Classics Ethereal Library (http://www.ccel.org).
    DSpace at MIT: Meeting the Challenges BIBAKPDF 468
      Michael J. Bass; Margret Branschofsky
    DSpace is a joint development effort by HP and MIT to establish an electronic system that will enable MIT faculty and researchers to capture, preserve, manage, and disseminate their intellectual output, and that will enable the Institute to maintain its intellectual heritage. The effort further aims to facilitate sharing of intellectual content and metadata among institutions by minimizing barriers to adoption and federation. This brief paper describes the motivation behind the project, its goals, objectives, progress, and references to detailed definition & design materials.
    Keywords: Information Systems -Information Storage and Retrieval - Library Automation (H.3.6); Information Systems -Information Storage and Retrieval - Online Information Services (H.3.5); Data - Data Structures (E.1); Design, Economics, Experimentation, Legal Aspects, Management; application service platform, architecture, archive, digital libraries, digital media, federation, metadata, repository
    Exploiting Image Semantics for Picture Libraries BIBAKPDF 469
      Kobus Barnard; David Forsyth
    We consider the application of a system for learning the semantics of image collections to digital libraries. We discuss our approach to browsing and search, and investigate the integration both in more detail.
    Keywords: Information Systems - Information Storage and Retrieval (H.3); Algorithms, Human Factors, Performance; digital libraries, hierarchical image clustering
    Feature Extraction for Content-Based Image Retrieval in DARWIN BIBPDF 470
      K. R. Debure; A. S. Russell
    Guided Linking: Efficiently Making Image-to-Transcript Correspondence BIBAKPDF 471
      Cheng Jiun Yuan; W. Brent Seales
    The problem of annotating unstructured images is labor intensive and difficult to automate. Linking is a type of annotation where an image region is tagged by representing a correspondence between the region and other information. Any serious effort at creating a digital edition of a manuscript from nothing but images and their associated information, such as transcripts and editorial remarks, must include the task of creating a large number of links between image regions and the related information. We present an approach to the problem of image linking, which concentrates on the fundamental and labor-intensive task of associating image regions with their textual counterparts. We assume the input to the system is a set of images representing a manuscript, and that associated data, such as a transcript, is available to provide guidance to the automated portion of the system. Our approach targets collections that are damaged and difficult-to-read, such as manuscripts that require intensive editorial annotation. It is essentially impossible to perform fully automated techniques, such as optical character recognition (OCR) or accurate handwriting analysis [2], on these kinds of manuscripts.
    Keywords: application, digital libraries, humanities computing, image analysis, image/text correspondence
    Integrating Digital Libraries by CORBA, XML and Servlet BIBAPDF 472
      Wing Hang Cheung; Michael R. Lyu; Kam Wing Ng
    In this paper, we describe how we use a mediator-based architecture for integrating digital libraries. We discuss how we tackle the obstacles of firewalls in the expansion of our system by using XML and Java Servlet, which are used to achieve CORBA general communications and callback features across the firewalls.
    A National Digital Library for Undergraduate Mathematics and Science Teacher Preparation and Professional Development BIBAKPDF 473
      Kimberly S. Roempler
    The primary goal of the National Digital Library for Undergraduate Mathematics and Science Teacher Preparation and Professional Development, funded through the NSF Division of Undergraduate Education National Science Digital Libraries Initiative, is to increase the use of best teaching practices by faculty by providing the resources - tools, training, and data - needed to build inquiry and discovery into all undergraduate science and mathematics courses. Improving the math and science education of future and in-service K-12 teachers is one of the most important challenges facing college and university faculties.
       The preparation of future teachers is a fundamental element in the improvement of the learning experience of all students, from grades K-16. As teachers know, it is natural to teach as we have been taught ourselves. The standards in mathematics and science call for greater integration of inquiry-based techniques and more rigorous mathematical and science content. Teachers at all levels will be better equipped to meet these standards if they are taught using these approaches during their own education.
    Keywords: inservice teachers, mathematics education, pedagogy, preservice teachers, science education, teacher preparation
    Print to Electronic: Measuring the Operational and Economic Implications of an Electronic Journal Collection BIBAKPDF 474
      Carol Hansen Montgomery; Linda S. Marion
    In this poster, we report methodology and initial results from a study of an academic library's migration to an all-electronic journal collection.
    Keywords: Computing Milieux -Management of Computing and Information Systems - General (K.6.0): Economics; Economics, Measurement, Management; academic library, digital library, electronic journals
    Turbo Recognition: Decoding Page Layout BIBPDF 475
      Taku A. Tokuyasu
    Using Markov Models and Innovation-Diffusion as a Tool for Predicting Digital Library Access and Distribution BIBAKPDF 476
      Bruce R. Barkstrom
    This paper, discusses a general approach to predicting data access rates and user access patterns for planning distribution capacities and for monitoring data usage. The approach uses a steady-state Markov model to describe user activities and innovation-diffusion to describe the rate at which a naive population adopts accessing data from a digital library.
    Keywords: EOSDIS, Markov models, innovation-diffusion, user access patterns, user access rates, user modeling
    A Versatile Facsimile and Transcription Service for Manuscripts and Rare Old Books at the Miguel de Cervantes Digital Library BIBAPDF 477
      Alejandro Bia
    The purpose of this poster is to describe our approach to provide facsimiles of manuscripts and old books as one of our DL services publicly available by Internet.
    The Virtual Naval Hospital: The Digital Library as Knowledge Management Tool for Nomadic Patrons BIBAKPDF 478
      Michael P. D'Alessandro; Richard S. Bakalar; Donna M. D'Alessandro; Denis E. Ashley; Mary J. C. Hendrix
    To meet the information needs of isolated primary care providers and their patients in the United States (U.S.) Navy, a digital health sciences library - Virtual Naval Hospital (http://www.vnh.org) - was created through a unique partnership between academia and government. The creation of the digital library was heavily influenced by the principles of user-centered design, and made allowances for the nomadic nature of the digital library's patrons and the heterogeneous access they have to Internet bandwidth. The result is a digital library that has been in operation since 1997, that continues to expand in size, that is heavily used, and that is highly regarded by its patrons. Over time, the digital library has evolved into a knowledge-management system for the U.S. Navy Bureau of Medicine and Surgery. A number of valuable technical, personal, and political lessons have been learned about delivering digital library and knowledge management services to nomadic patrons. They can be summarized by stating that to succeed in the design and implementation of a digital library that serves as a knowledge management tool, regardless of the field of endeavor, one must focus initially and then consistently on the population served and what their mission is, and tailor the digital library to their needs. If this is done, the result will be a tool that is heavily used and sincerely appreciated. These lessons learned will become increasingly valuable as society moves towards a ubiquitous computing environment.
    Keywords: Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Collection; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Dissemination; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): Systems issues; Information Systems -Information Storage and Retrieval - Digital Libraries (H.3.7): User issues; Design, Human Factors, Measurement; case study, digital libraries, knowledge management, lessons learned, nomadic computing

    Tutorials

    Tutorial 1: Practical Digital Libraries Overview BIBPDF 479
      Edward A. Fox
    Tutorial 2: Evaluating, Using, and Publishing eBooks BIBPDF 479
      Gene Golovchinsky; Cathy Marshall; Elli Mylonas
    Tutorial 3: Thesauri and Ontologies BIBPDF 479
      Dagobert Soergel
    Tutorial 4: How to Build a Digital Library Using Open-Source Software BIBPDF 480
      Ian H. Witten
    Hands-On Workshop: Build Your Own Digital Library Collections BIBPDF 480
      Ian H. Witten; David Bainbridge
    Tutorial 6: Building Interoperable Digital Libraries: A Practical Guide to Creating Open Archives BIBPDF 480
      Hussein Suleman

    Workshops

    Workshop 1: Visual Interfaces to Digital Libraries -- Its Past, Present, and Future BIBAKPDF 482
      Katy Borner; Chaomei Chen
    The design of easy-to-use and informative visual interfaces to digital libraries is an integral part to the advances of digital libraries. A wide range of approaches have been developed from a diverse spectrum of perspectives that focus on users and tasks to be supported, data to be modeled, and the efficiency of algorithms. Information visualization aims to exploit the human visual information processing system, especially with non-spatial data (such as documents and images typically found in digital libraries). Generally, information visualization examines semantic relationships intrinsic to an abstract information space and how they can be spatially navigated and memorized using similar cognitive processes to those that would apply during interactions with the real world. This workshop promotes the convergence of information visualization and digital libraries. It brings together researchers and practitioners in the areas of information visualization, digital libraries, human-computer interaction, library and information science, and computer science to identify the most important issues in the past and the present, and what should be done in the future.
    Keywords: cognitive psychology, digital libraries, human-computer interaction, information visualization, usability studies
    Workshop 2: The Technology of Browsing Applications BIBAPDF 483
      Nina Wacholder; Craig Nevill Manning
    Phrase browsing applications provide information seekers with access to text content via structured lists of index terms. These lists provide a preview of the content of a collection. The index terms, which may be identified by a variety of techniques, are phrases that represent important concepts referred to in a document or collection of documents. The browsing system supports interactive navigation and organization of the phrases.
       The goal of this workshop is to bring together researchers interested in any aspect of phrase browsing technology, including, but not limited to, identification of index terms, techniques for hierarchical organization of the terms, implementation of efficient systems, usability of browsing applications, and techniques for evaluating this technology.
    Workshop 3: Classification Crosswalks BIBAKPDF 484
      Paul Thompson; Traugott Koch; John Carter; Heike Neuroth; Ed O'Neill; Dagobert Soergel
    Mapping between/among classification schemes is beneficial within an organization that has a number of implicit schemes, between organizations seeking to exchange information, and in a digital library context where collections are organized by different classifications. This cross scheme mapping could be done manually, but if many schemes are to be mapped, it may be desirable to provide automated tools and techniques to support the process. This workshop will present research and projects that identify the state-of-the-practice and outline the research agenda.
       In addition to the educational part of the program, the afternoon will be devoted to ongoing NKOS activities related to a vocabulary mark-up language, mechanisms for search and retrieval of online knowledge organization sources, and a typology for describing knowledge organization sources that supports the development of knowledge organization services on the Web.
       The program is available from the NKOS Web site at http://nkos.slis.kent.edu.
    Keywords: classification schemes, controlled vocabularies, digital libraries, vocabulary integration tools
    Workshop 4: Digital Libraries in Asian Languages BIB --
      Su-Shing Chen; Ching-chih Chen
    Workshop 5: Information Visualization for Digital Libraries: Defining a Research Agenda for Heterogeneous Multimedia Collections BIBAK --
      Lucy Nowell; Elizabeth Hetzler
    This workshop will emphasize small group discussion and brainstorming to explore issues of visualization for heterogeneous digital libraries. The power of visualization lies in its ability to convey information at the high bandwidth of the human perceptual system, facilitating recognition of patterns in the information space, and supporting navigation in large collections. How do we extend these benefits to collections that span the range of digital media? Participants will explore this issue, with the aim of identifying a research agenda.
    Keywords: heterogeneous digital libraries, human computer interaction, multimedia, visualization