JCDL'02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries

Fullname:ACM/IEEE Joint Conference on Digital Libraries
Editors:Gary Marchionini
Location:Portland, Oregon, USA
Dates:2002-Jul-14 to 2002-Jul-18
Standard No:ACM ISBN 1-58113-513-0; ACM Order Number:606022; ACM DL: Table of Contents hcibib: DL02
  1. Building and using cultural digital libraries
  2. Summarization and question answering
  3. Studying users
  4. Classification and browsing
  5. A digital libraries for education
  6. Novel search environments
  7. Video and multimedia digital libraries
  8. OAI application
  9. Searching across language, time, and space
  10. NSDL
  11. Digital library communities and change
  12. Models and tools for generating digital libraries
  13. Novel user interfaces
  14. Federating and harvesting metadata
  15. Music digital libraries
  16. Preserving, securing, and assessing digital libraries
  17. Image and cultural digital libraries
  18. Digital libraries for spatial data
  19. Panels
  20. Demonstrations
  21. Posters
  22. Tutorials
  23. Workshops

Building and using cultural digital libraries

Primarily history: historians and the search for primary source materials BIBAFull-Text 1-10
  Helen R. Tibbo
This paper describes the first phase of an international project that is exploring how historians locate primary resource materials in the digital age, what they are teaching their Ph.D. students about finding research materials, and what archivists are doing to facilitate access to these materials. Preliminary findings are presented from a survey of 300 historians studying American History from leading institutions of higher education in the U.S. Tentative conclusions indicate the need to provide multiple pathways of access to historical research materials including paper-based approaches and newer digital ones. The need for user education, especially in regard to electronic search methodologies is indicated.
Using the Gamera framework for the recognition of cultural heritage materials BIBAFull-Text 11-17
  Michael Droettboom; Ichiro Fujinaga; Karl MacMillan; G. Sayeed Chouhury; Tim DiLauro; Mark Patton; Teal Anderson
This paper presents a new toolkit for the creation of customized structured document recognition applications by domain experts. This open-source system, called Gamera, allows a user, with particular knowledge of the documents to be recognized, to combine image processing and recognition tools in an easy-to-use, interactive, graphical scripting environment. Gamera is one of the key technology components in a proposed international project for the digitization of diverse types of humanities documents.
Supporting access to large digital oral history archives BIBAFull-Text 18-27
  Samuel Gustman; Dagobert Soergel; Douglas Oard; William Byrne; Michael Picheny; Bhuvana Ramabhadran; Douglas Greenberg
This paper describes our experience with the creation, indexing, and provision of access to a very large archive of videotaped oral histories -- 116,000 hours of digitized interviews in 32 languages from 52,000 survivors, liberators, rescuers, and witnesses of the Nazi Holocaust. It goes on to identify a set of critical research issues that must be addressed if we are to provide full and detailed access to collections of this size: issues in user requirement studies, automatic speech recognition, automatic classification, segmentation, summarization, retrieval, and user interfaces. The paper ends by inviting others to discuss use of these materials in their own research.

Summarization and question answering

Using sentence-selection heuristics to rank text segments in TXTRACTOR BIBAFull-Text 28-35
  Daniel McDonald; Hsinchun Chen
TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compare the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.
Using librarian techniques in automatic text summarization for information retrieval BIBAFull-Text 36-45
  Min-Yen Kan; Judith L. Klavans
A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such summarization system, was designed with respect to methods used in the library community. We have reviewed these librarian expert techniques to assist information seekers and codified them into eight distinct strategies. We detail how we have operationalized six of these strategies in Centrifuser by computing an informative extract, indicative differences between documents, as well as navigational links to narrow or broaden a user's query. We conclude the paper with results from a preliminary evaluation.
QuASM: a system for question answering using semi-structured data BIBAFull-Text 46-55
  David Pinto; Michael Branstein; Ryan Coleman; W. Bruce Croft; Matthew King; Wei Li; Xing Wei
This paper describes a system for question answering using semi-structured metadata, QuASM (pronounced "chasm"). Question answering systems aim to improve search performance by providing users with specific answers, rather than having users scan retrieved documents for these answers. Our goal is to answer factual questions by exploiting the structure inherent in documents found on the World Wide Web (WWW). Based on this structure, documents are indexed into smaller units and associated with metadata. Transforming table cells into smaller units associated with metadata is an important part of this task. In addition, we report on work to improve question classification using language models. The domain used to develop this system is documents retrieved from a crawl of www.fedstats.gov.

Studying users

Reading-in-the-small: a study of reading on small form factor devices BIBAFull-Text 56-64
  Catherine C. Marshall; Christine Ruotolo
The growing ubiquity of small form factor devices such as Palm Pilots and Pocket PCs, coupled with widespread availability of digital library materials and users' increasing willingness to read on the screen, raises the question of whether people can and will read digital library materials on handhelds. We investigated this question by performing a field study based on a university library's technology deployment: two classes were conducted using materials that were available in e-book format on Pocket PCs in addition to other electronic and paper formats. The handheld devices, the course materials, and technical support were all provided to students in the courses to use as they saw fit. We found that the handhelds were a good platform for reading secondary materials, excerpts, and shorter readings; they were used in a variety of circumstances where portability is important, including collaborative situations such as the classroom. We also discuss the effectiveness of annotation, search, and navigation functionality on the small form factor devices. We conclude by defining a set of focal areas and issues for digital library efforts designed for access by handheld computers.
A graph-based recommender system for digital library BIBAFull-Text 65-73
  Zan Huang; Wingyan Chung; Thian-Huat Ong; Hsinchun Chen
Research shows that recommendations comprise a valuable service for users of a digital library [11]. While most existing recommender systems rely either on a content-based approach or a collaborative approach to make recommendations, there is potential to improve recommendation quality by using a combination of both approaches (a hybrid approach). In this paper, we report how we tested the idea of using a graph-based recommender system that naturally combines the content-based and collaborative approaches. Due to the similarity between our problem and a concept retrieval task, a Hopfield net algorithm was used to exploit high-degree book-book, user-user and book-user associations. Sample hold-out testing and preliminary subject testing were conducted to evaluate the system, by which it was found that the system gained improvement with respect to both precision and recall by combining content-based and collaborative approaches. However, no significant improvement was observed by exploiting high-degree associations.
The effects of topic familiarity on information search behavior BIBAFull-Text 74-75
  Diane Kelly; Colleen Cool
We describe results from a preliminary investigation of the relationship between topic familiarity and information search behavior. Two types of information search behaviors are considered: reading time and efficacy. Our results indicate that as one's familiarity with a topic increases, one's searching efficacy increases and one's reading time decreases. These results suggest that it may be possible to infer topic familiarity from information search behavior.

Classification and browsing

A language modelling approach to relevance profiling for document browsing BIBAFull-Text 76-83
  David J. Harper; Sara Coulthard; Sun Yixing
This paper describes a novel tool, SmartSkim, for content-based browsing or skimming of documents. The tool integrates concepts from passage retrieval and from interfaces, such as TileBars, which provide a compact overview of query term hits within a document. We base our tool on the concept of relevance profiling, in which a plot of retrieval status values at each word position of a document is generated. A major contribution of this paper is applying language modelling to the task of relevance profiling. We describe in detail the design of the SmartSkim tool, and provide a critique of the design. Possible applications of the tool are described, and we consider how an operational version of SmartSkim might be designed.
Compound descriptors in context: a matching function for classifications and thesauri BIBAFull-Text 84-93
  Douglas Tudhope; Ceri Binding; Dorothee Blocks; Daniel Cunliffe
There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This paper discusses a matching function for compound descriptors, or multi-concept subject headings, that does not rely on exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based on a measure of semantic closeness between terms, which has the potential to help with recall problems. The work reported is part of the ongoing FACET project in collaboration with the National Museum of Science and Industry and its collections database. The architecture of the prototype system and its interface are outlined. The matching problem for compound descriptors is reviewed and the FACET implementation described. Results are discussed from scenarios using the faceted Getty Art and Architecture Thesaurus. We argue that automatic traversal of thesaurus relationships can augment the user's browsing possibilities. The techniques can be applied both to unstructured multi-concept subject headings and potentially to more syntactically structured strings. The notion of a focus term is used by the matching function to model AAT modified descriptors (noun phrases). The relevance of the approach to precoordinated indexing and matching faceted strings is discussed.
Structuring keyword-based queries for web databases BIBAFull-Text 94-95
  Rodrigo C. Vieira; Pavel Calado; Altigran S. da Silva; Alberto H. F. Laender; Berthier A. Ribeiro-Neto
This paper describes a framework, based on Bayesian belief networks, for querying Web databases using keywords only. According to this framework, the user inputs a query through a simple search-box. From the input query, one or more plausible structured queries are derived and submitted to Web databases. The results are then retrieved and presented to the user as ranked answers. To evaluate our framework, an experiment using 38 example queries was carried out. We found out that 97% of the time, one of the top three resulting structured queries is the proper one. Further, when the user selects one of these three top queries for processing, the ranked answers present average precision figures of 92%.
An approach to automatic classification of text for information retrieval BIBAFull-Text 96-97
  Hong Cui; P. Bryan Heidorn; Hong Zhang
In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.

A digital libraries for education

Middle school children's use of the ARTEMIS digital library BIBAFull-Text 98-105
  June Abbas; Cathleen Norris; Elliot Soloway
A case study of middle school student's interaction within a digital library, the differential use of interface features by students, and the issues of representation and retrieval obstacles are examined. A mechanism for evaluating user's search terms and questions is explained. Findings of a current case study indicate that student's interaction with the system varied between individual classes and between different achievement levels. Terms used by the system to represent the resources do not adequately represent the user groups' information needs.
Partnership reviewing: a cooperative approach for peer review of complex educational resources BIBAFull-Text 106-114
  John Weatherley; Tamara Sumner; Michael Khoo; Michael Wright; Marcel Hoffmann
Review of digital educational resources, such as course modules, simulations, and data analysis tools, can differ from review of scholarly articles, in the heterogeneity and complexity of the resources themselves. The Partnership Review Model, as demonstrated in two cases, appears to promote cooperative interactions between distributed resource reviewers, enabling reviewers to effectively divide up the task of reviewing complex resources with little explicit coordination. The shared structural outline of the resource made visible in the review environment enables participants to monitor other reviewers' actions and to thus target their efforts accordingly. This reviewing approach may be effective in educational digital libraries that depend on community volunteers for most of their reviewing.
A digital library for geography examination resources BIBAFull-Text 115-116
  Lian-Heong Chua; Dion Hoe-Lian Goh; Ee-Peng Lim; Zehua Liu; Rebecca Pei-Hui Ang
We describe a Web-based application developed above a digital library of geographical resources for Singapore students preparing to take a national examination in geography. The application provides an interactive, non-sequential approach to learning that supplements textbooks.
Digital library services for authors of learning materials BIBAFull-Text 117-118
  Flora McMartin; Youki Terada
Digital libraries, particularly those designed to meet the needs of educators and students, focus their primary services on the needs of their end users [1]. In this paper, we introduce and discuss the types of services authors of the materials cataloged within this type of digital library expect, or may find useful. Results from a study of authors cataloged in NEEDS -- a national engineering education digital library guide this discussion.

Novel search environments

Integration of simultaneous searching and reference linking across bibliographic resources on the web BIBAFull-Text 119-125
  William H. Mischo; Thomas G. Habing; Timothy W. Cole
Libraries and information providers are actively developing customized portals and gateway software designed to integrate secondary information resources such as A & I services, online catalogs, and publishers full-text repositories. This paper reports on a project carried out at the Grainger Engineering Library at the University of Illinois at Urbana-Champaign to provide web-based asynchronous simultaneous searching of multiple secondary information resources and integrated reference linking between bibliographic resources.
   The project has tested two different approaches to simultaneous broadcast searching. One approach utilizes custom distributed searchbots and shared blackboard databases. The other approach uses event-driven asynchronous HTTP queries within a single web script.
   The reference linking implementation is built around the application of OpenURL and Digital Object Identifier (DOI) technologies and the CrossRef metadata database within a proxy server environment.
Exploring discussion lists: steps and directions BIBAFull-Text 126-134
  Paula S. Newman
This paper describes some new facilities for exploring archived email-based discussion lists. The facilities exploit some specific properties of email messages to obtain improved archive overviews, and then use new tree visualizations, developed for the purpose, to obtain thread overviews and mechanisms to aid in the coherent reading of threads. We consider these approaches to be limited, but useful, approximations to more ideal facilities; a final section suggests directions for further work in this area.
Comparison of two approaches to building a vertical search tool: a case study in the nanotechnology domain BIBAFull-Text 135-144
  Michael Chau; Hsinchun Chen; Jialun Qin; Yilu Zhou; Yi Qin; Wai-Ki Sung; Daniel McDonald
As the Web has been growing exponentially, it has become increasingly difficult to search for desired information. In recent years, many domain-specific (vertical) search tools have been developed to serve the information needs of specific fields. This paper describes two approaches to building a domain-specific search tool. We report our experience in building two different tools in the nanotechnology domain -- (1) a server-side search engine, and (2) a client-side search agent. The designs of the two search systems are presented and discussed, and their strengths and weaknesses are compared. Some future research directions are also discussed.

Video and multimedia digital libraries

A multilingual, multimodal digital video library system BIBAFull-Text 145-153
  Michael R. Lyu; Edward Yau; Sam Sze
This paper presents the iVIEW system, a multi-lingual, multi-modal digital video content management system for intelligent searching and access of English and Chinese video contents. iVIEW allows full content indexing, searching and retrieval of multi-lingual text, audio and video material. It consists image processing techniques for scenes and scene changes analyses, speech processing techniques for audio signal transcriptions, and multi-lingual natural language processing techniques for word relevance determination. iVIEW can host multi-lingual contents and allow multi-modal search. It facilitate content developers to perform multi-modal information processing of rich video media and to construct XML-based multimedia representation in enhancing multi-modal indexing and searching capabilities, so that the end users can enjoy viewing flexible and seamless delivery of multimedia contents in various browsing tools and devices.
A digital library data model for music BIBAFull-Text 154-155
  Natalia Minibayeva; Jon W. Dunn
In this paper, we introduce a data and metadata model being developed for use in a music digital library system to support search and navigation of music content in multiple formats.
Video-cuebik: adapting image search to video shots BIBAFull-Text 156-157
  Alexander G. Hauptmann; Norman D. Papernick
We propose a new analysis for searching images in video libraries that goes beyond simple image search, which compares one still image frame to another. The key idea is to expand the definition of an image to account for the variability in the sequence of video frames that comprise a shot. A first implementation of this method for a QBIC-like image search engine shows a clear improvement over still image search. A combination of the traditional still image search and the new video image search provided the overall best results on the TREC video retrieval evaluation data.
Virtual multimedia libraries built from the web BIBAFull-Text 158-159
  Neil C. Rowe
We have developed a tool MARIE-4 for building virtual libraries of multimedia (images, video, and audio) by automatically exploring (crawling) a specified subdomain of the World Wide Web to create an index based on caption keywords. Our approach uses carefully-researched criteria to identify and rate caption text, and employs both an expert system and a neural network. We have used it to create a keyword-based interface to nearly all nontrivial captioned publicly-accessible U.S. Navy images (667,573), video (8,290), and audio (2,499), called the Navy Virtual Multimedia Library (NAVMULIB).
Multi-modal information retrieval from broadcast video using OCR and speech recognition BIBAFull-Text 160-161
  Alexander G. Hauptmann; Rong Jin; Tobun Dorbin Ng
We examine multi-modal information retrieval from broadcast video where text can be read on the screen through OCR and speech recognition can be performed on the audio track. OCR and speech recognition are compared on the 2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more important that speech recognition for video retrieval. OCR retrieval can further improve through dictionary-based post-processing. We demonstrate how to utilize imperfect multi-modal metadata results to benefit multi-modal information retrieval.

OAI application

Extending SDARTS: extracting metadata from web databases and interfacing with the open archives initiative BIBAFull-Text 162-170
  Panagiotis G. Ipeirotis; Tom Barry; Luis Gravano
SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines two complementary existing protocols, SDLIP and STARTS, to define a uniform interface that collections should support for searching and exporting metasearch-related metadata. SDARTS also includes a toolkit with wrappers that are easily customized to make both local and remote document collections SDARTS-compliant. This paper describes two significant ways in which we have extended the SDARTS toolkit. First, we have added a tool that automatically builds rich content summaries for remote web collections bym probing the collections with appropriate queries. These content summaries can then be used by a metasearcher to select over which collections to evaluate a given query. Second, we have enhanced the SDARTS toolkit so that all SDARTS-compliant collections export their metadata under the emerging Open Archives Initiative (OAI) protocol. Conversely, the SDARTS toolkit now also allows all OAI-compliant collections to be made SDARTS-compliant with minimal effort. As a result, we implemented a bridge between SDARTS and OAI, which will facilitate easy interoperability among a potentially large number of collections. The SDARTS toolkit, with all related documentation and source code, is publicly available at http://sdarts.cs.columbia.edu.
Using the open archives initiative protocols with EAD BIBAFull-Text 171-180
  Christopher J. Prom; Thomas G. Habing
The Open Archives Initiative Protocols present a promising opportunity to make metadata about archives, manuscript collections, and cultural heritage resources easier to locate and search. However, several technical barriers must be overcome before useful OAI records can be produced from the disparate metadata formats used to describe these resources. This paper examines Encoded Archival Description (EAD) as a test case of the issues to be addressed in transforming cultural heritage metadata to OAI. While EAD and OAI may appear to be incompatible, a mapping would be both useful and technically feasible. The authors suggest that it will be necessary to create numerous OAI records from one EAD file. In addition, the findings indicate that further standardization of EAD markup practices would enhance interoperability.
Preservation and transition of NCSTRL using an OAI-based architecture BIBAFull-Text 181-182
  H. Anan; X. Liu; K. Maly; M. Nelson; M. Zubair; J. C. French; E. Fox; P. Shivakumar
NCSTRL (Networked Computer Science Technical Reference Library) is a federation of digital libraries providing computer science materials. The architecture of the original NCSTRL was based largely on the Dienst software. It was implemented and maintained by the digital library group at Cornell University until September 2001. At that time, we had an immediate goal of preserving the existing NCSTRL collection and a long-term goal of providing a framework where participating organizations could continue to disseminate technical publications. Moreover, we wanted the new NCSTRL to be based on OAI (Open Archives Initiative) principles that provide a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience in moving towards an OAI-based NCSTRL.
Integrating harvesting into digital library content BIBAFull-Text 183-184
  David A. Smith; Anne Mahoney; Gregory Crane
The Open Archives Initiative has gained success by aiming between complex federation schemes and low functionality web crawling. Much information still remains hidden inside documents catalogued by OAI metadata. We discuss how subdocument information can be exposed by data providers and exploited by service providers. We discuss services for citation reversal and name and term linking with harvested data in the Perseus Project's document management system and a proxy service for automatically adding these links to OAI documents outside Perseus.

Searching across language, time, and space

Harvesting translingual vocabulary mappings for multilingual digital libraries BIBAFull-Text 185-190
  Ray R. Larson; Fredric Gey; Aitao Chen
This paper presents a method of information harvesting and consolidation to support the multilingual information requirements for cross-language information retrieval within digital library systems. We describe a way to create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. We will describe a multilingual conceptual mapping resource with broad coverage (over 100 written languages can be supported) that is truly multilingual as opposed to bilingual parings usually derived from machine translation. This resource is derived from the 10+ million title online library catalog of the University of California. It is created statistically via maximum likelihood associations from word and phrases in book titles of many languages to human assigned subject headings in English. The 150,000 subject headings can form interlingua mappings between pairs of languages or from one language to several languages. While our current demonstration prototype maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish), extensions to additional languages are straightforward. We also describe how this resource is being expanded for languages where linguistic coverage is limited in our initial database, by automatically harvesting new information from international online library catalogs using the Z39.50 networked library search protocol.
Detecting events with date and place information in unstructured text BIBAFull-Text 191-196
  David A. Smith
Digital libraries of historical documents provide a wealth of information about past events, often in unstructured form. Once dates and place names are identified and disambiguated, using methods that can differ by genre, we examine collocations to detect events. Collocations can be ranked by several measures, which vary in effectiveness according to type of events, but the log-likelihood measure (-2 log &lgr;) offers a reasonable balance between frequently and infrequently mentioned events and between larger and smaller spatial and temporal ranges. Significant date-place collocations can be displayed on timelines and maps as an interface to digital libraries. More detailed displays can highlight key names and phrases associated with a given event.
Using sharable ontology to retrieve historical images BIBAFull-Text 197-198
  Von-Wun Soo; Chen-Yu Lee; Jaw Jium Yeh; Ching-chih Chen
We present a framework of utilizing sharable domain ontology and thesaurus to help the retrieval of historical images of the First Emperor of China's terracotta warriors and horses. Incorporating the sharable domain ontology in RDF and RDF schemas of semantic web and a thesaurus, we implement methods to allow easily annotating images into RDF instances and parsing natural language like queries into the query schema in XML format. We also implement a partial structural matching algorithm to match the query schema with images at the level of semantic schemas. Therefore the historical images can be retrieved by naive users of domain specific history in terms of natural language like queries.
Towards an electronic variorum edition of Cervantes' Don Quixote:: visualizations that support preparation BIBAFull-Text 199-200
  Rajiv Kochumman; Carlos Monroy; Richard Furuta; Arpita Goenka; Eduardo Urbina; Erendira Melgoza
The Cervantes Project is creating an Electronic Variorum Edition (EVE) of Cervantes' well-known Don Quixote de la Mancha, published beginning in 1605. In this paper, we report on visualizations of features of a text collection that help us validate our text transcriptions and understand the relationships among the different printings of an edition.


Core services in the architecture of the national science digital library (NSDL) BIBAFull-Text 201-209
  Carl Lagoze; William Arms; Stoney Gan; Diane Hillmann; Christopher Ingram; Dean Krafft; Richard Marisa; Jon Phipps; John Saylor; Carol Terrizzi; Walter Hoehn; David Millman; James Allan; Sergio Guzman-Lara; Tom Kalt
We describe the core components of the architecture for the National Science Digital Library (NSDL). Over time the NSDL will include heterogeneous users, content, and services. To accommodate this, a design for a technical and organization infrastructure has been formulated based on the notion of a spectrum of interoperability. This paper describes the first phase of the interoperability infrastructure including the metadata repository, search and discovery services, rights management services, and user interface portal facilities.
Creating virtual collections in digital libraries: benefits and implementation issues BIBAFull-Text 210-218
  Gary Geisler; Sarah Giersch; David McArthur; Marty McClelland
Digital libraries have the potential to not only duplicate many of the services provided by traditional libraries but to extend them. Basic finding aids such as search and browse are common in most of today's digital libraries. But just as a traditional library provides more than a card catalog and browseable shelves of books, an effective digital library should offer a wider range of services. Using the traditional library concept of special collections as a model, in this paper we propose that explicitly defining sub-collections in the digital library -- virtual collections -- can benefit both the library's users and contributors and increase its viability. We first introduce the concept of a virtual collection, outline the costs and benefits for defining such collections, and describe an implementation of collection-level metadata to create virtual collections for two different digital libraries. We conclude by discussing the implications of virtual collections for enhancing interoperability and sharing across digital libraries, such as those that are part of the National SMETE Digital Library.
Ontology services for curriculum development in NSDL BIBAFull-Text 219-220
  Amarnath Gupta; Bertram Ludascher; Reagan W. Moore
We describe our effort to develop an ontology service on top of an educational digital library. The ontology is developed by relating library holdings to the educational concepts they refer to. The ontology system supports basic services like ontology-based search and complex services such as comparison of multiple curricula.
Interactive digital library resource information system: a web portal for digital library education BIBAFull-Text 221-222
  Ahmad Rafee Che Kassim; Thomas R. Kochtanek
This paper describes a collaborative database project that focuses on access to materials on topics relating to digital libraries that are organized within an educational framework.

Digital library communities and change

Cross-cultural usability of the library metaphor BIBAFull-Text 223-230
  Elke Duncker
Computing metaphors have become an integral part of information systems design, yet they are deeply rooted in cultural practices. This paper presents an investigation of the cross-cultural use and usability of such metaphors by studying the library metaphor of digital libraries in the cultural context of the Maori, the indigenous population of New Zealand. The ethnographic study examines relevant features of the Maori culture, their form of knowledge transfer and their use of physical and digital libraries. On this basis, the paper points out why and when the library metaphor fails Maori and other indigenous users, and indicates how this knowledge can contribute to the improvement of future designs.
Trust and epistemic communities in biodiversity data sharing BIBAFull-Text 231-239
  Nancy A. Van House
Trust is a key element of knowledge work: what we know depends largely on others. This paper discusses the concepts of communities of practice and epistemic cultures, and their implication for design of digital libraries that support data sharing, with particular reference to practices of trust and credibility. It uses an empirical study of a biodiversity digital library of data from a variety of sources to illustrate implications digital library design and operation. It concludes that diversity and uncomfortable boundary areas typify, not only digital library user groups, but the design and operation of digital libraries.
Evaluation of digital community information systems BIBAFull-Text 240-241
  K. T. Unruh; K. E. Pettigrew; J. C. Durrance
Community information systems provide a critical link between local resources and residents. While online versions of these systems have potential benefits, a systematic evaluation framework is needed to analyze and document realized impacts. Based on data from a nation-wide study of digital community information systems, an evaluation framework is proposed.
Adapting digital libraries to continual evolution BIBAFull-Text 242-243
  Bruce R. Barkstrom; Melinda Finch; Michelle Ferebee; Calvin Mackey
In this paper, we describe five investment streams (data storage infrastructure, knowledge management, data production control, data transport and security, and personnel skill mix) that need to be balanced against short-term operating demands in order to maximize the probability of long-term viability of a digital library. Because of the rapid pace of information technology change, a digital library cannot be a static institution. Rather, it has to become a flexible organization adapted to continuous evolution of its infrastructure.

Models and tools for generating digital libraries

Localizing experience of digital content via structural metadata BIBAFull-Text 244-252
  Naomi Dushay
With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. Our framework also accommodates extensible digital object behaviors and interoperability. The two key components of our approach are 1) exposing structural metadata associated with digital objects -- metadata about labeled access points within a digital object and 2) information intermediaries called context brokers that match structural characteristics of digital objects with mechanisms that produce behaviors. These context brokers allow for localized rendering of digital information stored externally.
Collection synthesis BIBAFull-Text 253-262
  Donna Bergmark
The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.
5SL: a language for declarative specification and generation of digital libraries BIBAFull-Text 263-272
  Marcos Andre Goncalves; Edward A. Fox
Digital libraries (DLs) are among the most complex kinds of information systems, due in part to their intrinsic multi disciplinary nature. Nowadays DLs are built within monolithic, tightly integrated, and generally inflexible systems -- or by assembling disparate components together in an ad-hoc way, with resulting problems in interoperability and adaptability. More importantly, conceptual modeling, requirements analysis, and software engineering approaches are rarely supported, making it extremely difficult to tailor DL content and behavior to the interests, needs, and preferences of particular communities. In this paper, we address these problems. In particular, we present 5SL, a declarative language for specifying and generating domain-specific digital libraries. 5SL is based on the 5S formal theory for digital libraries and enables high-level specification of DLs in five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different logical and presentational properties and operations of DL components (Spatial Model); the behavior of the DL (Scenario Model); and the different societies of actors and managers of services that act together to carry out the DL behavior (Societal Model). The practical feasibility of the approach is demonstrated by the presentation of a 5SL digital library generator for the MARIAN digital library system.

Novel user interfaces

A digital library of conversational expressions: helping profoundly disabled users communicate BIBAFull-Text 273-274
  Hayley Dunlop; Sally Jo Cunningham; Matt Jones
Digital libraries are for everyone. This paper describes the development of a digital library for a user who has a profound physical disability that means she cannot communicate verbally, and cannot use conventional communication tools.
Enhancing the ENVISION interface for digital libraries BIBAFull-Text 275-276
  Jun Wang; Abhishek Agrawal; Anil Bazaza; Supriya Angle; Edward A. Fox; Chris North
To enhance the ENVISION interface and facilitate user interaction, various techniques were considered for better rendering of search results with improved scalability. In this paper we discuss the challenges we encountered and our solutions to those problems.
A wearable digital library of personal conversations BIBAFull-Text 277-278
  Wei-hao Lin; Alexander G. Hauptmann
We have developed a wearable, personalized digital library system, which unobtrusively records the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person and hears the same voice, it can replay parts of the last conversation in compressed form. Results from a prototype system show the effectiveness of combining of face recognition and speaker identification for retrieving conversations.
Collaborative visual interfaces to digital libraries BIBAFull-Text 279-280
  Katy Borner; Ying Feng; Tamara McMahon
This paper argues for the design of collaborative visual interfaces to digital libraries that support social navigation. As an illustrative example we present work in progress on the design of a three-dimensional document space for a scholarly community -- namely faculty, staff, and students at the School of Library and Information Science, Indiana University.
Binding browsing and reading activities in a 3D digital library BIBAFull-Text 281-282
  Pierre Cubaud; Pascal Stokowski; Alexandre Topol
Browsing through digitalized books collections and reading activities are separated in most present WWW-based user's interfaces of digital libraries. This context break induces longer apprenticeship and navigation time within the interface. We study in this paper how 3D interaction metaphors provide a continuous navigation space for these two tasks.

Federating and harvesting metadata

DP9: an OAI gateway service for web crawlers BIBAFull-Text 283-284
  Xiaoming Liu; Kurt Maly; Mohammad Zubair; Michael L. Nelson
Many libraries and databases are closed to general-purpose Web crawlers, and they expose their content only through their own search engines. At the same time many researchers attempt to locate technical papers through general-purpose Web search engines. DP9 is an open source gateway service that allows general search engines, (e.g. Google, Inktomi) to index OAI-compliant archives. DP9 does this by providing consistent URLs for repository records, and converting them to OAI queries against the appropriate repository when the URL is requested. This allows search engines that do not support the OAI protocol to index the "deep Web" contained within OAI compliant repositories.
The Greenstone plugin architecture BIBAFull-Text 285-286
  Ian H. Witten; David Bainbridge; Gordon Paynter; Stefan Boddie
This note describes how the Greenstone digital library system uses "plugins" to import documents and metadata in different formats, and associate metadata with the appropriate documents. Plugins that import documents can perform their own format conversion internally, or take advantage of existing conversion programs. Metadata can be read from the input documents, or from separate metadata files, or are computed from the documents themselves. New plugins can be written for novel situations.
Building FLOW: federating libraries on the web BIBAFull-Text 287-288
  Anna Keller Gold; Karen S. Baker; Jean-Yves LeMeur; Kim Baldridge
Individuals, teams, organizations, and networks can be thought of as tiers or classes within the complex grid of technology and practice in which research documentation is both consumed and generated. The panoply of possible classes share with the others a common need for document management tools and practices. The distinctive document management tools and practices used within each represent boundaries across which information could flow openly if technology and metadata standards were to provide an accessible digital framework. The CERN Document Server (CDS), implemented by a research partnership at the San Diego Supercomputer Center (SDSC), establishes a prototype tiered repository system for such a panoply. Research suggests modifications to enable cross-domain information flow and is represented as a metadata grid.
JAFER ToolKit project: interfacing Z39.50 and XML BIBAFull-Text 289-290
  Antony Corfield; Matthew Dovey; Richard Mawby; Colin Tatham
In this paper, we describe the JAFER ToolKit project which is developing a simplified XML based API above the Z39.50 protocol[1]. The ToolKit allows the development of both Z39.50 based applications (both clients and servers) without detailed knowledge of the complexities of the protocol.
Schema extraction from XML collections BIBAFull-Text 291-292
  Boris Chidlovskii
XML Schema language has been proposed to replace Document Type Definitions (DTDs) as schema mechanism for XML data. This language consistently extends grammar-based constructions with constraint- and pattern-based ones and have a higher expressive power than DTDs. As schemas remain optional for XML, we address the problem of XML Schema extraction. We model the XML schema as extended context-free grammars and develop a novel extraction algorithm inspired by methods of grammatical inference. The algorithm copes also with the schema determinism requirement imposed by XML DTDs and XML Schema languages.
Mirroring an OAI archive on the I2-DSI channel BIBAFull-Text 293-294
  Ashwini Pande; Malini Kothapalli; Ryan Richardson; Edward A. Fox
The Open Archives Initiative (OAI) promotes interoperability among digital libraries and has created a protocol for data providers to easily export their metadata. One problem with this approach is that some of the more popular servers quickly become heavily loaded. The obvious solution is replication. Fortunately, the Internet-2 Distributed Storage Infrastructure (I2-DSI) has begun to develop technology for highly distributed transparent replication of servers. This paper presents our solution for transparent mirroring of OAI repositories within the I2-DSI.

Music digital libraries

HMM-based musical query retrieval BIBAFull-Text 295-300
  Jonah Shifrin; Bryan Pardo; Colin Meek; William Birmingham
We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. Pieces in the database are represented as hidden Markov models (HMMs). The query is treated as an observation sequence and a piece is judged similar to the query if its HMM has a high likelihood of generating the query. The top pieces are returned to the user in rank-order. This paper reports the basic approach for the construction of the target database of themes, encoding and transcription of user queries, and the results of initial experimentation with a small set of sung queries.
A comparison of melodic database retrieval techniques using sung queries BIBAFull-Text 301-307
  Ning Hu; Roger B. Dannenberg
Query-by-humming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in query-by-humming systems should be robust. We compare several variations of search algorithms in an effort to improve search precision. In particular, we describe a new frame-based algorithm that significantly outperforms note-by-note algorithms in tests using sung queries and a database of MIDI-encoded music.
Enhancing access to the levy sheet music collection: reconstructing full-text lyrics from syllables BIBAFull-Text 308-309
  Brian Wingenroth; Mark Patton; Tim DiLauro
The goal of the Lester S. Levy Sheet Music Collection, Phase Two project is to develop tools, processes, and systems that facilitate collection ingestion through automated processes that reduce, but not necessarily eliminate human intervention[1]. One of the major components of this project is an optical music recognition (OMR) system[2] that extracts musical information and lyric text from the page images that comprise each piece in a collection. It is often the case, as it is with the Levy Collection, that lyrics embedded in music notation are written in a syllabicated form so that each syllable lines up with the note or notes to which it corresponds. Searching the syllabicated form of words, however, would be counterintuitive and cumbersome for end-users. This paper describes the evolution of a tool that, using a simple algorithm, rebuilds complete words from lyric syllables and, in ambiguous cases, provides feedback to the collection builder. This system will be integrated into the workflow of the Levy Sheet Music Collection, but has broad applicability for any project ingesting musical scores with lyrics.
Evaluating automatic melody segmentation aimed at music information retrieval BIBAFull-Text 310-311
  Massimo Melucci; Nicola Orio
In this paper we investigate the effectiveness of a melody segmentation algorithm based on melodic feature. The segmentation produced by experienced music scholars have been compared with the algorithm, a random segmenter and a n-gram-based segmenter. Results showed that the algorithm is closer to manual segmentation than the other segmenters.

Preserving, securing, and assessing digital libraries

A methodology and system for preserving digital data BIBAFull-Text 312-319
  Raymond A. Lorie
This paper refers to a previous proposal made at the 1st Joint Conference on Digital Libraries, on a novel approach to the problem of the long-term archiving of digital data. It reports on ongoing work in refining the methodology and building an initial prototype. The method is based on the use of a Universal Virtual Computer (UVC) to specify the process that needs to be applied to the archived data in order to make it understandable for a future client. There is a certain amount of information (a Convention) that must be preserved for an indefinite time, to make sure that the client will be able to recover the information. A first version of this Convention is given here; it includes the architecture of the UVC. The paper also briefly mentions our current activities in implementation and evaluation.
Modeling web data BIBAFull-Text 320-321
  James C. French
We have created three testbeds of web data for use in controlled experiments in collection modeling. This short paper examines the applicability of Ziff's and Heaps' laws as applied to web data. We find extremely close agreement between observed vocabulary growth and Heaps' law. We find reasonable agreement with Ziff's law for medium to low frequency terms. Ziff's law is a poor predictor for high frequency terms. These findings hold for all three testbeds although we restrict ourselves to one here due to space limitations.
An evaluation model for a digital library services tool BIBAFull-Text 322-323
  Jim Dorward; Derek Reinke; Mimi Recker
This paper describes an evaluation model for a digital library tool, the Instructional Architect, which enables users to discover, select, reuse, sequence, and annotate digital library learning objects. By documenting our rapid-prototyping, iterative, and user-centered approach for evaluating a digital library service, we provide a model and set of methods that other developers may wish to employ. In addition, we provide preliminary results from our studies.
Why watermark?: the copyright need for an engineering solution BIBAFull-Text 324-325
  Michael Seadle; J. R., Jr. Deller; Aparna Gurijala
An important research component in the creation of the National Gallery of the Spoken Word (NGSW) is the development of watermarking technologies for the audio library. In this paper we argue that audio watermarking is a particularly desirable means of intellectual property protection. There is evidence that the courts consider watermarks to be a legitimate form of copyright protection. Watermarking facilitates redress, and represents a form of copyright protection that universities can use without being inconsistent in their mission to disseminate knowledge.

Image and cultural digital libraries

Time as essence for photo browsing through personal digital libraries BIBAFull-Text 326-335
  Adrian Graham; Hector Garcia-Molina; Andreas Paepcke; Terry Winograd
We developed two photo browsers for collections with thousands of time-stamped digital images. Modern digital cameras record photo shoot times, and semantically related photos tend to occur in bursts. Our browsers exploit the timing information to structure the collections and to automatically generate meaningful summaries. The browsers differ in how users navigate and view the structured collections. We conducted user studies to compare the two browsers and an un-summarized image browser. Our results show that exploiting the time dimension and appropriately summarizing collections can lead to significant improvements. For example, for one task category, one of our browsers enabled a 33% improvement in speed of finding given images compared to the commercial browser. Similarly, users were able to complete 29% more tasks when using this same browser.
Toward a distributed terabyte text retrieval system in China-US million book digital library BIBAFull-Text 336-337
  Bin Liu; Wen Gao; Ling Zhang; Tie-jun Huang; Xiao-ming Zhang; Jun Cheng
In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately, we are developing a distributed terabyte text retrieval system. With the query cache, system can search less data while maintaining acceptable retrieval accuracy. From the OEB package, we get its metadata and structural information to implement multi-scale indexing and retrieval. We are to explore some new retrieval models and text clustering approaches in the Digital Library.
Enhanced perspectives for historical and cultural documentaries using informedia technologies BIBAFull-Text 338-339
  Howard D. Wactlar; Ching-chih Chen
Speech recognition, image processing, and language understanding technologies have successfully been applied to broadcast news corpora to automate the extraction of metadata and make use of it in building effective video news retrieval interfaces. This paper discusses how these technologies can be adapted to cultural documentaries as represented by the award-winning First Emperor of China videodisc and multimedia CD. Through automated means, efficient interfaces into documentary contents can be built dynamically based on user needs. Such interfaces enable the assemblage of large video documentary libraries from component videodisc, CD, and videotape projects, with alternate views into the material complementing the original sequences authored by the materials' producers.
Interfaces for palmtop image search BIBAFull-Text 340-341
  Mark Derthick
Will current technology support search for video news or entertainment on mobile platforms? An Ipaq palmtop version of the Informedia Digital Video Library interface has already been developed at the Chinese University of Hong Kong. For these displays, the desktop technique of showing a large grid of images in parallel is not feasible. Perceptual psychology experiments suggest that time-multiplexing may be as effective as space-multiplexing for this kind of primed recognition task. In fact, it has been specifically suggested that image retrieval interfaces using Rapid Serial Visual Presentation (RSVP) may perform significantly better than parallel presentation even on a desktop computer [2]. In our experiments, we did not find this to be true. An important difference between previous RSVP experiments and our own is that image search engines rank retrievals, and correct answers are more likely to occur early in the list of results. Thus we found that scrolling (and low RSVP presentation rates) led to better recognition of answers that occur early, but worse for answers that occur far down the list. This split confounded the global effects that we hypothesized, yet in itself is an important consideration for future interface designs, which must adapt as search technology improves.

Digital libraries for spatial data

The ADEPT digital library architecture BIBAFull-Text 342-350
  Greg Janee; James Frew
The Alexandria Digital Earth ProtoType (ADEPT) architecture is a framework for building distributed digital libraries of georeferenced information. An ADEPT system comprises one or more autonomous libraries, each of which provides a uniform interface to one or more collections, each of which manages metadata for one or more items. The primary standard on which the architecture is based is the ADEPT bucket framework, which defines uniform client-level metadata query services that are compatible with heterogeneous underlying collections. ADEPT functionality strikes a balance between the simplicity of Web document delivery and the richness of Z39.50. The current ADEPT implementation runs as servlet-based middleware and supports collections housed in arbitrary relational databases.
G-Portal: a map-based digital library for distributed geospatial and georeferenced resources BIBAFull-Text 351-358
  Ee-Peng Lim; Dion Hoe-Lian Goh; Zehua Liu; Wee-Keong Ng; Christopher Soo-Guan Khoo; Susan Ellen Higgins
As the World Wide Web evolves into an immense information network, it is tempting to build new digital library services and expand existing digital library services to make use of web content. In this paper, we present the design and implementation of G-Portal, a web portal that aims to provide digital library services over geospatial and georeferenced content found on the World Wide Web. G-Portal adopts a map-based user interface to visualize and manipulate the distributed geospatial and georeferenced content. Annotation capabilities are supported, allowing users to contribute geospatial and georeferenced objects as well as their associated metadata. The other features included in G-Portal's design are query support, content classification, and content maintenance. This paper will mainly focus on the architecture design, visualization and annotation capabilities of G-Portal.


You mean I have to do what with whom: statewide museum/library DIGI collaborative digitization projects -- the experiences of California, Colorado & North Carolina BIBFull-Text 359
  Nancy Allen; Liz Bishoff; Robin Chandler; Kevin Cherry
Overcoming impediments to effective health and biomedical digital libraries BIBAFull-Text 360
  William Hersh; Jan Velterop; Alexa McCray; Gunther Eynsenbach; Mark Boguski
Digital libraries have great promise in the health and biomedical domains. Yet a variety of impediments exist to their more effective use. A series of panelists from a variety of backgrounds in health and biomedicine will explore these impediments and describe how they might be overcome.
The challenges of statistical digital libraries BIBAFull-Text 361
  Cathryn Dippo; Patricia Cruse; Ann Green; Carol Hert
What are statistical digital libraries? Who uses them? For what purpose? How do they differ from or resemble text-focused digital libraries? What are the research issues associated with their use and the implications for interface design?
   These are just some of the issues the panelists have been grappling with over the last few years as government agencies and academic libraries rush to make their holdings web-accessible to both the users they have always served and all kinds of new users with varying statistical and computing skills.
   The panelists represent a variety of user-oriented perspectives-some are developers, some are intermediaries, some are users themselves. Their primary user focus varies from university students and faculty to government policy analysts, but the casual or first-time user must also be served.
   The panelist will focus their remarks on the challenges of statistical libraries on a multitude of dimensions, including technical, social, behavioral, economic, organizational, etc. The discussion should both inform and entice the audience to pursue some difficult and interesting problems in digital library research.
Biodiversity and biocomplexity informatics: policy and implementation science versus citizen science BIBAFull-Text 362-364
  P. Bryan Heidorn
Biological science is one of the top ten social trends and the twenty-first Century has been defined as "The Age of Biology" [1]. One of the central themes of this age is biodiversity. Biodiversity is the richness of life. Biodiversity includes the variety of genes within one species through the complex interconnection of all life within an environment. One of the grand challenges of the twenty-first century is to document and understand the world's natural heritage. The management of the many kinds of information associated with this endeavor is "Biodiversity Informatics." There are many efforts developing worldwide to collect and distribute this information in digital collections. Some of these efforts are complementary; some efforts are in conflict and are just independent. There is a great need to integrate this information to increase its usefulness and value. Unfortunately, this integration is extremely difficult because of the diversity of the use and users of the information and the diversity of the information itself. The panelists will discuss different perspectives on the construction of global biodiversity digital libraries from the perspective of different goals and uses.
Panel on digital preservation BIBAFull-Text 365-367
  Joyce Ray; Robin Dale; Reagan Moore; Vicky Reich; William Underwood; Alexa T. McCray
Digital information in any form is at risk. Software and hardware become obsolete, and versions and file formats change, making data inaccessible. Data stored in even the simplest form are in danger due to computer media degradation and obsolescence. On-line information such as e-journals and databases are susceptible. They may become partially or entirely unreadable, and may not be recoverable by the time the problem is detected. Preservation strategies such as emulation (keeping alive the software and hardware needed to access a digital object), migration (converting the digital object to new versions and formats), and other long-term archival methods have been proposed [1-7]. Models such as the Open Archival Information System (OAIS) provide an architecture for conducting digital preservation research and experimentation [8-10]. The importance of preservation metadata has been recognized by a number of groups and efforts to develop and deploy metadata standards are underway [11-14].As more and more digital information is created, attention must be paid to what information should be preserved and how it can be preserved most economically and effectively. It is clear that for preservation to be successful, we need to pay attention not only to the format of digital objects, but also to the commitment we make to providing long-term access to the information. Thus, decisions about digital preservation will involve technical issues as well as economic, legal, social, and organizational ones. Is it possible or feasible to preserve all digital data automatically and in a cost effective way? How much functionality can or must be preserved? What type of metadata will be needed to ensure both access and preservation? What metrics do we use to evaluate whether our methods will be successful.
   Panelists will make short presentations about work in which they have been involved and which reflect a variety of aspects of digital preservation. Reagan Moore will discuss the levels of abstraction that are needed to create infrastructure independent representations for data, information, and knowledge, and he will discuss a prototype persistent digital archive. The persistent archive infrastructure has been developed for use by the National Archives and Records Administration and other Federal agencies. William Underwood will report on lessons learned in preserving digital records created on personal computers. The records being examined are the digital records created on personal computers during the administration of President George Bush (1988-1992). Vicky Reich will present work on the LOCKSS (Lots of Copies Keep Stuff Safe) project, which is a permanent web publishing and access system. LOCKSS software allows libraries to retain local collection control of materials delivered through the web while preserving the functionality of the original web based content. Robin Dale will report on activities of the preservation program of the Research Libraries Group (RLG). She will focus on the joint work of RLG and OCLC (Online Computer Library Center) on preservation metadata. Following the presentations by the four panelists, Alexa McCray will provide brief comments and then open the discussion for audience participation.
NSDL: from prototype to production to transformational national resource BIBAFull-Text 368
  William Y. Arms; Edward Fox; Jeanne Narum; Ellen Hoffman
This panel will discuss the first release of the National Science Digital Library and plans for growing it into a very large, comprehensive library of digital materials relevant to science education.
How important is metadata? BIBAFull-Text 369
  Hector Garcia-Molina; Diane Hillmann; Carl Lagoze; Elizabeth Liddy; Stuart Weibel
Metadata is expensive. Information services and digital library researchers spend considerable time, effort, and money on metadata. It is time to ask a number of important questions:
  • How much metadata is really necessary and for what reason?
  • What are the right metrics for metadata; its correctness, appropriateness,
       and return on investment?
  • Is metadata harvesting really useful for the creation of digital library
  • Are the assumptions about the utility, or even necessity, of metadata a
       legacy of years of library science and practice?
  • Do these assumptions make sense in the current context of massive computing
       power and automatic analysis? Clearly there is no one "correct' answer to these questions. The panel will provide the forum for practitioners and researchers from a number of areas to express their views and, hopefully, provoke stimulating discussions from the audience.
  • Planning for future digital libraries programs BIBAFull-Text 370
      Stephen M. Griffin
    This panel will discuss alternatives for follow-on Federal program activities to the Interagency Digital Libraries Initiative -- Phase 2 (DLI-2). The current Digital Libraries Initiative -- Phase 2 awards receive final funding increments in FY 2003. The National Science Foundation and other interested agencies wish to begin informal planning for potential follow-on activities to DLI-2 at this time. As in the past, sponsoring agencies look to the various stakeholder communities to assist in the creation of funding programs that are responsive to values, needs and opportunities.
       Panelists will present viewpoints as to the most important topical elements and effective program structures in light of the continuing rapid evolution of digital libraries technologies, computing and communication infrastructures and dramatic increase in networked digital content. Audience remarks will be encouraged, particularly suggestions for enabling broad community involvement in the planning dialogue.


    The Miguel de Cervantes Digital Library: a wide diversity of content, media, functionality and services BIBAFull-Text 371
      Alejandro Bia
    This demo describes the philosophy behind what represents one of the most ambitious projects of its kind in the Spanish-speaking world: The Miguel de Cervantes Digital Library (http://cervantesvirtual.com/). It shows the new ground being explored in terms of the wide variety of contents, media, functionality and services it offers to a worldwide audience. These services are meant to be used in serious research, as teaching aids, or just for cultural amusement and enjoyment.
       We will also describe the technical underpinnings of this project reporting also ongoing research and development activities.
    DSpace: durable digital documents BIBAFull-Text 372
      Margret Branschofsky; Daniel Chudnov
    The DSpace system for long-term management of institutional scholarly research repositories is now in use at the MIT Libraries; we will demonstrate the system and provide more information about its design, use at MIT, and other potential uses.
    NanoPort: a web portal for nanoscale science and technology BIBFull-Text 373
      Michael Chau; Hsinchun Chen; Jialun Qin; Yilu Zhou; Wai-Ki Sung; Yongchi Chen; Yi Qin; Daniel McDonald; Ann Lally; Matthew Landon
    Variations2: a digital music library system BIBAFull-Text 374
      Jon W. Dunn; Eric J. Isaacson
    This demonstration will show version 1.0 of the Variations2 digital library system developed by Indiana University. Variations2 is being built to provide access to music in a variety of formats-sound recordings, scanned musical scores, computer score notation files, and video-and is designed to support research and learning in the field of music.
    The learning matrix: cataloging resources with rich metadata BIBAFull-Text 375
      Lyndsay R. Greer
    Effective searching of a digital library requires that the library keep rich metadata about each of its resources. Entering complex metadata efficiently, accurately, and consistently can be confusing and time consuming. A demonstration of the Learning Matrix's Cataloging Tool, a web-based solution for creating metadata while uploading resources to the digital library, will be presented.
    Video retrieval with multiple image search strategies BIBFull-Text 376
      Alexander G. Hauptmann; Michael G. Christel; Norman D. Papernick
    Reprocessing paper-based reference materials for the digital environment BIBAFull-Text 377
      P. Bryan Heidorn
    One of the primary challenges for the creation of digital libraries is to enhance the value of paper-based publications by providing digital access to the materials. Simple full-text searching is just a first step in this process. Better functionality may be gained by exploiting the natural structure within text. The following paper describes the process of digital conversion and integration of encyclopedic publications, glossaries and thesauri. The Biological Information Browsing (http://www.biobrowser.org) team developed text-processing tools, and an information retrieval and visualization environment that provides greater functionality for these traditionally paper-based publications [1]. The process includes automatic text segmentation and structuring, automated XML markup, structure-based indexing, automatic thesaurus extraction for query expansion and on-line definitions. Very few other information systems provide complete services for publishing, indexing, XML query and retrieving documents.
    A framework for collaborative information environments and unified access to distributed digital content BIBAFull-Text 378
      Jon Herlocker; Janet Webster; Seikyung Jung; Anton Dragunov; Tim Holt; Tammy Culter; Sally Haerer
    In this demo, we will present two prototypes of digital information portals developed using a new common framework: The High Performance Computing Virtual Consultant and the Tsunami Digital Library. This framework supports the creation of digital library portals that include not only local data but distributed content that is not under the control of the portal maintainers, such as remote web sites. The framework provides a common user interface across all resources, even if the resources are served by a remote web site. Furthermore, the framework contains features that support effective low maintenance operation and intelligent learning search and layout algorithms.
    Active netlib: an active mathematical software collection for inquiry-based computational science & engineering education BIBAFull-Text 379
      Shirley Moore; A. J. Baker; Jack Dongarra
    A core subject in the undergraduate education of application scientists and engineers is the use of mathematical software to solve computational problems. To make effective use of mathematical software, application developers need a basic understanding of the underlying numerical methods and enough knowledge to be able to choose an appropriate solver, parameterize it correctly, and validate the computed results. Correct results are of course required, but good computational performance is desired as well. Most application scientists have neither the time nor the interest to read the current literature in numerical analysis. They solve numerical problems by relying on the methods and programs they learned about in previous coursework. This tendency has the unfortunate consequence that new methods with improved functionality and/or efficiency may go unused by practicing engineers. Application engineers need enough understanding of the underlying numerical methods to be able to detect and diagnose problems that occur and to modify or customize the methods if necessary.
       A large amount of mathematical software is both commercially and freely available. However, not all the software that is available is of high quality. It can also be difficult to locate the appropriate software by using web search engines, since the descriptions available for searching may be lacking or may not match the vocabulary used by the searcher. A good solution to these problems is to have experts in the field of numerical analysis maintain a moderated collection of high quality software which is organized and cataloged with appropriate metadata to enable easy searching. The Netlib mathematical software repository is such a collection that has been contributed to and managed by the numerical analysis community for the past fifteen years.
       Active Netlib provides an active collection of high-quality mathematical software resources in the context of an inquiry-based learning environment for computational science and engineering education. The Netlib collection is being extended in a number of ways to support the goals of this project. The NetSolve client-server system for accessing hardware and software resources over a network provides an active interface to the contents of Netlib. NetSolve essentially constructs network-accessible objects with executable content from the software packages in Netlib.
       By making the subroutines housed in Netlib available over the network on computational servers, NetSolve enables access to up-to-date mathematical software from a variety of client interfaces running on users' workstations, without requiring the users to download and install the software themselves.
       This demonstration will illustrate the following digital library technology:
    Stanford encyclopedia of philosophy: a dynamic reference work BIBAFull-Text 380
      Uri Nodelman; Colin Allen; Edward N. Zalta
    The Recent work of the Stanford Encyclopedia of Philosophy project http://plato.stanford.edu/ has been focused on fostering and managing the growth of a dynamic reference work. Our particular project is to produce an authoritative and comprehensive dynamic reference work devoted to the academic discipline of philosophy that will be kept up to date dynamically so as to remain useful to those in academia and the general public.
    Selected component technologies in digital libraries BIBAFull-Text 381
      Joel Plutchak; Joe Futrelle; Jeff Gaynor
    The demonstration will illustrate digital library component technologies joined together to provide solutions to common data mining, parsing, and archiving problems.
    Digital library system: capture, analysis, query, and display 3D data BIBAFull-Text 382
      Jeremy Rowe; Anshuman Razdan
    This paper describes development of a storage, archival, and sketch-based query and retrieval system for 3D objects. The initial focus has been Native American ceramic vessels, scanned and defined as a set of three-dimensional triangulated meshes composed of points and triangles. The process involves modeling the data with parametric surfaces, and extracting relevant features to raise the level of abstraction of data. The project uses a class based XML schema to catalog and organize vessel data. A visual query process was developed to permit users to interact with the data using sketches or by selecting sample vessel shapes to augment text and metric search criteria to retrieve original and modeled data, and interactive 2D and 3D models.
    Souvenir: flexible note-taking tool to pinpoint and share media in digital libraries BIBAFull-Text 383
      Anselm Spoerri
    Digital media audio/video can be difficult to search and share in a personal way. Souvenir is a software system that offers users a flexible and comprehensive way to use their handwritten or text notes to retrieve and share specific media moments. Users can take notes on a variety of devices, such as the paper-based CrossPad, the Palm Pilot and standard keyboard devices. Souvenir segments handwritten notes into an effective media index without the need for handwriting recognition. Users can use their notes to create hyperlinks to random-access media stored in a digital library. Souvenir also has web publishing and email capabilities to enable anyone to access or email media moments directly from a web page. Souvenir annotations capture information that can not be easily inferred by automatic media indexing tools.
    FACET: thesaurus retrieval with semantic term expansion BIBAFull-Text 384
      Douglas Tudhope; Ceri Binding; Dorothee Blocks; Daniel Cunliffe
    There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This demonstration of a research prototype illustrates a matching function for compound descriptors, or multi-concept subject headings, that does not rely on exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms [5]. The matching function is based on a measure of semantic closeness between terms [4].The work is part of the EPSRC funded FACET project [2] in collaboration with the UK National Museum of Science and Industry (NMSI) which includes the National Railway Museum [3]. An export of NMSI's Collections Database is used as the dataset for the research. The J. Paul Getty Trust's Art and Architecture Thesaurus (AAT) [1] is the main thesaurus in the project. The AAT is a widely used thesaurus (over 120,000 terms). Descriptors are organised in 7 facets representing separate conceptual classes of terms.
       The FACET application is a multi tiered architecture accessing a SQL Server database, with an OLE DB connection. The thesauri are stored as relational tables in the Server's database. However, a key component of the system is a parallel representation of the underlying semantic network as an in-memory structure of thesaurus concepts (corresponding to preferred terms). The structure models the hierarchical and associative interrelationships of thesaurus concepts via weighted poly-hierarchical links. Its primary purpose is real-time semantic expansion of query terms, achieved by a spreading activation semantic closeness algorithm. Queries with associated results are stored persistently using XML format data. A Visual Basic interface combines a thesaurus browser and an initial term search facility that takes into account equivalence relationships. Terms are dragged to a direct manipulation Query Builder which maintains the facet structure.
    Visualizing the archive of a computer mediated communication process BIBAFull-Text 385
      Bin Zhu; Hsinchun Chen
    The archive of computer-mediated communication (CMC) process contains knowledge shared and information about participants' behavior patterns. However, most CMC systems focus only on organizing the content of discussions. We propose to demo a prototype system that integrates a social visualization technique with existing information analysis technologies to graphically summarize both the content and behavior of a CMC process.
    MedTextus: an intelligent web-based medical meta-search system BIBAFull-Text 386
      Bin Zhu; Gondy Leroy; Hsinchun Chen; Yongchi Chen
    We propose to demonstrate a web-based prototype system that integrates the meta-search approach with existing information analysis and visualization technologies to facilitate concept-based searching behavior over the medical domain. The system distinguishes itself from other meta-search engines through two features. It incorporates the co-occurrence analysis and existing ontology to understand user's query. It also utilizes the self-organizing map (SOM) to categorize and visualize search results.
    Virtual Oregon: seamless access to distributed environmental information BIBAFull-Text 387
      Dylan Keon; Cherri Pancake; Dawn Wright
    Virtual Oregon is a new data coordination center established at Oregon State University (OSU) in order to: (1) archive environmental and other place-based data on Oregon and associated areas; (2) make those data accessible to a broad spectrum of agencies and individuals via innovative web interfaces; (3) identify key data sets that are not yet available and encourage their collection and dissemination; and (4) facilitate development of statewide standards for archiving, documenting, and disseminating data. Rather than co-locating researchers and data in a physical center, Virtual Oregon employs a distributed architecture that occupies multiple locations while users are presented with the illusion of a single, centralized facility. This approach was selected not just to maximize the impact on campus students, faculty, and staff but also toservice broader interactions with extension agents and other members of OSU's statewide community.
       Virtual Oregon builds on regional GIS centers and databanks in a wide range of disciplines, providing decades of research data on topics as varied as climate, biodiversity, land ownership, water quality, wildfire, and agricultural production. Our proximity to agencies such as the Oregon Climate Service, Oregon Natural Heritage Program, Oregon Flora Project, OSU Herbarium, EPA, and Forest Service adds breadth to data type and availability. Designed as a distributed architecture, Virtual Oregon has four nodes, each of which serves as a center and clearinghouse for distinct types of information and services:


    A DL server with OAI capabilities: LOVE BIBAFull-Text 388
      Su-Shing Chen; Chee-Yoong Choo
    We describe integrating OAI (Open Archives Initiative) and DL (Digital Library) capabilities in the integrated prototype DL server: LOVE (Learning Object Virtual Exchange).
    Evaluating a digital video library web interface BIBFull-Text 389
      Michael G. Christel; Pedro Cubilo; Junius Gunaratne; William Jerome; Eun-Ju O; Sohini Solanki
    Puget sound's MARS (media asset retrieval systems) digital library BIBAFull-Text 390
      Efthimis N. Efthimiadis; Jens-Erik Mai
    The Corporation for Public Broadcasting (CPB) and public broadcasters consider Media Asset Management (MAM) of critical importance since without a concerted and cooperative plan to manage their vast library of content, broadcasters are unable to reach their potential for service in the digital age.
       The concerns for Media Asset Management, which are the Digital Libraries for broadcasters, human and technical, are myriad. Media Asset Management is the framework upon which many of the largest technology projects will be built, including the future interconnection system between and among CPB member stations. It is CPB's hope that its licensees and their partners in university, museum, and library communities, will work together to contribute to Media Asset Management solutions.
       This poster will present some of the complex issues around Media Asset Management and possible solutions to the problems as well as show the breath of research projects in the area. The issues include metadata, indexing, controlled vocabularies, storage and access methods, rights management, technological infrastructure requirements, and interoperability.
       To highlight these issues we will use as an example the Media Asset Retrieval System (MARS) project. The goal of this project is to create a model for representing, organizing, storing, and facilitating access to public audio (radio) and audiovisual (television) broadcast material via the Internet. The immediate goal for MARS is to produce a digital online resource that will provide access to material produced by public broadcasters in the Puget Sound Region (KUOW and KCTS). The material will be made available to students, teachers, media, and the general public through the King County Library System and Seattle Public Library System. The mission of the Convergence Consortium guides the MARS project. The Convergence Consortium is a working collaborative between public broadcasters, public libraries, K-12 schools and the Information School of the University of Washington that meet to assess the needs of their constituents and propose solutions that will be developed to meet those need.
       The MARS project is funded by a grant from the Corporation for Public Broadcasting Television Future Fund and it falls within the context of major decisions about Media Asset Management in public broadcasting. It is the intent that the MARS project will produce a reference document that will help public broadcast stations make decisions about media asset management as it relates to audio and audiovisual access as a community resource.
       The MARS team will analyze the current systems and their contexts and design a digital library system for the KUOW and KCTS broadcasters that will facilitate access to the broadcast material. The digital library will support advanced knowledge organization techniques and search algorithms to facilitate retrieval of the broadcast material. The MARS project ties together some of the critical issues in Digital Libraries and approaches these problems from a user-centered perspective.
    Enki: open infrastructure for adaptive digital libraries BIBFull-Text 391
      James Farrell; Hilary J. Holz
    Argentinean historical heritage project BIBAFull-Text 392
      Maria Feldgen; Osvaldo Clua; Fernando Boro; Juan Jose Santos
    The digitalization effort of the Argentinean Heritage Project is described from its beginning, up to its present day form, as a framework of automatized, Web [1] operated and platform independent tools to assist historians to build and maintain digital libraries suited to their research needs. We show how low cost, labor intensive digital library building is possible using standard formats and tools.
    Content-based filtering and personalization using structured metadata BIBAFull-Text 393
      A. Mufit Ferman; James H. Errico; Peter van Beek; M. Ibrahim Sezan
    Structured descriptions of multimedia content and automatically generated user profiles are used to filter content.
    Developing a digital library of reusable historical artifacts BIBAFull-Text 394
      Dion Hoe-Lian Goh; Schubert Shou-Boon Foo
    This paper discusses the design and implementation of a digital library of historical artifacts. A major goal of the project is to create an architecture in which artifacts are reusable across various digital library applications. Two such applications have been developed and are described: a virtual exhibition system and a reference helpdesk.
    Search facilities for internet relay chat BIBAFull-Text 395
      Taher H. Haveliwala
    The Internet encompasses a diverse array of information sources that have been indexed for efficient search, including the Web, Usenet, and email (both personal mail and specialized mailing lists). One information source, publicly accessible over the Internet, yet unarchived and unindexed, is the Internet Relay Chat (IRC) system. We are archiving several of the more useful technical-support-oriented IRC channels, with the goal of extracting, archiving, and indexing information that would help satisfy users' information needs.
    User uncertainties with tabular statistical data: identification and resolution BIBAFull-Text 396
      Naybell Hernandez; Carol A. Hert; Kristen Armstrong
    United States government services are becoming increasingly Web-based, creating opportunities to make useful, even vital, information and services more accessible to citizens than in the past. This opportunity has challenged Federal agencies as they work to provide information and services that are easy to use and understandable to an extremely diverse constituency. This paper reports the findings of a study examining the questions and uncertainties of users during investigation of statistical tables. The questions and uncertainties are categorized, mapped to an XML DTD for use in a table-browsing system. Implications of the approach and results are discussed.
    Using the internet to communicate with immigrant/refugee communities about health BIBAFull-Text 397
      Ellen Howard; Christine Wilson Owens
    Our poster describes the use of the Web for communication between ethnic communities and their care providers.
    Unicode for multilingual representation in digital libraries from the indian perspective BIBAFull-Text 398
      Devika P. Madalli
    One of the main issues in digital data sharing is tackling multi-lingual resources. Paper presents the problems of representation of Indian languages on Internet. It covers the efforts so far undertaken for multilingual data representation in India. It examines the applicability and advantages of adopting Unicode.
    Marine realms information bank: a distributed geolibrary for the ocean BIBAFull-Text 399
      Fausto Marincioni; Frances Lightsom
    The Marine Realms Information Bank (MRIB) is a prototype web-based distributed geolibrary that organizes, indexes, and delivers online information about the oceanic and coastal environments. The improvement of computer power and connectivity of the 1990s, by enabling very fast exchange of data online, has shown that effective information management does not automatically result from quicker connection or large broadband. Millions of web sites have been setup to provide information on every subject, and various information-gathering systems have been developed to locate information online. Unfortunately, these search engines often produce exhaustive bibliographic lists that mix first-quality scientific knowledge with irrelevant materials. To be really useful, information banks require not only quality control but also classification systems that integrate and organize the information. In 1999 the National Research Council proposed the concept of distributed geolibraries, which are online digital libraries able to provide a simple mechanism for searching and retrieving information in response to topical and geographically defined needs. Distributed geolibraries are beneficial for various reasons, the most important of which is the authoritative role they would come to assume as subject gateways. To be referenced through a scientific geolibrary, information sources must meet quality standards set by the library gatekeeper. Another important benefit of a distributed geolibrary comes from its "distributed" attribute. Without the need to collect information in one physical location, local curators can serve and update online information without the requirement of maintaining consistency among multiple copies. The MRIB prototype implements the distributed geolibrary concept to organize, index, and deliver online information about the oceanic and coastal environments. MRIB provides access to information, but it is not an information repository. It incorporates information that exists in remote sources, without modifying formats or content. This system succeeds by building a central index that consists of Electronic Index Cards containing metadata about the information sources, their geographical areas, and their network locations. The ontology of MRIB is expressed in the classification system through which users can explore the available information. MRIB currently classifies information with 13 types of categories (facets): Location, Geologic Time, Features, Biota, Discipline, Scientific Method, Hot Topics, Project Name, Agency Name, Author, Class, Format, and Audience. Classifying information is not automatic but is performed by a librarian, which is both the major benefit and the major operating cost of MRIB. The significance of MRIB lies both in the utility of the information bank and in the implementation of the distributed geolibraries concept. Distributed information banks, such as MRIB, can be applied widely as unifying portals for extensive or rapidly developing information bases, for which a centralized repository would be impractical. In addition, MRIB has a modular structure that allows a classification system to be easily modified, to expedite the development and testing of suitable classification systems for existing information bases.
    Democratic access to information in a rapidly changing society: the case of Brazil BIBAFull-Text 400
      Cavan McCarthy; Murilo Bastos da Cunha
    Identifies and characterizes the principal Brazilian digital library initiatives, which make available materials in two areas: Science and Research, and Literature and the Humanities.
    Active netlib: an active mathematical software collection for inquiry-based computational science & engineering education BIBAFull-Text 401
      Shirley Moore; A. J. Baker; Jack Dongarra
    A core subject in the undergraduate education of application scientists and engineers is the use of mathematical software to solve computational problems. To make effective use of mathematical software, application developers need a basic understanding of the underlying numerical methods and enough knowledge to be able to choose an appropriate solver, parameterize it correctly, and validate the computed results. Correct results are of course required, but good computational performance is desired as well.
    Representing pulaar digitally BIBAKFull-Text 402
      Bartek Plichta; David Robinson
    This paper outlines a methodology for digital representation and preservation of Pulaar language data.
    Keywords: West African online digital library, language digitization, linguistics, long-term language preservation, mark-up, pulaar
    Components for constructing open archives BIBAFull-Text 403
      Joel Plutchak; Joe Futrelle; Jeff Gaynor
    In this poster, we describe how components that implement emerging standards have been used to produce custom solutions to metadata archive problems.
    Question types in digital reference: an evaluation of question taxonomies BIBAFull-Text 404
      Jeffrey Pomerantz
    This study evaluates four taxonomies of question types to determine the expressiveness of each for questions received by digital reference services. The result is a faceted classification scheme that can be used as a basis for automating parts of the reference question answering process.
    Integrating expertise into the NSDL: putting a human face on the digital library BIBAFull-Text 405
      Jeffrey Pomerantz; R. David Lankes
    This paper describes work currently underway at the Information Institute of Syracuse to build an operational digital reference system to support the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL).
    Automatic removal of advertising from web-page display BIBAFull-Text 406
      Neil C. Rowe; Jim Coffman; Yilmaz Degirmenci; Scott Hall; Shong Lee; Clifton Williams
    The usefulness of the World Wide Web as a digital library of precise and reliable information is reduced by the increasing presence of advertising on Web pages. But no one is required to read or see advertising, and this cognitive censorship can be automated by software. Such filters can be useful to the U.S. government which must permit its employees to use the Web but which is prohibited by law from endorsing commercial products. While the task would seem at first simpler than filtering of pornography or general firewalls, subtleties in recognizing advertising make full success daunting.
    Structured models of scientific concepts for organizing, accessing, and using learning materials BIBAFull-Text 407
      Terence R. Smith; Marcia L. Zeng; Olga Agapova; Olha Buchel; Michael Freeston; Jim Frew; Linda Hill; Laura Smart; Tim Tierney; Alex Ushakov
    Concepts and their interrelationships are the fundamental building blocks for representing the phenomena investigated in mathematics, science, and engineering (MSE). The knowledge represented in learning materials for the sciences is typically organized around term-based or "weakly-structured" models of concepts and their interrelationships. We introduce a "strongly-structured" model of scientific concepts that provides the foundation for a knowledge base (KB) of concept representations. It focuses on such attributes as the objective representations, operational semantics, use, and interrelationships of concepts, all of which play important roles in constructing representations of phenomena that further understanding of MSE domains of knowledge.
       We have developed a strongly-structured model of concepts for SME domains in terms of a frame-based KRS with slots and attribute-value fillers. The model, whose framework is shown in Figure 1, is implemented as an XML schema. This schema is used as the basis for creating domain-specific KBs containing XML records of concepts.
       The Alexandria Digital Library (ADL) Digital Earth Testbed system (http://www.alexandria.ucsb.edu) has been extended with: (1) a KB of scientific concepts, from the domain of physical geography, that are represented in terms of our XML schema for concept representation; (2) a collection of heterogeneous learning materials exemplifying the concepts and their properties in various contexts; and (3) services that provide a variety of views of the content of the KB and associated collection. (Please refer to the JCDL paper "The ADEPT Digital Library Architecture" by Janee and Frew.) This extension to ADL is being deployed in teaching an introductory course in physical geography in Fall, 2002.
    StandardConnection: correlating educational resources in digital libraries to content standards BIBAFull-Text 408
      Stuart A. Sutton; Elizabeth D. Liddy; John Kendall
    The goal of our two year NSF National Science Digital Library-funded project is to develop Natural Language Processing technology that will automatically produce metadata values that correlate individual educational resources in digital libraries to content standards. The goal is to assign this metadata to the descriptive metadata records for resources in support of standards-based discovery and retrieval. The project will utilize the Achieve/McREL Compendix, a comprehensive knowledgebase of K-12 content standards derived from over 137 state, national and international content standards documents. The test collection of educational resources being analyzed is drawn from the more than 400 Web-based collections represented in the Gateway to Educational Materials catalog.
       The significance of this project in terms of the Digital Library movement is that high-quality automatic correlation of educational resources to content standards is essential to meet the demands for searching and retrieving such resources based on those correlations. This demand will increase as the national focus on greater accountability in our K-12 institutions increases. While human correlations of resources to content standards characterize current practice, it is clear that the scale of the need for such correlations calls for sophisticated means for automatic mapping. This project is intended to provide an NLP-based solution to the problem.
       Briefly, our NLP approach in this project is to analyze language utilizing all the levels through which humans extract meaning-morphological, lexical, syntactic, semantic, discourse, and pragmatic. The extent to which an individual technology includes these levels, particularly the higher-level ones determines the capability and sophistication of the resultant application. Having incorporated each of these levels into our baseline NLP document-processing module, we are extending the system's capabilities in this project to the task of learning the linguistic features that can be relied on to indicate what content standard an educational resource supports.
       We are applying a sublanguage analysis framework to automatically identify clues that can be recognized in the mathematics and science educational materials to indicate to which standards the resources apply. Based on the discourse model, the system learns from recognizing these linguistic clues in the training set. The system will then be able to process new resources as they are added to the digital library and appropriately assign to the metadata for those resources the learning standards to which they are applicable.
       This work is a continuation of our NSF NSDL project "Breaking the Metadata Generation Bottleneck" where we were successful in processing text to automatically assign metadata tags for the descriptive and subject aspects of educational resources.
    Quantitatively evaluating the influence of online social interactions in the community-assisted digital library BIBAFull-Text 409
      YongHong Tian; TieJun Huang; Wen Gao
    Online social interactions are useful in information seeking from digital libraries, but how to measure their influence on the user's information access actions has not yet been revealed. Studies on this problem give us interesting insights into the workings of human dynamics in the context of information access from digital libraries. On the basis, we wish to improve the technological supports to provide more intelligent services in the ongoing China-America Million Books Digital Library so that it can reach its potential in serving human needs.
    Shuhai Wenyuan interactive internet worktable: studying ancient chinese philosophy online BIBAFull-Text 410
      Mary Tiles; Brian Bruya
    There are four major digital library projects in East Asia that publish digital versions of parts of the vast pre-modern Chinese corpus on the World Wide Web. All of these are targeted at professional sinologists, with no accommodation for the user who is not expertly proficient in Chinese. As a result, anyone interested in seriously engaging Chinese thought must either set aside a few years to learn Classical Chinese or remain beholden to the sinologist for both information and interpretation. At Shuhai Wenyuan, a project funded by the National Science Foundation's Digital Libraries Initiative (Phase II), we strive to capitalize on the advantages of digitization to allow the non-sinologist entry into the conceptual world of ancient Chinese thought.
    Breathing life into digital archives: use of natural language processing to revitalize the grey literature of public health BIBAFull-Text 411
      Anne M. Turner; Elizabeth D. Liddy; Jana Bradley
    The goal of our 2-year Robert Wood Johnson-funded project is to apply Natural Language Processing (NLP) technology to improve access and use of the digitalized public health "grey" literature. Much public health information, such as meeting notes, think-tank reports, policy statements, and data sets, is not available through traditional commercial pathways and is considered grey or fugitive literature. Although grey literature documents are increasingly posted in digital archives on the Web, the unstructured and varied nature of grey literature makes accessing useful content difficult at best.
       In an effort to help make the content of public health digital collections more accessible to public health providers, we will use proven NLP techniques to identify and extract key elements of digital documents. NLP techniques can be used to identify and tag key elements from full-text documents. Once tagged, the content of various documents can be extracted and summarized in tables and charts for comparison and review. The ability of NLP to recognize and represent both the explicit and implicit content of full text documents makes it a powerful technique for interpreting the language of text documents. Our NLP information access system has been used in other domains to extract individual entities and events, as well as draw relationships between entities and events to build a content representation.
       The goal is to develop a model of public health interventions and identify key entities and events from these digital archives. Key elements may include type of study and population demographics as well as more traditional bibliographic elements such as author, title and publication date.
       NLP technology will be used to search, identify and extract key elements based on the user's request. Key elements can be extracted across multiple documents for summary and comparison. For example, the user can extract key elements from annual reports about "teenage smoking cessation programs" to compare method of intervention, demographic population, and outcome. Such comparisons will help public health professionals to determine how a particular intervention fits with similar interventions reported in the grey literature. This system holds great promise for improving access to public health information through digital archives.
    Building a digital collection of manuscripts from the library of the royal palace of Spain BIBAFull-Text 412
      Soledad Velez; Manuel Sanchez-Quero; Juan Carlos Garcia; Alejandro Bia
    With an aim of bringing cultural contents to cyberspace and spreading some unknown aspects of the history of the Americas, the Miguel de Cervantes Digital Library has embarked in a joint effort with the Library of the Royal Palace of Spain, to develop the digital web publication of the Manuscripts of the Americas in the Royal Collections funds. In this joint venture, the Library of Royal Palace supplied its invaluable contents for digitization, and the Miguel de Cervantes DL its technology and experience as a digital publisher. The goal was to join the ancient and the new, the most precious and carefully preserved documents with the new electronic publishing technologies. The result was to make freely available to a worldwide public those otherwise unreachable treasures of the Royal Collections.
    By the people now, for the people later: using transitory metadata to anchor a digital archive BIBAFull-Text 413
      Anne Washington
    The Congressional Research Service (CRS) serves Congress by providing timely, objective and non-partisan research, analysis and information services. The Legislative Information Office within CRS fulfills that mandate by maintaining a digital library of legislative documents known as the Legislative Information System.
       An ongoing challenge is designing these full text and structured databases for both promptness and permanence. This is accomplished by metadata and interface design. This foundation prepares for the impending incorporation of more complex born-digital formats such as XML, audio and video.
       Legislative Information System (LIS) clients are those who have access to Capitol Hill intranet systems. (The public version is the THOMAS site http://thomas.loc.gov.) LIS adds value to public domain legislative data with advanced text retrieval tools and the benefit of a portal site.
       The content of LIS is composed of several collections of documents. The House of Representatives and the Senate create and update legislative information, which is the bulk of LIS. The Government Printing Office publishes text and PDF formats of congressional documents, building up LIS's catalog of full-text documents. The CRS Bill Digest section, established by law in 1935 to index and summarize bills, adds additional metadata. In partnership with the user community, a high degree of data quality control is actively maintained.
       The primary entry point is a database called Bill Summary and Status. From there, links are available into the full text collections: text of bills, committee reports and the Congressional Record. These are large-scale collections. LIS maintains every published bill version since 1989. This session, Congress already has introduced more than 6000 bills and important bills often have up to five versions. The search pages have been carefully crafted to accommodate both novice and advanced users. A number of prepared searches are built underneath the search page interface. These gather bills on nebulous topics such as "national security.
       LIS's primary role is as a prompt deliverer of legislative events and documents. For instance, a senator needs a current list of all co-sponsors on her bill. A staffer needs to read yesterday's floor debate. Legislative events in LIS are updated regularly to accommodate this fast-paced need for data. Searchers are able to specify narrowly defined legislative status steps in order to track current legislation. The LIS alert service sends out email notification when selected legislation is updated.
       LIS data is also used retrospectively to piece together a legislative history. A legislative history attempts to establish the intent of a current law by compiling documents created during the legislative process. A legislative history could include conference reports, congressional hearings, debates and early drafts of bills. All of these documents are available in electronic format in LIS. Each legislative step has embedded links to LIS full-text documents, creating an easy-to-maintain web of legislative history. In addition, a text analysis tool has been developed which allows paragraph level comparison of legislation. This tool allows us to closely track the evolution of legislative language.
       Future developments of LIS involve displaying full-text documents in XML, linking to video of congressional proceedings, and completing a retrospective conversion of past congressional content.
       The prompt recording of current legislative events provides the key to long-term access into our growing digital archive of legislative documents. Our primary designated community is the congressional staff, but the results benefit individual citizens, the courts and businesses who are trying to interpret the rules that govern how we live.


    Introduction to the open archives initiative protocol for metadata harvesting BIBAFull-Text 414
      Hussein Suleman
    The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [1] is a relatively new interoperability standard that is gaining much attention from existing and new digital libraries. It is currently advocated by many communities (including NDLTD, NCSTRL, and NSDL) to fulfill their metadata interoperability requirements.
       This tutorial is aimed at introducing individuals to the concepts underlying the OAI and the harvesting protocol, as well as providing sufficient information to allow attendees to almost immediately implement the current standard (OAI-PMH v2.0) on their own archives or in their own communities. Attendees will be introduced to both organizational and technical issues that need to be addressed when building new systems or extending existing systems, either in the capacity of being providers of data, users of data, or both. Wherever appropriate, references will be made to best practices that have emerged in the community of OAI implementers since the initial release of the protocol. Attendees will also be familiarized with tools developed within the OAI community to support the implementation of the protocol.
    Thesauri and ontologies in digital libraries: 1. structure and use in knowledge-based assistance to users BIBAFull-Text 415
      Dagobert Soergel
    This introductory tutorial is intended for anyone concerned with subject access to digital libraries. It provides a bridge by presenting methods of subject access as treated in an information studies program for those coming to digital libraries from other fields. It will elucidate through examples the conceptual and vocabulary problems users face when searching digital libraries. It will then show how a well-structured thesaurus / ontology can be used as the knowledge base for an interface that can assist users with search topic clarification (for example through browsing well-structured hierarchies and guided facet analysis) and with finding good search terms (through query term mapping and query term expansion -- synonyms and hierarchic inclusion). It will touch on cross-database and cross-language searching as natural extensions of these functions. The workshop will cover the thesaurus structure needed to support these functions: Concept-term relationships for vocabulary control and synonym expansion, conceptual structure (semantic analysis, facets, and hierarchy) for topic clarification and hierarchic query term expansion). It will introduce a few sample thesauri and some thesaurus-supported digital libraries and Web sites to illustrate these principles.
    How to build a digital library using open-source software BIBKFull-Text 416
      Ian H. Witten
    Keywords: Greenstone software, digital library, open source software
    Overview of digital libraries BIBAFull-Text 417
      Edward A. Fox
    This tutorial will start with an overview of definitions, foundations, scenarios, and perspectives. It will cover a variety of issues, including search, retrieval, and resource discovery; multimedia/hypermedia; metadata (e.g., Dublin Core); electronic publishing; document models and representations; SGML and XML; database approaches; agents and distributed processing; 2D and 3D interfaces and visualizations; metrics; architectures and interoperability; educational and social concerns; commerce and intellectual property rights, among others.
    Advanced overview of version 2.0 of the open archives initiative protocol for metadata harvesting BIBAFull-Text 418
      Michael L. Nelson; Herbert Van de Sompel; Simeon Warner
    This tutorial is a follow-on to "Introduction to the Open Archives Initiative Protocol for Metadata Harvesting" (OAI-PMH), given earlier the same day. It is appropriate for those who have completed the earlier tutorial or are already familiar with OAI-PMH. The tutorial will begin by highlighting the differences between versions 1.1 and 2.0 of the OAI-PMH, and then discuss possible migration strategies for 1.1 harvesters and repositories. Advanced topics and deployment scenarios will also be discussed, including: flow control, load balancing, error recovery, hierarchical harvesting, sets and alternate metadata formats.
    Thesauri and ontologies in digital libraries: 2. design, evaluation, and development BIBAFull-Text 419
      Dagobert Soergel
    This tutorial is intended for people who have a basic familiarity with the function and structure of thesauri and ontologies. It will introduce criteria for the design and evaluation of thesauri and ontologies and then deal with methods and tools for their development: Locating sources; collecting concepts, terms. and relationships to reuse existing knowledge; developing and refining thesaurus/ontology structure; software and database structure for the development and maintenance of thesauri and ontologies; collaborative development of thesauri and ontologies; developing crosswalks / mappings between thesauri/ontologies. In summing up, the tutorial will address the question of the amount of resources needed to develop and maintain a thesaurus or ontology.
    Hands-on workshop: build your own digital library collections BIBFull-Text 420
      Ian H. Witten; David Bainbridge
    Bioinformatics and digital libraries BIBAFull-Text 421
      William Hersh; Christopher Dubay
    The goal of this tutorial is to provide a basic introduction to bioinformatics and electronic biological data resources for the digital library community.


    Document search interface design for large-scale collections and intelligent access BIBAKFull-Text 422
      Javed Mostafa
    As the universe of documents has enlarged from those available via the online catalog to a larger cluster of databases and web-accessible resources, interfaces are being created that can search multiple document collections simultaneously. Also, searching for document surrogates is losing favor as more documents are digitized and distributed in full-text form. Availability of full-text makes it possible for document components such as tables, illustrations, citations, and references -- components that traditionally remained outside the scope of document searching -- to be considered and exploited by search interface designers. Additionally, due to the popularity of web-hyperlinking people are beginning to expect linking of documents across different collections based on common semantic or non-semantic attributes. Increased research activity on artificial intelligence techniques for document access is leading to more fundamental changes in document searching. It is now possible to delegate 100% of the search effort to online search agents. Agents have been also created for performing tasks such as selecting appropriate collections, refining queries, and sorting results to assist with searches conducted in distributed environments. The broad scope of the workshop is on the impact of above technological changes on search interface design.
    Keywords: agents, distributed collections, intelligent access, interface, retrieval, search
    Developing digital libraries education and training programs BIBAFull-Text 423
      Javed Mostafa; Kris Brancolini
    Gaining education and training in the field of Digital Libraries is a difficult prospect. Relevant courses and experiences are usually scattered among different programs and institutions. Often, course content does not include the necessary mix of the theoretical and practical treatment. The workshop is aimed at developers, researchers, educators, and administrators interested in educational programs for training next generation of digital library professionals -- both information technologists and librarians.
    Usability for digital libraries BIBAFull-Text 424
      Ann Blandford; George Buchanan
    As digital libraries are becoming increasingly available to, and used by, diverse user communities who do not have background or training in information sciences, the need to ensure that such libraries are usable and useful is becoming increasingly urgent. Usability issues can be tackled from various directions -- technical, cognitive, social, design-oriented -- and it is important to bring these different perspectives together, to share views, experiences and insights.
    Visual interfaces to digital libraries BIBAFull-Text 425
      Katy Borner; Chaomei Chen
    Today's digital libraries (DLs) are content rich, multimedia, multilingual collections that are distributed and accessed worldwide. Designing useful interfaces to access, understand, and manage this knowledge has become an active and challenging field of study. Visual interfaces to DLs aim to shift the user's mental load from slow reading to faster perceptual processes such as visual pattern recognition. They draw on progress in the new field of Information Visualization.
       The workshop in 2002 continues the theme started at JCDL 2001. In addition, the growth of the field warrants new perspectives on some of the issues we have addressed last year.
    Workshop on the creation of standardized test collections, tasks and metrics for music information retrieval (MIR) and music digital library (MDL) evaluation BIBAFull-Text 426
      J. Stephen Downie
    This workshop is designed to engage the participation of all those interested in MIR and MDL research and evaluation. Interested parties have been encouraged to submit formal "White Papers" outlining their individual perspectives on what needs to be done to create meaningful MIR and MDL test collections, retrieval tasks and evaluation metrics. Interested parties include musicologists, music theorists, audio-retrieval experts, symbolic-retrieval experts, librarians, lawyers, and business representatives. The compilation of these perspectives and the discussion that follows at the workshop are intended to form the bedrock upon which a solid foundation of future research can be built.
    Digital gazetteers: integration into distributed digital library services BIBAFull-Text 427
      Linda L. Hill; Gail Hodge; David Smith
    This NKOS workshop for JCDL (the 5th in the series: http://nkos.slis.kent.edu/) will focus on work-in-progress on gazetteer services and gazetteer-related projects in connection with distributed digital library services. It builds on the Digital Gazetteer Information Exchange (DGIE) workshop funded by the NSF in October 1999.Digital gazetteers are specialized KOS that map placenames and types of places to map-based locations and thus integrate word-based georeferencing to map-based georeferencing. The format consists of invited and selected presentations and discussion sessions, with the goal of developing collaborations for future research and development.
       Participants may provide handouts describing their own gazetteer and other NKOS related projects and will be given the opportunity to introduce their work and their interests briefly.
    Text retrieval conference (TREC) genomics pre-track workshop BIBAFull-Text 428
      William Hersh
    The goal of this workshop is to allow individuals interested in the Text Retrieval Conference (TREC, trec.nist.gov) Genomics Pre-Track to come together to discuss common goals and interests for the pre-track. The workshop will be designed to generate a plan for developing a common set of tasks, databases, and evaluation measures for the pre-track. The morning will be devoted to presentations by attendees, with the topics to be covered determined by selection by the program committee. The afternoon will be geared towards developing a plan for the pre-track, with the structure based on the number of attendees (i.e., if attendance is large, we will break into small groups).