HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

JCDL'05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries

Fullname:ACM/IEEE Joint Conference on Digital Libraries
Note:Cyberinfrastructure for Research and Education
Editors:Tamara Sumner; Frank Shipman
Location:Denver, CO, USA
Dates:2005-Jun-07 to 2005-Jun-11
Standard No:ISBN 1-58113-876-8; ACM Order Number 606052; ACM DL: Table of Contents hcibib: DL05
  1. Digital libraries and cyberinfrastructure: use of digital libraries in education
  2. Tools & techniques: frameworks for building libraries
  3. Digital libraries and cyberinfrastructure: use of digital libraries in education
  4. Users and interaction: interacting in media
  5. Tools & techniques: searching and IR
  6. Digital libraries and cyberinfrastructure: use of digital libraries in the humanities
  7. Tools & techniques: recommending and alerting
  8. Tools & techniques: supporting classification
  9. Digital libraries and cyberinfrastructure: cyberinfrastructure panel
  10. Users and interaction: understanding user needs and perceptions
  11. Tools & techniques: automatically managing media
  12. Digital libraries and cyberinfrastructure: creating information representations for education
  13. Users and interaction: understanding user needs and perceptions
  14. Tools & techniques: browsing and visualizing collections
  15. Digital libraries and cyberinfrastructure: creating information representations for the humanities (part 1)
  16. Users and interaction: memex and hypertext
  17. Tools & techniques: applying machine learning to collection development
  18. Digital libraries and cyberinfrastructure: creating information representations for the humanities (part 2)
  19. Tools & techniques: identifying names of people and places
  20. Posters
  21. Demonstrations
  22. Tutorials
  23. Workshops

Digital libraries and cyberinfrastructure: use of digital libraries in education

You can lead a horse to water: teacher development and use of digital library resources BIBAFull-Text 1-8
  Mimi Recker; Jim Dorward; Deonne Dawson; Sam Halioris; Ye Liu; Xin Mao; Bart Palmer; Jaeyang Park
This article presents findings from approximately 150 users who created instructional projects using educational digital library resources. One hundred of these users were teachers participating in professional development workshops on the topic of digital libraries. Our iterative approach to tool and workshop development and implementation was based on a framework that characterizes several input, output, and process variables affecting dissemination of such technologies in educational contexts. Data sources involved a mix of qualitative and quantitative methods, including electronic surveys, interviews, participant observations, and server log file and artifact analyses. These multiple and complementary levels of analyses reveal that despite teachers reporting great value in learning resources and educational digital libraries, significant and lasting impact on teaching practice remains difficult to obtain.
Comprehensive personalized information access in an educational digital library BIBAFull-Text 9-18
  Peter Brusilovsky; Rosta Farzan; Jae-wook Ahn
This paper explores two ways to help students locate most relevant resources in educational digital libraries. One method gives a more comprehensive access to educational resources, through multiple pathways of information access, including browsing and information visualization. The second method is to access personalized information through social navigation support. This paper presents the details of the Knowledge Sea II system for comprehensive personalized access to educational resources and also presents the results of a classroom study. The study delivered a convincing argument for the importance of providing multiple information presentations modes, showing that only about 10% of all resource accesses were made through the traditional search interface. We have also collected some solid evidence in favor of the social navigation support.
Facilitating middle school students' sense making process in digital libraries BIBAFull-Text 19-20
  Meilan Zhang; Chris Quintana
Previous research on using digital libraries in science classrooms indicated that middle school students tend to passively find answers rather than actively make sense of information they find in digital libraries. In response to this challenge, we designed a scaffolded software tool, the Digital IdeaKeeper, to support middle school students in making sense of digital library resources during online inquiry. This study describes preliminary results from a study to see how middle school students use different IdeaKeeper features. Initial data analysis indicates that IdeaKeeper can facilitate online learners to engage in sense-making process in online inquiry.
Evaluating G-portal for geography learning and teaching BIBAFull-Text 21-22
  Chew-Hung Chang; John G. Hedberg; Yin-Leng Theng; Ee-Peng Lim; Tiong-Sa Teh; Dion Hoe-Lian Goh
This paper describes G-Portal, a geospatial digital library of geographical assets, providing an interactive platform to engage students in active manipulation and analysis of information resources and collaborative learning activities. Using a G-Portal application in which students conducted a field study of an environmental problem of beach erosion and sea level rise, we describe a pilot study to evaluate usefulness and usability issues to support the learning of geographical concepts, and in turn teaching.

Tools & techniques: frameworks for building libraries

A new framework for building digital library collections BIBAFull-Text 23-31
  George Buchanan; David Bainbridge; Katherine J. Don; Ian H. Witten
This paper introduces a new framework for building digital library collections and contrasts it with existing systems. It describes a significant new step in the development of a widely-used open-source digital library system, Greenstone, which has evolved over many years. It is supported by a fresh implementation, which forced us to rethink the entire design rather than making incremental improvements. The redesign capitalizes on the best ideas from the existing system, which have been refined and developed to open new avenues through which digital librarians can tailor their collections. We demonstrate its flexibility by showing how digital library collections can be extended and altered to satisfy new requirements.
Using collection descriptions to enhance an aggregation of harvested item-level metadata BIBAFull-Text 32-41
  Muriel Foulonneau; Timothy W. Cole; Thomas G. Habing; Sarah L. Shreeves
As an increasing number of digital library projects embrace the harvesting of item-level descriptive metadata, issues of description granularity and concerns about potential loss of context when harvesting item-level metadata take on greater significance. Collection-level description can provide valuable context for item-level metadata records harvested from disparate and heterogeneous providers. This paper describes an ongoing experiment using collection-level description in concert with item-level metadata to improve quality of search and discovery across an aggregation of metadata describing resources held by a consortium of large academic research libraries. We present details of approaches implemented so far and preliminary analyses of the potential utility of these approaches. The paper concludes with a brief discussion of related issues and future work plans.
A web service framework for embedding discovery services in distributed library interfaces BIBAFull-Text 42-43
  John Weatherley
Significant barriers deter web page designers and developers from incorporating dynamic content from web services into their page designs. Web services typically require designers to learn service protocols and have access to and knowledge of dynamic application servers or CGI in order to incorporate dynamic content into their pages. This paper describes a framework for embedding discovery services in distributed interfaces that seeks to simplify this process and eliminate these barriers, making the use of the dynamic content available to a wider audience and increasing its potential for adoption and use in educational design.
xTagger: a new approach to authoring document-centric XML BIBAFull-Text 44-45
  Ionut E. Iacob; Alex Dekhtyar
The process of authoring document-centric XML documents in humanities disciplines is very different from the approach espoused by the standard XML editing software with the data-centric view of XML. Where data-centric XML is generated by first describing a tree structure of the encoding and then providing the content for the leaf elements, document-centric encodings start with content which is then marked up. In the paper we describe our approach to authoring document-centric XML documents and the tool, xTagger, originally developed for this purpose within the Electronic Boethius project [2], otherwise enhanced within the ARCHway project [5], an interdisciplinary project devoted to development of methods and software for preparation of image-based electronic editions of historic manuscripts.

Digital libraries and cyberinfrastructure: use of digital libraries in education

Enhancing access to research data: the challenge of crystallography BIBAFull-Text 46-55
  Monica Duke; Michael Day; Rachel Heery; Leslie A. Carr; Simon J. Coles
This paper describes an ongoing collaborative effort across digital library and scientific communities in the UK to improve access to research data. A prototype demonstrator service supporting the discovery and retrieval of detailed results of crystallography experiments has been deployed within an Open Archives digital library service model. Early challenges include the understanding of requirements in this specialized area of chemistry and reaching consensus on the design of a metadata model and schema. Future plans encompass the exploration of commonality and overlap with other schemas and across disciplines, working with publishers to develop mutually beneficial service models, and investigation of the pedagogical benefits. The potential improved access to experimental data to enrich scholarly communication from the perspective of both research and learning provides the driving force to continue exploring these issues.
Information synthesis: a new approach to explore secondary information in scientific literature BIBAFull-Text 56-64
  Catherine Blake
Advances in both technology and publishing practices continue to increase the quantity of scientific literature that is available electronically. In this paper, we introduce the Information Synthesis process, a new approach that enables scientists to visualize, explore, and resolve contradictory findings that are inevitable when multiple empirical studies explore the same natural phenomena. Central to the Information Synthesis approach is a cyber-infrastructure that provides a scientist with both primary and secondary information from an article and structured information resources. To demonstrate this approach, we have developed the Multi-User, Information Extraction for Information Synthesis (METIS) System. METIS is an interactive system that automates critical tasks within the Information Synthesis process. We provide two case-studies that demonstrate the utility of the Information Synthesis approach.
Comparative interoperability project: configurations of community, technology, organization BIBAFull-Text 65-66
  David Ribes; Karen S. Baker; Florence Millerand; Geoffrey C. Bowker
In this paper we describe the methods, goals and early findings of the research endeavor 'Comparative Interoperability Project' (CIP). The CIP is an extended interdisciplinary collaboration of information and social scientists with the shared goal of understanding the diverse range of interoperability strategies within information infrastructure building activities. We take interoperability strategies to be the simultaneous mobilization of community, organizational and technical resources to enable data integration. The CIP draws together work with three ongoing collaborative scientific projects (GEON, LTER, Ocean Informatics) that are building information infrastructures for the natural sciences.
Visualizing aggregated biological pathway relations BIBAFull-Text 67-68
  Byron Marshall; Karin Quinones; Hua Su; Shauna Eggers; Hsinchun Chen
The Genescene development team has constructed an aggregation interface for automatically-extracted biomedical pathway relations that is intended to help researchers identify and process relevant information from the vast digital library of abstracts found in the National Library of Medicine's PubMed collection. Users view extracted relations at various levels of relational granularity in an interactive and visual node-link interface. Anecdotal feedback reported here suggests that this multi-granular visual paradigm aligns well with various research tasks, helping users find relevant articles and discover new information.

Users and interaction: interacting in media

Addressing the challenge of visual information access from digital image and video libraries BIBAFull-Text 69-78
  Michael G. Christel; Ronald M. Conescu
While it would seem that digital video libraries should benefit from access mechanisms directed to their visual contents, years of TREC Video Retrieval Evaluation (TRECVID) research have shown that text search against transcript narrative text provides almost all the retrieval capability, even with visually oriented generic topics. A within-subjects study involving 24 novice participants on TRECVID 2004 tasks again confirms this result. The study shows that satisfaction is greater and performance is significantly better on specific and generic information retrieval tasks from news broadcasts when transcripts are available for search. Additional runs with 7 expert users reveal different novice and expert interaction patterns with the video library interface, helping explain the novices' lack of success with image search and visual feature browsing for visual information needs. Analysis of TRECVID visual features well suited for particular tasks provides additional insights into the role of automated feature classification for digital image and video libraries.
Assessing tools for use with webcasts BIBAFull-Text 79-88
  Elaine G. Toms; Christine Dufour; Jonathan Lewis; Ron Baecker
This research assessed the effectiveness of selected interface tools in helping people respond to classic information tasks with webcasts. Rather than focus on a classic search/browse task to locate an appropriate webcast to view, our work takes place at the level of an individual webcast to assess interactivity within the contents of a single webcast. The questions guiding our work are: 1) Which tool(s) are the most effective in achieving the best response? 2) How do users use those tools for task completion? In this study, 16 participants responded to a standard set of information tasks using ePresence, a webcasting system that handles both live and stored video, and provides multiple techniques for accessing content. Using questionnaires, screen capture and interviews, we evaluated the interaction, assessed the tools, and based on our results, make suggestions for improving access to the content of stored webcasts.
Exploring user perceptions of digital image similarity BIBAFull-Text 89-90
  Unmil P. Karadkar; Richard Furuta; Jeevan Joseph John; Jin-Cheon Na
The MIDAS project is developing infrastructure and policies for optimal display of digital information on devices with diverse characteristics. In this paper we present the preliminary results of a study that explored the effects of scaling and color-depth variation in digital photographs on user perceptions of similarity. Our results indicate general trends in user preferences and can serve as guidelines for designing policies and systems that display digital images optimally on various information devices.

Tools & techniques: searching and IR

Detecting and supporting known item queries in online public access catalogs BIBAFull-Text 91-99
  Min-Yen Kan; Danny C. C. Poo
When users seek to find specific resources in a digital library, they often use the library catalog to locate them. These catalog queries are defined as known item queries. As known item queries search for specific resources, it is important to manage them differently from other search types, such as area searches. We study how to identify known item queries in the context of a large academic institution's online public access catalog (OPAC), in which queries are issued via a simple keyword interface. We also examine how to recognize when a known item query has retrieved the item in question. Our approach combines techniques in machine learning, language modeling and machine translation evaluation metrics to build a classifier capable of distinguishing known item queries and correctly classifies titles for whether they are the known item sought with an 80% and 95% correlation to human performance, respectively on each task. To our knowledge, this is the first report of such work, which has the potential to streamline the user interface of both OPACs and digital libraries in support of known item searches.
Downloading textual hidden web content through keyword queries BIBAFull-Text 100-109
  Alexandros Ntoulas; Petros Zerfos; Junghoo Cho
An ever-increasing amount of information on the Web today is available only through search interfaces: the users have to type in a set of keywords in a search form in order to access the pages from certain Web sites. These pages are often referred to as the Hidden Web or the Deep Web. Since there are no static links to the Hidden Web pages, search engines cannot discover and index such pages and thus do not return them in the results. However, according to recent studies, the content provided by many Hidden Web sites is often of very high quality and can be extremely valuable to many users.
   In this paper, we study how we can build an effective Hidden Web crawler that can autonomously discover and download pages from the Hidden Web. Since the only "entry point" to a Hidden Web site is a query interface, the main challenge that a Hidden Web crawler has to face is how to automatically generate meaningful queries to issue to the site. Here, we provide a theoretical framework to investigate the query generation problem for the Hidden Web and we propose effective policies for generating queries automatically. Our policies proceed iteratively, issuing a different query in every iteration. We experimentally evaluate the effectiveness of these policies on 4 real Hidden Web sites and our results are very promising. For instance, in one experiment, one of our policies downloaded more than 90% of a Hidden Web site (that contains 14 million documents) after issuing fewer than 100 queries.
SpidersRUs: automated development of vertical search engines in different domains and languages BIBAFull-Text 110-111
  Michael Chau; Jialun Qin; Yilu Zhou; Chunju Tseng; Hsinchun Chen
In this paper we discuss the architecture of a tool designed to help users develop vertical search engines in different domains and different languages. The design of the tool is presented and an evaluation study was conducted, showing that the system is easier to use than other existing tools.
Grid-based digital libraries: cheshire3 and distributed retrieval BIBAFull-Text 112-113
  Ray R. Larson; Robert Sanderson
The University of California, Berkeley and the University of Liverpool are developing a Information Retrieval and Digital Library system (Cheshire3) that operates in both single-processor and "Grid" distributed computing environments. This paper discusses the architecture of the system and how it performs Digital Library tasks in a Grid computing environment.

Digital libraries and cyberinfrastructure: use of digital libraries in the humanities

Integrating digital libraries and electronic publishing in the DART project BIBAFull-Text 114-120
  Gordon Dahlquist; Brian Hoffman; David Millman
The Digital Anthropology Resources for Teaching (DART) project integrates the content acquisition and cataloging initiatives of a federated digital repository with the development of scholarly publications and the creation of digital tools to facilitate classroom teaching. The project's technical architecture and unique publishing model create a teaching context where students move easily between primary and secondary source material and between authored environments and independent research, and raise specific issues with regard to metadata, object referral, rights, and exporting content. The model also addresses the loss of provenance and catalog information for digital objects embedded in "born-digital" publications. The DART project presents a practical methodology to combine repository and publication that is both exportable and discipline-neutral.
Annotating illuminated manuscripts: an effective tool for research and education BIBAFull-Text 121-130
  Maristella Agosti; Nicola Ferro; Nicola Orio
The aim of this paper is to report the research results of an ongoing project that deals with the exploitation of a digital archive of drawings and illustrations of historic documents for research and educational purposes. According to the results on a study of user requirements, we have designed tools to provide researchers with innovative ways for accessing the digital manuscripts, sharing, and transferring knowledge in a collaborative environment. We have found that the results of scientific research on the relationships between images of manuscripts produced over the centuries can be rendered explicit by using annotations. For this purpose, a taxonomy for linking annotation is introduced, together with a conceptual schema which represents annotations and links them to digital objects.

Tools & techniques: recommending and alerting

A generic alerting service for digital libraries BIBAFull-Text 131-140
  George Buchanan; Annika Hinze
Users of modern digital libraries (DLs) can keep themselves up-to-date by searching and browsing their favorite collections, or more conveniently by resorting to an alerting service. The alerting service notifies its clients about new or changed documents. Proprietary and mediating alerting services fail to fluidly integrate information from differing collections. This paper analyses the conceptual requirements of this much-sought after service for digital libraries. We demonstrate that the differing concepts of digital libraries and its underlying technical design has extensive influence (a) the expectations, needs and interests of users regarding an alerting service, and (b) on the technical possibilities of the implementation of the service. Our findings will show that the range of issues surrounding alerting services for digital libraries, their design and use is greater than one may anticipate. We also show that, conversely, the requirements for an alerting service have considerable impact on the concepts of DL design. Our findings should be of interest for librarians as well as system designers. We highlight and discuss the far-reaching implications for the design of, and interaction with, libraries. This paper discusses the lessons learned from building such a distributed alerting service. We present our prototype implementation as a proof-of-concept for an alerting service for open DL software.
Link prediction approach to collaborative filtering BIBAFull-Text 141-142
  Zan Huang; Xin Li; Hsinchun Chen
Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most commonly-used and successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms to explore transitive user-item associations. In this paper, we extend the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms.
Sentiment-based search in digital libraries BIBAFull-Text 143-144
  Jin-Cheon Na; Christopher S. G. Khoo; Syin Chan; Norraihan Bte Hamzah
Several researchers have developed tools for classifying/clustering Web search results into different topic areas (such as sports, movies, travel, etc.), and to help users identify relevant results quickly in the area of interest. This study follows a similar approach, but is in the area of sentiment classification -- automatically classifying on-line review documents according to the overall sentiment expressed in them. This paper presents a prototype system that has been developed to perform sentiment categorization of Web search results. It assists users to quickly focus on recommended (or non-recommended) information by classifying Web search results into four categories: positive, negative, neutral, and non-review documents, by using an automatic classifier based on a supervised machine learning algorithm, Support Vector Machine (SVM).

Tools & techniques: supporting classification

Automatic extraction of titles from general documents using machine learning BIBAFull-Text 145-154
  Yunhua Hu; Hang Li; Yunbo Cao; Dmitriy Meyerzon; Qinghua Zheng
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of specific genres, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods have been proposed mainly for title extraction from research papers. It has not been clear whether it could be possible to conduct automatic title extraction from general documents. As a case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize formatting information such as font size as features in the models. It turns out that the use of formatting information can lead to quite accurate extraction from general documents. Precision and recall for title extraction from Word is 0.810 and 0.837 respectively, and precision and recall for title extraction from PowerPoint is 0.875 and 0.895 respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to another domain, and more surprisingly we can even train models in one language and apply them to another language. Moreover, we can significantly improve search ranking results in document retrieval by using the extracted titles.
HiBO: a system for automatically organizing bookmarks BIBAFull-Text 155-156
  Pavlos Kokosis; Vlassis Krikos; Sofia Stamou; Dimitris Christodoulakis
In this paper, we introduce the HiBO bookmark management system. HiBO aims at extending the populated personal repositories (aka bookmarks) by automatically organizing their contents into topics, through the use of a built-in subject hierarchy. HiBO offers customized personalized services, such as the meaningful grouping and ordering of bookmarks within the hierarchy's topics in terms of the bookmarks' conceptual similarity to each other. HiBO also provides a framework that allows the user to customize and assist the categorization process.
Automated text classification using a multi-agent framework BIBAFull-Text 157-158
  Yueyu Fu; Weimao Ke; Javed Mostafa
Automatic text classification is an important operational problem in digital library practice. Most text classification efforts so far concentrated on developing centralized solutions. However, centralized classification approaches often are limited due to constraints on knowledge and computing resources. In addition, centralized approaches are more vulnerable to attacks or system failures and less robust in dealing with them. We present a de-centralized approach and system implementation (named MACCI) for text classification using a multi-agent framework. Experiments are conducted to compare our multi-agent approach with a centralized approach. The results show multi-agent classification can achieve promising classification results while maintaining its other advantages.

Digital libraries and cyberinfrastructure: cyberinfrastructure panel

Is digital preservation an oxymoron? BIBAFull-Text 159
  Taylor Surface; Priscilla Caplan; Robert Horton; Martin Halbert
Techniques for the long-term preservation of digital materials are increasingly critical as more and more intellectual content is meaningful only in electronic form. Terry Kuny, in his often quoted paper "A Digital Dark Ages?" concluded, "Digital collections facilitate access, but do not facilitate preservation. ... Although tremendous work has been undertaken in defining the problems and challenges, much more remains to be done, and the tough task of actually doing digital preservation (and digital rescue) remains ahead."[1]

Users and interaction: understanding user needs and perceptions

Digital libraries' support for the user's 'information journey' BIBAFull-Text 160-169
  Anne Adams; Ann Blandford
The temporal elements of users' information requirements are a continually confounding aspect of digital library design. No sooner have users' needs been identified and supported than they change. This paper evaluates the changing information requirements of users through their 'information journey' in two different domains (health and academia). In-depth analysis of findings from interviews, focus groups and observations of 150 users have identified three stages to this journey: information initiation, facilitation (or gathering) and interpretation. The study shows that, although digital libraries are supporting aspects of users' information facilitation, there are still requirements for them to better support users' overall information work in context. Users are poorly supported in the initiation phase, as they recognize their information needs, especially with regard to resource awareness; in this context, interactive press-alerts are discussed. Some users (especially clinicians and patients) also require support in the interpretation of information, both satisfying themselves that the information is trustworthy and understanding what it means for a particular individual.
Interviews with NSDL grantees on core values and service perspectives BIBAFull-Text 170-171
  David W. Fulker
We analyze information from interviews with NSDL awardees. One purpose is to inform potential NSDL membership models, and a second is to better understand infrastructure needs, including capacity for integrating services. Our results shed light on social and architectural aspects of the NSDL as a distributed library-building endeavor.
Developing the DigiQUAL protocol for digital library evaluation BIBAFull-Text 172-173
  Martha Kyrillidou; Sarah Giersch
The distributed, project-oriented nature of digital libraries (DLs) has made them difficult to evaluate in aggregate. By modifying the methods and tools used to evaluate physical libraries' content and services, measures can be developed whose results can be used across a variety of DLs. The DigiQUAL protocol being developed by the Association of Research Libraries (ARL) has the potential to provide the National Science Digital Library (NSDL) with a standardized methodology and survey instrument with which to evaluate not only its distributed projects but also to gather data to assess the value and impact of the NSDL.
Language preference in a bi-language digital library BIBAFull-Text 174-175
  Te Taka Keegan; Sally Jo Cunningham
This paper examines user choice of interface language in a bi-language digital library (English and Maori, the language of the indigenous people of New Zealand). The majority of collection documents are in Maori, and the interface is available in both Maori and English. Log analysis shows three categories of preference for interface language: primarily English, primarily Maori, and bilingual (switching back and forth between the two).
A usability evaluation study of a digital library self-archiving service BIBAFull-Text 176-177
  Lena Veiga e Silva; Alberto H. F. Laender; Marcos Andre Goncalves
In this paper 1, we describe an evaluation study of a self-archiving service for the Brazilian Digital Library of Computing (BDBComp). We conducted an extensive usability experiment with several potential users, including graduate students, professors, and archivists/librarians. The results of the study are described and analyzed, following sound statistical principles.

Tools & techniques: automatically managing media

Leveraging context to resolve identity in photo albums BIBAFull-Text 178-187
  Mor Naaman; Ron B. Yeh; Hector Garcia-Molina; Andreas Paepcke
Our system suggests likely identity labels for photographs in a personal photo collection. Instead of using face recognition techniques, the system leverages automatically available context, like the time and location where the photos were taken.
   Based on time and location, the system automatically computes event and location groupings of photos. As the user annotates some of the identities of people in their collection, patterns of re-occurrence and co-occurrence of different people in different locations and events emerge. The system uses these patterns to generate label suggestions for identities that were not yet annotated. These suggestions can greatly accelerate the process of manual annotation and improve the quality of retrieval from the collection.
   We obtained ground-truth identity annotation for four different photo albums, and used them to test our system. The system proved effective, making very accurate label suggestions, even when the number of suggestions for each photo was limited to five names, and even when only a small subset of the photos was annotated.
Meaningful presentations of photo libraries: rationale and applications of bi-level radial quantum layouts BIBAFull-Text 188-196
  Jack Kustanowitz; Ben Shneiderman
Searching photo libraries can be made more satisfying and successful if search results are presented in a way that allows users to gain an overview of the photo categories. Since photo layouts on computer displays are the primary way that users get an overview, we propose a novel approach to show more photos in meaningful groupings. Photo layouts can be linear strips, or zoomable three dimensional arrangements, but the most common form is the two-dimensional grid. This paper introduces a novel bi-level hierarchical layout with motivating examples. In a bi-level hierarchy, one region is designated for primary content -- an image, text, or combination. Adjacent to that region, groups of photos are placed radially in an ordered fashion, such that the relationship of the single primary region to its many secondary regions is apparent. A compelling aspect is the interactive experience in which the layout is dynamically resized, allowing users to rapidly, incrementally, and reversibly alter the dimensions and content. It can accommodate hundreds of photos in dozens of regions, can be customized in a corner or center layout, and can scale from an element on a web page to a large poster size. On typical displays (1024 x 1280 or 1200 x 1600 pixels), bi-level radial quantum layouts can conveniently accommodate 2-20 regions with tens or hundreds of photos per region.
On the extraction of vocal-related information to facilitate the management of popular music collections BIBAFull-Text 197-206
  Wei-Ho Tsai; Hsin-Min Wang
With the explosive growth of networked collections of musical material, there is a need to establish a mechanism like a digital library to manage music data. This paper presents a content-based processing paradigm of popular song collections to facilitate the realization of a music digital library. The paradigm is built on the automatic extraction of information of interest from music audio signals. Because the vocal part is often the heart of a popular song, we focus on developing techniques to exploit the solo vocal signals underlying an accompanied performance. This supports the necessary functions of a music digital library, namely, music data organization, music information retrieval/recommendation, and copyright protection.

Digital libraries and cyberinfrastructure: creating information representations for education

From playful exhibits to LOM: lessons from building an exploratorium digital library BIBAFull-Text 207-212
  Holly Fait; Sherry Hsi
The Exploratorium, an interactive hand-on science museum, is developing an online collection of science learning and teaching resources to better serve educators' needs for pedagogically-rich instructional resources via the Web. Several challenges arise when designing a digital library for formal K12 education audiences using the Learning Object Metadata standard. These problems are multiplied when attempting to catalog the wide variety of informal learning digital resources from the Exploratorium's ever growing website and exhibit-based resource collections. This paper shares key challenges and early solutions for the creation of an educational metadata scheme, new vocabularies, and strategies for retrofitting existing informal learning science resources into learning objects.
Tacit user and developer frames in user-led collection development: the case of the digital water education library BIBAFull-Text 213-222
  Michael Khoo
This paper discusses the impact that developers' and users' tacit understandings can have on digital library development. It draws on three years of ethnographic research with the Digital Water Education Library (DWEL) that focused on the observation, collection, and analysis of the project's face-to-face and electronic organizational communication. The DWEL project involved formal and informal educators in the development of its collection, and experienced problems at the start of the project with getting these educators to complete their cataloguing tasks. The research showed that despite having spent several days in face-to-face workshops, the project's PIs and the educators had different tacit understandings of what digital libraries were, that were impeding the project's organizational communication and workflow. I describe how these differences were identified and analyzed, and subsequently addressed and mediated through the design and development of online tools that acted as boundary objects between the PIs and the educators.
Experimenting with the automatic assignment of educational standards to digital library content BIBAFull-Text 223-224
  Anne R. Diekema; Jiangping Chen
This paper describes exploratory research concerning the automatic assignment of educational standards to lesson plans. An information retrieval based solution was proposed, and the results of several experiments are discussed. Results suggest the optimal solution would be a recommender tool where catalogers receive suggestions from the system but humans make the final decision.

Users and interaction: understanding user needs and perceptions

Turning the page on navigation BIBAFull-Text 225-234
  Catherine C. Marshall; Sara Bly
In this paper, we discuss the findings of an in-depth observational study of reading and within-document navigation and add to these findings the results of a second analysis of how people read comparable digital materials on the screen, given limited navigational functionality. We chose periodicals as our initial foil since they represent a type of material that invites many different kinds of reading and strategies for navigation. Using multiple sources of evidence from the data, we first characterize readers' navigation strategies and specific practices as they make their way through the magazines. We then focus on two observed phenomena that occur when people read paper magazines, but are absent in their digital equivalents: the lightweight navigation that readers use unselfconsciously when they are reading a particular article and the approximate navigation readers engage in when they flip multiple pages at a time. Because page-turning is so basic and seems deceptively simple, we dissect the turn of a page, and use it to illustrate the importance and invisibility of lightweight navigation. Finally, we explore the significance of our results for navigational interfaces to digital library materials.
In the company of readers: the digital library book as "practiced place" BIBAFull-Text 235-243
  Nancy Kaplan; Yoram Chisik
Most digital libraries (DLs) necessarily focus on the complex issues that arise when library collections are freed from their physical anchors in buildings and on paper. Typical investigations look at supporting adults in work settings, such as school or research. Much less attention has been paid to younger generations of readers. As ever more digital venues cater to youngsters' attentions, a role for the DL as a catalyst of social interactions around traditional literacy practices begins to take shape. Based on prior research on annotation systems, constructive hypertexts, and computer support for cooperative work coupled with our contextual inquiries with children, we have developed a prototype for a digital book that supports social interactions through annotations. By placing and sharing notes, groups of readers transform the book from an artifact into a living record of communal experience. A system of support for marks and notes in the context of reading for pleasure can turn the digital library book into a "practiced place," a location that is not only accessible, but also welcoming, engaging and supportive of the activities children are interested in and therefore likely to engage in. Our experience with Alph, a prototype book-reader supporting a range of rhetorical marks and note-writing, suggests that future DLs need to look beyond augmenting work-based literacy practices by creating dynamic and social reading environments.
Digitization and 3D modeling of movable books BIBAFull-Text 244-245
  Pierre Cubaud; Jerome Dupire; Alexandre Topol
Movable books provide interesting challenges for digitization and user interfaces design. We report in this paper some preliminary results in the building of a 3D visualization workbench for such books.

Tools & techniques: browsing and visualizing collections

An initial evaluation of automated organization for digital library browsing BIBAFull-Text 246-255
  Aaron Krowne; Martin Halbert
In this article we present an evaluation of text clustering and classification methods for creating digital library browse interfaces, focusing on the particular case of collections made up of heterogeneous metadata records. This situation is common in "portal" style digital libraries, which are built by harvesting content from many disparate sources, typically using the Open Archives Protocol for Metadata Harvesting (OAI-PMH). By studying the activity of users in an experimental system, we find that taxonomies built or populated using machine-learning (or "AI") techniques provide a potentially useful avenue for browsing in this digital library scenario.
Using concept maps in digital libraries as a cross-language resource discovery tool BIBAFull-Text 256-257
  Ryan Richardson; Edward A. Fox
The concept map, first suggested by Joseph Novak, has been extensively studied as a way for learners to increase understanding. We are automatically generating and translating concept maps from electronic theses and dissertations, for both English and Spanish, as a DL aid to discovery and summarization.
Collection understanding for OAI-PMH compliant repositories BIBAFull-Text 258-259
  TeongJoo Ong; John J. Leggett
We briefly discuss the architecture and design of a collection understanding tool that utilizes information visualization and the Open Archives Initiative Protocol for Metadata Harvesting to help users in understanding the essence of image collections in OAI-PMH compliant repositories.
A focus-context browser for multiple timelines BIBAFull-Text 260-261
  Robert B. Allen
Events may be best understood in the context of other events. We can call a set of related events a "timeline", because of the temporal ordering. Such timelines are themselves best understood in the context of other timelines. To facilitate the exploration of a collection of events and timelines, a visualization tool has been developed that facilitates the user's ability to compare and browse across events and timelines. In this model, each event is accompanied by a text description and links to related resources such as articles from digitized historical newspapers.

Digital libraries and cyberinfrastructure: creating information representations for the humanities (part 1)

Semantics and syntax of Dublin core usage in open archives initiative data providers of cultural heritage materials BIBAFull-Text 262-270
  Arwen Hutt; Jenn Riley
This study analyzes metadata shared by cultural heritage institutions via the Open Archives Initiative Protocol for Metadata Harvesting. The syntax and semantics of metadata appearing in the Dublin Core fields creator, contributor, and date are examined. Preliminary conclusions are drawn regarding the effectiveness of Dublin Core in the Open Archives Initiative environment for cultural heritage materials.
Finding a catalog: generating analytical catalog records from well-structured digital texts BIBAFull-Text 271-280
  David Mimno; Alison Jones; Gregory Crane
One of the criticisms library users often make of catalogs is that they rarely include information below the bibliographic level. It is generally impossible to search a catalog for the titles and subjects of particular chapters or volumes. There has been no way to add this information to catalog records without exponentially increasing the workload of catalogers. At the same time, well-structured full-text XML transcriptions of printed works are becoming increasingly available. This paper describes how existing investments in full text digitization and structural markup combined with current named-entity extraction technology can efficiently generate the detailed level of catalog data that users want, at no significant additional cost. This system is demonstrated on an existing digital collection within the Perseus Digital Library.

Users and interaction: memex and hypertext

To grow in wisdom: vannevar bush, information overload, and the life of leisure BIBAFull-Text 281-286
  David M. Levy
It has been nearly sixty years since Vannevar Bush's essay, "As We May Think," was first published in The Atlantic Monthly, an article that foreshadowed and possibly invented hypertext. While much has been written about this seminal piece, little has been said about the argument Bush presented to justify the creation of the memex, his proposed personal information device. This paper revisits the article in light of current technological and social trends. It notes that Bush's argument centered around the problem of information overload and observes that in the intervening years, despite massive technological innovation, the problem has only become more extreme. It goes on to argue that today's manifestation of information overload will require not just better management of information but the creation of space and time for thinking and reflection, an objective that is consonant with Bush's original aims.
Integrating collections at the cervantes project BIBAFull-Text 287-288
  Neal Audenaert; Richard Furuta; Eduardo Urbina; Jie Deng; Carlos Monroy; Rosy Saenz; Doris Careaga
Unlike many efforts that focus on supporting scholarly research by developing large-scale, general resources for a wide range of audiences, we at the Cervantes Project have chosen to focus more narrowly on developing resources in support of ongoing research about the life and works of a single author, Miguel de Cervantes Saavedra (1547-1616). This has lead to a group of hypertextual archives, tightly integrated around the narrative and thematic structure of Don Quixote. This project is typical of many humanities research efforts and we discuss how our experiences inform the broader challenge of developing resources to support humanities research.
Icon abacus: positional display of document attributes BIBAFull-Text 289-290
  Eric A. Bier; Adam Perer
This paper presents icon abacus, a space-efficient technique for displaying document attributes by automatic positioning of document icons. It displays the value of an attribute by using position on a single axis, allowing the other axis to display different metadata simultaneously The layout is stable enough to support navigation using spatial memory.

Tools & techniques: applying machine learning to collection development

Developing practical automatic metadata assignment and evaluation tools for internet resources BIBAFull-Text 291-300
  Gordon W. Paynter
This paper describes the development of practical automatic metadata assignment tools to support automatic record creation for virtual libraries, metadata repositories and digital libraries, with particular reference to library-standard metadata. The development process is incremental in nature, and depends upon an automatic metadata evaluation tool to objectively measure its progress. The evaluation tool is based on and informed by the metadata created and maintained by librarian experts at the INFOMINE Project, and uses different metrics to evaluate different metadata fields. In this paper, we describe the form and function of common metadata fields, and identify appropriate performance measures for these fields. The automatic metadata assignment tools in the iVia virtual library software are described, and their performance is measured. Finally, we discuss the limitations of automatic metadata evaluation, and cases where we choose to ignore its evidence in favor of human judgment.
What's there and what's not?: focused crawling for missing documents in digital libraries BIBAFull-Text 301-310
  Ziming Zhuang; Rohit Wagle; C. Lee Giles
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors' self-submissions. While these approaches have so far built reasonable size libraries, they can suffer from having only a portion of the documents from specific publishing venues. We propose to use alternative online resources and techniques that maximally exploit other resources to build the complete document collection of any given publication venue.
   We investigate the feasibility of using publication metadata to guide the crawler towards authors' homepages to harvest what is missing from a digital library collection. We collect a real-world dataset from two Computer Science publishing venues, involving a total of 593 unique authors over a time frame of 1998 to 2004. We then identify the missing papers that are not indexed by CiteSeer. Using a fully automatic heuristic-based system that has the capability of locating authors' homepages and then using focused crawling to download the desired papers, we demonstrate that it is practical to harvest using a focused crawler academic papers that are missing from our digital library. Our harvester achieves a performance with an average recall level of 0.82 overall and 0.75 for those missing documents. Evaluation of the crawler's performance based on the harvest rate shows definite advantages over other crawling approaches and consistently outperforms a defined baseline crawler on a number of measures.

Digital libraries and cyberinfrastructure: creating information representations for the humanities (part 2)

Resolving the unencoded character problem for Chinese digital libraries BIBAFull-Text 311-319
  Derming Juang; Jenq-Haur Wang; Chen-Yu Lai; Ching-Chun Hsieh; Lee-Feng Chien; Jan-Ming Ho
Constructing a Chinese digital library, especially for a historical article archiving, is often bothered by the small character sets supported by the current computer systems. This paper is aimed at resolving the unencoded character problem with a practical and composite approach for Chinese digital libraries. The proposed approach consists of the glyph expression model, the glyph structure database, and supporting tools. With this approach, the following problems can be resolved. First, the extensibility of Chinese characters can be preserved. Second, it would be as easy to generate, input, display, and search unencoded characters as existing ones. Third, it is compatible with existing encoding schemes that most computers use.
   This approach has been utilized by organizations and projects in various application domains including archeology, linguistics, ancient texts, calligraphy and paintings, and stone and bronze rubbings. For example, in Academia Sinica, a very large full-text database of ancient texts called Scripta Sinica has been created using this approach. The Union Catalog of National Digital Archives Project (NDAP) dealt with the unencoded characters encountered when merging the metadata of 12 different thematic domains from various organizations. Also, in Bronze Inscriptions Research Team (BIRT) of Academia Sinica, 3,459 Bronze Inscriptions were added, which is very helpful to the education and research in historic linguistics.
E-library of medieval chant manuscript transcriptions BIBAFull-Text 320-329
  Louis W. G. Barton; John A. Caldwell; Peter G. Jeavons
In this paper we present our rationale and design principles for a distributed e-library of medieval chant manuscript transcriptions. We describe the great variety in neumatic notations, in order to motivate a standardised data representation that is lossless and universal with respect to these musical artefacts. We present some details of the data representation and an XML Schema for describing and delivering transcriptions via the Web. We argue against proposed data formats that look simpler, on the grounds that they will inevitably lead to fragmentation of digital libraries. We plan to develop applications software that will allow users to take full advantage of the carefully designed representation we describe, while shielding users from its complexity. We argue that a distributed e-library of this kind will greatly facilitate scholarship, education, and public appreciation of these artefacts.
Toward a metadata standard for digitized historical newspapers BIBAFull-Text 330-331
  Ray L. Murray
This paper is a case study of metadata development in the early stages of the National Digital Newspaper Program, a twenty-year digital initiative to expand access to historical newspapers in support of research and education. Some of the issues involved in newspaper metadata are examined, and a new XML-based standard is described that is suited to the large volume of data, while remaining flexible into the future.
The challenges in developing digital collections of phonograph records BIBAFull-Text 332-333
  Catherine Lai; Ichiro Fujinaga; Cynthia A. Leive
To facilitate long-term preservation and sustain the utility of phonograph records, an efficient and economical workflow management system for digitization is necessary. We describe in this paper the digitization process for building an online digital collection of phonograph records and our procedure for creating the ground-truth data, which is essential for developing an efficient metadata and content capturing system. We also discuss the challenges of defining metadata for phonograph records and their packaging to enhance access and use across traditional boundaries.

Tools & techniques: identifying names of people and places

Name disambiguation in author citations using a K-way spectral clustering method BIBAFull-Text 334-343
  Hui Han; Hongyuan Zha; C. Lee Giles
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: co-author names, paper titles, and publication venue titles 2. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
Comparative study of name disambiguation problem using a scalable blocking-based framework BIBAFull-Text 344-353
  Byung-Won On; Dongwon Lee; Jaewoo Kang; Prasenjit Mitra
In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.
On assigning place names to geography related web pages BIBAFull-Text 354-362
  Wenbo Zong; Dan Wu; Aixin Sun; Ee-Peng Lim; Dion Hoe-Lian Goh
In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose our approaches to address these sub-problems. In particular, we have modified GATE, a well-known named entity extraction software, to perform place name extraction using a US Census gazetteer. A rule-based place name disambiguation method and a place name assignment method capable of assigning place names to web page segments have also been proposed. We have evaluated our proposed disambiguation and assignment methods on a web page collection referenced by the DLESE metadata collection. The results returned by our methods are compared with manually disambiguated place names and place name assignment. It is shown that our proposed place name disambiguation method works well for geo/geo ambiguities. The preliminary results of our place name assignment method indicate promising results given the existence of geo/non-geo ambiguities among place names.


A reciprocal platform for archiving interview videos about arts and crafts BIBAFull-Text 363
  Kenro Aihara; Atsuhiro Takasu
This paper proposes a platform for portal and local repositories. Our methodology aims not only at construction of portal site but also at supporting capture of digital contents transformed from interview videos with intellectuals.
BEN collaborative poster BIBAFull-Text 364
  Linda Akli; Cal Collins; Yolanda George
In 1999, the American Association for the Advancement of Science (AAAS) Directorate for Education and Human Resources (EHR) Programs and Science's Signal Transduction Knowledge Environment (STKE) -- with 11 other professional societies and coalitions for biological sciences -- established the BiosciEdNet (BEN) Collaborative with limited funding from the US National Science Foundation (NSF) National Sciences Education Digital Library Program (NSDL). Since its inception, BEN has grown from its original 11 to 24 Collaborators.
   Currently, the digital library collections of BEN Collaborators provide a rich array of materials for undergraduate biological sciences educators, including ones that prepare K-12 teachers. The materials that users find via the BEN portal are unique in several ways.
  • First, BEN resources have been reviewed by the individual societies for
       standards of quality and accuracy. Although each BEN collaborator has unique
       review criteria, overall users find resources that are scientifically
       accurate and educationally sound.
  • Second, the BEN portal provides an extended set of search parameters to allow
       more productive searches by users.
  • Finally, due to the collaborative establishment of its metadata structure,
       the user can easily conduct productive interdisciplinary searches across the
       diverse biological sciences topics covered by the BEN Collaborators.
  • In general contributors can submit resources through the digital library of a
       BEN collaborator or directly to the BEN portal. Submissions to the BEN
       portal are sent to the Collaborators that are maintaining discipline
       specific resources. AAAS catalogs submissions that do not fit discipline
       specific Collaborators. Contributors enter the cataloging information at the
       time of submission and a BEN digital library provider or AAAS validates the
  • PDLib: personal digital libraries with universal access BIBAFull-Text 365
      Francisco Alvarez-Cavazos; David A. Garza-Salazar; Juan C. Lavariega-Jarquin
    We propose a universally available personal digital library system. It is "personal" in the sense that each user is provided with a general purpose document repository (i.e. a personal digital library). It is "universally available" in the sense that it allows the user to access her/his personal personal digital library from most computing devices connected to the Internet, including mobile phones, PDAs and laptops, therefore granting access "from anyplace at anytime."
    Large introductory science courses & digital libraries BIBAFull-Text 366
      Laura M. Bartolo; Cathy S. Lowe; Donald R. Sadoway; Patrick E. Trapa
    Student self-assessment survey results indicate that a virtual lab experience improved understanding of many key laboratory learning objectives and that the Materials Digital Library (MatDL) has potential value in supporting a virtual lab.
    aDORe: a modular and standards-based digital object repository at the los alamos national laboratory BIBAFull-Text 367
      Jeroen Bekaert; Xiaoming Liu; Herbert Van de Sompel
    This paper describes the aDORe repository architecture, designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory.
    A signal/semantic framework for image retrieval BIBAFull-Text 368
      Mohammed Belkhatir; Yves Chiaramella; Philippe Mulhem
    This poster presents an approach for integrating perceptual signal features (i.e. color and texture) and semantic information within an integrated architecture for image retrieval. It relies on an expressive knowledge representation formalism handling high-level image descriptions and a full-text query framework. It consequently brings the level of image retrieval closer to users' needs by translating low-level signal features to high-level data and coupling it with semantics within index and query structures.
    Video recommendations for the open video project BIBAFull-Text 369
      Johan Bollen; Michael L. Nelson; Raquel Araujo; Gary Geisler
    We describe a DL multimedia recommender system implemented for the Open Video project. Recommendations are generated by a spreading activation algorithm operating on a video network created from log download sequences. We compared the system's recommendations to those generated by a collaborative filtering technique.
    The DLESE evaluation services group: a framework for evaluation within a digital library BIBAFull-Text 370
      Susan Buhr; Lecia Barker; Thomas C. Reeves
    The Digital Library for Earth System Education (DLESE) Evaluation Services Core (ESC) team has been established to:
  • Provide evaluation support for DLESE core team activities.
  • Establish pilot studies with key user audiences (e.g. K-12 teachers and
       undergraduate faculty),
  • Implement studies designed to better characterize DLESE users and user needs.
  • Offer evaluation support and opportunities for the DLESE community through
       workshops and grant opportunities.
  • Using strand maps to engage digital library users with science content BIBAFull-Text 371
      Kirsten R. Butcher; Sonal Bhushan
    Our research examined whether using strand maps as an interface for digital library search tasks would change learners' cognitive processes when seeking educational resources. Results demonstrated that strand maps can help learners engage with science content and that semantic-spatial interfaces can support meaningful search processes.
    Impact: the last frontier in digital library evaluation BIBAFull-Text 372
      Anita S. Coleman; Laura M. Bartolo; Casey Jones
    The NSF-funded National Science Digital Library (NSDL) is engaged in an ongoing discourse about digital library evaluation. The Educational Impact and Evaluation Standing Committee (EIESC) has successfully identified desirable features in digital libraries such as usability and usage, but the hardest measure is impact. What is the impact of a DL? Members of the EIESC have engaged in pilots and feasibility studies using bricolage (a blend of qualitative and quantitative approaches to evaluation), and these activities are moving NSDL toward a richer understanding of impact.
    An approach to modeling content for digital repositories BIBKFull-Text 373
      Robert Chavez; Nikolai Schwertner
    Keywords: content model, digital repository, fedora, shared content
    Take note: academic note-taking and annotation behavior BIBAFull-Text 374
      Sally Jo Cunningham; Chris Knowles
    This paper describes an exploratory study of note taking at academic conferences.
    Teaching boxes and web services: optimizing the digital library for earth system education for the classroom BIBAFull-Text 375
      Lynne Davis; Shelley Olds
    A recent pilot program pioneered the development of Teaching Boxes by the Digital Library for Earth System Education (DLESE) with the Univ. of CA. Berkeley Museum of Paleontology, SF State Univ., USGS, and seven San Francisco area middle/high school teachers. This poster shares the current DLESE Teaching Box effort, explains the pilot program, and highlights the use of web services to create a context within which teachers can find, immediately use, or adapt relevant teaching and learning resources.
    Music-to-knowledge (M2K): a prototyping and evaluation environment for music digital library research BIBKFull-Text 376
      J. Stephen Downie; Andreas F. Ehmann; Xiao Hu
    Keywords: Design
    Real-time genre classification for music digital libraries BIBAFull-Text 377
      J. Stephen Downie; Andreas F. Ehmann; David Tcheng
    This poster describes a real-time audio-based automatic music genre classifier for use in organizing, browsing, and searching musical digital libraries. A decision tree classifier trained on a 40-dimension feature space is used to categorize music into one of 14 different genres with the results being displayed to a continuously updating user interface.
    CQE: a collaborative querying environment BIBKFull-Text 378
      Lin Fu; Dion Hoe-Lian Goh; Schubert Shou-Boon Foo
    Keywords: collaborative querying, information retrieval, user studies
    Musica colonial: 18th century music score meets 21st century digitalization technology BIBAFull-Text 379
      Ting Gan
    The Musica Colonial project is an initiative to preserve, digitize, and provide online access to the sole copy of a handwritten Colonial times cathedral music scores with Spanish lyrics collection in microfilm format archived at the Mesoamerican Center for Regional Research (CIRMA) located in Antigua, Guatemala. Various fields and methodologies of research can be done on this collection because of its multi-faceted and culturally rich content. This poster mainly focuses on the music aspect of the collection and illustrates the different stages of making the musical content of the collection accessible to the public. Digitization is the most affective and promising way to display this special collection to the rest of the world.
    MyPDL: a web-based personal digital library BIBAFull-Text 380
      Wu He; Demei Shen
    In recent years we have witnessed a dramatic increase in the volume of electronic and digital information that has been produced and as a result we feel increasingly overwhelmed by the amount of digital information that needs to be managed. This paper describes a Web-based personal digital library system, myPDL, which was designed and developed to allow individual users to create, store, organize, and retrieve their own personal digital information collections. The major distinctions between the developed tool and digital libraries in general were that major considerations were put on the functionalities of user customization and user personalization.
    JISC metadata schema registry BIBKFull-Text 381
      Rachel Heery; Pete Johnston; Dave Beckett; Nikki Rogers
    Keywords: Dublin core, learning object metadata, metadata, registry
    Applying verification, validation, and accreditation processes to digital libraries BIBAFull-Text 382
      David Joiner; Steven Gordon; Scott Lathrop; Marilyn McClelland; D. E. Stevenson
    We propose to address the issue of quality of digital library objects in the Computational Science Education Reference Desk by applying a verification, validation, and accreditation workflow to the review of learning objects.
    Task difficulty in information searching behavior: expected difficulty and experienced difficulty BIBAFull-Text 383
      Jeonghyun Kim
    The purpose of the work is to better understand the issue of task difficulty from the perspective of the user. To investigate the relationship between task difficulty and information searching behavior, two types of task difficulty are considered: expected difficulty and experienced difficulty. Information searching behavior was observed via time spent, pages viewed, pages saved, search efficiency and the number of query reformulations.
    An information network overlay architecture for the NSDL BIBAFull-Text 384
      Carl Lagoze; Dean B. Krafft; Susan Jesuroga; Tim Cornwell; Ellen J. Cramer; Edwin Shin
    We describe the underlying data model and implementation of a new architecture for the National Science Digital Library (NSDL) by the Core Integration Team (CI). The architecture is based on the notion of an information network overlay. This network, implemented as a graph of digital objects in a Fedora repository, allows the representation of multiple information entities and their relationships. The architecture provides the framework for contextualization and reuse of resources, which we argue is essential for the utility of the NSDL as a tool for teaching and learning.
    Metadata for phonograph records: facilitating new forms of use and access to analog sound recordings BIBAFull-Text 385
      Catherine Lai; Ichiro Fujinaga; Cynthia A. Leive
    A new metadata schema for analog sound recordings is described.
    Facilitating the effective use of earth science data in education through digital libraries: bridging the gap between scientists and educators BIBAFull-Text 386
      Tamara Shapiro Ledley; LuAnn Dahlman; Ben Domenico; Michael R. Taber
    Creating learning modules that utilize Earth data is a difficult task, requiring knowledge about the data, science, curriculum design, and the educational context. The Digital Library for Earth System Education (DLESE) Data Services group hosted a workshop to bridge the knowledge gap between data providers and educators by assembling teams of experts in these areas to create data-rich learning modules. Face-to-face collaboration allowed the sharing of perspectives and encouraged the contribution of individual expertise, facilitating development of data-rich modules.
    Digital libraries on handhelds for autistic children BIBAFull-Text 387
      Gondy Leroy; Serena Chuang; John Huang; Marjorie H. Charlop-Christy
    Autism is a wide spectrum developmental disorder. Its prevalence has increased enormously. Each autistic child has different needs and requires individual therapy. Information technology can help these children and their families by augmenting therapy and providing communication tools. We are developing a digital library that provides such a communication tool on a Pocket PC. The tool will be integrated into the therapy sessions and the children will be able use it daily. Additionally, all the interactions with the tool are logged, allowing the therapist a detailed view of the effects of therapy on the children's communication.
    A study of annotations for a consumer health portal BIBAFull-Text 388
      Lili Luo; David West; Gary Marchionini; Catherine Blake
    This paper presents a study of annotations made by cataloguers of consumer health websites in order to better understand the website cataloging process.
    Motivating and supporting faculty use of educational digital libraries: an example from the geosciences BIBFull-Text 389
      Cathy A. Manduca; Ellen R. Iverson; Sean Fox; Flora McMartin
    Innovative training solutions for digitization BIBAFull-Text 390
      Amy Lynn Maroso
    The benefits of digitizing library collections are important and diverse. Patrons outside traditional geographic boundaries can be served along side local patrons. Access to local history, a vibrant part of many library holdings, and original, rare, and/or valuable materials can be greatly expanded. However, the practice of digitization can fall short of its promise due to poor planning and a lack of digitization skills. Moreover, quality digitization training is not always accessible or feasible given the time, expense, and staff limitations for many institutions.
       The Basics and Beyond digitization training program offers a novel solution for this problem. Funded by an Institute of Museum and Library Services National Leadership grant and administered by the University of Illinois at Urbana-Champaign Library, the Illinois State Library, and the Illinois Heritage Association, Basics and Beyond offers three digitization training options to cultural heritage institutions: one-day on-site workshops, three-week online training, and three-week online training followed by a hands-on workshop.
       The content of the workshops provides participants with an overview of the digitization process. Topics presented include: project planning, equipment selection, metadata, and standards and best practices for digitizing materials. The online courses expand greatly on the material covered in the workshops and provide the participants with an in-depth look at the digitization process. They cover material such as using digitization as a preservation practice, selection of materials for digitization, project planning, metadata schemes, equipment needs, and standards and best practices. The online courses are accessible to anyone with a Web connection and provide institutions around the world with access to innovative digitization training. The online training is affordable to most organizations, and its asynchronous format allows librarians and staff to easily fit the course into their work schedules.
       The effectiveness of the workshops and courses has been determined by participant evaluations conducted both during the workshops and courses and several months after completion. Evaluations include pre- and post-course surveys, essay questions, and quizzes as well as follow-up telephone interviews conducted several months after course or workshop completion. Evaluations are designed to determine the quality of training and to what degree the training assisted in the implementation and practice of newly formed or revamped digital projects.
       Evaluation results have been overwhelmingly positive for all courses and workshops. Over 200 people have taken the one-day workshop. A nine-question pre-workshop quiz indicates that 80% of participants missed three or more questions before being exposed to the workshop material. As expected, participants score significantly higher when given the same quiz after the workshop-only 14% missed more than two questions. Over 130 people have taken one of the online courses. In addition to quizzes, objective evaluation is also done in the online courses and show that students are able to apply the information they learn in "real world" situations, such as equipment purchases and metadata creation. Evaluations and participants' comments indicate that the workshops and courses are highly successful and have accomplished the chief goal of the project-to educate cultural heritage institution professionals on the best practices for digitization to ensure the success and longevity of their digitized collections.
    Down on the OCR farm: how we produced searchable PDFs for 7 million documents in a student computer lab BIBAFull-Text 391
      Robert Mason; Heidi Schmidt; Richard Trott
    Utilizing idle workstations in a student computer lab, 7 million searchable PDF documents were generated from 42 million TIF page images.
    The climate change collection: a case study on digital library collection review and the integration of research, education and evaluation BIBAFull-Text 392
      Mark McCaffrey; Tim Weston
    Validating the scientific quality and potential of digital resources use in classroom settings has become a major focus of recent digital library efforts such as the Digital Library for Earth System Education (DLESE). The Climate Change Collection is thematic collection of digital resources relating to the topic of global climate change and natural climate variability designed as a pilot project for reviewing the scientific quality and pedagogical potential of selected digital resources using a focused and streamlined approach. The collection offers a case-study in integrating research and education through the collaborative efforts of an interdisciplinary review team made up of professionals from the fields of climate research, geoscience education, cognitive psychology, and evaluation. Each participant received a stipend for their involvement in the process. Designed as an experiment in streamlined collection development, it is anticipated that the experience of the Climate Change Collection effort will help inform future digital library review and collection-building efforts.
    If you harvest arXiv.org, will they come? BIBAFull-Text 393
      Michael L. Nelson; Johan Bollen
    We examine which NASA Technical Report Server (NTRS) repositories have received the most downloads during 15 months of operation. In particular, we explore the collection development policy of including non-NASA scientific, technology and medicine (STM) repositories. We found that three of the four non-NASA repositories included in NTRS contributed little to the overall download totals.
    Creating the infrastructure for collaboration between digital reference services and researchers: the digital reference electronic warehouse (DREW) project BIBAFull-Text 394
      Scott Nicholson; R. David Lankes
    The Digital Reference Electronic Warehouse (DREW) project is a collection of digital reference transactions from different services and different communication channels that live in a single space. Reference services work with DREW to submit transactions using the DREW schema, which is conceptually similar to the MARC record format for bibliographic materials. Researchers can then receive records from DREW to improve our knowledge of digital reference. These researchers then use the results of their research to create tools, reports, and models based on the DREW schema, and place those items into a management information system (MIS). The services can then access the MIS and apply those tools to their own archives. The result is that services can benefit directly and rapidly from research, and are then more likely to continue their involvement with the project. This infrastructure creates a collaborative space where researchers and practitioners can benefit from the work of each other and aid us in advancing the field of digital reference.
    Building lite-weight EAD repositories BIBAFull-Text 395
      Terry Reese
    University Archives and museums traditional haven't been viewed as a bastion for innovative technology development, but for a number of years now, it has been university archives and museums that have made the most significant moves towards adopting and utilizing XML-based metadata schemas for bibliographic description. Unlike the general library profession, which has been able to rely on MARC throughout the years as its primary vehicle for bibliographic description, museums and archives have traditionally had no formal metadata structure to create portable metadata records. It is likely that for this reason that museums and archives were quick to embrace EAD (Encoded Archival Description) as the defacto method for creating portable finding aids. However, while EAD has provided a vehicle for portability, institutions have found that actually using these finding aids within a database or for display can be quite daunting. Often times, individual institutions creating finding aids have no idea how to make use of them within their institution's current infrastructure -- often relying on larger organizations like the California Digital Library or the Northwest Digital Archive (NWDA) to provide a vehicle for distributing their materials. And while the consortia method of distribution offers a number of benefits, it does come at a cost of individual institutional identity.
       Oregon State University currently is one of the participating libraries in the NWDA. This has been a tremendous relationship in terms of creating a resource that provides exposure to a wide number of collections at OSU in relation to other archive collections at other participating NWDA libraries. However, the cost has been a loss of identity in regards to how OSU presents its finding aids to its own user community. As a result, the OSU Archives was creating multiple finding aids -- an EAD record for use within the NWDA and a static HTML page for use at OSU.
       In January 2005, OSU Valley Library started work on a lite-weight EAD repository with the following goals in mind: 1) that the solution be portable to other NWDA institutions, 2) be built on top of current open source technologies (like MySQL, PHP, etc.), 3) provided a method for federated searching of the repository and 4) that it be flexible enough to eventually include other metadata schemas. This lite-weight EAD repository is the first product of this development, which uses PHP, MySQL and Saxon to produce an easily replicable database environment for querying and serving EAD records on the web. Likewise, by building SRU functionality into the EAD repository -- this lite-weight solution can also expose it's own resources to the outside world through a standard query and retrieval language.
       This poster session will discuss the method used to generate a lite-weight search and discovery solution and demonstrate the resulting finished product. This poster session will also discuss the implications of this type of repository solution -- particularly the ability for institutions to quickly move EAD elements into a searchable, web-based environment as well as the ability for institutions to include their EAD repositories in a federated search infrastructure using a standard search and retrieval language.
    Osprey: peer-to-peer enabled content distribution BIBAFull-Text 396
      John Reuning; Paul Jones
    As the size of data and files increases, digital repositories face a growing problem in content distribution. High quality multimedia and research data sets can range from 100's of megabytes to over a terabyte. Web-based digital repositories may experience a substantial increase in operating and bandwidth costs when providing materials to the public. Peer-to-peer networks are sometimes suggested as an alternative to traditional centralized repositories [2]. However, critical issues such as data inte grity, access control, and content availability exist when using peer-to-peer technologies [1].
       Osprey (http://osprey.ibiblio.org) addresses these problems by combining a flexible metadata management system with the BitTorrent peer-to-peer protocol. A Web database application provides searching and browsing of collection objects, and the peer-to-peer component lowers the bandwidth costs by employing distributed downloading. The Permaseed application supplies reliable, persistent peer-to-peer access to files in the digital repository.
    Integrating image-rich biological information with a web search tool: the inside wood model BIBAFull-Text 397
      Shirley Rodgers; Elisabeth Wheeler; Troy Simpson; Jeff Bartlett
    North Carolina State University is collaborating with global partners to produce a comprehensive Internet-accessible wood anatomy reference, research, and teaching tool incorporating images, taxonomy, and anatomical information sets. With its multiple search capabilities, content types, and user options, InsideWood serves as a model for image-intensive, searchable biological collections. http://insidewood.lib.ncsu.edu/search/
    What type of page is this?: genre as web descriptor BIBAFull-Text 398
      Mark A. Rosso
    Many have suggested the use of genres to ameliorate the problem of web search, e.g. [1,3,4,5,6,7]. A central issue in the implementation of this idea is the choice of genres to be used as web page descriptors. Several studies have explored user terminology for and recognition of several types of digital documents, e.g., various types of office documents [8], personal homepages [2], and pages returned by user web searches [4,6]. This poster reports on a series of three user studies with the purpose of developing a genre "palette" for use in web retrieval. Pages viewed by participants in these studies were limited to the edu domain, as in [5].
       In the first study, three participants, an information technology professional, an oncology social worker and a computer science professor, in separate sessions, were given a stack of 102 web page printouts, and were asked to separate the pages into piles according to genre. They were also asked to name the genres by writing the names on sticky notes and placing them on the piles. After the piles were complete, participants were asked to provide a short, one or two sentence, description of each genre, and then to describe the page characteristics that led them to place a page in that genre.
       A list of 49 genre names and definitions was developed from the work of the three participants, keeping the terminology as similar as possible to the original, while combining definitions which were nearly identical in wording. In a second user study, each of ten participants was given this list of genre name/definition pairs, the same stack of 102 printed web pages (arranged in a different random order for each participant), and a data collection form on which he/she recorded a genre for each web page. For each of the 102 web pages, the participant was given the option to either write a number from the list corresponding to a genre/definition pair which best described the page; or to provide his/her own suggestion for a genre name and definition, if none of those in the list seemed adequate. The participants were drawn from a convenience sample of approximately 10 college graduates of various occupations. Given that participants chose genres from a list of 48, many of which were extremely similar in nature, the resulting level of agreement (half or more of the participants agreeing on one genre for a given page in 60% of the instances) is quite acceptable. A set of five principles for creating a genre palette from individuals' sortings was developed. Based on those principles, the original list was trimmed down to 18 genres.
       The third study was an online experiment in which 257 college, faculty, students, and staff from two schools categorized a new set of 55 pages using the 18 genres. On average, over 70% agreed on the genre of each page. No study of this scale is known to report user recognition of web genres. This user validation is necessary to set upper bounds for machine categorization efforts. Also, because genre is usually considered to be "socially defined", genre studies using researcher-defined a priori categories (e.g., [5]) may not be able to show genres' usefulness for web search.
       Interestingly, the genres in this palette, although developed independently, are similar to 7 of 8 Internet-wide genres based on user input reported in [7], and similar to 8 of 11 Internet-wide genres as reported in [3]. Based on these observations, one might infer that some substantial amount of genre knowledge exists among users, even from different cultures (in this case, the United States, Germany, and Sweden).
    Negotiating identity in the math forum's online mentoring project BIBAFull-Text 399
      Wesley Shumar; Craig Bach
    Drawing on current thinking about identity, social group boundary and informational technology, the research presented in this poster discusses a unique online interactive project at the Math Forum called The Online Mentoring Project. The importance of this work for digital libraries will be highlighted.
    User centred interactive search in the humanities BIBAFull-Text 400
      Claire Warwick; Jon Rimmer; Ann Blandford; George Buchanan
    This poster describes research on the needs and behaviours of Humanities users of both digital libraries and more traditional information environments.
    Tools for managing collaboration, communication, and website content development in a distributed digital library community BIBAFull-Text 401
      Marianne Weingroff; Sonal Bhushman
    This poster showcases tools that the Digital Library for Earth System Education (DLESE) has developed to address the needs of its distributed community members to manage collaboration and communication, as well as the development of their own websites using DLESE templates. DLESE has customized several open source content management systems and has integrated its suite of Web services into them. These services enable developers to add customized search services, smart links, and RSS feeds, etc. to their sites.
    Studying the presence of terrorism on the web: an knowledge portal approach BIBKFull-Text 402
      Yilu Zhou; Jialun Qin; Edna Reid; Guanpi Lai; Hsinchun Chen
    Keywords: Algorithms, Design, Security
    Personalized project space for managing metadata of geography learning objects BIBKFull-Text 403
      Wenbo Zong; Dan Wu; Aixin Sun; Ee-Peng Lim; Dion Hoe-Lian Goh; Yin-Leng Theng; John Hedberg; Chew-Hung Chang
    Keywords: geography learning objects, personalized project space


    Icon abacus and ghost icons BIBAFull-Text 404
      Eric A. Bier; Adam Perer
    We present two techniques that make document collection visualizations more informative. Icon abacus uses the horizontal position of icon groups to communicate document attributes. Ghost icons show linked documents by adding temporary icons and by highlighting or dimming existing ones.
    Measuring the quality of network visualization BIBAFull-Text 405
      Chaomei Chen
    A quantitative method is developed for measuring the quality of network visualizations in terms of log-likelihood metrics resulted from Expectation Maximization (EM) clustering intrinsic and extrinsic attributes of network nodes.
    Building image-based electronic editions using the edition production technology BIBAFull-Text 406
      Alex Dekhtyar; Ionut E. Iacob; Jerzy Jaromczyk; Kevin Kiernan; Neil Moore; Dorothy C. Porter
    We demonstrate the Edition Production Technology (EPT), an integrated development environment for building Image-based Electronic Editions (IBEE). EPT is developed in Java on top of Eclipse platform and benefits from the openness of Eclipse's plugin architecture and its portability (currently EPT runs on Windows XP, Linux, and Mac OS X). EPT provides software support for building image-based digital libraries of historic documents. Starting with high resolution images of manuscripts and transcriptions of them, EPT tools provide support for creating XML encoding of the electronic edition, searching the electronic edition, linking text and images, and publishing the electronic edition (using filters and XSLT).
    EVIADA: ethnomusicological video for instruction and analysis digital archive BIBAFull-Text 407
      Jon W. Dunn; William G. Cowan
    The field of ethnomusicology depends heavily on ethnographic research or "fieldwork" by researchers that often involves the capture and subsequent analysis of audio and video information, to help document and understand the musical practices of people all over the world. Ethnomusicologists have used a variety of recording technologies over the years to capture film and video, and much of this footage lies in researchers' offices and home basements. No systematic mechanism exists for preserving and providing access to this video for other students and scholars.
       The Ethnomusicological Video for Instruction and Analysis Digital Archive (EVIADA) [1] is a multi-year collaborative project between Indiana University and the University of Michigan to create a digital archive for field video recordings captured by ethnomusicology researchers. This digital archive will serve both to preserve this content for future generations of scholars and also to provide a resource to support teaching and learning in ethnomusicology, anthropology, and related disciplines. The creation of EVIADA has involved a unique collaboration between ethnomusicologists, librarians, archivists, and technologists in carrying out all stages of the project, including video digitization, metadata creation, and system and user interface design.
       As part of the project, we are developing several software tools: The Segmentation/Annotation Tool is a Java Swing application written using Apple's QuickTime for Java API. It allows an ethnomusicologist who is contributing a video collection to the archive to divide that video into a hierarchy of segments, attach free-text descriptions and controlled vocabulary terms to each segment, and output this information in the form of a METS [3] XML document incorporating MODS [2] descriptive metadata records. This METS document can then be ingested into downstream archival and delivery systems. We hope to evolve this software into a more general-purpose tool for the creation of METS documents for video objects.
       We are also building a web-based user interface on top of the Fedora digital repository system to allow users to search and browse video content in the collection via the descriptive metadata and annotations, making appropriate use of controlled vocabulary thesauri to increase search recall.
    A fluid treemap interface for personal digital libraries BIBAFull-Text 408
      Lance Good; Ashok C. Popat; William C. Janssen; Eric Bier
    The UC system employs hybrid quantum/continuous treemaps for fluidly interacting with documents in a personal digital library. By incorporating a document reader application within the visualization workspace, UC supports multi-document reading tasks that have been traditionally accomplished by laying out documents on a physical desk. One of the overall goals of the system is to eliminate the boundary between acquiring and using documents.
    Processing XML documents with overlapping hierarchies BIBAFull-Text 409
      Ionut E. Iacob; Alex Dekhtyar
    The problem of overlapping markup hierarchies, first mentioned in the context of SGML, often occurs in XML text encoding applications for humanities. Previous solutions to the problem rely on manual maintenance of the markup and address only the problem of representing overlapping features in XML, leaving the issues of automated maintenance and querying open. As a consequence, traditional XML tools are of little practical use when dealing with overlapping markup. In this work we demonstrate the implementation of our framework for management of concurrent XML hierarchies from a computer science perspective. We propose an underlying model, data structures, APIs, and algorithms so that the most of the burden of managing concurrent XML hierarchies would be born by the software.
    The UpLib personal digital library system BIBAFull-Text 410
      William C. Janssen
    We demonstrate the operation of UpLib, a visually-oriented personal digital library system.
    Evaluation of mobile information retrieval strategies BIBAFull-Text 411
      Joemon M. Jose; Stephen Downes
    In this paper we describe and evaluate three strategies for information retrieval on mobile devices. Results show the effectiveness of our adaptive approach.
    Media matrix: a digital library research tool BIBAFull-Text 412
      Mark Kornbluh; Michael Fegan; Dean Rehberger
    Media Matrix (version 1.0)- an online, server side tool that helps users to find, segment, annotate, organize, and publish streaming media found in digital libraries on the Internet.
    Visual understanding environment BIBAFull-Text 413
      Anoop Kumar; Ranjani Saigal
    The Visual Understanding Environment (VUE) project at Tufts' Academic Technology department provides faculty and students with tools to successfully integrate digital resources into their teaching and learning. VUE provides a visual environment for structuring, presenting, and sharing digital information and an OKI-compliant software bridge for connecting to FEDORA-based digital repositories. Using VUE's concept mapping interface, faculty and students design customized semantic networks of digital resources drawing from digital libraries, local files and the Web. The resulting content maps can then be viewed and exchanged online.
    Schema mapper: a visualization tool for DL integration BIBAFull-Text 414
      Ananth Raghavan; Divya Rangarajan; Rao Shen; Marcos Andre Goncalves; Naga Srinivas Vemuri; Weiguo Fan; Edward A. Fox
    Schema mapping is a challenging problem. It has come to the fore in recent years; there are important applications like database schema integration and, more recently, digital library merging of heterogeneous data. Previous studies have approached the schema mapping process either from algorithmic or visualization perspectives, with few integrating both. With Schema Mapper we demonstrate a semi-automatic tool for schema integration that combines a novel visual interface with an algorithm-based recommendation engine. Schemas are visualized as hyperbolic trees (see Fig. 1), thus allowing more schema nodes to be displayed at one time. Matches to selections are recommended to the user, which makes the mapping operation easier and faster.
    Using concept maps as a cross-language resource discovery tool for large documents in digital libraries BIBAFull-Text 415
      Ryan Richardson; Edward A. Fox
    Project Gutenburg, the Million Book Project, the Networked Digital Library of Theses and Dissertations, Amazon's book search service, and the recently announced collaboration of Google and leading libraries, all aim to make available large numbers of book-length objects, in a variety of languages. Traditional approaches to discovering a suitable book for a particular purpose have generally relied on catalog records, sometimes enhanced with abstracts. Full-text searching -- popular, e.g., with legal and government documents -- and passage retrieval techniques, suitable for encyclopedias and reference works, have not been adequately tested with large collections of large objects.
    Terror tracker system: a web portal for terrorism research BIBKFull-Text 416
      Robert P. Schumaker; Hsinchun Chen; Tao Wang; Jerod Wilkerson
    Keywords: news retrieval, security, terrorism
    Mining and analyzing digital archive usage data to support collection development decisions BIBAFull-Text 417
      Jewel Ward; Johan Bollen; Jeffrey Pearson; Shing-Cheung Chan; Hui-Hsien Chi; Marie Chi; Kristine Guevara; Hsiao-han Huang; Genesan Kim; Maks Krivokon; Bo H. Lee; Pei-Han Li; Fenny Muliawan; Vu Nguyen; Barry W. Boehm; A. Winsor Brown; Edward Colbert; Alex Lam; Mayur Patel
    We demonstrate a "collection development decision support tool" that mines digital archive usage data. We want to better understand the University of Southern California (USC) Digital Archive's collection structure by analyzing the objects' characteristics, by analyzing the relationships between viewed objects, and by understanding usage trends over time. By relying on implicit patterns of usage data, such as co-retrievals, rather than explicit data, such as hit counts, we believe we can make more informed decisions about where to expend our resources.
    BioPortal: a case study in infectious disease informatics BIBAFull-Text 418
      Daniel Zeng; Hsinchun Chen; Chunju Tseng; Wei Chang; Millicent Eidson; Ivan Gotham; Cecil Lynch
    We present the BioPortal system, an integrated cross-jurisdictional data sharing and analysis environment to facilitate detection, prevention, and management of infectious disease outbreaks.


    Introduction to (teaching/learning about) digital libraries BIBAFull-Text 419
      Edward A. Fox; Marcos Andre Goncalves
    This tutorial provides a thorough and deep introduction to the DL field, building upon a firm theoretical foundation (starting with "5S": Streams, Structures, Spaces, Scenarios, Societies [1]), giving careful definitions and explanations of all the key parts of a "minimal digital library", and expanding from that basis to cover key DL issues, illustrated with a well-chosen set of case studies.
    Evaluating digital libraries BIBAFull-Text 420
      Thomas C. Reeves; Susan Buhr; Lecia Barker
    "So far, evaluation has not kept pace with efforts in digital libraries (or with digital libraries themselves), has not become part of their integral activity, and has not been even specified as to what it means, and how to do it." -- [1]Conducting a comprehensive evaluation of a digital library requires a "triangulation" approach including multiple models, procedures, and tools. Carrying out valid evaluations of digital libraries in a timely and efficient manner is the focus of this tutorial. Why is evaluation of digital libraries so important? Each year sees the introduction of new digital libraries promoted as valuable resources for education and other needs. Yet systematic evaluation of the implementation and efficacy of these digital library systems is often lacking. This tutorial is specifically designed to establish evaluation as a key strategy throughout the design, development, and implementation of digital libraries. The tutorial focuses on a decision-oriented model for evaluating digital libraries using multiple methods such as: service evaluation, usability evaluation, information retrieval, biometrics evaluation, transaction log analysis survey methods, interviews and focus groups, observations, and experimental methods. Participants in this tutorial will learn how to implement models and procedures for evaluating digital libraries at all levels of education. The tutorial includes presentations with actual case studies that are focused on a variety of digital library evaluation strategies. Participants will also receive a copy of Evaluating Digital Libraries: A User-Friendly Guide.
    Thesauri and ontologies in digital libraries BIBKFull-Text 421
      Dagobert Soergel
    Keywords: ontologies, taxonomies, user orientation
    Copyright transfer agreements and self-archiving BIBAFull-Text 422
      Anita S. Coleman; Cheryl Knott Malone
    Concerns about intellectual property rights are a significant barrier to the practice of scholarly self-archiving in institutional and other types of digital repositories. This introductory level, half-day tutorial will demystify the journal copyright transfer agreements (CTAs) that often are the source of these rights concerns of scholars. In addition, participants will be introduced to the deposit processes of self-archiving in an interdisciplinary repository and open access archive (OAA), such as DLIST, Digital Library for Information Science and Technology.
    Using standards in digital library design & development BIBAFull-Text 423
      Jeroen Bekaert; Xiaoming Liu; Herbert Van de Sompel
    This tutorial will cover a set of Standards and defacto Standards that can play a role in the design and development of Digital Library applications. The Standards that will be discussed are the ISO MPEG-21 Digital Item Declaration, the ISO MPEG-21 Digital Item Identification, the ISO MPEG-21 Digital Item Processing, the Open Archives Protocol for Metadata Harvesting, the Internet Archive ARC file format, the NISO OpenURL Framework for Context-Sensitive Services, and the proposed info URI scheme. The tutorial will discuss these Standards by illustrating how they have been used in the context of the aDORe Digital Object repository. aDORe [8] has been designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. Since aDORe is not a product, the tutorial is not a product advertisement. Rather, it is an opportunity for designers and developers to learn about Standards that can help addressing real-life challenges in DL design and development, and help increase interoperability across systems. The presenters are actively involved in all of the standardization efforts that are discussed.
    Building preservation environments BIBAFull-Text 424
      Reagan W. Moore; Richard Marciano
    The preservation of digital entities requires data management technologies that are provided by digital libraries and data grids. Digital libraries provide standard data organization and presentation mechanisms. Data grids provide support for infrastructure independence, the ability to incorporate new technology as it becomes available. Preservation environments integrate these technologies to assure the authenticity and integrity of digital entities. We will describe the concepts behind preservation and illustrate the concepts with three data preservation environments based on the NARA research prototype persistent archive, the NHPRC Persistent Archive Testbed, and the NSF NSDL persistent archive.
    Building digital library collections with greenstone BIBAFull-Text 425
      Ian H. Witten; David Bainbridge
    This tutorial will demonstrate how to build a variety of different kinds of digital library collections with the Greenstone digital library software, a comprehensive, open-source system for constructing, presenting, and maintaining information collections. Collections will be built from HTML documents; Word, PDF and PostScript documents; images in various formats; MP3 and MIDI audio; MARC records; and more. For each collection, various different full-text search indexes and metadata-based browsers will be created.
       Attendees who wish to are encouraged to bring their laptops, install Greenstone from a CD-ROM that we will provide, along with various sample files, and follow along with the demonstrations on their own machine.
    Practical digital library interoperability standards BIBAFull-Text 426
      David Bainbridge; Ian H. Witten
    As the field of digital libraries matures and new systems and standards develop, the ability to interoperate between systems becomes paramount. This tutorial gives a practical introduction to many recent standards and de facto standards for interoperability, and illustrates them using open source digital library software-including online demonstrations of interoperation issues and solutions. Core standards that are discussed include Dublin Core, OAI-PMH, METS, and MODS. We use interoperation between Greenstone and DSpace as a motivating case study.
       For those demonstrations that involve Greenstone, attendees who wish to may bring their laptops, install Greenstone from a CD-ROM that we will provide, along with various sample files, and follow along with the demonstrations on their own machine.


    Developing a digital library education program BIBKFull-Text 427
      Javed Mostafa; Kristine Brancolini; Linda C. Smith; William Mischo
    Keywords: IMLS, curriculum development, digital libraries education, institute of museum and library services, needs assessment
    International scientific data, standards, & digital libraries BIBAFull-Text 428
      Laura M. Bartolo; John R. Rumble
    This workshop explores the various models used successfully to develop internationals standards for languages and tools, as well as scientific & technical information for use of data on the emerging Semantic Web. The advantages and disadvantages of the models will be highlighted in a manner that allows emerging standards to benefit from existing experience.
    Studying digital library users in the wild: theories, methods, and analytical approaches BIBAFull-Text 429
      Michael Khoo; David Ribes
    As digital libraries continue the transition from research to operational status, understanding how they impact on the educational and learning practices of their users becomes an increasingly important objective for both library developers and evaluators. This workshop will examine the theoretical and methodological issues involved in the qualitative, naturalistic, and/or longitudinal study of the users of digital libraries. It will focus on the methodologies that can be used to capture the behaviors of digital library users, and the theoretical frameworks that can be used to analyze these behaviors, including ethnography, ethnomethodology, grounded theory, discourse analysis, scenarios, in-depth interviews and focus groups.
    Next generation knowledge organization systems: integration challenges and strategies BIBAFull-Text 430
      Gail Hodge; Linda Hill; Marcia Lei Zeng; Jian Qin; Douglas Tudhope
    This year's Networked Knowledge Organization Systems (NKOS) workshop built on seven years of workshops in the U.S. and Europe on issues regarding enabling networked knowledge organization systems (KOS), such as classification systems, thesauri, gazetteers, taxonomies, and ontologies, to support the description, retrieval, and use of diverse information resources. Now, many efforts are underway to research the issues and implement solutions to the challenges of networking and integrating KOS across somewhat isolated domains: indexing services and thesaurus builders; computer scientists and systems integrators; ontologists; taxonomists; and others. In many cases, requirements to solve these integration issues have become mission critical; the need to support computational, programmatic integration to handle masses of data from independent sources is pushing the research and development agenda. The need to move forward to meet these challenges while at the same time applying the best practices and "wisdom" developed through years of practical experience is acute.
       The JCDL-NKOS workshop for 2005 brought together researchers and implementers from diverse international communities who are developing new models, conducting research, and implementing practical solutions for networking KOS and integrating the associated information and data resources.