HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

JCDL'07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries

Fullname:ACM/IEEE Joint Conference on Digital Libraries
Note:Building and Sustaining the Digital Environment
Editors:Ray Larson; Edie Rasmussen; Shigeo Sugimoto; Elaine Toms
Location:Vancouver, BC, Canada
Dates:2007-Jun-18 to 2007-Jun-23
Publisher:ACM
Standard No:ISBN 978-1-59593-644-8; ACM Order Number 606072; ACM DL: Table of Contents hcibib: DL07
Papers:117
Pages:518
Links:Conference Home Page (expired)
  1. Visualization
  2. Digital curation and preservation
  3. Information extraction 1
  4. Panel 1
  5. Information extraction 2
  6. Social networks
  7. Systems
  8. Educational digital libraries
  9. Information retrieval and extraction 1
  10. Panel 2
  11. Information extraction 3
  12. Models and case studies
  13. Architecture and ontologies
  14. Music digital libraries
  15. User studies and user interfaces
  16. Information retrieval and extraction 2
  17. Large-scale collections
  18. Metadata
  19. Historical digital libraries
  20. Automatic classification
  21. Search behavior and personalization
  22. Posters
  23. Demos

Visualization

World explorer: visualizing aggregate data from unstructured text in geo-referenced collections BIBAFull-Text 1-10
  Shane Ahern; Mor Naaman; Rahul Nair; Jeannie Hui-I Yang
The availability of map interfaces and location-aware devices makes a growing amount of unstructured, geo-referenced information available on the Web. This type of information can be valuable not only for browsing, finding and making sense of individual items, but also in aggregate form to help understand data trends and features. In particular, over twenty million geo-referenced photos are now available on Flickr, a photo-sharing website -- the first major collection of its kind. These photos are often associated with user-entered unstructured text labels (i.e., tags). We show how we analyze the tags associated with the geo-referenced Flickr images to generate aggregate knowledge in the form of "representative tags" for arbitrary areas in the world. We use these tags to create a visualization tool, World Explorer, that can help expose the content of the data, using a map interface to display the derived tags and the original photo items. We perform a qualitative evaluation of World Explorer that outlines the visualization's benefits in browsing this type of content. We provide insights regarding the aggregate versus individual-item requirements in browsing digital geo-referenced material.
Categorization and analysis of text in computer mediated communication archives using visualization BIBAFull-Text 11-18
  Ahmed Abbasi; Hsinchun Chen
Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.
Delineating the citation impact of scientific discoveries BIBAFull-Text 19-28
  Chaomei Chen; Jian Zhang; Weizhong Zhu; Michael Vogeley
Identifying the significance of specific concepts in the diffusion of scientific knowledge is a challenging issue concerning many theoretical and practical areas. We introduce an innovative visual analytic approach to integrate microscopic and macroscopic perspectives of a rapidly growing scientific knowledge domain. Specifically, our approach focuses on statistically unexpected phrases extracted from unstructured text of titles and abstracts at the microscopic level in association with the magnitude and timeliness of their citation impact at the macroscopic level. The H-index, originally defined to measure individual scientists. productivity in terms of their citation profiles, is extended in two ways: 1) to papers and terms as a means of dividing these items into two groups so as to replace the less optimal threshold-based divisions, and 2) to take into account the timeliness of the impact of knowledge diffusion in terms of the timing of citations and publications so that attention is particularly drawn towards potentially significant and timely papers. The selected terms are connected to higher-level performance indicators, such as measures derived from the H-index, in the form of decision trees. A top-down traversal of such decision trees provides an intuitive walkthrough of concepts and phrases that may underline potentially significant but currently still latent scientific discoveries. Timeliness measures can also help to identify institutions that are at the forefront of a research field. We illustrate how widely accessible tools such as Google Earth can be utilized to disseminate such insights. The practical significance for digital libraries and fostering scientific discoveries is demonstrated through the astronomical literature related to the Sloan Digital Sky Survey (SDSS).

Digital curation and preservation

How to choose a digital preservation strategy: evaluating a preservation planning procedure BIBAFull-Text 29-38
  Stephan Strodl; Christoph Becker; Robert Neumayer; Andreas Rauber
An increasing number of institutions throughout the world face legal obligations or business needs to collect and preserve digital objects over several decades. A range of tools exists today to support the variety of preservation strategies such as migration or emulation. Yet, different preservation requirements across institutions and settings make the decision on which solution to implement very difficult.
   This paper presents the PLANETS Preservation Planning approach. It provides an approved way to make informed and accountable decisions on which solution to implement in order to optimally preserve digital objects for a given purpose. It is based on Utility Analysis to evaluate the performance of various solutions against well-defined requirements and goals. The viability of this approach is shown in a range of case studies for different settings. We present its application to two scenarios of web archives, two collections of electronic publications, and a collection of multimedia art. This work focuses on the different requirements and goals in the various preservation settings.
Factors affecting website reconstruction from the web infrastructure BIBAFull-Text 39-48
  Frank McCown; Norou Diawara; Michael L. Nelson
When a website is suddenly lost without a backup, it maybe reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We examine several characteristics of the websites over time including birth rate, decay and age of resources. We evaluate the reconstructions when compared to the crawled sites and develop a statistical model for predicting reconstruction success from the WI. On average, we were able to recover 61% of each website's resources. We found that Google's PageRank, number of hops and resource age were the three most significant factors in determining if a resource would be recovered from the WI.
Defining what digital curators do and what they need to know: the DigCCurr project BIBAFull-Text 49-50
  Christopher A. Lee; Helen R. Tibbo; John C. Schaefer
The DigCCurr (Digital Curation Curriculum) project is developing a graduate level curricular framework, course modules, and experiential components to prepare students for digital curation in various environments. This paper summarizes a draft and guiding principles behind a matrix of digital curation knowledge and competencies, which are serving as the basis for our curriculum design efforts.
Generating best-effort preservation metadata for web resources at time of dissemination BIBAFull-Text 51-52
  Joan A. Smith; Michael L. Nelson
HTTP and MIME, while sufficient for contemporary webpage access, do not provide enough forensic information to enable the long-term preservation of the resources they describe and transport. But what if the originating web server automatically provided preservation metadata encapsulated with the resource at time of dissemination? Perhaps the ingestion process could be streamlined, with additional forensic metadata available to future information archeologists. We have adapted an Apache web server implementation of OAI-PMH which can utilize third-party metadata analysis tools to provide a metadata-rich description of each resource. The resource and its forensic metadata are packaged together as a complex object, expressed in plain ASCII and XML. The result is a CRATE: a self-contained preservation-ready version of the resource, created at time of dissemination.

Information extraction 1

Document clustering using small world communities BIBAFull-Text 53-62
  Brant W. Chee; Bruce Schatz
Words in natural language documents exhibit a small world network structure. Thus the physics community provides us with an extensive supply of algorithms for extracting community structure. We present a novel method for semantically clustering a large collection of documents using small world communities. This method combines modified physics algorithms with traditional information retrieval techniques. A term network is generated from the document collection, the terms are clustered into small world communities, the semantic term clusters are used to generate overlapping document clusters. The algorithm combines the speed of single link with the quality of complete link. Clustering takes place in nearly real-time and the results are judged to be coherent by expert users. Our algorithm occupies a middle ground between speed and quality of document clustering.
Efficient summarization-aware search for online news articles BIBAFull-Text 63-72
  Wisam Dakka; Luis Gravano
News portals gather and organize news articles published daily on the Internet. Typically, news articles are clustered into 'events' and each cluster is displayed with a short description of its contents. A particularly interesting choice for describing the contents of a cluster is a machine-generated multi-document summary of the articles in the cluster. Such summaries are informative and help news readers to identify and explore only clusters of interest. Naturally, multi-document clusters and summaries are also valuable to help users navigate the results of keyword-search queries. Unfortunately, current document summarizers are still slow; as a result, search strategies that define document clusters and their multi-document summaries online, in a query-specific manner, are prohibitively expensive. In contrast, search strategies that only return offline, query-independent document clusters are efficient, but might return clusters whose (query-independent) summaries are of little relevance to the queries. In this paper, we present an efficient Hybrid search strategy to address the limitations of fully online and fully offline summarization-aware search approaches. Extensive experiments involving user relevance judgments and real news articles show that the quality of our Hybrid results is high, and that these results are computed in substantially less time than with the fully online strategy. We have implemented our strategy and made it available on the Newsblaster news summarization system, which crawls and summarizes news articles from a variety of web sources on a daily basis.
Integrating data and text mining processes for digital library applications BIBAFull-Text 73-79
  Robert Sanderson; Paul Watry
This paper explores the integration of text mining and data mining techniques, digital library systems, and computational and data grid technologies with the objective of developing an online classification service exemplar. We discuss the current research issues relating to the use of data mining algorithms and toolkits for textual data; the necessary changes within the Cheshire3 Information Framework to accommodate analysis workflows; the outcomes of a demonstrator based on the National Library of Medicine's Medline dataset; and the provision of comparable metrics for evaluation purposes. The prototype has resulted in extremely accurate online classification services and offers a novel method of supporting text mining and data mining within a highly scaled computational environment, integrated seamlessly into the digital library architecture.

Panel 1

The OAI-ORE effort: progress, challenges, synergies BIBAFull-Text 80
  Cliff Lynch; Savas Parastatidis; Neil Jacobs; Herbert Van de Sompel; Carl Lagoze
The panel will discuss various aspects of the ongoing Object Re-Use and Exchange (ORE) effort of the Open Archives Initiative (OAI). OAI-ORE is funded by the Andrew W. Mellon Foundation and is a result of the "Augmenting Interoperability across Scholarly Repositories" meeting that took place in April 2006 at the Mellon Foundation. A panel at JCDL 2006 reported on this meeting. The goal of OAI-ORE is to develop, identify, and profile extensible standards and protocols that allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories.

Information extraction 2

SlideSeer: a digital library of aligned document and presentation pairs BIBAFull-Text 81-90
  Min-Yen Kan
Research findings are often transmitted both as written documents and narrated slide presentations. As these two forms of media contain both unique and replicated information, it is useful to combine and align these two views to create a single synchronized medium. We introduce SlideSeer, a digital library that discovers, aligns and presents such presentation and document pairs. We discuss the three major system components of the SlideSeer DL: 1) the resource discovery, 2) the fine-grained alignment and 3) the user interface. For resource discovery, we have bootstrapped our collection building process using metadata from DBLP and CiteSeer. For alignment, we modify maximum similarity alignment to favor monotonic alignments and incorporate a classifier to handle slides which should not be aligned. For the user interface, we allow the user to seamlessly switch between four carefully motivated views of the resulting synchronized media pairs.
TableSeer: automatic table metadata extraction and searching in digital libraries BIBAFull-Text 91-100
  Ying Liu; Kun Bai; Prasenjit Mitra; C. Lee Giles
Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm -- TableRank. TableRank rates each (query,table) pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.
CiteSearch: next-generation citation analysis BIBAFull-Text 101-102
  Kiduk Yang; Lokman Meho
The coverage of citations in citation databases of today is disjoint and incomplete, which can result in conflicting quality assessment outcomes across different data sources. Fusion approach to quality assessment that employs a range of citation-based methods to analyze data from multiple sources is one way to address this limitation. The paper discusses a citation analysis pilot study that measured the impact of scholarly publications based on the data mined from Web of Science, Scopus, and Google Scholar.
Retrieval effectiveness of table of contents and subject headings BIBAFull-Text 103-104
  Youngok Choi; Ingrid Hsieh-Yee; Bill Kules
The effectiveness of two modes of subject representation -- table of contents (TOC) and subject headings -- in subject searching in an online public access catalog (OPAC) system was investigated. The retrieval difference between TOC and the Library of Congress subject headings (LCSH) was statistically significant; the effect of subject domain was not statistically significant; users had better success matching their keywords to TOC than to LCSH; but their keywords often failed to retrieve items similar to the target items. These findings underscore the need to bridge user keywords to both TOC and LCSH.
Mining a digital library for influential authors BIBAFull-Text 105-106
  David Mimno; Andrew McCallum
When browsing a digital library of research papers, it is natural to ask which authors are most influential in a particular topic. We present a probabilistic model that ranks authors based on their influence in particular areas of scientific research. This model combines several sources of information: citation information between documents as represented by PageRank scores, authorship data gathered through automatic information extraction, and the words in paper abstracts. We compare the performance of a topic model versus a smoothed language model by assessing the number of major award winners in the resulting ranked list of researchers.

Social networks

Can social bookmarking enhance search in the web? BIBAFull-Text 107-116
  Yusuke Yanbe; Adam Jatowt; Satoshi Nakamura; Katsumi Tanaka
Social bookmarking is an emerging type of a Web service that helps users share, classify, and discover interesting resources. In this paper, we explore the concept of an enhanced search, in which data from social bookmarking systems is exploited for enhancing search in the Web. We propose combining the widely used link-based ranking metric with the one derived using social bookmarking data. First, this increases the precision of a standard link-based search by incorporating popularity estimates from aggregated data of bookmarking users. Second, it provides an opportunity for extending the search capabilities of existing search engines. Individual contributions of bookmarking users as well as the general statistics of their activities are used here for a new kind of a complex search where contextual, temporal or sentiment-related information is used. We investigate the usefulness of social bookmarking systems for the purpose of enhancing Web search through a series of experiments done on datasets obtained from social bookmarking systems. Next, we show the prototype system that implements the proposed approach and we present some preliminary results.
Task-based interaction with an integrated multilingual, multimedia information system: a formative evaluation BIBAFull-Text 117-126
  Pengyi Zhang; Lynne Plettenberg; Judith L. Klavans; Douglas W. Oard; Dagobert Soergel
This paper describes a formative evaluation of an integrated multilingual, multimedia information system, a series of user studies designed to guide system development. The system includes automatic speech recognition for English, Chinese, and Arabic, automatic translation from Chinese and Arabic into English, and query-based and profile-based search options. The study design emphasizes repeated evaluation with the same (increasingly experienced) participants, exploration of alternative task designs, rich qualitative and quantitative data collection, and rapid analysis to provide the timely feedback needed to support iterative and responsive development. Results indicate that users presented with materials in a language that they do not know can generate remarkably useful work products, but that integration of transcription, translation, search and profile management poses challenges that would be less evident were each technology to be evaluated in isolation.
Modeling personal and social network context for event annotation in images BIBAFull-Text 127-134
  Bageshree Shevade; Hari Sundaram; Lexing Xie
This paper describes a framework to annotate images using personal and social network contexts. The problem is important as the correct context reduces the number of image annotation choices.. Social network context is useful as real-world activities of members of the social network are often correlated within a specific context. The correlation can serve as a powerful resource to effectively increase the ground truth available for annotation. There are three main contributions of this paper: (a) development of an event context framework and definition of quantitative measures for contextual correlations based on concept similarity in each facet of event context; (b) recommendation algorithms based on spreading activations that exploit personal context as well as social network context; (c) experiments on real-world, everyday images that verified both the existence of inter-user semantic disagreement and the improvement in annotation when incorporating both the user and social network context. We have conducted two user studies, and our quantitative and qualitative results indicate that context (both personal and social) facilitates effective image annotation.
Longitudinal study of changes in blogs BIBAFull-Text 135-136
  Paul Logasa, II Bogen; Luis Francisco-Revilla; Richard Furuta; Takeisha Hubbard; Unmil P. Karadkar; Frank Shipman
Web-based distributed collections often include links to documents that are expected to change frequently, such as blogs. The study reported here demonstrates that blog changes follow specific patterns. The results also illustrate the substantial role of standardized templates in blog pages. These results extend our earlier models that assess the significance of Web page change from a human perspective. These improved models will enable software systems to assist human collection managers in identifying unexpected changes and aberrant events.

Systems

SearchGen: a synthetic workload generator for scientific literature digital libraries and search engines BIBAFull-Text 137-146
  Huajing Li; Wang-Chien Lee; Anand Sivasubramaniam; Lee Giles
Due to the popularity of web applications and their heavy usage, it is important to obtain a good understanding of their workloads in order to improve performance of search services. Existing works have typically focused on generic web workloads without putting emphasis on specific domains. In this paper, we analyze the usage logs of CiteSeer, a scientific literature digital library and search engine, to characterize workloads for both robots and users. Essential ingredients that contribute to workloads are proposed. Among them we find the access intervals show high variance, and thus cannot be predicted well with time-series models. On the other hand, client visiting path and semantics can be well captured with probabilistic models and Zipf-law. Based on the findings, we propose SearchGen, a synthetic workload generator to output traces for scientific literature digital libraries and search engines. A comparison between synthetic workloads and actual logged traces suggests that the synthetic workload fits well.
A retrospective look at Greenstone: lessons from the first decade BIBAFull-Text 147-156
  Ian H. Witten; David Bainbridge
The Greenstone Digital Library Software has helped spread the practical impact of digital library technology throughout the world, with particular emphasis on developing countries. As Greenstone enters its second decade, this article takes a retrospective look at its development, the challenges that have been faced, and the lessons that have been learned in developing and deploying a comprehensive open-source system for the construction of digital libraries internationally. Not surprisingly, the most difficult challenges have been political, educational, and sociological, echoing that old programmers' blessing "may all your problems be technical ones.".
A unified platform for archival description and access BIBAFull-Text 157-166
  Christopher J. Prom; Christopher A. Rishel; Scott W. Schwartz; Kyle J. Fox
The archival community has developed content and data structure standards to facilitate access to the diverse and unique sets of archival records, personal papers, and manuscript collections that are held by archival repositories and special collections libraries. However, these standards are difficult for archivists to use and are often implemented in ways that negatively affect materials-handling workflows, depriving archival users of the best possible access to the totality of materials available within an individual repository. The authors propose that archival descriptive problems can be addressed by implementing a web/database application that is tailored specifically to archival needs and can be implemented with little technical knowledge. This paper describes the system architecture of one such tool, the Archon software package, which was developed at the University of Illinois at Urbana-Champaign. Archon automates many technical tasks, such as producing a searchable website, an EAD instance or a MARC record. Although the system utilizes sophisticated algorithms and optimizations, it is easily extensible because most development takes place in an easy-to-use, object-oriented environment.

Educational digital libraries

Children's interests and concerns when using the international children's digital library: a four-country case study BIBAFull-Text 167-176
  Allison Druin; Ann Weeks; Sheri Massey; Benjamin B. Bederson
This paper presents a case study of 12 children who used the International Children's Digital Library (ICDL) over four years and live in one of four countries: Germany, Honduras, New Zealand, and the United States. By conducting interviews, along with collecting drawings and book reviews, this study describes these children's interests in books, libraries, technology and the world around them. Findings from this study include: these young people increased the variety of books they read online; still valued their physical libraries as spaces for social interaction and reading; showed increased reading motivation; and showed interest in exploring different cultures.
Digital library education in computer science programs BIBAFull-Text 177-178
  Jeffrey Pomerantz; Sanghee Oh; Barbara M. Wildemuth; Seungwon Yang; Edward A. Fox
In an effort to identify the "state of the art" in digital library education in computer science (CS) programs, we analyzed CS courses on digital libraries and digital library-related topics. Fifteen courses that mention digital libraries in the title or short description were identified; of these, five are concerned with digital libraries as the primary topic of the course. The readings from these five courses were analyzed further, in terms of their authors and the journals in which they were published.
A study of how online learning resource are used BIBAFull-Text 179-180
  Mimi Recker; Sarah Giersch; Andrew Walker; Sam Halioris; Xin Mao; Bart Palmer
This paper defines a model of teacher practice ("teaching as design"), and describes a professional development curriculum in which K-12 teachers design learning activities using resources and tools from education digital libraries. It then presents preliminary findings from an application of this model in which teachers' artifacts are analyzed to learn how online learning resources are used in situ. Initial results suggest that learning resources of a smaller granularity are more likely to be adapted or improvised upon in teacher-designed learning activities, which further supports teachers' becoming contributors of online resources and active participants in an education cyberinfrastructure.
Standards or semantics for curriculum search? BIBAFull-Text 181-182
  Byron B. Marshall; René F. Reitsma; Martha N. Cyr
Aligning digital library resources with national and state educational standards to help K-12 teachers search for relevant curriculum is an important issue in the digital library community. Aligning standards from different states promises to help teachers in one state find appropriate materials created and cataloged elsewhere. Although such alignments provide a powerful means for crosswalking standards and curriculum across states, alignment matrices are intrinsically sparse. Hence, we hypothesize that such sparseness may cause significant numbers of false negatives when used for searching curriculum. Our preliminary results confirm the false negative hypothesis, demonstrate the usefulness of term-based techniques in addressing the false negative problem, and explore ways to combine term occurrence data with standards correlations.
Information behavior of small groups: implications for design of digital libraries BIBAFull-Text 183-184
  Nan Zhou; Gerry Stahl
We report findings of a study that investigates the information behavior of online small groups engaged in math problem solving and discuss the implications for designing digital libraries that can support learning of younger students and their broader information practices.

Information retrieval and extraction 1

Adaptive sorted neighborhood methods for efficient record linkage BIBAFull-Text 185-194
  Su Yan; Dongwon Lee; Min-Yen Kan; Lee C. Giles
Traditionally, record linkage algorithms have played an important role in maintaining digital libraries -- i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-line and are fixed during the execution. Since finding the ideal values of such parameters is not straightforward, or no such single ideal value even exists, the applicability of existing solutions to new scenarios or domains is greatly hampered. To remedy this problem, we argue that one can achieve significant improvement by adaptively and dynamically changing such parameters of record linkage algorithms. To validate our hypothesis, we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and demonstrate how we can achieve improved accuracy and performance by adaptively changing its fixed sliding window size. Our claim is analytically and empirically validated using both real and synthetic data sets of digital libraries and other domains.
Distributed web search efficiency by truncating results BIBAFull-Text 195-203
  Christopher T. Fallen; Gregory B. Newby
A large set of Web documents (the TREC GOV2 collection) comes from many separate Internet hosts, such as www.nih.gov and travel.state.gov. There is considerable variability in the number of Web pages (i.e., documents) from each host. In this paper, we present and evaluate a method for setting a maximum number of "hits" that may be presented for each web host. Federated search environments are increasingly common components of digital libraries and in these environments, the benefit of such a maximum is that it can reduce the number of possibly relevant documents presented by each subcollection, without hurting early precision measures such as P@20. Derivation of a maximum number, which is proportional to the subcollection size but not sensitive to different search topics, is made possible by an analysis of patterns of relevance judgment across approximately 17,000 web hosts in GOV2.
Adaptive graphical approach to entity resolution BIBAFull-Text 204-213
  Zhaoqi Chen; Dmitri V. Kalashnikov; Sharad Mehrotra
Entity resolution is a very common Information Quality (IQ) problem with many different applications. In digital libraries, it is related to problems of citation matching and author name disambiguation; in Natural Language Processing, it is related to coreference matching and object identity; in Web application, it is related to Web page disambiguation. The problem of Entity Resolution arises because objects/entities in real world datasets are often referred to by descriptions, which might not be unique identifiers of these entities, leading to ambiguity. The goal is to group all the entity descriptions that refer to the same real world entities. In this paper we present a graphical approach for entity resolution. It complements the traditional methodology with the analysis of the entity-relationship graph constructed for the dataset being analyzed. The paper demonstrates that a technique that measures the degree of interconnectedness between various pairs of nodes in the graph can significantly improve the quality of entity resolution. Furthermore, the paper presents an algorithm for making that technique self-adaptive to the underlying data, thus minimizing the required participation from the domain-analyst and potentially further improving the disambiguation quality.

Panel 2

Cyberinfrastructure for the humanities and social sciences: advancing the humanities research agenda BIBAFull-Text 214
  Joyce Ray; Clifford Lynch; Brett Bobley; Gregory Crane; Steven Wheatley
In 2006 the American Council of Learned Societies (ACLS) released Our Cultural Commonwealth, the final report of the Commission on Cyberinfrastructure for the Humanities and Social Sciences. The report, based on a study funded by the Mellon Foundation, explored how research environments might be created for the humanities and social sciences to complement those being developed to support scientific research. The report includes key recommendations addressed to universities, funding agencies, scholarly societies, academic libraries, publishers, Congress, state legislatures, and others. Implementation of the recommendations could potentially transform scholarship and exponentially increase access to resources and new scholarship in the humanities and social sciences. But the report has not been universally embraced. How will humanities scholarship be advanced by new technologies and research practices, and how will the academic community recognize new forms of scholarship? How will funding agencies respond to the challenges and issues raised? What does cyberinfrastructure mean for different domains within the humanities? These questions will be addressed by panelists and discussed by participants.

Information extraction 3

FLUX-CIM: flexible unsupervised extraction of citation metadata BIBAFull-Text 215-224
  Eli Cortez; Altigran S. da Silva; Marcos André Gonçalves; Filipe Mesquita; Edleno S. de Moura
In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.
Measuring conference quality by mining program committee characteristics BIBAFull-Text 225-234
  Ziming Zhuang; Ergin Elmacioglu; Dongwon Lee; C. Lee Giles
Bibliometrics are important measures for venue quality in digital libraries. Impacts of venues are usually the major consideration for subscription decision-making, and for ranking and recommending high-quality venues and documents. For digital libraries in the Computer Science literature domain, conferences play a major role as an important publication and dissemination outlet. However, with a recent profusion of conferences and rapidly expanding fields, it is increasingly challenging for researchers and librarians to assess the quality of conferences. We propose a set of novel heuristics to automatically discover prestigious (and low-quality) conferences by mining the characteristics of Program Committee members. We examine the proposed cues both in isolation and combination under a classification scheme. Evaluation on a collection of 2,979 conferences and 16,147 PC members shows that our heuristics, when combined, correctly classify about 92% of the conferences, with a low false positive rate of 0.035 and a recall of more than 73% for identifying reputable conferences. Furthermore, we demonstrate empirically that our heuristics can also effectively detect a set of low-quality conferences, with a false positive rate of merely 0.002. We also report our experience of detecting two previously unknown low-quality conferences. Finally, we apply the proposed techniques to the entire quality spectrum by ranking conferences in the collection.
Toward alternative measures for ranking venues: a case of database research community BIBAFull-Text 235-244
  Su Yan; Dongwon Lee
Ranking of publication venues is often closely related with important issues such as evaluating the contributions of individual scholars/research groups, or subscription decision making. The development of large-scale digital libraries and the availability of various meta data provide the possibility of building new measures more efficiently and accurately. In this work, we propose two novel measures for ranking the impacts of academic venues an easy-to-implement seed-based measure that does not use citation analysis, and a realistic browsing-based measure that takes an article reader's behavior into account. Both measures are computationally efficient yet mimic the results of the widely accepted Impact Factor. In particular, our proposal exploits the fact that: (1) in most disciplines, there are "top" venues that most people agree on; and (2) articles that appeared in good venues are more likely to be viewed by readers. Our proposed measures are extensively evaluated on a test case of the Database research community using two real bibliography data sets -- ACM and DBLP. Finally, ranks of venues by our proposed measures are compared against the Impact Factor using the Spearman's rank correlation coefficient, and their positive rank order relationship is proved with a statistical significance test.

Models and case studies

A model for inclusive design of digital libraries BIBAFull-Text 245-246
  Sambhavi Chandrashekar; Nadia Caidi
Digital libraries (DLs) must cater not only to the varied needs of its target users but also to their differing abilities, and to the adaptive technologies used by persons whose computing capabilities are restricted due to disabilities. This paper proposes a model for DL design that includes optimization of the usability of the search process and ensures accessibility of the content for users of DLs with disabilities.
Representing aggregate works in the digital library BIBAFull-Text 247-256
  George Buchanan; Jeremy Gow; Ann Blandford; Jon Rimmer; Claire Warwick
This paper studies the challenge of representing aggregate works such as encyclopedias, collected poems and journals in heterogenous digital library collections. Reflecting on the materials used by humanities academics, we demonstrate the varied range of aggregate types and the problems of faithfully representing this in the DL interface. Aggregates are complex and pervasive, challenge common assumptions and confuse boundaries within organisational structures. Existing DL systems can only provide imperfect representation of aggregates, and alterations to document encoding are insufficient to create a faithful reproduction of the physical library. The challenge is amplified through concrete examples, and solutions are demonstrated in a well-known DL system and related to standard DL architecture.
StoryBank: an indian village community digital library BIBAFull-Text 257-258
  Matt Jones; Will Harwood; George Buchanan; Mounia Lalmas
This paper considers information access styles for a community digital library in an Indian village. We present our impressions of the community gathered during a field-study and show how these have influenced the interaction design. The prototype aims to overcome low-textual literacy and lack of computing experience by combining touch-based interaction, engaging visual presentations and drawing on villagers' familiarity with radio listening.
The gray lady gets a new dress: a field study of the times news reader BIBAFull-Text 259-268
  Catherine C. Marshall
Increasingly individuals are turning to online sources for their daily news. Traditional newspapers have developed significant web presences to compete with newer services such as news aggregators and emerging genres such as blogs and other forms of citizen journalism. This paper reports the results of a field study to investigate the use of a new RSS-driven, template-based presentation mechanism that delivers a daily newspaper to subscribers' laptops and desktops; the Times News Reader hybridizes elements of print newspapers with aspects of online news. We explore how this application compares with print and web-based news reading and evaluate functionality developed to draw in readers from both audiences. Finally we examine three general technological implications drawn from current use: how the news reader may adapt to different styles of reading; how the news reader's functionality may be extended to highlight the timeliness of the content and to personalize the application; and how long-term use of the news reader can result in a personal news archive.

Architecture and ontologies

Drowning in data: digital library architecture to support scientific use of embedded sensor networks BIBAFull-Text 269-277
  Christine L. Borgman; Jillian C. Wallis; Matthew S. Mayernik; Alberto Pepe
New technologies for scientific research are producing a deluge of data that is overwhelming traditional tools for data capture, analysis, storage, and access. We report on a study of scientific practices associated with dynamic deployments of embedded sensor networks to identify requirements for data digital libraries. As part of continuing research on scientific data management, we interviewed 22 participants in 5 environmental science projects to identify data types and uses, stages in their data life cycle, and requirements for digital library architecture. We found that scientists need continuous access to their data from the time that field experiments are designed through final analysis and publication, thus reflecting a broader notion of "digital library." Six categories of requirements are discussed: the ability to obtain and maintain data in the field, verify data in the field, document data context for subsequent interpretation, integrate data from multiple sources, analyze data, and preserve data. Three digital library efforts currently underway within the Center for Embedded Networked Sensing are addressing these requirements, with the goal of a tightly coupled interoperable framework that, in turn, will be a component of cyberinfrastructure for science.
A practical ontology for the large-scale modeling of scholarly artifacts and their usage BIBAFull-Text 278-287
  Marko A. Rodriguez; Johan Bollen; Herbert Van de Sompel
The large-scale analysis of scholarly artifact usage is constrained primarily by current practices in usage data archiving, privacy issues concerned with the dissemination of usage data, and the lack of a practical ontology for modeling the usage domain. As a remedy to the third constraint, this article presents a scholarly ontology that was engineered to represent those classes for which large-scale bibliographic and usage data exists, supports usage research, and whose instantiation is scalable to the order of 50 million articles along with their associated artifacts (e.g. authors and journals) and an accompanying 1 billion usage events. The real world instantiation of the presented abstract ontology is a semantic network model of the scholarly community which lends the scholarly process to statistical analysis and computational support. We present the ontology, discuss its instantiation, and provide some example inference rules for calculating various scholarly artifact metrics.
A dynamic ontology for a dynamic reference work BIBAFull-Text 288-297
  Mathias Niepert; Cameron Buckner; Colin Allen
The successful deployment of digital technologies by humanities scholars presents computer scientists with a number of unique scientific and technological challenges. The task seems particularly daunting because issues in the humanities are presented in abstract language demanding the kind of subtle interpretation often thought to be beyond the scope of artificial intelligence, and humanities scholars themselves often disagree about the structure of their disciplines. The future of humanities computing depends on having tools for automatically discovering complex semantic relationships among different parts of a corpus. Digital library tools for the humanities will need to be capable of dynamically tracking the introduction of new ideas and interpretations and applying them to older texts in ways that support the needs of scholars and students.
   This paper describes the design of new algorithms and the adjustment of existing algorithms to support the automated and semi-automated management of domain-rich metadata for an established digital humanities project, the Stanford Encyclopedia of Philosophy. Our approach starts with a "hand-built" formal ontology that is modified and extended by a combination of automated and semi-automated methods, thus becoming a "dynamic ontology". We assess the suitability of current information retrieval and information extraction methods for the task of automatically maintaining the ontology. We describe a novel measure of term-relatedness that appears to be particularly helpful for predicting hierarchical relationships in the ontology. We believe that our project makes a further contribution to information science by being the first to harness the collaboration inherent in a expert-maintained dynamic reference work to the task of maintaining and verifying a formal ontology. We place special emphasis on the task of bringing domain expertise to bear on all phases of the development and deployment of the system, from the initial design of the software and ontology to its dynamic use in a fully operational digital reference work.

Music digital libraries

Preparing resource discovery for digitized music: an analysis of an australian application BIBAFull-Text 298-302
  Jennifer A. Thomas; Michael R. Middleton; Margaret Warren
This paper examines procedures for the creation and delivery of digital music that are being undertaken by contributors to the National Library of Australia's federated music gateway MusicAustralia. The case study discusses access to and preservation of digital material as key drivers of the digitization movement, and compares projects being undertaken worldwide. Also analyzed are the underlying digitization principles and standards, and metadata schemas for the description and exchange of digital objects which facilitate record exchange and improve audience reach.
   The paper provides an overview of some individual contributing institutions, however particular focus is placed on the State Library of Queensland's (SLQ) approach to preparing its unique Queensland music collection for digital resource discovery in MusicAustralia. A detailed analysis of SLQ's strategy is presented, including its risk management approach to copyright implications, and consideration of infrastructure issues affecting the creation, preservation and online delivery of its digital music objects.
   Whilst SLQ's current digital music collection is relatively small, it has become core business of SLQ's Arts and Humanities branch, and the collection will expand with the continued incorporation of music material unique to Queensland into the collection. SLQ has developed a sound foundation for digitization based on widely endorsed principles and standards which should allow this to effectively occur.
Goal-directed evaluation for the improvement of optical music recognition on early music prints BIBAFull-Text 303-304
  Laurent Pugin; John Ashley Burgoyne; Ichiro Fujinaga
Optical music recognition (OMR) systems are promising tools for the creation of searchable digital music libraries. Using an adaptive OMR system for early music prints based on hidden Markov models, we leverage an edit distance evaluation metric to improve recognition accuracy. Baseline results are computed with new labeled training and test sets drawn from a diverse group of prints. We present two experiments based on this evaluation technique. The first resulted in a significant improvement to the feature extraction function for these images. The second is a goal-directed comparison of several popular adaptive binarization algorithms, which are often evaluated only subjectively. Accuracy increased by as much as 55% for some pages, and the experiments suggest several avenues for further research.
Annotation functionality for digital libraries supporting collaborative performance: an example of musical scores BIBAFull-Text 305-306
  Megan Winget
This paper describes the findings of an ethnographic study that examined the annotation behaviors of musicians working with musical scores for the purpose of performance. Annotation was found to be an important part of the rehearsal process, and specific annotation functionalities are recommended for future digital library development.
Toward an understanding of similarity judgments for music digital library evaluation BIBAFull-Text 307-308
  J. Stephen Downie; Jin Ha Lee; Anatoliy A. Gruzd; M. Cameron Jones
This paper presents an analysis of 7,602 similarity judgments collected for the Symbolic Melodic Similarity (SMS) and Audio Music Similarity and Retrieval (AMS) evaluation tasks in the 2006 Music Information Retrieval Evaluation eXchange (MIREX). We discuss the influence of task definitions, as well as evaluation metrics on user perceptions of music similarity, and provide recommendations for future Music Digital Library/Music Information Retrieval research pertaining to music similarity.

User studies and user interfaces

Agreeing to disagree: search engines and their public interfaces BIBAFull-Text 309-318
  Frank McCown; Michael L. Nelson
Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.
Static reformulation: a user study of static hypertext for query-based reformulation BIBAFull-Text 319-328
  Michael Huggett; Joel Lanir
Hypertext allows users to navigate between related materials in digital libraries. The most fundamental automated hypertexts are those constructed on the basis of semantic similarity. Such hypertexts have been evaluated by a variety of means, but seldom by real users given simulated real-world tasks. We claim that while other methods exist, one of the best ways to prove the usefulness of hypertext is to show the benefits for users performing realistic tasks. We compare the reformulation of queries that users perform in keyword searching, to the query reformulation implicit in browsing between documents linked by similarity of content. We find that a static automatically-constructed similarity hypertext provides useful linking between related items, improving the retrieval of targets when used to augment standard keyword search.
A rich OPAC user interface with AJAX BIBAFull-Text 329-330
  Jesse Prabawa Gozali; Min-Yen Kan
Open Public Access Catalogs (OPACs) provide patrons with a user interface (UI) to help their information seeking tasks. Even though many OPAC UIs are now web-based, their architectures are often static, which does not allow them to integrate user assistance modules dynamically. We report on a UI that supports integration of such modules, while providing a usable and rich environment. We explore how Asynchronous JavaScript + XML (AJAX) can be employed to create an OPAC UI that offers a better user experience and task support. Our developed UI features a modular architecture that combines several Natural Language Processing (NLP) modules employed to enhance information seeking. Our UI manages queries in a novel way with a tabbed interface featuring an overview/details presentation model, and an AJAX query results data grid. Preliminary user testing results are also presented.
Constructing digital library interfaces BIBAFull-Text 331-332
  David M. Nichols; David Bainbridge; Michael B. Twidale
The software technologies used to create web interfaces for digital libraries are discussed using examples from Greenstone 3.

Information retrieval and extraction 2

Retrieval in text collections with historic spelling using linguistic and spelling variants BIBAFull-Text 333-341
  Andrea Ernst-Gerlach; Norbert Fuhr
We present a new approach for the retrieval of texts with non-standard spelling, which is important for historic texts e.g. in English or German. In this paper, we describe the overall architecture of our system, followed by its evaluation. Given a search term as lemma, we use a dictionary of contemporary German for finding all inflected and derived forms of the lemma. Then we apply transformation rules (derived from training data) for generating historic spelling variants. For the evaluation, we regard the resulting retrieval quality. The experimental results show that we can improve the retrieval quality for historic collections substantially.
Efficient topic-based unsupervised name disambiguation BIBAFull-Text 342-351
  Yang Song; Jian Huang; Isaac G. Councill; Jia Li; C. Lee Giles
Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective two-stage approach to disambiguate names. In the first stage, two novel topic-based models are proposed by extending two hierarchical Bayesian text models, namely Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. After learning an initial model, the topic distributions are treated as feature sets and names are disambiguated by leveraging a hierarchical agglomerative clustering method. Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering and could be extended to other research fields. We empirically addressed the issue of scalability by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.
Using bilingual ETD collections to mine phrase translations BIBAFull-Text 352-353
  Ryan Richardson; Edward A. Fox
Phrase translation lists can enhance cross-language information retrieval. However, finding translations for technical phrases is difficult. Bilingual dictionaries have limited coverage for specialized fields, and even more limited coverage of technical phrases. Since phrases can have very specific meanings in technical fields, this limits the quality of translations produced by generic machine translation systems. We hypothesize that digital libraries of electronic theses and dissertations (ETDs) are a good source of technical phrase translations. We have acquired a collection of 3,086 Spanish ETDs about computer science from Scirus, the Universidad Nacional Autonoma de Mexico (Mexico City), and Universidad de las Americas (Puebla). By using English ETDs from NDLTD, we have a comparable corpus of computing-related documents from which to mine phrase translations. We describe our method and its formative evaluation.
Evaluation of kernel-based link analysis measures on research paper recommendation BIBAFull-Text 354-355
  Masashi Shimbo; Takahiko Ito; Yuji Matsumoto
We compare various kernel-based link analysis measures on graph nodes to evaluate their utility as a research paper recommendation system. The compared measures include the Kandola et al.'s von Neumann kernel, its extension that takes communities into account, and Smola and Kondor's regularized Laplacian. Chebotarev and Shamis' matrix forest-based algorithm, Kleinberg's HITS authority ranking, and classic co-citation coupling are also evaluated. The experimental result shows that kernel-based methods outperform HITS and co-citation coupling, with the community-based von Neumann kernel achieving the highest score.

Large-scale collections

A new generation of textual corpora: mining corpora from very large collections BIBAFull-Text 356-365
  Gordon Stewart; Gregory Crane; Alison Babeu
While digital libraries based on page images and automatically generated text have made possible massive projects such as the Million Book Library, Open Content Alliance, Google, and others, humanists still depend upon textual corpora expensively produced with labor-intensive methods such as double-keyboarding and manual correction. This paper reports the results from an analysis of OCR-generated text for classical Greek source texts. Classicists have depended upon specialized manual keyboarding that costs two or more times as much as keyboarding of English both for accuracy and because classical Greek OCR produced no usable results. We found that we could produce texts by OCR that, in some cases, approached the 99.95% professional data entry accuracy rate. In most cases, OCR-generated text yielded results that, by including the variant readings that digital corpora traditionally have left out, provide better recall and, we argue, can better serve many scholarly needs than the expensive corpora upon which classicists have relied for a generation. As digital collections expand, we will be able to collate multiple editions against each other, identify quotations of primary sources, and provide a new generation of services.
Subject metadata enrichment using statistical topic models BIBAFull-Text 366-375
  David Newman; Kat Hagedorn; Chaitanya Chemudugunta; Padhraic Smyth
Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.
Organizing the OCA: learning faceted subjects from a library of digital books BIBAFull-Text 376-385
  David Mimno; Andrew McCallum
Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent "topics" that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model is simultaneously better able to represent observed properties of text and more scalable to extremely large text collections. We train individual topic models for each book based on the cooccurrence of words within pages. We then cluster topics across books. The resulting topical clusters can be interpreted as subject facets, allowing readers to browse the topics of a collection quickly, find relevant books using topically expanded keyword searches, and explore topical relationships between books. We demonstrate this method finding topics on a corpus of 1.49 billion words from 42,000 books in less than 20 hours, and it easily could scale well beyond this.

Metadata

Trends in metadata practices: a longitudinal study of collection federation BIBAFull-Text 386-395
  Carole L. Palmer; Oksana L. Zavalina; Megan Mustafoff
With the increasing focus on interoperability for distributed digital content, resource developers need to take into consideration how they will contribute to large federated collections, potentially at the national and international level. At the same time, their primary objectives are usually to meet the needs of their own institutions and user communities. This tension between local practices and needs and the more global potential of digital collections has been an object of study for the IMLS Digital Collections and Content (IMLS DCC) project. Our practical aim has been to provide integrated access to over 160 IMLS-funded digital collections through a centralized collection registry and metadata repository. During the course of development, the research team has investigated how collections and items can best be represented to meet the needs of local resource developers and aggregators of distributed content, as well as the diverse user communities they may serve. This paper presents results from a longitudinal analysis of IMLS DCC development trends between 2003 and 2006. Changes in metadata applications have not been pronounced. However, multi-scheme use has become less common, and use of Dublin Core remains high, even as recognition of its limitations grows. Locally developed schemes are used as much as MARC, and may be on the increase as new collections are incorporating less traditional library and museum materials, and more interactive and multimedia content. Based on our empirical understanding of metadata use in practice, patterns in new content development, and user community indicators, our research has turned toward identifying metadata relationships between items and collections to preserve context and enhance functionality and usefulness for scholarly user communities.
Induced tagging: promoting resource discovery and recommendation in digital libraries BIBAFull-Text 396-397
  J. Alfredo Sánchez; Adriana Arzamendi-Pétriz; Omar Valdiviezo
We introduce the notion of "induced tagging" in the context of learning communities that are supported by digital libraries. We also describe an environment aimed to foster discovery and recommendation of digital library resources based on induced tagging.
Standards alignment for metadata assignment BIBAFull-Text 398-399
  Anne R. Diekema; Ozgur Yilmazel; Jennifer Bailey; Sarah C. Harwell; Elizabeth D. Liddy
The research in this paper describes a Machine Learning technique called hierarchical text categorization which is used to solve the problem of finding equivalents from among different state and national education standards. The approach is based on a set of manually aligned standards and utilizes the hierarchical structure present in the standards to achieve a more accurate result. Details of this approach and its evaluation are presented.
Identifying personal photo digital library features BIBAFull-Text 400-401
  Sally Jo Cunningham; Masood Masoodian
At present, little evidence is available about how people want to interact with their photos in a personal photo digital library. Analysis of a set of 22 user needs summaries and critiques of existing photo management systems provides insight into potentially useful features.

Historical digital libraries

Locating thematic pinpoints in narrative texts with short phrases: a test study on Don Quixote BIBAFull-Text 402-410
  Jie Deng; Richard Furuta; Eduardo Urbina
Traditional implementations provide only limited assistance for locating the information in narrative texts relevant to a certain point of interest. We are investigating providing a "reading wheel" for such purposes. The first step of the bigger picture, as inspired by the editorial compilation of a textbook's index, is an attempt to locate thematically coherent sentences to a given short phrase. In this paper, we propose a two-step methodology to increase the search performance and examine its effectiveness in a test study. We describe the experimental setup and report on the quantitative evaluation of the techniques involved.
Digital Donne: workflow, editing tools, and the reader.s interface of a collection of 17th-century english poetry BIBAFull-Text 411-412
  Carlos Monroy; Richard Furuta; Gary Stringer
We describe a multidisciplinary effort in the creation of an electronic repository of poems of John Donne -- the renowned 17th-century English poet. We discuss the workflow we have adopted and the Web-based tools we have developed for maintaining a collection of transcriptions and images, a concordance of poems, a list of press variants, and a browsing interface that enables readers to access these materials. A complement to the multi-volume Variorum Edition of the Poetry of John Donne, this endeavor shows how a traditional scholarly edition can be enhanced by resources made available by computers and the Internet.
A multilingual approach to technical manuscripts: 16th and 17th-century Portuguese shipbuilding treatises BIBAFull-Text 413-414
  Carlos Monroy; Richard Furuta; Filipe Castro
Shipbuilding treatises are technical manuscripts written in a variety of languages and spanning several centuries that describe the construction of ships. Given their technical content, understanding terms, concepts, and construction sequences is a challenging task. In this paper we describe a scalable approach and a multilingual web-based interface for enabling a group of scholars to edit a glossary of nautical terms in multiple languages.
First class objects and indexes for chant manuscripts BIBAFull-Text 415-416
  Louis W. G. Barton; Peter G. Jeavons; John A. Caldwell; Koon Shan Barry Ng
We discuss a crucial part of infrastructure for the Web-delivery of medieval chant resources. Although widely accepted by software professionals, the distributed-content model is sharply opposed by some chant scholars. We advocate for a paradigm of the Web as a massive database where each "first class object" acts like a record; metadata about, and links to such objects are compiled in virtual libraries. Scholarly-edited indexes determine which objects are in libraries, and unreliable content is excluded. Special metadata ontologies can be defined without modifying the primary content.
Recommending related papers based on digital library access records BIBAFull-Text 417-418
  Stefan Pohl; Filip Radlinski; Thorsten Joachims
An important goal for digital libraries is to enable researchers to more easily explore related work. While citation data is often used as an indicator of relatedness, in this paper we demonstrate that digital access records (e.g. http-server logs) can be used as indicators as well. In particular, we show that measures based on co-access provide better coverage than co-citation, that they are available much sooner, and that they are more accurate for recent papers.

Automatic classification

Automatic patent classification using citation network information: an experimental study in nanotechnology BIBAFull-Text 419-427
  Xin Li; Hsinchun Chen; Zhu Zhang; Jiexun Li
Classifying and organizing documents in repositories is an active research topic in digital library studies. Manually classifying the large volume of patents and patent applications managed by patent offices is a labor-intensive task. Many previous studies have employed patent contents for patent classification with the aim of automating this process. In this research we propose to use patent citation information, especially the citation network structure information, to address the patent classification problem. We adopt a kernel-based approach and design kernel functions to capture content information and various citation-related information in patents. These kernels. performances are evaluated on a testbed of patents related to nanotechnology. Evaluation results show that our proposed labeled citation graph kernel, which utilized citation network structures, significantly outperforms the kernels that use no citation information or only direct citation information.
Collaborative classifier agents: studying the impact of learning in distributed document classification BIBAFull-Text 428-437
  Weimao Ke; Javed Mostafa; Yueyu Fu
We developed a multi-agent framework where agents had limited/distributed knowledge for document classification and collaborated with each other to overcome the knowledge distribution. Each agent was equipped with a certain learning algorithm for predicting potential collaborators, or helping agents. We conducted experimental research on a standard news corpus to examine the impact of two learning algorithms: Pursuit Learning and Nearest Centroid Learning. For a fundamental retrieval operation, namely classification, both algorithms achieved competitive classification effectiveness and efficiency. Subsequently, the impact of the learning exploration rate and the maximum collaboration range on classification effectiveness and efficiency were examined. Close investigation of agent learning dynamics revealed increasing and stabilizing patterns that were enhanced by the learning algorithms.
UpdateNews: a news clustering and summarization system using efficient text processing BIBAFull-Text 438-439
  Takaharu Takeda; Atsuhiro Takasu
This paper proposes a news articles clustering and summarization system. It provides integrated access to news articles from various news sites. The system consists of a crawler, topic detector, and summarizer. This paper describes its efficient summarization technique to handle large amounts of crawled news articles.
Automatic syllabus classification BIBAFull-Text 440-441
  Xiaoyan Yu; Manas Tungare; Weiguo Fan; Manuel Perez-Quinones; Edward A. Fox; William Cameron; GuoFang Teng; Lillian Cassel
Syllabi are important educational resources. However, searching for a syllabus on the Web using a generic search engine is an error-prone process and often yields too many non-relevant links. In this paper, we present a syllabus classifier to filter noise out from search results. We discuss various steps in the classification process, including class definition, training data preparation, feature selection, and classifier building using SVM and Naive Bayes. Empirical results indicate that the best version of our method achieves a high classification accuracy, i.e., an F value of 83% on average.

Search behavior and personalization

Effects of structure and interaction style on distinct search tasks BIBAFull-Text 442-451
  Robert Capra; Gary Marchionini; Jung Sun Oh; Fred Stutzman; Yan Zhang
In this paper we present the results of a study that investigates the relationships between search tasks, information architecture, and interaction style. Three kinds of search tasks (simple lookup, complex lookup and exploratory) were performed using three different user interfaces (standard web site, hierarchical text-based faceted interface, and dynamic query faceted interface) for a large-scale public corpus containing semi-structured statistical data and reports. Twenty-eight people conducted the three kinds of searches in a between-subjects study and twelve others conducted the three kinds of searches on all three systems in a within-subjects study. Quantitative results demonstrate that the alternative general-purpose user interfaces that accept automated structuring of data offer comparable effectiveness, efficiency, and aesthetics to manually constructed architectures. Qualitative results demonstrate the manual architectures are favored.
Towards automatic conceptual personalization tools BIBAFull-Text 452-461
  Faisal Ahmad; Sebastian de la Chica; Kirsten Butcher; Tamara Sumner; James H. Martin
This paper describes the results of a study designed to validate the use of domain competency models to diagnose student scientific misconceptions and to generate personalized instruction plans using digital libraries. Digital library resources provided the content base for human experts to construct a domain competency model for earthquakes and plate tectonics encoded as a knowledge map. The experts then assessed student essays using comparisons against the constructed domain competency model and prepared personalized instruction plans using the competency model and digital library resources. The results from this study indicate that domain competency models generated from select digital library resources may provide the desired degree of content coverage to support both automated diagnosis and personalized instruction in the context of nationally-recognized science learning goals. These findings serve to inform the design of personalized instruction tools for digital libraries.
Mobile G-Portal supporting collaborative sharing and learning in geography fieldwork: an empirical study BIBAFull-Text 462-471
  Yin-Leng Theng; Kuah-Li Tan; Ee-Peng Lim; Jun Zhang; Dion Hoe-Lian Goh; Kalyani Chatterjea; Chew Hung Chang; Aixin Sun; Han Yu; Nam Hai Dang; Yuanyuan Li; Minh Chanh Vo
Integrated with G-Portal, a Web-based geospatial digital library of geography resources, this paper describes the implementation of Mobile G-Portal, a group of mobile devices as learning assistant tools supporting collaborative sharing and learning for geography fieldwork. Based on a modified Technology Acceptance Model and a Task-Technology Fit model, an initial study with Mobile G-Portal was conducted involving 39 students in a local secondary school. The findings suggested positive indication of acceptance of Mobile G-Portal for geography fieldwork. The paper concludes with a discussion on technological challenges, recommendations for refinement of Mobile G-Portal, and design implications in general for digital libraries and personal digital assistants supporting mobile learning.

Posters

Highly structured scientific publications BIBAFull-Text 472
  Robert B. Allen
Science is a complex, but highly structured, activity. We propose that reports about science would benefit by reflecting that structure. We provide an example based on the research paradigm and we explore more complex examples in which workflow models describe the conceptual model, the research procedure, the data analysis, and the conclusions.
Cooperative collection building in NSDL MatDL pathway through IVIa data fountains BIBAFull-Text 473
  Laura M. Bartolo; Cathy S. Lowe; Johannes Ruscheinski; Diane Bisom
This poster describes a collaboration involving two NSDL projects: the Materials Digital Library Pathway (MatDL) and the iVia Data Fountains Project. MatDL is testing and providing feedback for refinement of the iVia tools while streamlining its metadata assignment process.
MESUR: usage-based metrics of scholarly impact BIBKFull-Text 474
  Johan Bollen; Marko A. Rodriguez; Herbert Van de Sompel
Keywords: digital libraries, impact factor, scholarly evaluation, semantic networks, usage data
A publisher of last resort: enduring document access BIBAFull-Text 475
  George Buchanan
Ensuring long-term access to valuable online content is complicated by legal constraints and practical difficulties. We introduce a new technique for ensuring the long-term availability of digital content on the internet. The technique combines legal and technical measures to guarantee that a document remains available when its original goes offline, either permanently or long-term: a "publisher of last resort".
Educational application integration with digital repository BIBAFull-Text 476
  Robert Chavez; Anoop Kumar; Nikolai Schwertner
The value of a digital repository increases tremendously when applications use the content in innovative ways. Tufts University has developed its repository based on the Fedora framework using the principles of service oriented architecture. The repository features innovative content models allowing the digital objects within the Tufts Digital Repository to be accessible through a variety of applications, including Perseus, Artifact, Tufts Digital Library (TDL) and Visual Understanding Environment (VUE). The poster will present the underlying architecture including latest services and their use in Educational Applications.
Blogger perceptions on digital preservation BIBAFull-Text 477
  Carolyn Hank; Songphan Choemprayong; Laura Sheble
Blogs have emerged as valuable records of current social and political events. In response, calls in the literature have advocated that these new vehicles of communication and information dissemination are valuable additions to the human record worthy of stewardship [1,2,3]. The intent of this research is to study the requirements and feasibility of impacting stewardship of blogs at the level of creation. This will be accomplished by surveying blogger perceptions on digital preservation. Expected outcomes of this study include the development of a framework for constructing a digital preservation program for blogs.
   A survey will be administered to bloggers to assess perceptions of digital preservation issues as related to their own blogging activities and the blogosphere in general. The instrument is organized into five categories: demographics, awareness, appraisal, impact, and investment. Participants will be recruited through established contacts in the blogging community, with the intent of a resulting snowball effect for gathering additional participation.
   The demographics section collects basic characteristics of respondents, characteristics of their blogs (e.g., topic areas, platforms, linkages, content types, permissions for reuse), and their blogging practices (e.g., motivations, frequency of updates). The awareness section surveys current preservation-related activities performed by bloggers such as measures taken to ensure duplication of blog content; and whether, why, and how bloggers engage in practices that result in post-publication content changes. The appraisal section assesses perceptions of issues related to persistent storage and access. Respondents are asked to evaluate the importance of researcher-supplied blog characteristics that could be used to appraise blogs and their components. These characteristics include social and cultural factors such as perceived blog popularity, social linkages, and artifactual significance as well as structural components and content types. In addition to seeking clarification of the types and components of blogs that are perceived to be important with respect to preservation, the appraisal section addresses issues related to content ownership. The impact section focuses on the perceived importance of blogs to authors, preserving access to blogs, and blogs as a part of the human record. In the investment section, respondents are asked to quantify resources that they would be willing to expend to preserve their own blogs and their willingness to extend these expenditures to the blogs of others.
   Data collection will begin April 2007 and continue for one month. Following closure of the survey, data will be analyzed using descriptive statistics and qualitative evaluation methods. An initial assessment report will include a summary analysis of results and initial calls for recommendations. Future works include further development of these recommendations, development of benchmarks for planning ingest of blogs into a repository system, and the design and pilot testing of a user interface for deposit, storage, and access.
   This research is intended to promote digital preservation activities for continued access to blogs and to raise awareness of digital preservation issues among a population of users removed from the walls of academia and research. Bloggers constitute a significant producer type in that they have produced culturally and socially significant works, including those that contribute to wider public discourse. Furthermore, bloggers have the potential to become significant contributors to the dissemination of preservation awareness because they are vital actors in networks of communities that often span the borders of institutional, commercial, grassroots and personal communications.
Evolution of a data archive BIBKFull-Text 478
  Jonathan D. Crabtree; David Sheaves
Keywords: alliances, digital archives, federation, social science data
Examining perception of digital information space BIBAFull-Text 479
  John A. D'Ignazio; Joseph D. Ryan; Sarah C. Harwell; Anne R. Diekema; Elizabeth D. Liddy
A study using a modified think aloud protocol of University of Rochester undergraduate students' interactions with a general, humanities scholarly database helped a research team gain insight into their information-seeking behavior and thus the impact of the digital library.
Tagging video: conventions and strategies of the YouTube community BIBAFull-Text 480
  Gary Geisler; Sam Burns
This poster summarizes the results from a quantitative analysis of the tags and associated metadata used to describe more than one million videos by 537,246 contributors at the YouTube video sharing site. Results from this work suggest methodological and design considerations that could enhance the effectiveness of sharing within communities devoted to online video.
DRIADE: a data repository for evolutionary biology BIBAFull-Text 481
  Jed Dube; Sarah Carrier; Jane Greenberg
NESCent (The National Evolutionary Synthesis Center) is developing DRIADE (Digital Repository of Information and Data for Evolution) to address synthetic research challenges fundamental to advancing the field of evolutionary biology. This poster highlights results from a survey of selected repositories' functionalities, DRIADE's functional requirements, and DRIADE's functional model. We also summarize ongoing research activities, studying evolutionary biologists' data preservation practices and use requirements.
AlouetteCanada metadata toolkit BIBAKFull-Text 482
  Mark Jordan
This poster provides an overview of the AlouetteCanada Metadata Toolkit.
Keywords: application profiles, digitization, metadata, standards, tools
Building a digital library of traditional mongolian historical documents BIBAFull-Text 483
  Garmaabazar Khaltarkhuu; Akira Maeda
This paper describes technique of converting modern Mongolian text input to traditional Mongolian script and integrating the result into the Greenstone Digital Library (GSDL). This work is part of on-going research to create a digital library of traditional Mongolian historical documents.
Evaluating digital libraries with webmetrics BIBAFull-Text 484
  Michael Khoo; Robert A. Donahue
We report preliminary lessons from a year of webmetrics research with two digital libraries. Despite the apparent 'plug-and-play-and-report' nature of webmetrics tools, much work was required to extract useful data from the tools used.
Tagging for health information organisation and retrieval BIBKFull-Text 485
  Margaret E. I. Kipp
Keywords: health information, social bookmarking, tagging
Augmenting OAI-PMH repository holdings using search engine APIs BIBAFull-Text 486
  Martin Klein; Michael L. Nelson; Juliet Z. Pao
In this poster, we give the preliminary results of our project to acquire Atmospheric Science Data Center (ASDC) project-related web resources, not with focused crawling, but by using the search engine (SE) APIs directly. We aggregate the results and create archive-ready complex objects.
The cyberinfrastructure for scholars project: componentized architecture for sustainable scholarly portals BIBKFull-Text 487
  Aaron Krowne; Stacey Martin; Urvashi Gadi; Micah Wedemeyer; Martin Halbert
Keywords: OAI, OXF, componentized architecture, web services
Social bookmarking in digital library systems: framework and case study BIBFull-Text 488
  Fiftarina Puspitasari; Ee-Peng Lim; Dion Hoe-Lian Goh; Chew-Hung Chang; Jun Zhang; Aixin Sun; Yin-Leng Theng; Kalyani Chatterjea; Yuanyuan Li
Automated collection strength analysis BIBAFull-Text 489
  Clare Lllewellyn; Robert Sanderson; Brian Rea
The strengths within six library collections were automatically determined through automated enrichment and analysis of bibliographic level metadata records, with a view towards efficient resource sharing and collaborative collection management. This involved very large scale deduplicantion, enrichment and automatic reclassification of records using machine learning processes.
Digital library education: some international course structure comparisons BIBAFull-Text 490
  Yongqing Ma; Ann O'Brien; Warwick Clegg
Following our recent review of progress in Digital Library (DL) education [1], we present here a brief overview of current work to investigate the commonality/diversity of course structure between ten institutions outside North America which offer DL education in their library schools. The weighting of specifically DL module topic credits as a proportion of the overall course taught credits varies between 13% and 63%, and coverage of a proposed core topic set [2] is as high as 85%.
PIM through a 5S perspective BIBKFull-Text 491
  Yi Ma; Edward A. Fox; Marcos A. Gonçalves
Keywords: 5S model, cognitive science, intrapersonal communication, personal information management
Use vs. access: design and use in educational digital libraries BIBAFull-Text 492
  Flora McMartin; Brandon Muramatsu
The poster will compare and contrast the design and usage assumptions of existing educational digital libraries and repositories to challenge digital library developers to meet the needs of their increasingly sophisticated users. Traditionally, assumptions have focused on access to a site and discovery of content, whereas we define use as content and its application (context, audience, etc.).
   In this poster we will review the assumptions that have driven the design of digital libraries, their services and evaluation. Measures of success such as page views of metadata rest on assumptions associated with access, i.e., the number of times a metadata record is displayed. This measure provides a very limited view of how a digital library is used. We believe that educational digital libraries need to go beyond such a limited view and think about what people actually do with material: Are they using it? Are they returning to it? Are they modifying it? Are they sharing it with others? We will explore an alternate set of metrics for determining the success (or failure) of educational digital libraries by examining metrics focused on use of the contents of educational digital libraries.
What do faculty need and want from digital libraries? BIBAFull-Text 493
  Flora P. McMartin; Alan Wolf; Ellen Iverson; Cathrynn Manduca; Glenda Morgan; Joshua Morrill
In this paper, we report on the results of a national survey of faculty members concerning their use of digital resources (DRs), collections of resources and digital libraries (DLs). The research reported here explored issues such as: value of DRs, motivation for using DRs and barriers to use of these resources in teaching. The results have implications for how DLs might develop education and outreach efforts to increase visibility and use of their collections.
Building cross-browser interfaces for digital libraries with scalable vector graphics (SVG) BIBAFull-Text 494
  Francis Molina; Brian Sweeney; Ted Willard; André Winter
We share our experience with developing interactive, crossbrowser strand maps using SVG. These maps will provide educators with free and easy access to carefully selected instructional resources linked to national and state learning goals. We will show the interface in at least two browsers, Internet Explorer with Adobe's SVG Viewer plug-in and Mozilla Firefox.
Understanding target users of a digital reference library BIBAFull-Text 495
  Daniela K. Rosner; John Mark Josling; Andrea Moed; Elisa Oreglia
Through an investigation of the needs and practices of researchers in the humanities and social sciences, we identify key issues in the use of an online digital reference library, the Electronic Cultural Atlas Initiatives' "Support for the Learner: What, When, Where and Who". In this poster we present an examination of results from survey data and user tasks, and discuss implications for future design and implementation based on our findings.
Capturing relevant information for digital curation BIBKFull-Text 496
  Chirag Shah; Gary Marchionini
Keywords: contextual information, digital curation, digital preservation
Merging the Norwegian gazetteer with the ADL gazetteer BIBAFull-Text 497
  Øyvind Vestavik; Ingeborg T. Sølvberg
We report on work in progress on the merging of the Norwegian Gazetteer and the ADL gazetteer to create a gazetteer with both detailed local coverage and global scope suitable for indexing articles from a local newspaper. We describe a mapping on the schema level, a strategy for identifying duplicates in the merged gazeetteer and some identified challenges.
Information system media education (ISM): cooperating for media literacy BIBKFull-Text 498
  Heike vom Orde
Keywords: media education, media literacy, network
Digitizing & providing access to contextual cultural materials: the liner notes digitization project BIBAFull-Text 499
  Megan Winget
This paper describes a digitization project focused on developing a metadata modeling schema for album liner notes.
Use of online digital learning materials and digital libraries: comparison by discipline BIBAFull-Text 500
  Alan J. Wolf; Ellen R. Iverson; Cathryn Manduca; Flora McMartin; Glenda Morgan; Joshua Morrill
In this paper, we describe the results of a national survey of higher education faculty concerning their use of digital resources and collections of these resources. We explore the differences in resource use by discipline groups and suggest implications for development of discipline specific libraries and faculty development practices.

Demos

XML as the articulation between information retrieval and multimedia in a musical heritage dissemination BIBAFull-Text 501
  Rodolphe Bailly
The Cite de la musique in Paris has recently opened a new media Library. One of the Library's assignments is the dissemination of the Cite de la musique's collection of recorded concerts. This paper presents the concert's description model implemented into the MARC Catalogue and emphasizes the central position in the library information system architecture of automatically generated XML representations of each concert.
Lightweight realistic books: the greenstone connection BIBKFull-Text 502
  Veronica Liesaputra; Ian H. Witten; David Bainbridge
Keywords: electronic book, flash application
Rapid document navigation for information triage support BIBAFull-Text 503
  George Buchanan
This paper introduces a novel interaction for supporting rapid between-page comparison of texts within a limited screen estate. In comparison with existing interfaces and interactions, it gives a high degree of visual feedback and allows rapid between-page flicking, similar to what is readily achieved in physical environments using the fingers and thumbs of a reader as they flip between related pages.
Fluid interaction for the document in context BIBAFull-Text 504
  Pierre Cubaud; Jérôme Dupire; Alexandre Topol
We explore in this paper the interface requirements for user's navigation within a mixed collection of 3D digitalized objects and textual documents. A specific application is history of technology where 3D and 2D documents are most of the time inter-related.
Demonstrating the semantic growbag: automatically creating topic facets for facetedDBLP BIBAFull-Text 505
  Jörg Diederich; Wolf-Tilo Balke; Uwe Thaden
The FacetedDBLP demonstrator allows to search computer science publications starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. Furthermore, it uses GrowBag graphs, i.e., automatically created categorization systems, to create a topic facet, with which a user can characterize the result set in terms of main research topics and filter it according to certain subtopics.
The David L. Bassett stereoscopic atlas of human anatomy: developing a specialized collection within the stanford mediaserver digital library BIBAFull-Text 506
  Jeremy C. Durack; Amy L. Ladd; Shyh-Yuan Kung; Margaret Krebs; Robert A. Chase; Parvati Dev
We describe the creation of a specialized media collection in the Stanford MediaServer highlighting the David L. Bassett Stereoscopic Atlas of Human Anatomy. Previous reports have outlined the underlying architecture and features of the MediaServer developed to support biomedical media-based education 1,2. Here we focus on specific design principles and technical aspects of a focused project that may be beneficial to those developing digital media collections.
Evalutron 6000: collecting music relevance judgments BIBKFull-Text 507
  Anatoliy A. Gruzd; J. Stephen Downie; M. Cameron Jones; Jin Ha Lee
Keywords: MIREX, music digital libraries, music information retrieval, music similarity
VCenter: a digital video management system with mobile search service BIBAFull-Text 508
  Jen-Hao Hsiao; Yu-Zheng Wang
Digital video data have proliferated in recent years due to the rapid development of multimedia computing and computer technologies. Management of video data is thus becoming an indispensable part in digital library. However, currently most digital video library systems are lack of the support of content-based video search and an easy-to-use query interface. In this work, we develop a digital video management system called VCenter, which provides lightweight mobile search functionality based on image taken from camera phone. By the proposed framework, both end user and content owner are easier to enjoy the multimedia contents in digital video libraries.
Creativity support: the mixed-initiative composition space BIBAFull-Text 509
  Andruid Kerne; Eunyee Koh
Creativity support is an important and challenging emerging area of research. combinFormation is being developed as a tool to support creativity through a mixed-initiative composition space. The system combines searches and information feeds, representing relevant information collections as compositions of image and text surrogates. The composition space affords human manipulation. This method has been shown to support information discovery in The Design Process, an interdisciplinary undergraduate course. In this demo, we demonstrate how combinFormation can be used to explore and discover information in digital libraries such as ACM Digital Library and the International Children's Digital Library.
Visual understanding environment BIBAFull-Text 510
  Anoop Kumar
The Visual Understanding Environment (VUE) project at Tufts' Academic Technology department aims at providing faculty and students with tools to successfully integrate digital resources into their teaching and learning. VUE not only provides a visual environment for structuring, presenting, and sharing digital information but also viewing digital resources along with their metadata. The demonstration will showcase the federated search capabilities of VUE that enable users to search across multiple digital repositories. We will also present concept maps created using digital objects from repositories.
Mobile digital libraries for geography education BIBFull-Text 511
  Minh-Chanh Vo; Fiftarina Puspitasari; Ee-Peng Lim; Chew-Hung Chang; Yin-Leng Theng; Dion Hoe-Lian Goh; Kalyani Chatterjea; Jun Zhang; Aixin Sun; Yuanyuan Li
The internet public library: an online learning laboratory for digital libraries BIBAFull-Text 512
  Lorri Mon; Larry Dennis; Kyunghye Kim
This demonstration explores the Internet Public Library (www.ipl.org), a shared online facility for testing innovations in digital libraries and for training a skilled work force in digital library services, systems, and collections. Hypatia 2.0 and QRC software used in IPL's digital library collections and services are shown, with discussion of IPL in education, digital collections, digital reference services, digital library systems, and research.
5SQual: a quality assessment tool for digital libraries BIBKFull-Text 513
  Bárbara Lagoeiro Moreira; Marcos André Gonçalves; Alberto Henrique Frade Laender; Edward A. Fox
Keywords: 5S, 5SQual, digital libraries, quality assessment
ContextMiner: a tool for digital library curators BIBKFull-Text 514
  Chirag Shah; Gary Marchionini
Keywords: contextual information, digital curation, digital preservation
From kinescope to rich media: 50 years (ago) with Mike Wallace BIBAFull-Text 515
  Quinn Stewart; Grete Pasch; Rodrigo Arias
What do Eleanor Roosevelt, Frank Lloyd Wright, Margaret Sanger, and Henry Kissinger have in common? All of them, and 55 other celebrities were interviewed by Mike Wallace in 1957-58, and the corresponding kinescopes have resided in the Harry Ransom Humanities Research Center at the University of Texas since the early 1960's. This demonstration will showcase an online searchable video digital library of the Wallace interviews created by researchers, staff, and students at the University of Texas School of Information and the Universidad Francisco Marroquin.