| World explorer: visualizing aggregate data from unstructured text in geo-referenced collections | | BIBA | Full-Text | 1-10 | |
| Shane Ahern; Mor Naaman; Rahul Nair; Jeannie Hui-I Yang | |||
| The availability of map interfaces and location-aware devices makes a growing amount of unstructured, geo-referenced information available on the Web. This type of information can be valuable not only for browsing, finding and making sense of individual items, but also in aggregate form to help understand data trends and features. In particular, over twenty million geo-referenced photos are now available on Flickr, a photo-sharing website -- the first major collection of its kind. These photos are often associated with user-entered unstructured text labels (i.e., tags). We show how we analyze the tags associated with the geo-referenced Flickr images to generate aggregate knowledge in the form of "representative tags" for arbitrary areas in the world. We use these tags to create a visualization tool, World Explorer, that can help expose the content of the data, using a map interface to display the derived tags and the original photo items. We perform a qualitative evaluation of World Explorer that outlines the visualization's benefits in browsing this type of content. We provide insights regarding the aggregate versus individual-item requirements in browsing digital geo-referenced material. | |||
| Categorization and analysis of text in computer mediated communication archives using visualization | | BIBA | Full-Text | 11-18 | |
| Ahmed Abbasi; Hsinchun Chen | |||
| Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization. | |||
| Delineating the citation impact of scientific discoveries | | BIBA | Full-Text | 19-28 | |
| Chaomei Chen; Jian Zhang; Weizhong Zhu; Michael Vogeley | |||
| Identifying the significance of specific concepts in the diffusion of scientific knowledge is a challenging issue concerning many theoretical and practical areas. We introduce an innovative visual analytic approach to integrate microscopic and macroscopic perspectives of a rapidly growing scientific knowledge domain. Specifically, our approach focuses on statistically unexpected phrases extracted from unstructured text of titles and abstracts at the microscopic level in association with the magnitude and timeliness of their citation impact at the macroscopic level. The H-index, originally defined to measure individual scientists. productivity in terms of their citation profiles, is extended in two ways: 1) to papers and terms as a means of dividing these items into two groups so as to replace the less optimal threshold-based divisions, and 2) to take into account the timeliness of the impact of knowledge diffusion in terms of the timing of citations and publications so that attention is particularly drawn towards potentially significant and timely papers. The selected terms are connected to higher-level performance indicators, such as measures derived from the H-index, in the form of decision trees. A top-down traversal of such decision trees provides an intuitive walkthrough of concepts and phrases that may underline potentially significant but currently still latent scientific discoveries. Timeliness measures can also help to identify institutions that are at the forefront of a research field. We illustrate how widely accessible tools such as Google Earth can be utilized to disseminate such insights. The practical significance for digital libraries and fostering scientific discoveries is demonstrated through the astronomical literature related to the Sloan Digital Sky Survey (SDSS). | |||
| How to choose a digital preservation strategy: evaluating a preservation planning procedure | | BIBA | Full-Text | 29-38 | |
| Stephan Strodl; Christoph Becker; Robert Neumayer; Andreas Rauber | |||
| An increasing number of institutions throughout the world face legal
obligations or business needs to collect and preserve digital objects over
several decades. A range of tools exists today to support the variety of
preservation strategies such as migration or emulation. Yet, different
preservation requirements across institutions and settings make the decision on
which solution to implement very difficult.
This paper presents the PLANETS Preservation Planning approach. It provides an approved way to make informed and accountable decisions on which solution to implement in order to optimally preserve digital objects for a given purpose. It is based on Utility Analysis to evaluate the performance of various solutions against well-defined requirements and goals. The viability of this approach is shown in a range of case studies for different settings. We present its application to two scenarios of web archives, two collections of electronic publications, and a collection of multimedia art. This work focuses on the different requirements and goals in the various preservation settings. | |||
| Factors affecting website reconstruction from the web infrastructure | | BIBA | Full-Text | 39-48 | |
| Frank McCown; Norou Diawara; Michael L. Nelson | |||
| When a website is suddenly lost without a backup, it maybe reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We examine several characteristics of the websites over time including birth rate, decay and age of resources. We evaluate the reconstructions when compared to the crawled sites and develop a statistical model for predicting reconstruction success from the WI. On average, we were able to recover 61% of each website's resources. We found that Google's PageRank, number of hops and resource age were the three most significant factors in determining if a resource would be recovered from the WI. | |||
| Defining what digital curators do and what they need to know: the DigCCurr project | | BIBA | Full-Text | 49-50 | |
| Christopher A. Lee; Helen R. Tibbo; John C. Schaefer | |||
| The DigCCurr (Digital Curation Curriculum) project is developing a graduate level curricular framework, course modules, and experiential components to prepare students for digital curation in various environments. This paper summarizes a draft and guiding principles behind a matrix of digital curation knowledge and competencies, which are serving as the basis for our curriculum design efforts. | |||
| Generating best-effort preservation metadata for web resources at time of dissemination | | BIBA | Full-Text | 51-52 | |
| Joan A. Smith; Michael L. Nelson | |||
| HTTP and MIME, while sufficient for contemporary webpage access, do not provide enough forensic information to enable the long-term preservation of the resources they describe and transport. But what if the originating web server automatically provided preservation metadata encapsulated with the resource at time of dissemination? Perhaps the ingestion process could be streamlined, with additional forensic metadata available to future information archeologists. We have adapted an Apache web server implementation of OAI-PMH which can utilize third-party metadata analysis tools to provide a metadata-rich description of each resource. The resource and its forensic metadata are packaged together as a complex object, expressed in plain ASCII and XML. The result is a CRATE: a self-contained preservation-ready version of the resource, created at time of dissemination. | |||
| Document clustering using small world communities | | BIBA | Full-Text | 53-62 | |
| Brant W. Chee; Bruce Schatz | |||
| Words in natural language documents exhibit a small world network structure. Thus the physics community provides us with an extensive supply of algorithms for extracting community structure. We present a novel method for semantically clustering a large collection of documents using small world communities. This method combines modified physics algorithms with traditional information retrieval techniques. A term network is generated from the document collection, the terms are clustered into small world communities, the semantic term clusters are used to generate overlapping document clusters. The algorithm combines the speed of single link with the quality of complete link. Clustering takes place in nearly real-time and the results are judged to be coherent by expert users. Our algorithm occupies a middle ground between speed and quality of document clustering. | |||
| Efficient summarization-aware search for online news articles | | BIBA | Full-Text | 63-72 | |
| Wisam Dakka; Luis Gravano | |||
| News portals gather and organize news articles published daily on the Internet. Typically, news articles are clustered into 'events' and each cluster is displayed with a short description of its contents. A particularly interesting choice for describing the contents of a cluster is a machine-generated multi-document summary of the articles in the cluster. Such summaries are informative and help news readers to identify and explore only clusters of interest. Naturally, multi-document clusters and summaries are also valuable to help users navigate the results of keyword-search queries. Unfortunately, current document summarizers are still slow; as a result, search strategies that define document clusters and their multi-document summaries online, in a query-specific manner, are prohibitively expensive. In contrast, search strategies that only return offline, query-independent document clusters are efficient, but might return clusters whose (query-independent) summaries are of little relevance to the queries. In this paper, we present an efficient Hybrid search strategy to address the limitations of fully online and fully offline summarization-aware search approaches. Extensive experiments involving user relevance judgments and real news articles show that the quality of our Hybrid results is high, and that these results are computed in substantially less time than with the fully online strategy. We have implemented our strategy and made it available on the Newsblaster news summarization system, which crawls and summarizes news articles from a variety of web sources on a daily basis. | |||
| Integrating data and text mining processes for digital library applications | | BIBA | Full-Text | 73-79 | |
| Robert Sanderson; Paul Watry | |||
| This paper explores the integration of text mining and data mining techniques, digital library systems, and computational and data grid technologies with the objective of developing an online classification service exemplar. We discuss the current research issues relating to the use of data mining algorithms and toolkits for textual data; the necessary changes within the Cheshire3 Information Framework to accommodate analysis workflows; the outcomes of a demonstrator based on the National Library of Medicine's Medline dataset; and the provision of comparable metrics for evaluation purposes. The prototype has resulted in extremely accurate online classification services and offers a novel method of supporting text mining and data mining within a highly scaled computational environment, integrated seamlessly into the digital library architecture. | |||
| The OAI-ORE effort: progress, challenges, synergies | | BIBA | Full-Text | 80 | |
| Cliff Lynch; Savas Parastatidis; Neil Jacobs; Herbert Van de Sompel; Carl Lagoze | |||
| The panel will discuss various aspects of the ongoing Object Re-Use and Exchange (ORE) effort of the Open Archives Initiative (OAI). OAI-ORE is funded by the Andrew W. Mellon Foundation and is a result of the "Augmenting Interoperability across Scholarly Repositories" meeting that took place in April 2006 at the Mellon Foundation. A panel at JCDL 2006 reported on this meeting. The goal of OAI-ORE is to develop, identify, and profile extensible standards and protocols that allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. | |||
| SlideSeer: a digital library of aligned document and presentation pairs | | BIBA | Full-Text | 81-90 | |
| Min-Yen Kan | |||
| Research findings are often transmitted both as written documents and narrated slide presentations. As these two forms of media contain both unique and replicated information, it is useful to combine and align these two views to create a single synchronized medium. We introduce SlideSeer, a digital library that discovers, aligns and presents such presentation and document pairs. We discuss the three major system components of the SlideSeer DL: 1) the resource discovery, 2) the fine-grained alignment and 3) the user interface. For resource discovery, we have bootstrapped our collection building process using metadata from DBLP and CiteSeer. For alignment, we modify maximum similarity alignment to favor monotonic alignments and incorporate a classifier to handle slides which should not be aligned. For the user interface, we allow the user to seamlessly switch between four carefully motivated views of the resulting synchronized media pairs. | |||
| TableSeer: automatic table metadata extraction and searching in digital libraries | | BIBA | Full-Text | 91-100 | |
| Ying Liu; Kun Bai; Prasenjit Mitra; C. Lee Giles | |||
| Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm -- TableRank. TableRank rates each (query,table) pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents. | |||
| CiteSearch: next-generation citation analysis | | BIBA | Full-Text | 101-102 | |
| Kiduk Yang; Lokman Meho | |||
| The coverage of citations in citation databases of today is disjoint and incomplete, which can result in conflicting quality assessment outcomes across different data sources. Fusion approach to quality assessment that employs a range of citation-based methods to analyze data from multiple sources is one way to address this limitation. The paper discusses a citation analysis pilot study that measured the impact of scholarly publications based on the data mined from Web of Science, Scopus, and Google Scholar. | |||
| Retrieval effectiveness of table of contents and subject headings | | BIBA | Full-Text | 103-104 | |
| Youngok Choi; Ingrid Hsieh-Yee; Bill Kules | |||
| The effectiveness of two modes of subject representation -- table of contents (TOC) and subject headings -- in subject searching in an online public access catalog (OPAC) system was investigated. The retrieval difference between TOC and the Library of Congress subject headings (LCSH) was statistically significant; the effect of subject domain was not statistically significant; users had better success matching their keywords to TOC than to LCSH; but their keywords often failed to retrieve items similar to the target items. These findings underscore the need to bridge user keywords to both TOC and LCSH. | |||
| Mining a digital library for influential authors | | BIBA | Full-Text | 105-106 | |
| David Mimno; Andrew McCallum | |||
| When browsing a digital library of research papers, it is natural to ask which authors are most influential in a particular topic. We present a probabilistic model that ranks authors based on their influence in particular areas of scientific research. This model combines several sources of information: citation information between documents as represented by PageRank scores, authorship data gathered through automatic information extraction, and the words in paper abstracts. We compare the performance of a topic model versus a smoothed language model by assessing the number of major award winners in the resulting ranked list of researchers. | |||
| Can social bookmarking enhance search in the web? | | BIBA | Full-Text | 107-116 | |
| Yusuke Yanbe; Adam Jatowt; Satoshi Nakamura; Katsumi Tanaka | |||
| Social bookmarking is an emerging type of a Web service that helps users share, classify, and discover interesting resources. In this paper, we explore the concept of an enhanced search, in which data from social bookmarking systems is exploited for enhancing search in the Web. We propose combining the widely used link-based ranking metric with the one derived using social bookmarking data. First, this increases the precision of a standard link-based search by incorporating popularity estimates from aggregated data of bookmarking users. Second, it provides an opportunity for extending the search capabilities of existing search engines. Individual contributions of bookmarking users as well as the general statistics of their activities are used here for a new kind of a complex search where contextual, temporal or sentiment-related information is used. We investigate the usefulness of social bookmarking systems for the purpose of enhancing Web search through a series of experiments done on datasets obtained from social bookmarking systems. Next, we show the prototype system that implements the proposed approach and we present some preliminary results. | |||
| Task-based interaction with an integrated multilingual, multimedia information system: a formative evaluation | | BIBA | Full-Text | 117-126 | |
| Pengyi Zhang; Lynne Plettenberg; Judith L. Klavans; Douglas W. Oard; Dagobert Soergel | |||
| This paper describes a formative evaluation of an integrated multilingual, multimedia information system, a series of user studies designed to guide system development. The system includes automatic speech recognition for English, Chinese, and Arabic, automatic translation from Chinese and Arabic into English, and query-based and profile-based search options. The study design emphasizes repeated evaluation with the same (increasingly experienced) participants, exploration of alternative task designs, rich qualitative and quantitative data collection, and rapid analysis to provide the timely feedback needed to support iterative and responsive development. Results indicate that users presented with materials in a language that they do not know can generate remarkably useful work products, but that integration of transcription, translation, search and profile management poses challenges that would be less evident were each technology to be evaluated in isolation. | |||
| Modeling personal and social network context for event annotation in images | | BIBA | Full-Text | 127-134 | |
| Bageshree Shevade; Hari Sundaram; Lexing Xie | |||
| This paper describes a framework to annotate images using personal and social network contexts. The problem is important as the correct context reduces the number of image annotation choices.. Social network context is useful as real-world activities of members of the social network are often correlated within a specific context. The correlation can serve as a powerful resource to effectively increase the ground truth available for annotation. There are three main contributions of this paper: (a) development of an event context framework and definition of quantitative measures for contextual correlations based on concept similarity in each facet of event context; (b) recommendation algorithms based on spreading activations that exploit personal context as well as social network context; (c) experiments on real-world, everyday images that verified both the existence of inter-user semantic disagreement and the improvement in annotation when incorporating both the user and social network context. We have conducted two user studies, and our quantitative and qualitative results indicate that context (both personal and social) facilitates effective image annotation. | |||
| Longitudinal study of changes in blogs | | BIBA | Full-Text | 135-136 | |
| Paul Logasa, II Bogen; Luis Francisco-Revilla; Richard Furuta; Takeisha Hubbard; Unmil P. Karadkar; Frank Shipman | |||
| Web-based distributed collections often include links to documents that are expected to change frequently, such as blogs. The study reported here demonstrates that blog changes follow specific patterns. The results also illustrate the substantial role of standardized templates in blog pages. These results extend our earlier models that assess the significance of Web page change from a human perspective. These improved models will enable software systems to assist human collection managers in identifying unexpected changes and aberrant events. | |||
| SearchGen: a synthetic workload generator for scientific literature digital libraries and search engines | | BIBA | Full-Text | 137-146 | |
| Huajing Li; Wang-Chien Lee; Anand Sivasubramaniam; Lee Giles | |||
| Due to the popularity of web applications and their heavy usage, it is important to obtain a good understanding of their workloads in order to improve performance of search services. Existing works have typically focused on generic web workloads without putting emphasis on specific domains. In this paper, we analyze the usage logs of CiteSeer, a scientific literature digital library and search engine, to characterize workloads for both robots and users. Essential ingredients that contribute to workloads are proposed. Among them we find the access intervals show high variance, and thus cannot be predicted well with time-series models. On the other hand, client visiting path and semantics can be well captured with probabilistic models and Zipf-law. Based on the findings, we propose SearchGen, a synthetic workload generator to output traces for scientific literature digital libraries and search engines. A comparison between synthetic workloads and actual logged traces suggests that the synthetic workload fits well. | |||
| A retrospective look at Greenstone: lessons from the first decade | | BIBA | Full-Text | 147-156 | |
| Ian H. Witten; David Bainbridge | |||
| The Greenstone Digital Library Software has helped spread the practical impact of digital library technology throughout the world, with particular emphasis on developing countries. As Greenstone enters its second decade, this article takes a retrospective look at its development, the challenges that have been faced, and the lessons that have been learned in developing and deploying a comprehensive open-source system for the construction of digital libraries internationally. Not surprisingly, the most difficult challenges have been political, educational, and sociological, echoing that old programmers' blessing "may all your problems be technical ones.". | |||
| A unified platform for archival description and access | | BIBA | Full-Text | 157-166 | |
| Christopher J. Prom; Christopher A. Rishel; Scott W. Schwartz; Kyle J. Fox | |||
| The archival community has developed content and data structure standards to facilitate access to the diverse and unique sets of archival records, personal papers, and manuscript collections that are held by archival repositories and special collections libraries. However, these standards are difficult for archivists to use and are often implemented in ways that negatively affect materials-handling workflows, depriving archival users of the best possible access to the totality of materials available within an individual repository. The authors propose that archival descriptive problems can be addressed by implementing a web/database application that is tailored specifically to archival needs and can be implemented with little technical knowledge. This paper describes the system architecture of one such tool, the Archon software package, which was developed at the University of Illinois at Urbana-Champaign. Archon automates many technical tasks, such as producing a searchable website, an EAD instance or a MARC record. Although the system utilizes sophisticated algorithms and optimizations, it is easily extensible because most development takes place in an easy-to-use, object-oriented environment. | |||
| Children's interests and concerns when using the international children's digital library: a four-country case study | | BIBA | Full-Text | 167-176 | |
| Allison Druin; Ann Weeks; Sheri Massey; Benjamin B. Bederson | |||
| This paper presents a case study of 12 children who used the International Children's Digital Library (ICDL) over four years and live in one of four countries: Germany, Honduras, New Zealand, and the United States. By conducting interviews, along with collecting drawings and book reviews, this study describes these children's interests in books, libraries, technology and the world around them. Findings from this study include: these young people increased the variety of books they read online; still valued their physical libraries as spaces for social interaction and reading; showed increased reading motivation; and showed interest in exploring different cultures. | |||
| Digital library education in computer science programs | | BIBA | Full-Text | 177-178 | |
| Jeffrey Pomerantz; Sanghee Oh; Barbara M. Wildemuth; Seungwon Yang; Edward A. Fox | |||
| In an effort to identify the "state of the art" in digital library education in computer science (CS) programs, we analyzed CS courses on digital libraries and digital library-related topics. Fifteen courses that mention digital libraries in the title or short description were identified; of these, five are concerned with digital libraries as the primary topic of the course. The readings from these five courses were analyzed further, in terms of their authors and the journals in which they were published. | |||
| A study of how online learning resource are used | | BIBA | Full-Text | 179-180 | |
| Mimi Recker; Sarah Giersch; Andrew Walker; Sam Halioris; Xin Mao; Bart Palmer | |||
| This paper defines a model of teacher practice ("teaching as design"), and describes a professional development curriculum in which K-12 teachers design learning activities using resources and tools from education digital libraries. It then presents preliminary findings from an application of this model in which teachers' artifacts are analyzed to learn how online learning resources are used in situ. Initial results suggest that learning resources of a smaller granularity are more likely to be adapted or improvised upon in teacher-designed learning activities, which further supports teachers' becoming contributors of online resources and active participants in an education cyberinfrastructure. | |||
| Standards or semantics for curriculum search? | | BIBA | Full-Text | 181-182 | |
| Byron B. Marshall; René F. Reitsma; Martha N. Cyr | |||
| Aligning digital library resources with national and state educational standards to help K-12 teachers search for relevant curriculum is an important issue in the digital library community. Aligning standards from different states promises to help teachers in one state find appropriate materials created and cataloged elsewhere. Although such alignments provide a powerful means for crosswalking standards and curriculum across states, alignment matrices are intrinsically sparse. Hence, we hypothesize that such sparseness may cause significant numbers of false negatives when used for searching curriculum. Our preliminary results confirm the false negative hypothesis, demonstrate the usefulness of term-based techniques in addressing the false negative problem, and explore ways to combine term occurrence data with standards correlations. | |||
| Information behavior of small groups: implications for design of digital libraries | | BIBA | Full-Text | 183-184 | |
| Nan Zhou; Gerry Stahl | |||
| We report findings of a study that investigates the information behavior of online small groups engaged in math problem solving and discuss the implications for designing digital libraries that can support learning of younger students and their broader information practices. | |||
| Adaptive sorted neighborhood methods for efficient record linkage | | BIBA | Full-Text | 185-194 | |
| Su Yan; Dongwon Lee; Min-Yen Kan; Lee C. Giles | |||
| Traditionally, record linkage algorithms have played an important role in maintaining digital libraries -- i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-line and are fixed during the execution. Since finding the ideal values of such parameters is not straightforward, or no such single ideal value even exists, the applicability of existing solutions to new scenarios or domains is greatly hampered. To remedy this problem, we argue that one can achieve significant improvement by adaptively and dynamically changing such parameters of record linkage algorithms. To validate our hypothesis, we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and demonstrate how we can achieve improved accuracy and performance by adaptively changing its fixed sliding window size. Our claim is analytically and empirically validated using both real and synthetic data sets of digital libraries and other domains. | |||
| Distributed web search efficiency by truncating results | | BIBA | Full-Text | 195-203 | |
| Christopher T. Fallen; Gregory B. Newby | |||
| A large set of Web documents (the TREC GOV2 collection) comes from many separate Internet hosts, such as www.nih.gov and travel.state.gov. There is considerable variability in the number of Web pages (i.e., documents) from each host. In this paper, we present and evaluate a method for setting a maximum number of "hits" that may be presented for each web host. Federated search environments are increasingly common components of digital libraries and in these environments, the benefit of such a maximum is that it can reduce the number of possibly relevant documents presented by each subcollection, without hurting early precision measures such as P@20. Derivation of a maximum number, which is proportional to the subcollection size but not sensitive to different search topics, is made possible by an analysis of patterns of relevance judgment across approximately 17,000 web hosts in GOV2. | |||
| Adaptive graphical approach to entity resolution | | BIBA | Full-Text | 204-213 | |
| Zhaoqi Chen; Dmitri V. Kalashnikov; Sharad Mehrotra | |||
| Entity resolution is a very common Information Quality (IQ) problem with many different applications. In digital libraries, it is related to problems of citation matching and author name disambiguation; in Natural Language Processing, it is related to coreference matching and object identity; in Web application, it is related to Web page disambiguation. The problem of Entity Resolution arises because objects/entities in real world datasets are often referred to by descriptions, which might not be unique identifiers of these entities, leading to ambiguity. The goal is to group all the entity descriptions that refer to the same real world entities. In this paper we present a graphical approach for entity resolution. It complements the traditional methodology with the analysis of the entity-relationship graph constructed for the dataset being analyzed. The paper demonstrates that a technique that measures the degree of interconnectedness between various pairs of nodes in the graph can significantly improve the quality of entity resolution. Furthermore, the paper presents an algorithm for making that technique self-adaptive to the underlying data, thus minimizing the required participation from the domain-analyst and potentially further improving the disambiguation quality. | |||
| Cyberinfrastructure for the humanities and social sciences: advancing the humanities research agenda | | BIBA | Full-Text | 214 | |
| Joyce Ray; Clifford Lynch; Brett Bobley; Gregory Crane; Steven Wheatley | |||
| In 2006 the American Council of Learned Societies (ACLS) released Our Cultural Commonwealth, the final report of the Commission on Cyberinfrastructure for the Humanities and Social Sciences. The report, based on a study funded by the Mellon Foundation, explored how research environments might be created for the humanities and social sciences to complement those being developed to support scientific research. The report includes key recommendations addressed to universities, funding agencies, scholarly societies, academic libraries, publishers, Congress, state legislatures, and others. Implementation of the recommendations could potentially transform scholarship and exponentially increase access to resources and new scholarship in the humanities and social sciences. But the report has not been universally embraced. How will humanities scholarship be advanced by new technologies and research practices, and how will the academic community recognize new forms of scholarship? How will funding agencies respond to the challenges and issues raised? What does cyberinfrastructure mean for different domains within the humanities? These questions will be addressed by panelists and discussed by participants. | |||
| FLUX-CIM: flexible unsupervised extraction of citation metadata | | BIBA | Full-Text | 215-224 | |
| Eli Cortez; Altigran S. da Silva; Marcos André Gonçalves; Filipe Mesquita; Edleno S. de Moura | |||
| In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested. | |||
| Measuring conference quality by mining program committee characteristics | | BIBA | Full-Text | 225-234 | |
| Ziming Zhuang; Ergin Elmacioglu; Dongwon Lee; C. Lee Giles | |||
| Bibliometrics are important measures for venue quality in digital libraries. Impacts of venues are usually the major consideration for subscription decision-making, and for ranking and recommending high-quality venues and documents. For digital libraries in the Computer Science literature domain, conferences play a major role as an important publication and dissemination outlet. However, with a recent profusion of conferences and rapidly expanding fields, it is increasingly challenging for researchers and librarians to assess the quality of conferences. We propose a set of novel heuristics to automatically discover prestigious (and low-quality) conferences by mining the characteristics of Program Committee members. We examine the proposed cues both in isolation and combination under a classification scheme. Evaluation on a collection of 2,979 conferences and 16,147 PC members shows that our heuristics, when combined, correctly classify about 92% of the conferences, with a low false positive rate of 0.035 and a recall of more than 73% for identifying reputable conferences. Furthermore, we demonstrate empirically that our heuristics can also effectively detect a set of low-quality conferences, with a false positive rate of merely 0.002. We also report our experience of detecting two previously unknown low-quality conferences. Finally, we apply the proposed techniques to the entire quality spectrum by ranking conferences in the collection. | |||
| Toward alternative measures for ranking venues: a case of database research community | | BIBA | Full-Text | 235-244 | |
| Su Yan; Dongwon Lee | |||
| Ranking of publication venues is often closely related with important issues such as evaluating the contributions of individual scholars/research groups, or subscription decision making. The development of large-scale digital libraries and the availability of various meta data provide the possibility of building new measures more efficiently and accurately. In this work, we propose two novel measures for ranking the impacts of academic venues an easy-to-implement seed-based measure that does not use citation analysis, and a realistic browsing-based measure that takes an article reader's behavior into account. Both measures are computationally efficient yet mimic the results of the widely accepted Impact Factor. In particular, our proposal exploits the fact that: (1) in most disciplines, there are "top" venues that most people agree on; and (2) articles that appeared in good venues are more likely to be viewed by readers. Our proposed measures are extensively evaluated on a test case of the Database research community using two real bibliography data sets -- ACM and DBLP. Finally, ranks of venues by our proposed measures are compared against the Impact Factor using the Spearman's rank correlation coefficient, and their positive rank order relationship is proved with a statistical significance test. | |||
| A model for inclusive design of digital libraries | | BIBA | Full-Text | 245-246 | |
| Sambhavi Chandrashekar; Nadia Caidi | |||
| Digital libraries (DLs) must cater not only to the varied needs of its target users but also to their differing abilities, and to the adaptive technologies used by persons whose computing capabilities are restricted due to disabilities. This paper proposes a model for DL design that includes optimization of the usability of the search process and ensures accessibility of the content for users of DLs with disabilities. | |||
| Representing aggregate works in the digital library | | BIBA | Full-Text | 247-256 | |
| George Buchanan; Jeremy Gow; Ann Blandford; Jon Rimmer; Claire Warwick | |||
| This paper studies the challenge of representing aggregate works such as encyclopedias, collected poems and journals in heterogenous digital library collections. Reflecting on the materials used by humanities academics, we demonstrate the varied range of aggregate types and the problems of faithfully representing this in the DL interface. Aggregates are complex and pervasive, challenge common assumptions and confuse boundaries within organisational structures. Existing DL systems can only provide imperfect representation of aggregates, and alterations to document encoding are insufficient to create a faithful reproduction of the physical library. The challenge is amplified through concrete examples, and solutions are demonstrated in a well-known DL system and related to standard DL architecture. | |||
| StoryBank: an indian village community digital library | | BIBA | Full-Text | 257-258 | |
| Matt Jones; Will Harwood; George Buchanan; Mounia Lalmas | |||
| This paper considers information access styles for a community digital library in an Indian village. We present our impressions of the community gathered during a field-study and show how these have influenced the interaction design. The prototype aims to overcome low-textual literacy and lack of computing experience by combining touch-based interaction, engaging visual presentations and drawing on villagers' familiarity with radio listening. | |||
| The gray lady gets a new dress: a field study of the times news reader | | BIBA | Full-Text | 259-268 | |
| Catherine C. Marshall | |||
| Increasingly individuals are turning to online sources for their daily news. Traditional newspapers have developed significant web presences to compete with newer services such as news aggregators and emerging genres such as blogs and other forms of citizen journalism. This paper reports the results of a field study to investigate the use of a new RSS-driven, template-based presentation mechanism that delivers a daily newspaper to subscribers' laptops and desktops; the Times News Reader hybridizes elements of print newspapers with aspects of online news. We explore how this application compares with print and web-based news reading and evaluate functionality developed to draw in readers from both audiences. Finally we examine three general technological implications drawn from current use: how the news reader may adapt to different styles of reading; how the news reader's functionality may be extended to highlight the timeliness of the content and to personalize the application; and how long-term use of the news reader can result in a personal news archive. | |||
| Drowning in data: digital library architecture to support scientific use of embedded sensor networks | | BIBA | Full-Text | 269-277 | |
| Christine L. Borgman; Jillian C. Wallis; Matthew S. Mayernik; Alberto Pepe | |||
| New technologies for scientific research are producing a deluge of data that is overwhelming traditional tools for data capture, analysis, storage, and access. We report on a study of scientific practices associated with dynamic deployments of embedded sensor networks to identify requirements for data digital libraries. As part of continuing research on scientific data management, we interviewed 22 participants in 5 environmental science projects to identify data types and uses, stages in their data life cycle, and requirements for digital library architecture. We found that scientists need continuous access to their data from the time that field experiments are designed through final analysis and publication, thus reflecting a broader notion of "digital library." Six categories of requirements are discussed: the ability to obtain and maintain data in the field, verify data in the field, document data context for subsequent interpretation, integrate data from multiple sources, analyze data, and preserve data. Three digital library efforts currently underway within the Center for Embedded Networked Sensing are addressing these requirements, with the goal of a tightly coupled interoperable framework that, in turn, will be a component of cyberinfrastructure for science. | |||
| A practical ontology for the large-scale modeling of scholarly artifacts and their usage | | BIBA | Full-Text | 278-287 | |
| Marko A. Rodriguez; Johan Bollen; Herbert Van de Sompel | |||
| The large-scale analysis of scholarly artifact usage is constrained primarily by current practices in usage data archiving, privacy issues concerned with the dissemination of usage data, and the lack of a practical ontology for modeling the usage domain. As a remedy to the third constraint, this article presents a scholarly ontology that was engineered to represent those classes for which large-scale bibliographic and usage data exists, supports usage research, and whose instantiation is scalable to the order of 50 million articles along with their associated artifacts (e.g. authors and journals) and an accompanying 1 billion usage events. The real world instantiation of the presented abstract ontology is a semantic network model of the scholarly community which lends the scholarly process to statistical analysis and computational support. We present the ontology, discuss its instantiation, and provide some example inference rules for calculating various scholarly artifact metrics. | |||
| A dynamic ontology for a dynamic reference work | | BIBA | Full-Text | 288-297 | |
| Mathias Niepert; Cameron Buckner; Colin Allen | |||
| The successful deployment of digital technologies by humanities scholars
presents computer scientists with a number of unique scientific and
technological challenges. The task seems particularly daunting because issues
in the humanities are presented in abstract language demanding the kind of
subtle interpretation often thought to be beyond the scope of artificial
intelligence, and humanities scholars themselves often disagree about the
structure of their disciplines. The future of humanities computing depends on
having tools for automatically discovering complex semantic relationships among
different parts of a corpus. Digital library tools for the humanities will need
to be capable of dynamically tracking the introduction of new ideas and
interpretations and applying them to older texts in ways that support the needs
of scholars and students.
This paper describes the design of new algorithms and the adjustment of existing algorithms to support the automated and semi-automated management of domain-rich metadata for an established digital humanities project, the Stanford Encyclopedia of Philosophy. Our approach starts with a "hand-built" formal ontology that is modified and extended by a combination of automated and semi-automated methods, thus becoming a "dynamic ontology". We assess the suitability of current information retrieval and information extraction methods for the task of automatically maintaining the ontology. We describe a novel measure of term-relatedness that appears to be particularly helpful for predicting hierarchical relationships in the ontology. We believe that our project makes a further contribution to information science by being the first to harness the collaboration inherent in a expert-maintained dynamic reference work to the task of maintaining and verifying a formal ontology. We place special emphasis on the task of bringing domain expertise to bear on all phases of the development and deployment of the system, from the initial design of the software and ontology to its dynamic use in a fully operational digital reference work. | |||
| Preparing resource discovery for digitized music: an analysis of an australian application | | BIBA | Full-Text | 298-302 | |
| Jennifer A. Thomas; Michael R. Middleton; Margaret Warren | |||
| This paper examines procedures for the creation and delivery of digital
music that are being undertaken by contributors to the National Library of
Australia's federated music gateway MusicAustralia. The case study discusses
access to and preservation of digital material as key drivers of the
digitization movement, and compares projects being undertaken worldwide. Also
analyzed are the underlying digitization principles and standards, and metadata
schemas for the description and exchange of digital objects which facilitate
record exchange and improve audience reach.
The paper provides an overview of some individual contributing institutions, however particular focus is placed on the State Library of Queensland's (SLQ) approach to preparing its unique Queensland music collection for digital resource discovery in MusicAustralia. A detailed analysis of SLQ's strategy is presented, including its risk management approach to copyright implications, and consideration of infrastructure issues affecting the creation, preservation and online delivery of its digital music objects. Whilst SLQ's current digital music collection is relatively small, it has become core business of SLQ's Arts and Humanities branch, and the collection will expand with the continued incorporation of music material unique to Queensland into the collection. SLQ has developed a sound foundation for digitization based on widely endorsed principles and standards which should allow this to effectively occur. | |||
| Goal-directed evaluation for the improvement of optical music recognition on early music prints | | BIBA | Full-Text | 303-304 | |
| Laurent Pugin; John Ashley Burgoyne; Ichiro Fujinaga | |||
| Optical music recognition (OMR) systems are promising tools for the creation of searchable digital music libraries. Using an adaptive OMR system for early music prints based on hidden Markov models, we leverage an edit distance evaluation metric to improve recognition accuracy. Baseline results are computed with new labeled training and test sets drawn from a diverse group of prints. We present two experiments based on this evaluation technique. The first resulted in a significant improvement to the feature extraction function for these images. The second is a goal-directed comparison of several popular adaptive binarization algorithms, which are often evaluated only subjectively. Accuracy increased by as much as 55% for some pages, and the experiments suggest several avenues for further research. | |||
| Annotation functionality for digital libraries supporting collaborative performance: an example of musical scores | | BIBA | Full-Text | 305-306 | |
| Megan Winget | |||
| This paper describes the findings of an ethnographic study that examined the annotation behaviors of musicians working with musical scores for the purpose of performance. Annotation was found to be an important part of the rehearsal process, and specific annotation functionalities are recommended for future digital library development. | |||
| Toward an understanding of similarity judgments for music digital library evaluation | | BIBA | Full-Text | 307-308 | |
| J. Stephen Downie; Jin Ha Lee; Anatoliy A. Gruzd; M. Cameron Jones | |||
| This paper presents an analysis of 7,602 similarity judgments collected for the Symbolic Melodic Similarity (SMS) and Audio Music Similarity and Retrieval (AMS) evaluation tasks in the 2006 Music Information Retrieval Evaluation eXchange (MIREX). We discuss the influence of task definitions, as well as evaluation metrics on user perceptions of music similarity, and provide recommendations for future Music Digital Library/Music Information Retrieval research pertaining to music similarity. | |||
| Agreeing to disagree: search engines and their public interfaces | | BIBA | Full-Text | 309-318 | |
| Frank McCown; Michael L. Nelson | |||
| Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months. | |||
| Static reformulation: a user study of static hypertext for query-based reformulation | | BIBA | Full-Text | 319-328 | |
| Michael Huggett; Joel Lanir | |||
| Hypertext allows users to navigate between related materials in digital libraries. The most fundamental automated hypertexts are those constructed on the basis of semantic similarity. Such hypertexts have been evaluated by a variety of means, but seldom by real users given simulated real-world tasks. We claim that while other methods exist, one of the best ways to prove the usefulness of hypertext is to show the benefits for users performing realistic tasks. We compare the reformulation of queries that users perform in keyword searching, to the query reformulation implicit in browsing between documents linked by similarity of content. We find that a static automatically-constructed similarity hypertext provides useful linking between related items, improving the retrieval of targets when used to augment standard keyword search. | |||
| A rich OPAC user interface with AJAX | | BIBA | Full-Text | 329-330 | |
| Jesse Prabawa Gozali; Min-Yen Kan | |||
| Open Public Access Catalogs (OPACs) provide patrons with a user interface (UI) to help their information seeking tasks. Even though many OPAC UIs are now web-based, their architectures are often static, which does not allow them to integrate user assistance modules dynamically. We report on a UI that supports integration of such modules, while providing a usable and rich environment. We explore how Asynchronous JavaScript + XML (AJAX) can be employed to create an OPAC UI that offers a better user experience and task support. Our developed UI features a modular architecture that combines several Natural Language Processing (NLP) modules employed to enhance information seeking. Our UI manages queries in a novel way with a tabbed interface featuring an overview/details presentation model, and an AJAX query results data grid. Preliminary user testing results are also presented. | |||
| Constructing digital library interfaces | | BIBA | Full-Text | 331-332 | |
| David M. Nichols; David Bainbridge; Michael B. Twidale | |||
| The software technologies used to create web interfaces for digital libraries are discussed using examples from Greenstone 3. | |||
| Retrieval in text collections with historic spelling using linguistic and spelling variants | | BIBA | Full-Text | 333-341 | |
| Andrea Ernst-Gerlach; Norbert Fuhr | |||
| We present a new approach for the retrieval of texts with non-standard spelling, which is important for historic texts e.g. in English or German. In this paper, we describe the overall architecture of our system, followed by its evaluation. Given a search term as lemma, we use a dictionary of contemporary German for finding all inflected and derived forms of the lemma. Then we apply transformation rules (derived from training data) for generating historic spelling variants. For the evaluation, we regard the resulting retrieval quality. The experimental results show that we can improve the retrieval quality for historic collections substantially. | |||
| Efficient topic-based unsupervised name disambiguation | | BIBA | Full-Text | 342-351 | |
| Yang Song; Jian Huang; Isaac G. Councill; Jia Li; C. Lee Giles | |||
| Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective two-stage approach to disambiguate names. In the first stage, two novel topic-based models are proposed by extending two hierarchical Bayesian text models, namely Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. After learning an initial model, the topic distributions are treated as feature sets and names are disambiguated by leveraging a hierarchical agglomerative clustering method. Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering and could be extended to other research fields. We empirically addressed the issue of scalability by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset. | |||
| Using bilingual ETD collections to mine phrase translations | | BIBA | Full-Text | 352-353 | |
| Ryan Richardson; Edward A. Fox | |||
| Phrase translation lists can enhance cross-language information retrieval. However, finding translations for technical phrases is difficult. Bilingual dictionaries have limited coverage for specialized fields, and even more limited coverage of technical phrases. Since phrases can have very specific meanings in technical fields, this limits the quality of translations produced by generic machine translation systems. We hypothesize that digital libraries of electronic theses and dissertations (ETDs) are a good source of technical phrase translations. We have acquired a collection of 3,086 Spanish ETDs about computer science from Scirus, the Universidad Nacional Autonoma de Mexico (Mexico City), and Universidad de las Americas (Puebla). By using English ETDs from NDLTD, we have a comparable corpus of computing-related documents from which to mine phrase translations. We describe our method and its formative evaluation. | |||
| Evaluation of kernel-based link analysis measures on research paper recommendation | | BIBA | Full-Text | 354-355 | |
| Masashi Shimbo; Takahiko Ito; Yuji Matsumoto | |||
| We compare various kernel-based link analysis measures on graph nodes to evaluate their utility as a research paper recommendation system. The compared measures include the Kandola et al.'s von Neumann kernel, its extension that takes communities into account, and Smola and Kondor's regularized Laplacian. Chebotarev and Shamis' matrix forest-based algorithm, Kleinberg's HITS authority ranking, and classic co-citation coupling are also evaluated. The experimental result shows that kernel-based methods outperform HITS and co-citation coupling, with the community-based von Neumann kernel achieving the highest score. | |||
| A new generation of textual corpora: mining corpora from very large collections | | BIBA | Full-Text | 356-365 | |
| Gordon Stewart; Gregory Crane; Alison Babeu | |||
| While digital libraries based on page images and automatically generated text have made possible massive projects such as the Million Book Library, Open Content Alliance, Google, and others, humanists still depend upon textual corpora expensively produced with labor-intensive methods such as double-keyboarding and manual correction. This paper reports the results from an analysis of OCR-generated text for classical Greek source texts. Classicists have depended upon specialized manual keyboarding that costs two or more times as much as keyboarding of English both for accuracy and because classical Greek OCR produced no usable results. We found that we could produce texts by OCR that, in some cases, approached the 99.95% professional data entry accuracy rate. In most cases, OCR-generated text yielded results that, by including the variant readings that digital corpora traditionally have left out, provide better recall and, we argue, can better serve many scholarly needs than the expensive corpora upon which classicists have relied for a generation. As digital collections expand, we will be able to collate multiple editions against each other, identify quotations of primary sources, and provide a new generation of services. | |||
| Subject metadata enrichment using statistical topic models | | BIBA | Full-Text | 366-375 | |
| David Newman; Kat Hagedorn; Chaitanya Chemudugunta; Padhraic Smyth | |||
| Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal. | |||
| Organizing the OCA: learning faceted subjects from a library of digital books | | BIBA | Full-Text | 376-385 | |
| David Mimno; Andrew McCallum | |||
| Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent "topics" that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model is simultaneously better able to represent observed properties of text and more scalable to extremely large text collections. We train individual topic models for each book based on the cooccurrence of words within pages. We then cluster topics across books. The resulting topical clusters can be interpreted as subject facets, allowing readers to browse the topics of a collection quickly, find relevant books using topically expanded keyword searches, and explore topical relationships between books. We demonstrate this method finding topics on a corpus of 1.49 billion words from 42,000 books in less than 20 hours, and it easily could scale well beyond this. | |||
| Trends in metadata practices: a longitudinal study of collection federation | | BIBA | Full-Text | 386-395 | |
| Carole L. Palmer; Oksana L. Zavalina; Megan Mustafoff | |||
| With the increasing focus on interoperability for distributed digital content, resource developers need to take into consideration how they will contribute to large federated collections, potentially at the national and international level. At the same time, their primary objectives are usually to meet the needs of their own institutions and user communities. This tension between local practices and needs and the more global potential of digital collections has been an object of study for the IMLS Digital Collections and Content (IMLS DCC) project. Our practical aim has been to provide integrated access to over 160 IMLS-funded digital collections through a centralized collection registry and metadata repository. During the course of development, the research team has investigated how collections and items can best be represented to meet the needs of local resource developers and aggregators of distributed content, as well as the diverse user communities they may serve. This paper presents results from a longitudinal analysis of IMLS DCC development trends between 2003 and 2006. Changes in metadata applications have not been pronounced. However, multi-scheme use has become less common, and use of Dublin Core remains high, even as recognition of its limitations grows. Locally developed schemes are used as much as MARC, and may be on the increase as new collections are incorporating less traditional library and museum materials, and more interactive and multimedia content. Based on our empirical understanding of metadata use in practice, patterns in new content development, and user community indicators, our research has turned toward identifying metadata relationships between items and collections to preserve context and enhance functionality and usefulness for scholarly user communities. | |||
| Induced tagging: promoting resource discovery and recommendation in digital libraries | | BIBA | Full-Text | 396-397 | |
| J. Alfredo Sánchez; Adriana Arzamendi-Pétriz; Omar Valdiviezo | |||
| We introduce the notion of "induced tagging" in the context of learning communities that are supported by digital libraries. We also describe an environment aimed to foster discovery and recommendation of digital library resources based on induced tagging. | |||
| Standards alignment for metadata assignment | | BIBA | Full-Text | 398-399 | |
| Anne R. Diekema; Ozgur Yilmazel; Jennifer Bailey; Sarah C. Harwell; Elizabeth D. Liddy | |||
| The research in this paper describes a Machine Learning technique called hierarchical text categorization which is used to solve the problem of finding equivalents from among different state and national education standards. The approach is based on a set of manually aligned standards and utilizes the hierarchical structure present in the standards to achieve a more accurate result. Details of this approach and its evaluation are presented. | |||
| Identifying personal photo digital library features | | BIBA | Full-Text | 400-401 | |
| Sally Jo Cunningham; Masood Masoodian | |||
| At present, little evidence is available about how people want to interact with their photos in a personal photo digital library. Analysis of a set of 22 user needs summaries and critiques of existing photo management systems provides insight into potentially useful features. | |||
| Locating thematic pinpoints in narrative texts with short phrases: a test study on Don Quixote | | BIBA | Full-Text | 402-410 | |
| Jie Deng; Richard Furuta; Eduardo Urbina | |||
| Traditional implementations provide only limited assistance for locating the information in narrative texts relevant to a certain point of interest. We are investigating providing a "reading wheel" for such purposes. The first step of the bigger picture, as inspired by the editorial compilation of a textbook's index, is an attempt to locate thematically coherent sentences to a given short phrase. In this paper, we propose a two-step methodology to increase the search performance and examine its effectiveness in a test study. We describe the experimental setup and report on the quantitative evaluation of the techniques involved. | |||
| Digital Donne: workflow, editing tools, and the reader.s interface of a collection of 17th-century english poetry | | BIBA | Full-Text | 411-412 | |
| Carlos Monroy; Richard Furuta; Gary Stringer | |||
| We describe a multidisciplinary effort in the creation of an electronic repository of poems of John Donne -- the renowned 17th-century English poet. We discuss the workflow we have adopted and the Web-based tools we have developed for maintaining a collection of transcriptions and images, a concordance of poems, a list of press variants, and a browsing interface that enables readers to access these materials. A complement to the multi-volume Variorum Edition of the Poetry of John Donne, this endeavor shows how a traditional scholarly edition can be enhanced by resources made available by computers and the Internet. | |||
| A multilingual approach to technical manuscripts: 16th and 17th-century Portuguese shipbuilding treatises | | BIBA | Full-Text | 413-414 | |
| Carlos Monroy; Richard Furuta; Filipe Castro | |||
| Shipbuilding treatises are technical manuscripts written in a variety of languages and spanning several centuries that describe the construction of ships. Given their technical content, understanding terms, concepts, and construction sequences is a challenging task. In this paper we describe a scalable approach and a multilingual web-based interface for enabling a group of scholars to edit a glossary of nautical terms in multiple languages. | |||
| First class objects and indexes for chant manuscripts | | BIBA | Full-Text | 415-416 | |
| Louis W. G. Barton; Peter G. Jeavons; John A. Caldwell; Koon Shan Barry Ng | |||
| We discuss a crucial part of infrastructure for the Web-delivery of medieval chant resources. Although widely accepted by software professionals, the distributed-content model is sharply opposed by some chant scholars. We advocate for a paradigm of the Web as a massive database where each "first class object" acts like a record; metadata about, and links to such objects are compiled in virtual libraries. Scholarly-edited indexes determine which objects are in libraries, and unreliable content is excluded. Special metadata ontologies can be defined without modifying the primary content. | |||
| Recommending related papers based on digital library access records | | BIBA | Full-Text | 417-418 | |
| Stefan Pohl; Filip Radlinski; Thorsten Joachims | |||
| An important goal for digital libraries is to enable researchers to more easily explore related work. While citation data is often used as an indicator of relatedness, in this paper we demonstrate that digital access records (e.g. http-server logs) can be used as indicators as well. In particular, we show that measures based on co-access provide better coverage than co-citation, that they are available much sooner, and that they are more accurate for recent papers. | |||
| Automatic patent classification using citation network information: an experimental study in nanotechnology | | BIBA | Full-Text | 419-427 | |
| Xin Li; Hsinchun Chen; Zhu Zhang; Jiexun Li | |||
| Classifying and organizing documents in repositories is an active research topic in digital library studies. Manually classifying the large volume of patents and patent applications managed by patent offices is a labor-intensive task. Many previous studies have employed patent contents for patent classification with the aim of automating this process. In this research we propose to use patent citation information, especially the citation network structure information, to address the patent classification problem. We adopt a kernel-based approach and design kernel functions to capture content information and various citation-related information in patents. These kernels. performances are evaluated on a testbed of patents related to nanotechnology. Evaluation results show that our proposed labeled citation graph kernel, which utilized citation network structures, significantly outperforms the kernels that use no citation information or only direct citation information. | |||
| Collaborative classifier agents: studying the impact of learning in distributed document classification | | BIBA | Full-Text | 428-437 | |
| Weimao Ke; Javed Mostafa; Yueyu Fu | |||
| We developed a multi-agent framework where agents had limited/distributed knowledge for document classification and collaborated with each other to overcome the knowledge distribution. Each agent was equipped with a certain learning algorithm for predicting potential collaborators, or helping agents. We conducted experimental research on a standard news corpus to examine the impact of two learning algorithms: Pursuit Learning and Nearest Centroid Learning. For a fundamental retrieval operation, namely classification, both algorithms achieved competitive classification effectiveness and efficiency. Subsequently, the impact of the learning exploration rate and the maximum collaboration range on classification effectiveness and efficiency were examined. Close investigation of agent learning dynamics revealed increasing and stabilizing patterns that were enhanced by the learning algorithms. | |||
| UpdateNews: a news clustering and summarization system using efficient text processing | | BIBA | Full-Text | 438-439 | |
| Takaharu Takeda; Atsuhiro Takasu | |||
| This paper proposes a news articles clustering and summarization system. It provides integrated access to news articles from various news sites. The system consists of a crawler, topic detector, and summarizer. This paper describes its efficient summarization technique to handle large amounts of crawled news articles. | |||
| Automatic syllabus classification | | BIBA | Full-Text | 440-441 | |
| Xiaoyan Yu; Manas Tungare; Weiguo Fan; Manuel Perez-Quinones; Edward A. Fox; William Cameron; GuoFang Teng; Lillian Cassel | |||
| Syllabi are important educational resources. However, searching for a syllabus on the Web using a generic search engine is an error-prone process and often yields too many non-relevant links. In this paper, we present a syllabus classifier to filter noise out from search results. We discuss various steps in the classification process, including class definition, training data preparation, feature selection, and classifier building using SVM and Naive Bayes. Empirical results indicate that the best version of our method achieves a high classification accuracy, i.e., an F value of 83% on average. | |||
| Effects of structure and interaction style on distinct search tasks | | BIBA | Full-Text | 442-451 | |
| Robert Capra; Gary Marchionini; Jung Sun Oh; Fred Stutzman; Yan Zhang | |||
| In this paper we present the results of a study that investigates the relationships between search tasks, information architecture, and interaction style. Three kinds of search tasks (simple lookup, complex lookup and exploratory) were performed using three different user interfaces (standard web site, hierarchical text-based faceted interface, and dynamic query faceted interface) for a large-scale public corpus containing semi-structured statistical data and reports. Twenty-eight people conducted the three kinds of searches in a between-subjects study and twelve others conducted the three kinds of searches on all three systems in a within-subjects study. Quantitative results demonstrate that the alternative general-purpose user interfaces that accept automated structuring of data offer comparable effectiveness, efficiency, and aesthetics to manually constructed architectures. Qualitative results demonstrate the manual architectures are favored. | |||
| Towards automatic conceptual personalization tools | | BIBA | Full-Text | 452-461 | |
| Faisal Ahmad; Sebastian de la Chica; Kirsten Butcher; Tamara Sumner; James H. Martin | |||
| This paper describes the results of a study designed to validate the use of domain competency models to diagnose student scientific misconceptions and to generate personalized instruction plans using digital libraries. Digital library resources provided the content base for human experts to construct a domain competency model for earthquakes and plate tectonics encoded as a knowledge map. The experts then assessed student essays using comparisons against the constructed domain competency model and prepared personalized instruction plans using the competency model and digital library resources. The results from this study indicate that domain competency models generated from select digital library resources may provide the desired degree of content coverage to support both automated diagnosis and personalized instruction in the context of nationally-recognized science learning goals. These findings serve to inform the design of personalized instruction tools for digital libraries. | |||
| Mobile G-Portal supporting collaborative sharing and learning in geography fieldwork: an empirical study | | BIBA | Full-Text | 462-471 | |
| Yin-Leng Theng; Kuah-Li Tan; Ee-Peng Lim; Jun Zhang; Dion Hoe-Lian Goh; Kalyani Chatterjea; Chew Hung Chang; Aixin Sun; Han Yu; Nam Hai Dang; Yuanyuan Li; Minh Chanh Vo | |||
| Integrated with G-Portal, a Web-based geospatial digital library of geography resources, this paper describes the implementation of Mobile G-Portal, a group of mobile devices as learning assistant tools supporting collaborative sharing and learning for geography fieldwork. Based on a modified Technology Acceptance Model and a Task-Technology Fit model, an initial study with Mobile G-Portal was conducted involving 39 students in a local secondary school. The findings suggested positive indication of acceptance of Mobile G-Portal for geography fieldwork. The paper concludes with a discussion on technological challenges, recommendations for refinement of Mobile G-Portal, and design implications in general for digital libraries and personal digital assistants supporting mobile learning. | |||
| Highly structured scientific publications | | BIBA | Full-Text | 472 | |
| Robert B. Allen | |||
| Science is a complex, but highly structured, activity. We propose that reports about science would benefit by reflecting that structure. We provide an example based on the research paradigm and we explore more complex examples in which workflow models describe the conceptual model, the research procedure, the data analysis, and the conclusions. | |||
| Cooperative collection building in NSDL MatDL pathway through IVIa data fountains | | BIBA | Full-Text | 473 | |
| Laura M. Bartolo; Cathy S. Lowe; Johannes Ruscheinski; Diane Bisom | |||
| This poster describes a collaboration involving two NSDL projects: the Materials Digital Library Pathway (MatDL) and the iVia Data Fountains Project. MatDL is testing and providing feedback for refinement of the iVia tools while streamlining its metadata assignment process. | |||
| MESUR: usage-based metrics of scholarly impact | | BIBK | Full-Text | 474 | |
| Johan Bollen; Marko A. Rodriguez; Herbert Van de Sompel | |||
Keywords: digital libraries, impact factor, scholarly evaluation, semantic networks,
usage data | |||
| A publisher of last resort: enduring document access | | BIBA | Full-Text | 475 | |
| George Buchanan | |||
| Ensuring long-term access to valuable online content is complicated by legal constraints and practical difficulties. We introduce a new technique for ensuring the long-term availability of digital content on the internet. The technique combines legal and technical measures to guarantee that a document remains available when its original goes offline, either permanently or long-term: a "publisher of last resort". | |||
| Educational application integration with digital repository | | BIBA | Full-Text | 476 | |
| Robert Chavez; Anoop Kumar; Nikolai Schwertner | |||
| The value of a digital repository increases tremendously when applications use the content in innovative ways. Tufts University has developed its repository based on the Fedora framework using the principles of service oriented architecture. The repository features innovative content models allowing the digital objects within the Tufts Digital Repository to be accessible through a variety of applications, including Perseus, Artifact, Tufts Digital Library (TDL) and Visual Understanding Environment (VUE). The poster will present the underlying architecture including latest services and their use in Educational Applications. | |||
| Blogger perceptions on digital preservation | | BIBA | Full-Text | 477 | |
| Carolyn Hank; Songphan Choemprayong; Laura Sheble | |||
| Blogs have emerged as valuable records of current social and political
events. In response, calls in the literature have advocated that these new
vehicles of communication and information dissemination are valuable additions
to the human record worthy of stewardship [1,2,3]. The intent of this research
is to study the requirements and feasibility of impacting stewardship of blogs
at the level of creation. This will be accomplished by surveying blogger
perceptions on digital preservation. Expected outcomes of this study include
the development of a framework for constructing a digital preservation program
for blogs.
A survey will be administered to bloggers to assess perceptions of digital preservation issues as related to their own blogging activities and the blogosphere in general. The instrument is organized into five categories: demographics, awareness, appraisal, impact, and investment. Participants will be recruited through established contacts in the blogging community, with the intent of a resulting snowball effect for gathering additional participation. The demographics section collects basic characteristics of respondents, characteristics of their blogs (e.g., topic areas, platforms, linkages, content types, permissions for reuse), and their blogging practices (e.g., motivations, frequency of updates). The awareness section surveys current preservation-related activities performed by bloggers such as measures taken to ensure duplication of blog content; and whether, why, and how bloggers engage in practices that result in post-publication content changes. The appraisal section assesses perceptions of issues related to persistent storage and access. Respondents are asked to evaluate the importance of researcher-supplied blog characteristics that could be used to appraise blogs and their components. These characteristics include social and cultural factors such as perceived blog popularity, social linkages, and artifactual significance as well as structural components and content types. In addition to seeking clarification of the types and components of blogs that are perceived to be important with respect to preservation, the appraisal section addresses issues related to content ownership. The impact section focuses on the perceived importance of blogs to authors, preserving access to blogs, and blogs as a part of the human record. In the investment section, respondents are asked to quantify resources that they would be willing to expend to preserve their own blogs and their willingness to extend these expenditures to the blogs of others. Data collection will begin April 2007 and continue for one month. Following closure of the survey, data will be analyzed using descriptive statistics and qualitative evaluation methods. An initial assessment report will include a summary analysis of results and initial calls for recommendations. Future works include further development of these recommendations, development of benchmarks for planning ingest of blogs into a repository system, and the design and pilot testing of a user interface for deposit, storage, and access. This research is intended to promote digital preservation activities for continued access to blogs and to raise awareness of digital preservation issues among a population of users removed from the walls of academia and research. Bloggers constitute a significant producer type in that they have produced culturally and socially significant works, including those that contribute to wider public discourse. Furthermore, bloggers have the potential to become significant contributors to the dissemination of preservation awareness because they are vital actors in networks of communities that often span the borders of institutional, commercial, grassroots and personal communications. | |||
| Evolution of a data archive | | BIBK | Full-Text | 478 | |
| Jonathan D. Crabtree; David Sheaves | |||
Keywords: alliances, digital archives, federation, social science data | |||
| Examining perception of digital information space | | BIBA | Full-Text | 479 | |
| John A. D'Ignazio; Joseph D. Ryan; Sarah C. Harwell; Anne R. Diekema; Elizabeth D. Liddy | |||
| A study using a modified think aloud protocol of University of Rochester undergraduate students' interactions with a general, humanities scholarly database helped a research team gain insight into their information-seeking behavior and thus the impact of the digital library. | |||
| Tagging video: conventions and strategies of the YouTube community | | BIBA | Full-Text | 480 | |
| Gary Geisler; Sam Burns | |||
| This poster summarizes the results from a quantitative analysis of the tags and associated metadata used to describe more than one million videos by 537,246 contributors at the YouTube video sharing site. Results from this work suggest methodological and design considerations that could enhance the effectiveness of sharing within communities devoted to online video. | |||
| DRIADE: a data repository for evolutionary biology | | BIBA | Full-Text | 481 | |
| Jed Dube; Sarah Carrier; Jane Greenberg | |||
| NESCent (The National Evolutionary Synthesis Center) is developing DRIADE (Digital Repository of Information and Data for Evolution) to address synthetic research challenges fundamental to advancing the field of evolutionary biology. This poster highlights results from a survey of selected repositories' functionalities, DRIADE's functional requirements, and DRIADE's functional model. We also summarize ongoing research activities, studying evolutionary biologists' data preservation practices and use requirements. | |||
| AlouetteCanada metadata toolkit | | BIBAK | Full-Text | 482 | |
| Mark Jordan | |||
| This poster provides an overview of the AlouetteCanada Metadata Toolkit. Keywords: application profiles, digitization, metadata, standards, tools | |||
| Building a digital library of traditional mongolian historical documents | | BIBA | Full-Text | 483 | |
| Garmaabazar Khaltarkhuu; Akira Maeda | |||
| This paper describes technique of converting modern Mongolian text input to traditional Mongolian script and integrating the result into the Greenstone Digital Library (GSDL). This work is part of on-going research to create a digital library of traditional Mongolian historical documents. | |||
| Evaluating digital libraries with webmetrics | | BIBA | Full-Text | 484 | |
| Michael Khoo; Robert A. Donahue | |||
| We report preliminary lessons from a year of webmetrics research with two digital libraries. Despite the apparent 'plug-and-play-and-report' nature of webmetrics tools, much work was required to extract useful data from the tools used. | |||
| Tagging for health information organisation and retrieval | | BIBK | Full-Text | 485 | |
| Margaret E. I. Kipp | |||
Keywords: health information, social bookmarking, tagging | |||
| Augmenting OAI-PMH repository holdings using search engine APIs | | BIBA | Full-Text | 486 | |
| Martin Klein; Michael L. Nelson; Juliet Z. Pao | |||
| In this poster, we give the preliminary results of our project to acquire Atmospheric Science Data Center (ASDC) project-related web resources, not with focused crawling, but by using the search engine (SE) APIs directly. We aggregate the results and create archive-ready complex objects. | |||
| The cyberinfrastructure for scholars project: componentized architecture for sustainable scholarly portals | | BIBK | Full-Text | 487 | |
| Aaron Krowne; Stacey Martin; Urvashi Gadi; Micah Wedemeyer; Martin Halbert | |||
Keywords: OAI, OXF, componentized architecture, web services | |||
| Social bookmarking in digital library systems: framework and case study | | BIB | Full-Text | 488 | |
| Fiftarina Puspitasari; Ee-Peng Lim; Dion Hoe-Lian Goh; Chew-Hung Chang; Jun Zhang; Aixin Sun; Yin-Leng Theng; Kalyani Chatterjea; Yuanyuan Li | |||
| Automated collection strength analysis | | BIBA | Full-Text | 489 | |
| Clare Lllewellyn; Robert Sanderson; Brian Rea | |||
| The strengths within six library collections were automatically determined through automated enrichment and analysis of bibliographic level metadata records, with a view towards efficient resource sharing and collaborative collection management. This involved very large scale deduplicantion, enrichment and automatic reclassification of records using machine learning processes. | |||
| Digital library education: some international course structure comparisons | | BIBA | Full-Text | 490 | |
| Yongqing Ma; Ann O'Brien; Warwick Clegg | |||
| Following our recent review of progress in Digital Library (DL) education [1], we present here a brief overview of current work to investigate the commonality/diversity of course structure between ten institutions outside North America which offer DL education in their library schools. The weighting of specifically DL module topic credits as a proportion of the overall course taught credits varies between 13% and 63%, and coverage of a proposed core topic set [2] is as high as 85%. | |||
| PIM through a 5S perspective | | BIBK | Full-Text | 491 | |
| Yi Ma; Edward A. Fox; Marcos A. Gonçalves | |||
Keywords: 5S model, cognitive science, intrapersonal communication, personal
information management | |||
| Use vs. access: design and use in educational digital libraries | | BIBA | Full-Text | 492 | |
| Flora McMartin; Brandon Muramatsu | |||
| The poster will compare and contrast the design and usage assumptions of
existing educational digital libraries and repositories to challenge digital
library developers to meet the needs of their increasingly sophisticated users.
Traditionally, assumptions have focused on access to a site and discovery of
content, whereas we define use as content and its application (context,
audience, etc.).
In this poster we will review the assumptions that have driven the design of digital libraries, their services and evaluation. Measures of success such as page views of metadata rest on assumptions associated with access, i.e., the number of times a metadata record is displayed. This measure provides a very limited view of how a digital library is used. We believe that educational digital libraries need to go beyond such a limited view and think about what people actually do with material: Are they using it? Are they returning to it? Are they modifying it? Are they sharing it with others? We will explore an alternate set of metrics for determining the success (or failure) of educational digital libraries by examining metrics focused on use of the contents of educational digital libraries. | |||
| What do faculty need and want from digital libraries? | | BIBA | Full-Text | 493 | |
| Flora P. McMartin; Alan Wolf; Ellen Iverson; Cathrynn Manduca; Glenda Morgan; Joshua Morrill | |||
| In this paper, we report on the results of a national survey of faculty members concerning their use of digital resources (DRs), collections of resources and digital libraries (DLs). The research reported here explored issues such as: value of DRs, motivation for using DRs and barriers to use of these resources in teaching. The results have implications for how DLs might develop education and outreach efforts to increase visibility and use of their collections. | |||
| Building cross-browser interfaces for digital libraries with scalable vector graphics (SVG) | | BIBA | Full-Text | 494 | |
| Francis Molina; Brian Sweeney; Ted Willard; André Winter | |||
| We share our experience with developing interactive, crossbrowser strand maps using SVG. These maps will provide educators with free and easy access to carefully selected instructional resources linked to national and state learning goals. We will show the interface in at least two browsers, Internet Explorer with Adobe's SVG Viewer plug-in and Mozilla Firefox. | |||
| Understanding target users of a digital reference library | | BIBA | Full-Text | 495 | |
| Daniela K. Rosner; John Mark Josling; Andrea Moed; Elisa Oreglia | |||
| Through an investigation of the needs and practices of researchers in the humanities and social sciences, we identify key issues in the use of an online digital reference library, the Electronic Cultural Atlas Initiatives' "Support for the Learner: What, When, Where and Who". In this poster we present an examination of results from survey data and user tasks, and discuss implications for future design and implementation based on our findings. | |||
| Capturing relevant information for digital curation | | BIBK | Full-Text | 496 | |
| Chirag Shah; Gary Marchionini | |||
Keywords: contextual information, digital curation, digital preservation | |||
| Merging the Norwegian gazetteer with the ADL gazetteer | | BIBA | Full-Text | 497 | |
| Øyvind Vestavik; Ingeborg T. Sølvberg | |||
| We report on work in progress on the merging of the Norwegian Gazetteer and the ADL gazetteer to create a gazetteer with both detailed local coverage and global scope suitable for indexing articles from a local newspaper. We describe a mapping on the schema level, a strategy for identifying duplicates in the merged gazeetteer and some identified challenges. | |||
| Information system media education (ISM): cooperating for media literacy | | BIBK | Full-Text | 498 | |
| Heike vom Orde | |||
Keywords: media education, media literacy, network | |||
| Digitizing & providing access to contextual cultural materials: the liner notes digitization project | | BIBA | Full-Text | 499 | |
| Megan Winget | |||
| This paper describes a digitization project focused on developing a metadata modeling schema for album liner notes. | |||
| Use of online digital learning materials and digital libraries: comparison by discipline | | BIBA | Full-Text | 500 | |
| Alan J. Wolf; Ellen R. Iverson; Cathryn Manduca; Flora McMartin; Glenda Morgan; Joshua Morrill | |||
| In this paper, we describe the results of a national survey of higher education faculty concerning their use of digital resources and collections of these resources. We explore the differences in resource use by discipline groups and suggest implications for development of discipline specific libraries and faculty development practices. | |||
| XML as the articulation between information retrieval and multimedia in a musical heritage dissemination | | BIBA | Full-Text | 501 | |
| Rodolphe Bailly | |||
| The Cite de la musique in Paris has recently opened a new media Library. One of the Library's assignments is the dissemination of the Cite de la musique's collection of recorded concerts. This paper presents the concert's description model implemented into the MARC Catalogue and emphasizes the central position in the library information system architecture of automatically generated XML representations of each concert. | |||
| Lightweight realistic books: the greenstone connection | | BIBK | Full-Text | 502 | |
| Veronica Liesaputra; Ian H. Witten; David Bainbridge | |||
Keywords: electronic book, flash application | |||
| Rapid document navigation for information triage support | | BIBA | Full-Text | 503 | |
| George Buchanan | |||
| This paper introduces a novel interaction for supporting rapid between-page comparison of texts within a limited screen estate. In comparison with existing interfaces and interactions, it gives a high degree of visual feedback and allows rapid between-page flicking, similar to what is readily achieved in physical environments using the fingers and thumbs of a reader as they flip between related pages. | |||
| Fluid interaction for the document in context | | BIBA | Full-Text | 504 | |
| Pierre Cubaud; Jérôme Dupire; Alexandre Topol | |||
| We explore in this paper the interface requirements for user's navigation within a mixed collection of 3D digitalized objects and textual documents. A specific application is history of technology where 3D and 2D documents are most of the time inter-related. | |||
| Demonstrating the semantic growbag: automatically creating topic facets for facetedDBLP | | BIBA | Full-Text | 505 | |
| Jörg Diederich; Wolf-Tilo Balke; Uwe Thaden | |||
| The FacetedDBLP demonstrator allows to search computer science publications starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. Furthermore, it uses GrowBag graphs, i.e., automatically created categorization systems, to create a topic facet, with which a user can characterize the result set in terms of main research topics and filter it according to certain subtopics. | |||
| The David L. Bassett stereoscopic atlas of human anatomy: developing a specialized collection within the stanford mediaserver digital library | | BIBA | Full-Text | 506 | |
| Jeremy C. Durack; Amy L. Ladd; Shyh-Yuan Kung; Margaret Krebs; Robert A. Chase; Parvati Dev | |||
| We describe the creation of a specialized media collection in the Stanford MediaServer highlighting the David L. Bassett Stereoscopic Atlas of Human Anatomy. Previous reports have outlined the underlying architecture and features of the MediaServer developed to support biomedical media-based education 1,2. Here we focus on specific design principles and technical aspects of a focused project that may be beneficial to those developing digital media collections. | |||
| Evalutron 6000: collecting music relevance judgments | | BIBK | Full-Text | 507 | |
| Anatoliy A. Gruzd; J. Stephen Downie; M. Cameron Jones; Jin Ha Lee | |||
Keywords: MIREX, music digital libraries, music information retrieval, music
similarity | |||
| VCenter: a digital video management system with mobile search service | | BIBA | Full-Text | 508 | |
| Jen-Hao Hsiao; Yu-Zheng Wang | |||
| Digital video data have proliferated in recent years due to the rapid development of multimedia computing and computer technologies. Management of video data is thus becoming an indispensable part in digital library. However, currently most digital video library systems are lack of the support of content-based video search and an easy-to-use query interface. In this work, we develop a digital video management system called VCenter, which provides lightweight mobile search functionality based on image taken from camera phone. By the proposed framework, both end user and content owner are easier to enjoy the multimedia contents in digital video libraries. | |||
| Creativity support: the mixed-initiative composition space | | BIBA | Full-Text | 509 | |
| Andruid Kerne; Eunyee Koh | |||
| Creativity support is an important and challenging emerging area of research. combinFormation is being developed as a tool to support creativity through a mixed-initiative composition space. The system combines searches and information feeds, representing relevant information collections as compositions of image and text surrogates. The composition space affords human manipulation. This method has been shown to support information discovery in The Design Process, an interdisciplinary undergraduate course. In this demo, we demonstrate how combinFormation can be used to explore and discover information in digital libraries such as ACM Digital Library and the International Children's Digital Library. | |||
| Visual understanding environment | | BIBA | Full-Text | 510 | |
| Anoop Kumar | |||
| The Visual Understanding Environment (VUE) project at Tufts' Academic Technology department aims at providing faculty and students with tools to successfully integrate digital resources into their teaching and learning. VUE not only provides a visual environment for structuring, presenting, and sharing digital information but also viewing digital resources along with their metadata. The demonstration will showcase the federated search capabilities of VUE that enable users to search across multiple digital repositories. We will also present concept maps created using digital objects from repositories. | |||
| Mobile digital libraries for geography education | | BIB | Full-Text | 511 | |
| Minh-Chanh Vo; Fiftarina Puspitasari; Ee-Peng Lim; Chew-Hung Chang; Yin-Leng Theng; Dion Hoe-Lian Goh; Kalyani Chatterjea; Jun Zhang; Aixin Sun; Yuanyuan Li | |||
| The internet public library: an online learning laboratory for digital libraries | | BIBA | Full-Text | 512 | |
| Lorri Mon; Larry Dennis; Kyunghye Kim | |||
| This demonstration explores the Internet Public Library (www.ipl.org), a shared online facility for testing innovations in digital libraries and for training a skilled work force in digital library services, systems, and collections. Hypatia 2.0 and QRC software used in IPL's digital library collections and services are shown, with discussion of IPL in education, digital collections, digital reference services, digital library systems, and research. | |||
| 5SQual: a quality assessment tool for digital libraries | | BIBK | Full-Text | 513 | |
| Bárbara Lagoeiro Moreira; Marcos André Gonçalves; Alberto Henrique Frade Laender; Edward A. Fox | |||
Keywords: 5S, 5SQual, digital libraries, quality assessment | |||
| ContextMiner: a tool for digital library curators | | BIBK | Full-Text | 514 | |
| Chirag Shah; Gary Marchionini | |||
Keywords: contextual information, digital curation, digital preservation | |||
| From kinescope to rich media: 50 years (ago) with Mike Wallace | | BIBA | Full-Text | 515 | |
| Quinn Stewart; Grete Pasch; Rodrigo Arias | |||
| What do Eleanor Roosevelt, Frank Lloyd Wright, Margaret Sanger, and Henry Kissinger have in common? All of them, and 55 other celebrities were interviewed by Mike Wallace in 1957-58, and the corresponding kinescopes have resided in the Harry Ransom Humanities Research Center at the University of Texas since the early 1960's. This demonstration will showcase an online searchable video digital library of the Wallace interviews created by researchers, staff, and students at the University of Texas School of Information and the Universidad Francisco Marroquin. | |||