HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

JCDL'11: Proceedings of the 2011 Joint International Conference on Digital Libraries

Fullname:Proceedings of the 2011 joint international conference on Digital libraries
Editors:Glen Newton; Michael Wright; Lillian Cassel
Location:Ottawa, Ontario, Canada
Dates:2011-Jun-13 to 2011-Jun-17
Publisher:ACM
Standard No:ISBN 1-4503-0744-2, 978-1-4503-0744-4; ACM DL: Table of Contents hcibib: DL11
Papers:102
Pages:482
Links:Conference Home Page
  1. Automated methods to help our understanding of texts
  2. Improving collection management
  3. Approaches to preservation and archiving
  4. Methods for extracting new order from analyzing content and use
  5. Finding web content again -- is it possible?
  6. First, the news followed by art beat
  7. How understanding rights impacts access and use
  8. On the formality, or not, of annotations
  9. Show me a New way to view and discover
  10. Author linkages, a necessary foundation for collaboration
  11. Improving impact by understanding users' information needs and strategies
  12. Continuing the work on improving recommendation
  13. Can domain practice Inform better digital library query interaction?
  14. Methods improving information extracted from collections
  15. Dealing with data repositories: approaches and issues to consider
  16. Poster session
  17. Demonstration session

Automated methods to help our understanding of texts

Measuring historical word sense variation BIBAFull-Text 1-10
  David Bamman; Gregory Crane
We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are able to automatically classify the Latin word senses in a 389 million word corpus and track the rise and fall of those senses over a span of two thousand years. We evaluate the performance of seven different classifiers both in a tenfold test on 83,892 words from the aligned parallel corpus and on a smaller, manually annotated sample of 525 words, measuring both the overall accuracy of each system and how well that accuracy correlates (via mean square error) to the observed historical variation.
Structure extraction from PDF-based book documents BIBAFull-Text 11-20
  Liangcai Gao; Zhi Tang; Xiaofan Lin; Ying Liu; Ruiheng Qiu; Yongtao Wang
Nowadays PDF documents have become a dominating knowledge repository for both the academia and industry largely because they are very convenient to print and exchange. However, the methods of automated structure information extraction are yet to be fully explored and the lack of effective methods hinders the information reuse of the PDF documents. To enhance the usability for PDF-formatted electronic books, we propose a novel computational framework to analyze the underlying physical structure and logical structure. The analysis is conducted at both page level and document level, including global typographies, reading order, logical elements, chapter/section hierarchy and metadata. Moreover, two characteristics of PDF-based books, i.e., style consistency in the whole book document and natural rendering order of PDF files, are fully exploited in this paper to improve the conventional image-based structure extraction methods. This paper employs the bipartite graph as a common structure for modeling various tasks, including reading order recovery, figure and caption association, and metadata extraction. Based on the graph representation, the optimal matching (OM) method is utilized to find the global optima in those tasks. Extensive benchmarking using real-world data validates the high efficiency and discrimination ability of the proposed method.
Word order matters: measuring topic coherence with lexical argument structure BIBAFull-Text 21-24
  Steve Spagnola; Carl Lagoze
Topic models are emerging tools for improved browsing and searching within digital libraries. These techniques collapse words within documents into unordered "bags of words," ignoring word order. In this paper, we present a method that examines syntactic dependency parse trees from Wikipedia article titles to learn expected patterns between relative lexical arguments. This process is highly dependent on the global word ordering of a sentence, modeling how each word interacts with other words to gain an aggregate perspective on how words interact over all 3.2 million titles. Using this information, we analyze how coherent a given topic is by comparing the relative usage vectors between the top 5 words in a topic. Results suggest that this technique can identify poor topics based on how well the relative usages align with each other within a topic, potentially aiding digital library indexing.
Phrases as subtopical concepts in scholarly text BIBAFull-Text 25-28
  Asif-ul Haque; Paul Ginsparg
Retrieval of subtopical concepts from scholarly communication systems is now possible through a combination of text and metadata analysis, augmented by user search queries and click logs. Here we investigate how a "phrase", defined as a variable length sequence of vocabulary words, can be used to represent a concept. We present a method to extract such phrases from a text corpus, and rank them using a citation network measure, the compensated normalized link count (CNLC), which measures the extent to which they are propagated along the citation structure of articles. We validate the ranking with actively and passively determined metrics: comparison with human-assigned keywords, and comparison with passively harvested terms from search query logs. This method is demonstrated on full texts and abstracts from 7 years of high energy physics articles from the arXiv preprint database.

Improving collection management

Game development documentation and institutional collection development policy BIBAFull-Text 29-38
  Megan A. Winget; Wiliam Walker Sampson
Videogames and other new media artifacts constitute an important part of our cultural and economic landscape and collecting institutions have a responsibility to collect and preserve these materials for future access. Unfortunately, these kinds of materials present unique challenges for collecting institutions including problems of collection development, technological preservation, and access. This paper presents findings from a grant-funded project focused on examining documentation of the creative process in game development. Data includes twelve qualitative interviews conducted with individuals involved in the game development process, spanning a number of different roles and institution types. The most pressing findings are related to the nature of documentation in the videogame industry: project interviews indicate that the game development process does produce significant and important documentation as traditionally conceived by collecting institutions, ranging from game design documents to email correspondence and business reports. However, while it does exist, traditional documentation does not adequately, or even, at times, truthfully represent the project or the game creation process as a whole. In order to adequately represent the development process, collecting institutions also need to seek out and procure numerous versions of games and game assets as well as those game assets that are natural byproducts of the design process like gamma and beta versions of the game, for example, vertical slices, or different renderings of graphical elements.
That's 'é' not 'þ' '?' or '◓': a user-driven context-aware approach to erroneous metadata in digital libraries BIBAFull-Text 39-48
  David Bainbridge; Michael B. Twidale; David M. Nichols
In this paper we present a novel system for user-driven integration of name variants when interacting with web-based information systems. The growth and diversity of online information means that many users experience disambiguation and collocation errors in their information searching. We approach these issues via a client-side JavaScript browser extension that can reorganise web content and also integrate remote data sources. The system is illustrated through three worked examples using existing digital libraries.
FRBR and facets provide flexible, work-centric access to items in library collections BIBAFull-Text 49-52
  Kelley McGrath; Bill Kules; Chris Fitzpatrick
This paper explores a technique to improve searcher access to library collections by providing a faceted search interface built on a data model based on the Functional Requirements for Bibliographic Records (FRBR). The prototype provides a Work-centric view of a moving image collection that is integrated with bibliographic and holdings data. Two sets of facets address important user needs: "what do you want?" and "how/where do you want it?" enabling patrons to narrow, broaden and pivot across facet values instead of limiting them to the tree-structured hierarchy common with existing FRBR applications. The data model illustrates how FRBR is being adapted and applied beyond the traditional library catalog.
What do you call it?: a comparison of library-created and user-created tags BIBAFull-Text 53-56
  Catherine E. Hall; Michael A. Zarro
In this paper, we describe an exploratory study comparing the abstracting and indexing practices of a semi-expert LIS community (metadata creators for the digital library, ipl2) and the social tags generated by Delicious users for the same corpus of materials. We find over 88% of the resources in the ipl2 History collection were tagged at least once in Delicious. Overlap between the tags applied to ipl2 resources and indexing shows terms that the two groups are similar enough to be useful, yet dissimilar enough to provide new access points and description.

Approaches to preservation and archiving

Extending digital repository architectures to support disk image preservation and access BIBAFull-Text 57-66
  Kam Woods; Christopher A. Lee; Simson Garfinkel
Disk images (bitstreams extracted from physical media) can play an essential role in the acquisition and management of digital collections by serving as containers that support data integrity and chain of custody, while ensuring continued access to the underlying bits without depending on physical carriers. Widely used today by practitioners of digital forensics, disk images can serve as baselines for comparison for digital preservation activities, as they provide fail-safe mechanisms when curatorial actions make unexpected changes to data; enable access to potentially valuable data that resides below the file system level; and provide options for future analysis. We discuss established digital forensics techniques for acquiring, preserving and annotating disk images, provide examples from both research and educational collections, and describe specific forensic tools and techniques, including an object-oriented data packaging framework called the Advanced Forensic Format (AFF) and the Digital Forensics XML (DFXML) metadata representation.
Preservation decisions: terms and conditions apply BIBAFull-Text 67-76
  Christoph Becker; Andreas Rauber
Decisions in digital preservation pose the delicate mission of balancing desired goals of authentic long-term access with the technical means available to date. Organisations with a commitment to the long-term value of information and knowledge have to take decisions on several levels to achieve their business goals with the evolving technology of the day. This article explores the decision space in digital preservation, with a focus on what can be called the core decision: how to preserve content information. We undertake a critical analysis of the challenges, constraints and objectives of decision making, and discuss the experience in applying the Planets preservation planning method, supported by the planning tool Plato, to real-world business decisions. Based on this methodology and substantial real-world experience in decision making, we present a set of observation points that address issues frequently raised in decision making. The conclusions shall contribute to a clarified understanding of the state of the art and future challenges in scalable decision making for long-term preservation.
Ember: a case study of a digital memorial museum of born-digital artifacts BIBAFull-Text 77-80
  Paul Logasa, II Bogen; Richard Furuta
This paper discusses the creation of Ember, a collection of borndigital artifacts generated in the aftermath of the 1999 Aggie Bonfire collapse. Ember is an example of a previously unexamined class of cultural heritage digital libraries, which we describe as a digital memorial museum. Ember's artifacts consist of emails, photos, documents, and web pages that the communities surrounding the tragedy created. Due to the community investment and the personal nature of the artifacts, concerns arise on how the collection should be properly handled, which leads us to propose "Sensitivity" as an addition to the 5S model. Initially, we are focusing on the email portion of the collection, which can be viewed as the basis of an emerging oral tradition surrounding the Bonfire tragedy.
SimDL: a model ontology driven digital library for simulation systems BIBAFull-Text 81-84
  Jonathan Leidig; Edward A. Fox; Kevin Hall; Madhav Marathe; Henning Mortveit
We propose a digital library design to support epidemic and public health simulation experiments in which model ontologies direct collection organization, user interface construction, and discovery. We have developed a SimDL instantiation of the ontological design tailored for a typical experimentation workflow. SimDL relies on an XML Schema description of a simulation model to form a domain and model specific ontology. We show this approach useful in building digital libraries to support collaborative simulation efforts.

Methods for extracting new order from analyzing content and use

Eliminating the redundancy in blocking-based entity resolution methods BIBAFull-Text 85-94
  George Papadakis; Ekaterini Ioannou; Claudia Niederée; Themis Palpanas; Wolfgang Nejdl
Entity resolution is the task of identifying entities that refer to the same real-world object. It has important applications in the context of digital libraries, such as citation matching and author disambiguation. Blocking is an established methodology for efficiently addressing this problem; it clusters similar entities together, and compares solely entities inside each cluster. In order to effectively deal with the current large, noisy and heterogeneous data collections, novel blocking methods that rely on redundancy have been introduced: they associate each entity with multiple blocks in order to increase recall, thus increasing the computational cost, as well.
   In this paper, we introduce novel techniques that remove the superfluous comparisons from any redundancy-based blocking method. They improve the time-efficiency of the latter without any impact on the end result. We present the optimal solution to this problem that discards all redundant comparisons at the cost of quadratic space complexity. For applications with space limitations, we also present an alternative, lightweight solution that operates at the abstract level of blocks in order to discard a significant part of the redundant comparisons. We evaluate our techniques on two large, real-world data sets and verify the significant improvements they convey when integrated into existing blocking methods.
Detecting and exploiting stability in evolving heterogeneous information spaces BIBAFull-Text 95-104
  George Papadakis; George Giannakopoulos; Claudia Niederée; Themis Palpanas; Wolfgang Nejdl
Individuals contribute content on the Web at an unprecedented rate, accumulating immense quantities of (semi-)structured data. Wisdom of the Crowds theory advocates that such information (or parts of it) is constantly overwritten, updated, or even deleted by other users, with the goal of rendering it more accurate, or up-to-date. This is particularly true for the collaboratively edited, semi-structured data of entity repositories, whose entity profiles are consistently kept fresh. Therefore, their core information that remain stable with the passage of time, despite being reviewed by numerous users, are particularly useful for the description of an entity.
   Based on the above hypothesis, we introduce a classification scheme that predicts, on the basis of statistical and content patterns, whether an attribute (i.e., name-value pair) is going to be modified in the future. We apply our scheme on a large, real-world, versioned dataset and verify its effectiveness. Our thorough experimental study also suggests that reducing entity profiles to their stable parts conveys significant benefits to two common tasks in computer science: information retrieval and information integration.
Classification of user interest patterns using a virtual folksonomy BIBAFull-Text 105-108
  Ricardo Kawase; Eelco Herder
User interest in topics and resources is known to be recurrent and to follow specific patterns, depending on the type of topic or resource. Traditional methods for predicting reoccurring patterns are based on ranking and associative models. In this paper we identify several 'canonical' patterns by clustering keywords related to visited resources, making use of a large repository of Web usage data. The keywords are derived from a 'virtual' folksonomy of tags assigned to these resources using a collaborative bookmarking system.
Tags in domain-specific sites: new information? BIBAFull-Text 109-112
  Jeremy Steinhauer; Lois M. L. Delcambre; David Maier; Marianne Lykke; Vu H. Tran
If researchers use tags in retrieval applications they might assume, implicitly, that tags represent novel information, e.g., when they attribute performance improvement in their retrieval algorithm(s) to the use of tags. In this work, we investigate whether this assumption is true. We focus on the use of tags in domain-specific websites because such websites are more likely to have a coherent, discernible website structure and because the users that are searching for and tagging pages in such a site may have specific information needs (as opposed to the broad range of information needs that users have when browsing/searching the Internet at large). For this study, we assume that the application of the same tag to multiple pages provides an indication that those pages are related. To determine whether this indication of relatedness is contributing new information, we first measure whether pages with common tag(s) could have been deemed as related based on site structure as measured by shortest navigational distance between pages. Second, we measure whether or not tags could have been determined algorithmically based on standard tf-idf scores of terms on the page. Based on our analysis of two different sites, we found that tags contribute novel information that is not discernible from site structure or site/page content.

Finding web content again -- is it possible?

Archiving the web using page changes patterns: a case study BIBAFull-Text 113-122
  Myriam Ben Saad; Stéphane Gançarski
A pattern is a model or a template used to summarize and describe the behavior (or the trend) of a data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend), or more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive web sites. We first define our pattern model that describes the changes of pages. Then, we present the strategy used to (i) extract the temporal evolution of page changes, to (ii) discover patterns and to (iii) exploit them to improve web archives. We choose the archive of French public TV channels « France Télévisions » as a case study in order to validate our approach. Our experimental evaluation based on real web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.
On identifying academic homepages for digital libraries BIBAFull-Text 123-132
  Sujatha Das Gollapalli; C. Lee Giles; Prasenjit Mitra; Cornelia Caragea
Academic homepages are rich sources of information on scientific research and researchers. Most researchers provide information about themselves and links to their research publications on their homepages. In this study, we address the following questions related to academic homepages: (1) How many academic homepages are there on the web? (2) Can we accurately discriminate between academic homepages and other webpages? and (3) What information can be extracted about researchers from their homepages? For addressing the first question, we use mark-recapture techniques commonly employed in biometrics to estimate animal population sizes. Our results indicate that academic homepages comprise a small fraction of the Web making automatic methods for discriminating them crucial. We study the performance of content-based features for classifying webpages. We propose the use of topic models for identifying content-based features for classification and show that a small set of LDA-based features out-perform term features selected using traditional techniques such as aggregate term frequencies or mutual information. Finally, we deal with the extraction of name and research interests information from an academic homepage. Term-topic associations obtained from topic models are used to design a novel, unsupervised technique to identify short segments corresponding to research interests of the researchers specified in academic homepages. We show the efficacy of our proposed methods on all the three tasks by experimentally evaluating them on multiple publicly-available datasets.
How much of the web is archived? BIBAFull-Text 133-136
  Scott G. Ainsworth; Ahmed Alsum; Hany SalahEldeen; Michele C. Weigle; Michael L. Nelson
The Memento Project's archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the inevitable question that comes to mind is "How much of the Web is archived?" This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%-90% of URIs have at least one archived copy, 17%-49% have two to five copies, 1%-8% have six to ten copies, and 8%-63% at least ten copies. The number of URI copies varies as a function of time, but only 14.6-31.3% of URIs are archived more than once per month.
Rediscovering missing web pages using link neighborhood lexical signatures BIBAFull-Text 137-140
  Martin Klein; Jeb Ware; Michael L. Nelson
For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the "aboutness" of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constructing a lexical signature for a page from its link neighborhood, that is the "backlinks", or pages that link to the missing page. After testing various methods, we show that one can construct a lexical signature for a missing web page using only ten backlink pages. Further, we show that only the first level of backlinks are useful in this effort. The text that the backlinks use to point to the missing page is used as input for the creation of a four-word lexical signature. That lexical signature is shown to successfully find the target URI in more than half of the test cases.

First, the news followed by art beat

That's news to me: the influence of perceived gratifications and personal experience on news sharing in social media BIBAFull-Text 141-144
  Long Ma; Chei Sian Lee; Dion Hoe-Lian Goh
Sharing news is a popular activity in social media and influence both individuals and society. However, little empirical research has been conducted to explore the motivations underlying users' news sharing behavior. Adopting the uses and gratifications perspective, this study examined the role of gratification factors and user experience in explaining users' news sharing intention on social media. Hierarchical regression was employed to analyze the data collected from 144 undergraduate and graduate students. The results show that status seeking was the strongest motivation in predicting news sharing intention, followed by sociality and informativeness. However, entertainment/escapism was not a significant predictor in contrast to prior work. Further, we examined user experience in predicting news sharing intention and identified it as a significant factor.
Facilitating content creation and content research in building the city of lit digital library BIBAFull-Text 145-148
  Haowei Hsieh; Bridget Draxler; Nicole Dudley; Jim Cremer; Lauren Haldeman; Dat Nguyen; Peter Likarish; Jon Winet
In conjunction with Iowa City's designation as a UNESCO "City of Literature," an interdisciplinary research team at The University of Iowa collaborated to develop a digital library featuring important Iowa City authors and locations. The "City of Lit" digital library consists of a mobile application for the general public and a set of web-based interfaces for researchers and content creators. This paper explains the motivation and describes the design and implementation of the digital library, its framework, the user-side mobile app and our future plans. We also outline a pilot study, in which undergraduate students conducted scholarly research and created content for the digital collection.
Towards a new reading experience via semantic fusion of text and music BIBAFull-Text 149-152
  Ling Zhuang; Zhenchao Ye; Jiangqin Wu; Feng Zhou; Jian Shao
In CADAL, there preserve a lot of Chinese classical literatures, including graceful prose and verse. These works written in ancient Chinese comparatively are concise in vocabulary and sentence patterns. But they express rich feelings and convey a wealth of information. Although can be explained in modern Chinese, the aesthetic sense in those works disappears. So we aim to illustrate the feeling in these works using Chinese traditional music which is also another part of Chinese culture. This is an interesting and challenging work. In this paper, the correlation between the text and music is studied. A novel approach is proposed to model the latent semantic association underlying the two medium. Based on the correlation model we learned from training data, we can associate a literary work (mainly verse and prose in our digital library) with a few music pieces automatically. When a reader is appreciating a literary work, a piece of background music is playing meanwhile, the information and emotion implied by the work and music blend together. The reader may be immersed into the emotion and obtain aesthetic enjoyment intensively. We implement the proposed method and design experiments to evaluate the performance of it. The experimental result substantiates the feasibility of the proposed approach in this paper.
Indexing musical pieces using their major repetition BIBAFull-Text 153-156
  Benjamin Martin; Pierre Hanna; Matthias Robine; Pascal Ferraro
With the growing presence of large collections of musical content, methods for facilitating efficient browsing and fast comparisons of audio pieces become more and more useful. Notably, methods that isolate relevant parts in audio pieces give an insight of the musical content and can be used to improve similarity evaluation systems. In this context, we propose an indexing method that allows retrieving in audio signals particular parts, namely a major repetition. We use harmonic representations together with string matching techniques to strictly define and isolate such segments. Experiments on state-of-the-art structural datasets show a strong correlation between the retrieved parts and the perceived structure of pieces.

How understanding rights impacts access and use

The ownership and reuse of visual media BIBAFull-Text 157-166
  Catherine C. Marshall; Frank M. Shipman
This paper presents the results of a study of the ownership and reuse of visual media. A survey was administered to 250 social media-savvy respondents to investigate their attitudes about saving, sharing, publishing, and removing online photos; the survey also explored participants' current photo-sharing and reuse practices, and their general expectations of photo reuse. Our probe of respondent attitudes revealed that respondents felt: (1) people should be able to save visual media, regardless of its source; (2) people have slightly less right to reuse photos than they do to save them; (3) a photo's subject has a slightly greater right than the photographer to reuse the photo in non-commercial situations; (4) removal is controversial, but trends more positive when it involves only metadata (e.g. tags); and (5) access to institutional archives of personal photos is better deferred for 50 years. Participants explained their own reuse of online photos in pragmatic terms that included the nature of the content, the aim and circumstances of reuse, their sense of the photo's original use, and their understanding of existing laws and restrictions. In the abstract, the same general question revealed a 'reuse paradox'; while respondents trust themselves to make this judgment, they do not trust the reciprocal judgment of unknown others.
Using national bibliographies for rights clearance BIBAFull-Text 167-174
  Nuno Freire; Andreas Juffinger
In the process of digitizing a book, a library needs to clear the rights associated with it. Rights clearance is a time consuming process, and possibly, with higher costs than the actual digitization. To analyze the rights situation, a range of information is required, which is distributed across several national databases hosted in national libraries, publishers and collective rights organizations. National bibliographies are key data sources in these processes, as they are the only source to identify all the publications of a specific intellectual work per country. However, national bibliographies are not built for rights clearance purposes. The information in bibliographic records results from cataloguing practices with users and library management in mind, and links between different publications of a single intellectual work are not available. This paper presents a study on the implications of data quality problems of national bibliographies for the identification of all publications of a work. It also presents an approach for work data extraction and matching based on similarity of the most discriminatory attributes of works. Evaluation has shown that the data quality problems are difficult to overcome, as our best approach achieved an F0,5-measure of 0,91. These results help to speed up the process of discovering all relevant publications per work significantly, with sufficient recall.

On the formality, or not, of annotations

SharedCanvas: a collaborative model for medieval manuscript layout dissemination BIBAFull-Text 175-184
  Robert Sanderson; Benjamin Albritton; Rafael Schwemmer; Herbert Van de Sompel
In this paper we present a model based on the principles of Linked Data that can be used to describe the interrelationships of images, texts and other resources to facilitate the interoperability of repositories of medieval manuscripts or other culturally important handwritten documents. The model is designed from a set of requirements derived from the real world use cases of some of the largest digitized medieval content holders, and instantiations of the model are intended as the input to collection-independent page turning and scholarly presentation interfaces. A canvas painting paradigm, such as in PDF and SVG, was selected based on the lack of a one to one correlation between image and page, and to fulfill complex requirements such as when the full text of a page is known, but only fragments of the physical object remain. The model is implemented using technologies such as OAI-ORE Aggregations and OAC Annotations, as the fundamental building blocks of emerging Linked Digital Libraries. The model and implementation are evaluated through prototypes of both content providing and consuming applications. Although the system was designed from requirements drawn from the medieval manuscript domain, it is applicable to any layout-oriented presentation of images of text.
Use of subimages in fish species identification: a qualitative study BIBAFull-Text 185-194
  Uma Murthy; Lin Tzy Li; Eric Hallerman; Edward A. Fox; Manuel A. Perez-Quinones; Lois M. Delcambre; Ricardo da S. Torres
Many scholarly tasks involve working with subdocuments, or contextualized fine-grain information, i.e., with information that is part of some larger unit. A digital library (DL) facilitates management, access, retrieval, and use of collections of data and metadata through services. However, most DLs do not provide infrastructure or services to support working with subdocuments. Superimposed information (SI) refers to new information that is created to reference subdocuments in existing information resources. We combine this idea of SI with traditional DL services, to define and develop a DL with SI (SI-DL). We explored the use of subimages and evaluated the use of SuperIDR, a prototype SI-DL, in fish species identification, a scholarly task that involves working with subimages. The contexts and strategies of working with subimages in SuperIDR suggest new and enhanced support (SI-DL services) for scholarly tasks that involve working with subimages, including new ways of querying and searching for subimages and associated information. The main conceptual contributions of our work are the insights gained from these findings of the use of subimages and of SuperIDR, which lead to recommendations for the design of digital libraries with superimposed information.
Persistent annotations deserve new URIs BIBAFull-Text 195-198
  Abdulla Alasaadi; Michael L. Nelson
Some digital libraries support annotations, but sharing these annotations with other systems or across the web is difficult because of the need of special applications to read and decode these annotations. Due to the frequent change of web resources, the annotation's meaning can change if the underlying resources change. This project concentrates on minting a new URI for every annotation and creating a persistent and independent archived version of all resources. Users should be able to select a segment of an image or a video to be part of the annotation. The media fragment URIs described in the Open Annotation Collaboration data model can be used, but in practice they have limits, and they face the lack of support by the browsers. So in this project the segments of images, and videos can be used in the annotations without using media fragment URIs.
Semantically augmented annotations in digitized map collections BIBAFull-Text 199-202
  Rainer Simon; Bernhard Haslhofer; Werner Robitza; Elaheh Momeni
Historic maps are valuable scholarly resources that record information often retained by no other written source. With the YUMA Map Annotation Tool we want to facilitate collaborative annotation for scholars studying historic maps, and allow for semantic augmentation of annotations with structured, contextually relevant information retrieved from Linked Open Data sources. We believe that the integration of Web resource linkage into the scholarly annotation process is not only relevant for collaborative research, but can also be exploited to improve search and retrieval. In this paper, we introduce the COMPASS Experiment, an ongoing crowdsourcing effort in which we are collecting data that can serve as a basis for evaluating our assumption. We discuss the scope and setup of the experiment framework and report on lessons learned from the data collected so far.

Show me a New way to view and discover

Integrating implicit structure visualization with authoring promotes ideation BIBAFull-Text 203-212
  Andrew M. Webb; Andruid Kerne
We need to harness the growing wealth of information in digital libraries to support intellectual work involving creative and exploratory processes. Prior research on hypertext authoring shifted the focus from explicit structure to direct presentation of content aided by "implicit" spatial representation of structure. We likewise shift the field of information visualization. Using hypertext's rubric, we redefine what most people think of as "information visualization" as explicit structure visualization. We alternatively address implicit structure visualization, presenting content directly, representing structure with spatiality and other visual features. We integrate authoring to emphasize the role of human thought in learning and ideation. Prior research has shown that people iteratively collect and organize information by clipping magazines, piling clippings in somewhat messy ways, and organizing them. MessyOrganizer is an iterative implicit structure visualization algorithm which, like human practice, gradually collects and organizes information clippings. Content is depicted directly. Structural relationships are visualized implicitly through spatial positioning of related elements, with overlap and translucence. The simulated annealing algorithm is applied to a model of semantic relatedness over a spatial grid. We develop an experiment comparing products created with the integrated environment versus separated visualization and authoring spaces. Results reveal that participants have more novel and varied ideas when visualization is integrated with authoring.
Visualizing collaboration networks implicit in digital libraries using OntoStarFish BIBAFull-Text 213-222
  J. Alfredo Sánchez; Ofelia Cervantes; Alfredo Ramos; María Auxilio Medina; Juan Carlos Lavariega; Eric Balam
This paper presents the design rationale and initial findings derived from preliminary usage of OntoStarFish, a visualization technique aimed at taking advantage of implicit relationships that can be inferred from large collections of documents in digital libraries. OntoStarFish makes such relationships explicit so users may visualize them and detect potential collaboration networks. Users that may be interested in exploring collaboration networks include researchers looking for partners for specific projects as well as funding agencies concerned with the strength of associations among participants of competing proposals. OntoStarFish is based upon the use of multiple fisheye views that can be placed on top of starfields, dynamic scatter plots for which each axis is determined by a lightweight ontology of attributes associated to potential collaborators.
A link-based visual search engine for Wikipedia BIBAFull-Text 223-226
  David N. Milne; Ian H. Witten
This paper introduces HMpara, a new search engine that aims to make Wikipedia easier to explore. It works on top of the encyclopedia's existing link structure, abstracting away from document content and allowing users to navigate the resource at a higher level. It utilizes semantic relatedness measures to emphasize articles and connections that are most likely to be of interest, visualization to expose the structure of how the available information is organized, and lightweight information extraction to explain itself.
Supporting revisitation with contextual suggestions BIBAFull-Text 227-230
  Ricardo Kawase; George Papadakis; Eelco Herder
Web browsers provide only little support for users to revisit pages that they do not visit very often. We developed a browser toolbar that reminds users of visited pages related to the page that they currently viewing. The recommendation method combines ranking with propagation methods. A user evaluation shows that on average 22.7% of the revisits were triggered by the toolbar, a considerable change on the participants' revisitation routines. In this paper we discuss the value of the recommendations and the implications derived from the evaluation.

Author linkages, a necessary foundation for collaboration

CollabSeer: a search engine for collaboration discovery BIBAFull-Text 231-240
  Hung-Hsuan Chen; Liang Gou; Xiaolong Zhang; Clyde Lee Giles
Collaborative research has been increasingly popular and important in academic circles. However, there is no open platform available for scholars or scientists to effectively discover potential collaborators. This paper discusses CollabSeer, an open system to recommend potential research collaborators for scholars and scientists. CollabSeer discovers collaborators based on the structure of the coauthor network and a user's research interests. Currently, three different network structure analysis methods that use vertex similarity are supported in CollabSeer: Jaccard similarity, cosine similarity, and our relation strength similarity measure. Users can also request a recommendation by selecting a topic of interest. The topic of interest list is determined by CollabSeer's lexical analysis module, which analyzes the key phrases of previous publications. The CollabSeer system is highly modularized making it easy to add or replace the network analysis module or users' topic of interest analysis module. CollabSeer integrates the results of the two modules to recommend collaborators to users. Initial experimental results over a subset of the CiteSeerX database show that CollabSeer can efficiently discover prospective collaborators.
Resolving author name homonymy to improve resolution of structures in co-author networks BIBAFull-Text 241-250
  Theresa A. Velden; Asif-ul Haque; Carl Lagoze
We investigate how author name homonymy distorts clustered large-scale co-author networks, and present a simple, effective, scalable and generalizable algorithm to ameliorate such distortions. We evaluate the performance of the algorithm to improve the resolution of mesoscopic network structures, that is those meso-level structures of a network resulting from groupings of nodes and their interlinking. To this end, we establish the ground truth for a sample of author names that is statistically representative of different types of nodes in the co-author network, distinguished by their role for the connectivity of the network. We finally observe that this distinction of node roles based on the mesoscopic structure of the network, in combination with a quantification of the commonality of last names, suggests a new approach to assess network distortion by homonymy and to analyze the reduction of distortion in the network after disambiguation, without requiring ground truth sampling.
Ranking authors in digital libraries BIBAFull-Text 251-254
  Sujatha Das Gollapalli; Prasenjit Mitra; C. Lee Giles
Searching for people with expertise on a particular topic also known as expert search is a common task in digital libraries. Most models for this task use only documents as evidence for expertise while ranking people. In digital libraries, other sources of evidence are available such as a document's association with venues and citation links with other documents. We propose graph-based models that accommodate multiple sources of evidence in a PageRank-like algorithm for ranking experts. Our studies on two publicly-available datasets indicate that our model despite being general enough to be directly useful for ranking other types of objects performs on par with probabilistic models commonly used for expert ranking.
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag BIBAFull-Text 255-258
  Bela Gipp; Norman Meuschke; Joeran Beel
Various approaches for plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized documents based on the citations used in the text. It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Detection rates can be improved by combining citation-based with text-based plagiarism detection.

Improving impact by understanding users' information needs and strategies

Understanding digital library adoption: a use diffusion approach BIBAFull-Text 259-268
  Keith E. Maull; Manuel Gerardo Saldivar; Tamara Sumner
With the growth in operational digital libraries, the need for automatic methods capable of characterizing adoption and use has grown. We describe a computational methodology for producing two, inter-related, user typologies based on use diffusion. Use diffusion theory views technology adoption as a process that can lead to widely different patterns of use across a given population of potential users; these models use measures of frequency and variety to characterize and describe these usage patterns. The methodology uses computational techniques such as clickstream entropy and clustering to produce both coarse-grained and fine-grained user typologies. A case study demonstrates the utility and applicability of the method: it is used to understand how middle and high school science teachers participating in an academic year-long field trial adopted and integrated digital library resources into their instructional planning and teaching. The resulting fine-grained user typology identified five different types of teacher-users, including "interactive resource specialists" and "community seeker specialists" This typology was validated through comparison with qualitative and quantitative data collected using traditional educational field research methods.
In the bookshop: examining popular search strategies BIBAFull-Text 269-278
  George Buchanan; Dana McKay
Users' search tactics often appear naïve. Much research has endeavored to understand the rudimentary query typically seen in log analyses and user studies. Researchers have tested a number of approaches to supporting query development, including information literacy training and interaction design these have tried and often failed to induce users to use more complex search strategies. To further investigate this phenomenon, we combined established HCI methods with models from cultural studies, and observed customers' mediated searches for books in bookstores. Our results suggest that sophisticated search techniques demand mental models that many users lack.
World vs. method: educational standard formulation impacts document retrieval BIBAFull-Text 279-282
  Byron Marshall; René Reitsma
Although initiatives are underway in the educational community to consolidate disparate collections of educational standards, little has been done to explore the impact of educational standard formulation on information retrieval. Recent research contrasts two categories of educational standards: 'World' (topical domain-related concepts) and 'Method' (investigative and epistemological principles). This paper explores the information retrieval implications of the World vs. Method distinction. We find that experts are more likely to agree about which educational resources align with a Method standard but that a typical automatic standard assignment tool is more likely to assign a World standard to an educational resource. Further, a text-based information retrieval system is more likely to be accurate in retrieving documents relevant to a World standard as compared to a Method standard. These findings have implications both for educational standard formulation (combining World and Method components in a standard may improve retrieval) and for digital library builders who want to help teachers identify useful, standards-aligned learning objects.
Automating open educational resources assessments: a machine learning generalization study BIBAFull-Text 283-286
  Heather Leary; Mimi Recker; Andrew Walker; Philipp Wetzler; Tamara Sumner; James Martin
Assessing the quality of online educational resources in a cost effective manner is a critical issue for educational digital libraries. This study reports on the approach for extending the Open Educational Resource Assessments (OPERA) algorithm from assessing vetted to peer-produced content. This article reports details of changes to the algorithm, comparisons between human raters and the algorithm, and the extent the algorithm can automate the review process.

Continuing the work on improving recommendation

A social network-aware top-N recommender system using GPU BIBAFull-Text 287-296
  Ruifeng Li; Yin Zhang; Haihan Yu; Xiaojun Wang; Jiangqin Wu; Baogang Wei
A book recommender system is very useful for a digital library. Good book recommender systems can effectively help users find interesting and relevant books from the massive resources, by providing individual recommendation book list for each end-user. By now, a variety of collaborative filtering algorithms have been invented, which are the cores of most recommender systems. However, because of the explosion of information, especially in the Internet, the improvement of the efficiency of the collaborative filting (CF) algorithm becomes more and more important. In this paper, we first propose a parallel Top-N recommendation algorithm in CUDA (Compute Unified Device Architecture) which combines the collaborative filtering and trust-based approach to deal with the cold-start user problem. Then based on this algorithm, we present a parallel book recommender system on a GPU (Graphics Processor unit) for CADAL digital library platform. Our experimental results show our algorithm is very efficient to process the large-scale datasets with good accuracy, and we report the impact of different values of parameters on the recommendation performance.
A source independent framework for research paper recommendation BIBAFull-Text 297-306
  Cristiano Nascimento; Alberto H. F. Laender; Altigran S. da Silva; Marcos André Gonçalves
As the number of research papers available on the Web has increased enormously over the years, paper recommender systems have been proposed to help researchers on automatically finding works of interest. The main problem with the current approaches is that they assume that recommending algorithms are provided with a rich set of evidence (e.g., document collections, citations, profiles) which is normally not widely available. In this paper we propose a novel source independent framework for research paper recommendation. The framework requires as input only a single research paper and generates several potential queries by using terms in that paper, which are then submitted to existing Web information sources that hold research papers. Once a set of candidate papers for recommendation is generated, the framework applies content-based recommending algorithms to rank the candidates in order to recommend the ones most related to the input paper. This is done by using only publicly available metadata (i.e., title and abstract). We evaluate our proposed framework by performing an extensive experimentation in which we analyzed several strategies for query generation and several ranking strategies for paper recommendation. Our results show that good recommendations can be obtained with simple and low cost strategies.
Serendipitous recommendation for scholarly papers considering relations among researchers BIBAFull-Text 307-310
  Kazunari Sugiyama; Min-Yen Kan
Serendipity occurs when one finds an interesting discovery while searching for something else. While search engines seek to report work relevant to a targeted query, recommendation engines are particularly well-suited for serendipitous recommendations as such processes do not need to fulfill a targeted query. Junior researchers can use such an engine to broaden their horizon and learn new areas, while senior researchers can discover interdisciplinary frontiers to apply integrative research. We adapt a state-of-the-art scholarly paper recommendation system's user profile construction to make use of information drawn from 1) dissimilar users and 2) co-authors to specifically target serendipitous recommendation.
Product review summarization from a deeper perspective BIBAFull-Text 311-314
  Duy Khang Ly; Kazunari Sugiyama; Ziheng Lin; Min-Yen Kan
With product reviews growing in depth and becoming more numerous, it is growing challenge to acquire a comprehensive understanding of their contents, for both customers and product manufacturers. We built a system that automatically summarizes a large collection of product reviews to generate a concise summary. Importantly, our system not only extracts the review sentiments but also the underlying justification for their opinion. We solve this problem through a novel application of clustering and validate our approach through an empirical study, obtaining good performance as judged by F-measure (the harmonic mean of purity and inverse purity).

Can domain practice Inform better digital library query interaction?

Do graphical search interfaces support effective search for and evaluation of digital library resources BIBAFull-Text 315-324
  Kirsten R. Butcher; Sarah Davies; Ashley Crockett; Aaron Dewald; Robert Zheng
This paper explores the cognitive processes and online behaviors in which preservice teachers engage when seeking educational resources for classroom instruction. Participants used graphical and keyword search interfaces provided by a large-scale digital library (NSDL.org) and a keyword search interface from a large, commercial search engine (Google.com) to complete searches for online materials that would support classroom instruction. Overall, findings from the current work indicate that a graphical search interface can support comprehension by providing a conceptual organization of domain content during digital search and evaluation. Findings also show that digital libraries allow users to offload processing related to resource trustworthiness, thereby increasing cognitive capacity for other purposes.
Taking chemistry to the task: personalized queries for chemical digital libraries BIBAFull-Text 325-334
  Sascha Tönnies; Benjamin Köhncke; Wolf-Tilo Balke
Nowadays, the information access is conducted almost exclusively using the Web. Simple keyword based Web search engines, e.g. Google or Yahoo!, offer suitable retrieval and ranking features. In contrast, for highly specialized domains, represented by digital libraries, these features are insufficient. Considering the domain of chemistry, where searching for relevant literature is essentially centered on chemical entities. Beside commercial information providers such as Chemical Abstract Service (CAS) numerous groups are working on building free chemical search engines to overcome the expensive access to chemical literature. However, due to the nature of chemical queries these are often overspecialized. Often we need meaningful similarity measures for chemical entities for query relaxation. In chemistry, the similarity measures are vast; more than 40 similarity measures are available and focus on different aspects of chemical entities. This vast number of similarity measures is obvious, because the desired search results highly depend on the working field of the chemist. In this paper we present a personalized retrieval system for chemical documents taking into account the background knowledge of the individual chemist. This is done by a query relaxation for chemical entities using similar substances. We evaluate our approach extensively by analyzing the correlation of commonly used chemical similarity measures and fingerprint representations. All uncorrelated measures are finally used by our feedback engine to learn preferred similarity measures for each user. We also conducted a user study with domain experts showing that our system can assign a unique similarity measure for 75% of the users after only 10 feedback cycles.
Physics pathway: a digital library filled with synthetic interviews BIBAFull-Text 335-338
  Michael G. Christel; Scott M. Stevens; Dean Zollman
Physics Pathway is a digital library available through an Adobe Flash portal whose contents are a series of interviews with four experts who answer questions about the pedagogy of teaching physics. The answers were collected over a broad time span, but are presented to the user as if he or she is conducting the personal interview, similar to naturally conversing with the expert. This "synthetic interview" style is discussed in this paper, with a mixed methods evaluation by 19 high school teachers who used Physics Pathway for a 14-week period at the end of 2010. The evaluation with teachers showed that the synthetic interviews validated or reinforced their ideas on their course materials and delivery. As these are teachers who are relatively new to physics instruction, confirmation that they are teaching well is important. Physics Pathway is linked with comPADRE, a member of the National Science Digital Library.

Methods improving information extracted from collections

A metadata geoparsing system for place name recognition and resolution in metadata records BIBAFull-Text 339-348
  Nuno Freire; José Borbinha; Pável Calado; Bruno Martins
This paper describes an approach for performing recognition and resolution of place names mentioned over the descriptive metadata records of typical digital libraries. Our approach exploits evidence provided by the existing structured attributes within the metadata records to support the place name recognition and resolution, in order to achieve better results than by just using lexical evidence from the textual values of these attributes. In metadata records, lexical evidence is very often insufficient for this task, since short sentences and simple expressions are predominant. Our implementation uses a dictionary based technique for recognition of place names (with names provided by Geonames), and machine learning for reasoning on the evidences and choosing a possible resolution candidate. The evaluation of our approach was performed in data sets with a metadata schema rich in Dublin Core elements. Two evaluation methods were used. First, we used cross-validation, which showed that our solution is able to achieve a very high precision of 0,99 at 0,55 recall, or a recall of 0,79 at 0,86 precision. Second, we used a comparative evaluation with an existing commercial service, where our solution performed better on any confidence level (p<0,001).
Event detection with spatial latent Dirichlet allocation BIBAFull-Text 349-358
  Chi-Chun Pan; Prasenjit Mitra
A large number of news articles are generated every day on the Web. Automatically identifying events from a large document collection is a challenging problem. In this paper, we propose two event detection approaches using generative models. We combine the popular LDA model with temporal segmentation and spatial clustering. In addition, we adapt an image segmentation model, SLDA, for spatial-temporal event detection on text. The results of our experiments show that both approaches outperform the traditional content-based clustering approaches on our datasets.
A new video text detection method BIBAFull-Text 359-362
  Jie Yuan; Baogang Wei; Weiming Lu; Lidong Wang
Nowadays, digital libraries contain more and more videos in them, and how to organize and retrieve those videos effectively has become very urgent. Text in videos is a very meaningful clue for video semantic understanding, and it can be used for video organization and retrieval. However, existing text recognizing methods can not deal with multilingual texts or texts embedded in a complex background very well. In this paper, we propose a novel video text detection method. Edge detection and candidate region extraction are firstly used to obtain all rough candidate text regions, and then region refinement is used to obtain the accurate location of each region. Based on our observation that a real text region has a uniform distribution with its non-zero pixels in its binary image, an entropy filter is used to remove non-text regions. Experiments on various videos show that our method is effective and robust against different languages, background complexities and font styles.

Dealing with data repositories: approaches and issues to consider

Retrieval and exploratory search in multivariate research data repositories using regressional features BIBAFull-Text 363-372
  Maximilian Scherer; Jürgen Bernard; Tobias Schreck
Increasing amounts of data are collected in many areas of research and application. The degree to which this data can be accessed, retrieved, and analyzed is decisive to obtain progress in fields such as scientific research or industrial production. We present a novel method supporting content-based retrieval and exploratory search in repositories of multivariate research data. In particular, functional dependencies are a key characteristic of data that researchers are often interested in. Our methods are able to describe the functional form of such dependencies, e.g., the relationship between inflation and unemployment in economics. Our basic idea is to use feature vectors based on the goodness-of-fit of a set of regression models, to describe the data mathematically. We denote this approach Regressional Features and use it for content-based search and, since our approach motivates an intuitive definition of interestingness, for exploring the most interesting data. We apply our method on considerable real-world research datasets, showing the usefulness of our approach for user-centered access to research data in a Digital Library system.
A research agenda for data curation cyberinfrastructure BIBAFull-Text 373-382
  Carl Lagoze; Karin Patzke
In 2008, the National Science Foundation released the DataNet solicitation, which presents an ambitious vision for a comprehensive data curation cyberinfrastructure in support of fourth paradigm science. The program subsequently funded two projects, DataONE and the Data Conservancy. The authors put forth an uncertainty framework for understanding the larger socio-cultural issues that influence the progress of DataNet projects and cyberinfrastructure projects in general. This framework highlights the key technical, organizational, scientific, and institutional contexts that the projects must consider as they mature.
When use cases are not useful: data practices, astronomy, and digital libraries BIBAFull-Text 383-386
  Laura Wynholds; David S., Jr. Fearon; Christine L. Borgman; Sharon Traweek
As science becomes more dependent upon digital data, the need for data curation and for data digital libraries becomes more urgent. Questions remain about what researchers consider to be their data, their criteria for selecting and trusting data, and their orientation to data challenges. This paper reports findings from the first 18 months of research on astronomy data practices from the Data Conservancy. Initial findings suggest that issues for data production, use, preservation, and sharing revolve around factors that rarely are accommodated in use cases for digital library system design including trust in data, funding structures, communication channels, and perceptions of scientific value.

Poster session

An exploration of pattern-based subtopic modeling for search result diversification BIBAFull-Text 387-388
  Wei Zheng; Xuanhui Wang; Hui Fang; Hong Cheng
Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result diversification is to return search results that not only are relevant to the query but also cover different subtopics. Therefore, the subtopic modeling is an important research topic in search result diversification. In this paper, we propose a novel pattern based method to extract subtopics from retrieved documents. The basic idea is to explicitly model a query subtopic as a semantically meaningful text unit in relevant documents. We apply a frequent pattern mining algorithm to efficiently extract these text units (patterns) from retrieved documents. We then model a query subtopic with a single pattern and rank subtopics based on their similarity with the query. These pattern based subtopics are then used to diversify search results.
A very efficient approach to news title and content extraction on the web BIBAFull-Text 389-390
  Hualiang Yan; Jianwu Yang
We consider the problem of efficient and template-independent news extraction on the Web. The popular news extraction methods are based on visual information, and they can achieve good accuracy performance, but the computational efficiency is poor, because it is very time-consuming to render web page to obtain visual information. In this paper we propose an efficient and effective news extraction approach based on novel features. Our approach neither needs training nor needs visual information, so it is simple and very efficient. And it can extract news information from various news sites without using templates. In our experiments, the proposed approach achieves 99% accuracy over 5,671 news pages from 20 different news sites. And the efficiency is much faster than the baseline machine learning method using visual information.
Web video search by mutual boosting between the inside and outside text of video BIBAFull-Text 391-392
  Yuxin Peng; Zhiguo Yang; Jian Yi
This paper proposes a new idea and approach for the web video search. Instead of only using the surrounding text of video in webpage, our approach boosts mutually and utilizes jointly the inside and outside text of video to support the video-based and frame-based search. In our approach, the inside text is the video caption, while the outside text is the surrounding text of video in webpage. In our view, the caption text, although has some wrong characters and words caused by the automatic caption recognition, is a useful indicator for the content of video. While the relevant surrounding texts of video, although is difficult to locate and confirm, have the correct characters and words which usually indicate the video content, especially the title and introduction of video. In this paper, we integrate their advantages and alleviate their disadvantages by the mutual boosting idea, that is, to employ the inside text to confirm the relevance of outside text, and to utilize the relevant outside text to correct the inside text. Mutual boosting not only enhances the query-by-text video search, but also further supports the query-by-text frame search with the corrected caption. Based on the above idea, our approach can be divided into three phases: Firstly, we proposed a new approach for automatic caption detection and extraction. Then, we extract the surrounding text candidates of video. Finally, the mutual boosting approach is employed to get the relevant and accurate text of web video. The experiments show the proposed approach can achieve good performance.
Visual interfaces for stimulating exploratory search BIBAFull-Text 393-394
  Ralf Krestel; Gianluca Demartini; Eelco Herder
Exploration is an activity that people undertake to broaden their knowledge on a certain topic. In contrast to regular search, which is typically aimed at obtaining a specific answer to a specific question, exploratory search should give a more complete overview of a topic. Further it should enable the discovery of related aspects, such as people, places, times and locations. Exploration demands more time, effort and creativity from the user, but rewards the user with deeper knowledge. Therefore, users need to be stimulated to bring exploration in regular goal-directed search activities. In this paper we present a user study in which we investigate different kinds of exploratory behavior and goals, as well as different kinds of visualizations to support exploration.
Designing interconnected distributed resources for collaborative inquiry based science education BIBAFull-Text 395-396
  Anne Adams; Tim Coughlan; John Lea; Yvonne Rogers; Sarah Davies; Trevor Collins
This paper describes the design and evaluation of a distributed information resource system (IRS) shared between field and laboratory settings for higher education geology students. An investigation of geo-science scholarship and technical pilot studies highlighted the importance of situational specific and distributed information usage. To advance our understanding of novel resource approaches (i.e. from tabletops to tablets) and collaborative learning, two in-depth field trials evaluated 21 students' information journeys (i.e. initiating information needs, facilitating information and collaborative interpretation). Analysis identified how a designing for a varied device ecology supported information filtering and empathy between locations provoking deeper reflection and abstract understanding in the field, while live collaborative remote interaction provided an engaging yet distinct learning experience for those in the laboratory.
Retrieving attributes using web tables BIBAFull-Text 397-398
  Arlind Kopliku; Karen Pinel-Sauvagnat; Mohand Boughanem
In this paper we propose an attribute retrieval approach which extracts and ranks attributes from Web tables. We combine simple heuristics to filter out improbable attributes and we rank attributes based on frequencies and a table match score. Ranking is reinforced with external evidence from Web search, DBPedia and Wikipedia. Our approach can be applied to whatever instance (e.g. Canada) to retrieve its attributes (capital, GDP). It is shown it has a much higher recall than DBPedia and Wikipedia and that it works better than lexico-syntactic rules for the same purpose.
Towards a model of the e-science data environment BIBAFull-Text 399-400
  Stacy T. Kowalczyk
Building digital libraries for preserving scientific digital data, ensuring its continued access, has emerged as a major initiative for funding agencies and academic institutions. Understanding the environments in which data is created, quality is assessed, and data is managed is a necessary antecedent to developing appropriate technologies to support the preservation and ongoing access to data. This paper reports on a study of 11 laboratories and research centers at three U.S. universities. Using the grounded theory methodology, this paper develops a new, generalized view of the e-Science data that attempts to explain the environment in which data is created, managed, documented, and preserved.
Social reference: aggregating online usage of scientific literature in CiteULike for clustering academic resources BIBAFull-Text 401-402
  Jiepu Jiang; Daqing He; Chaoqun Ni
Citation-based methods have been widely studied and employed for clustering academic resources and mapping science. Although effective, these methods suffer from citation delay. In this study, we extend reference and citation analysis to a broader notion from social perspective. We coin the term "social reference" to refer to the references of literatures in social academic web environment. We propose clustering methods using social reference information from CiteULike. We experiment for journal clustering and author clustering using social reference and compare with citation-based methods. Our experiments indicate: first, social reference implies connections among literatures which are as effective as citation in clustering academic resources; second, in practical settings, social reference-based clustering methods are not as effective as citation-based ones due to the sparseness of social reference data, but they can outperform in clustering new resources that have few citation.
Digital archives of Taiwan agricultural history during the Japanese colonial period BIBAFull-Text 403-404
  Li-Ping Chen; Chunsheng Huang; Yi-Hui Chang
The National Chung Hsing University Library has built a digital archive about Taiwan agricultural history. The content is unique in that it covers historical materials about Taiwan during the Japanese colonial period from 1895 to 1945, and that they are all available in full text, in addition to metadata. To make these materials more accessible to the research and education community, a user-centered retrieval system incorporated with multi-language, subject browsing, cluster analysis, topic map, post-query classification methods was developed to help users find the inter-relationships among documents and the collective meaning of a sub-collection. Such system is anticipated to help advance research in Taiwan agricultural history and set a model for other similar endeavor.
Developing a concept extraction technique with ensemble pathway BIBAFull-Text 405-406
  Prat Tanapaisankit; Min Song; Edward A. Fox
In this paper, we describe our Concept Extraction technique for Educational Digital libraries (CEED) which applies Conditional Random Fields (CRFs) to extract concepts from the Ensemble Pathway collection. In addition, we discuss how we implement RESTful APIs for concept extraction.
SCOR/IODE/MBLWHOI library collaboration on data publication BIBAFull-Text 407-408
  Lisa Raymond; Linda Pikula; Roy Lowry; Ed Urban; Gwenaëlle Moncoiffé; Peter Pissierssens; Cathy Norton
This poster describes the development of international standards to publish oceanographic datasets. Research areas include the assignment of persistent identifiers, tracking provenance, linking datasets to publications, attributing credit to data providers, and best practices for the physical composition and semantic description of the content.
A content analysis of institutional data policies BIBAFull-Text 409-410
  Kayleigh Ayn Bohémier; Thea Atwood; Andreas Kuehn; Jian Qin
The newly issued requirement for a data management plan in proposals submitted to the U.S. National Science Foundation and other federal funding agencies prompted many institutions to develop their own policies to conform to this new requirement as well as to more effectively manage, share, publish, and provide access to research data. While the need for guidelines or a framework in developing such data policies is imminent, research is lacking in this area. The study reported here addresses this need by using a content analysis of 58 policy documents from 20 institutions. Our preliminary findings reveal an uneven distribution of data policies among the institutions and disciplines included in this study. We are currently analyzing our results.
Are learned topics more useful than subject headings BIBAFull-Text 411-412
  Youn Noh; Katrina Hagedorn; David Newman
Topic models, through their ability to automatically learn and assign topics to documents in a collection, have the potential to greatly improve how content is organized and searched in digital libraries. However, much remains to be done to assess the value of topic models in digital library applications. In this work, we present results from a user study, in which participants evaluated the similarity of books clustered using matched topics and Library of Congress Subject Headings (LCSH). Topics outperformed LCSH in 11 cases; LCSH outperformed topics in 4. These results suggest that topics are a viable alternative to LCSH.
Detecting academic papers on the web BIBAFull-Text 413-414
  Emi Ishita; Teru Agata; Atsushi Ikeuchi; Miyata Yosuke; Shuichi Ueda
Our research goal is to develop a search engine for open access to academic papers. English and Japanese test sets were built for detection of academic papers from 20,000 PDF files in each language using five annotators. Six classifiers were trained using similar features for each language. We report F1 of 0.74 for English and 0.54 for Japanese and argue that similar features could easily be generated for other languages as well.
Improving scalability by self-archiving BIBAFull-Text 415-416
  Zhiwu Xie; Jinyang Liu; Herbert Van de Sompel; Johann van Reenen; Ramiro Jordan
The newer generation of web browsers supports the client-side database, making it possible to run the full web application stacks entirely in the web clients. Still, the server side database is indispensable as the central hub for exchanging persistent data between the web clients. Assuming this characterization, we propose a novel web application framework in which the server archives its database states at predefined periods then makes them available on the web. The clients then use these archives to synchronize their local databases. Although the main purpose is to reduce the database scalability bottleneck, this approach also promotes self-archiving and can be used for time traveling. We discuss the consistency properties provided by this framework, as well as the tradeoffs imposed.
An analysis of personal collections among users of social media BIBAFull-Text 417-418
  Paul Logasa, II Bogen; Frank Shipman; Richard Furuta
We have been developing a system to support the management of collections of web-based resources called the Distributed Collection Manager (DCM). As work on DCM has progressed, questions about the characteristics of people's collections of web pages have arisen. Simultaneously, work in the area of social media technology has ignored investigating how people are trying to maintain their collections. In order to address these concerns, we performed an online user study of 125 individuals from a variety of online and offline communities. From this study we were able to examine the needs for a system to manage web-based distributed collections, how current tools affect maintenance, and the characteristics of current practices and problems in maintaining web-based collections.
WPv4: a re-imagined Walden's paths to support diverse user communities BIBAFull-Text 419-420
  Paul Logasa, II Bogen; Daniel Pogue; Faryaneh Poursardar; Yuangling Li; Richard Furuta; Frank Shipman
The Walden's Paths Project, as part of our philosophy of continual evaluation, seeks out user communities who may find our tool useful. However, our users, in the last few years, have reported a series of common issues and desired features. In order to support our users, we initiated a redesign of Walden's Paths to solve these problems and enable us to rapidly prototype and experiment with features and interfaces. In order to accomplish these goals, we have created a web service that handles the storage and representation of our Path data structure. This service is isolated from user interface layers, allowing multiple interface designs to be implemented on top of the same Path data structures. Our prototype interfaces also represent new areas for Paths such as collaborative work, offline presentation, and mobile computing.
Is tagging multilingual?: a case study with BibSonomy BIBAFull-Text 421-422
  Juliane Stiller; Maria Gäde; Vivien Petras
This paper investigates the occurrence of tags in different languages in a collaborative bookmarking and publication sharing service -- BibSonomy. Social tags assigned to URLs in multiple languages and users tagging these URLs multilingually are the main focus of this study. The results show that multilingual tags occur for the same URL and that users tag in different languages. Furthermore, the results give indications that the language of the content of a URL does not imply that its tags are in the same language.
Creating meta-indexes for digital domains BIBAFull-Text 423-424
  Michael Huggett; Edie Rasmussen
The back-of-book indexes for test collections of digital books in the domains of Economics and Geology have been deconstructed and analyzed, and the entries aggregated to create domain-level meta-indexes. Metrics comparing the two domains are presented.
   The back-of-book indexes for test collections of digital books in the domains of Economics and Geology have been deconstructed and analyzed, and the entries aggregated to create domain-level meta-indexes. Metrics comparing the two domains are presented.
Analytic potential of data: assessing reuse value BIBAFull-Text 425-426
  Carole L. Palmer; Nicholas M. Weber; Melissa H. Cragin
Realizing the vision of networked data collections and services requires large bodies of scientific data that can be used in new ways. Adapting the concept of epistemological potential, we illustrate an approach for assessing the value of data for reuse in new domains. Two criteria for this analytic potential -- integrity and fit-for-purpose -- are recognized aspects of data curation, however identifying potential domains of interest for reuse requires knowledge of practices and needs across disciplines. Evaluating analytic potential will become increasingly important for libraries and repositories to make informed decisions about recruitment and curation of data for interdisciplinary science.
Building a research social network from an individual perspective BIBAFull-Text 427-428
  Alberto H. F. Laender; Mirella M. Moro; Marcos André Gonçalves; Clodoveu A., Jr. Davis; Altigran S. da Silva; Allan J. C. Silva; Carolina A. S. Bigonha; Daniel Hasan Dalip; Eduardo M. Barbosa; Eli Cortez; Peterson S., Jr. Procópio; Rafael Odon de Alencar; Thiago N. C. Cardoso; Thiago Salles
In this poster paper, we present an overview of CiênciaBrasil, a research social network involving researchers within the Brazilian INCT program. We describe its architecture and the solutions adopted for data collection, extraction, and deduplication, and for materializing and visualizing the network.
Designing map-based visualizations for collection understanding BIBAFull-Text 429-430
  Olga (Olha) Buchel
This paper describes a conceptualization of collection understanding task and its implementation in a map-based visualization (MBV) prototype that represents a library collection. Unlike previous conceptualizations that treat a collection as a whole composed of documents, our conceptualization is grounded in widely-researched concepts "collection" and "understanding.
How children find books for leisure reading: implications for the digital library BIBAFull-Text 431-432
  Sally Jo Cunningham
Finding a good book can be difficult, particularly for young readers. This paper adds to our understanding of how children select books for recreational reading by exploring the 'native' strategies (both successful and ineffective) that children employ in bookstores and libraries.
Repurposing data across disciplines: a study of data reuse issues between climate science and social science BIBAFull-Text 433-434
  Lynne K. Davis; Peter Alston; John D'Ignazio
Repurposing of data raises a number of issues for use across the disciplines of climate science and social science. The issues we present are results of the work we are currently carrying out as part of the Data Conservancy Project1 which aims to preserve data for long-term known and unanticipated use over time, across disciplines and over a variety of spatial, temporal and organizational scales.
Improving simulation management systems through ontology generation and utilization BIBAFull-Text 435-436
  Jonathan Leidig; Edward A. Fox; Kevin Hall; Madhav Marathe; Henning Mortveit
Content from simulation systems is useful in defining domain ontologies. We describe a digital library process to generate and leverage domain ontologies to support simulation systems tasks. Workflow ontologies may be used to define compositions of simulation-related services. Simulation model ontologies may be used in customizing collection management systems for tasks such as organization, interface construction, and metadata record generation.
CTRnet DL for disaster information services BIBAFull-Text 437-438
  Seungwon Yang; Andrea Kavanaugh; Nádia P. Kozievitch; Lin Tzy Li; Venkat Srinivasan; Steven D. Sheetz; Travis Whalen; Donald Shoemaker; Ricardo da S Torres; Edward A. Fox
We describe our work in collecting, analyzing and visualizing online information (e.g., Web documents, images, tweets), which are to be maintained by the Crisis, Tragedy and Recovery Network (CTRnet) digital library. We have been collecting resources about disaster events, as well as campus and other major shooting events, in collaboration with the Internet Archive (IA). Social media data (e.g., tweets, Facebook data) also have been collected and analyzed. Analyzed results are visualized using graphs and tag clouds. Exploratory content-based image retrieval has been applied in one of our image collections. We explain our CTR ontology development methodology and collaboration with Arlington County, VA and IBM, in a Center for Community Security and Resilience funded project.
Representing educational content in digital library resources BIBAFull-Text 439-440
  Kirsten R. Butcher; Ashley Crockett; Sarah Davies
A rubric for representing the educational content of digital resources was developed and tested in an experiment with preservice teachers. The ADMIRE (Analyzing Digital Materials In Resources for Education) rubric codes 11 types of digital content organized into five major categories; codes and categories are drawn from learning science and instructional research. The ADMIRE rubric was used to analyze the types of content present in digital resources that preservice teachers accepted or rejected for classroom use during instructional planning. Results show that the ADMIRE rubric provides a useful method to understand teachers' success in online search and the types of educational content that they value during instructional planning.
Units of evidence for analyzing subdisciplinary difference in data practice studies BIBAFull-Text 441-442
  Melissa H. Cragin; Tiffany C. Chao; Carole L. Palmer
Digital libraries (DLs) are adapting to accommodate research data and related services. The complexities of this new content spans the elements of DL development, and there are questions concerning data selection, service development, and how best to align these with local, institutional initiatives for cyberinfrastructure, data-intensive research, and data stewardship. Small science disciplines are of particular relevance due to the prevalence of this mode of research in the academy, and the anticipated magnitude of data production. To support data acquisition into DLs -- and subsequent data reuse -- there is a need for new knowledge on the range and complexities inherent in practice-data-curation arrangements for small science research. We present a flexible methodological approach crafted to generate data units to analyze these relationships and facilitate cross-disciplinary comparisons.
E-informing the public: communicative intents in the production of online government information BIBAFull-Text 443-444
  Luanne Freund; Justyna Berzowska; Leah Hopton
Governments produce vast amounts of electronic information geared for the public, but research points to a mismatch between the communicative intents of the government and the information needs of the public. Initial results of semi-structured interviews with government content creators suggest that learning more about why and how government information is produced may lead to the establishment of greater common ground.
Using a hidden Markov model to transcribe handwritten bushman texts BIBAFull-Text 445-446
  Kyle Williams; Hussein Suleman
The Bushman texts in the Bleek and Lloyd Collection contain complex diacritics that make automatic transcription difficult. Transcriptions of these texts would allow for enhanced digital library services to be created for interacting with the collection. In this study, an investigation into automatic transcription of the Bushman texts was performed using the popular method of using a Hidden Markov Model for text line recognition. The results show that while this technique may be well suited to well-constrained and understood scripts, its application to more complex scripts introduces a number of difficulties that need to be overcome.
Connecting research data and indigenous communities BIBAFull-Text 447-448
  Elizabeth Mulhollann; Kirsten Thorpe; Gabrielle Gardiner
This poster demonstrates the program of consultation and associated technical workflow developed by the Aboriginal and Torres Strait Islander Data Archive (ATSIDA) to support the digital return of research data to Indigenous Australian communities, while also facilitating data preservation and reuse in the research community and by the general public.
RFV: interactive geographical visualization for citation network exploration BIBAFull-Text 449-450
  Christopher Aikens; George Lucchese; Patrick Webster; Andruid Kerne
Research Field Visualizer (RFV) is a system designed to visualize citation chains for publications within specific disciplines. RFV overlays publication information on geographical displays to aid in locating dense areas of research based on different search criteria, thus exposing a network of interconnected ideas across the world. Within this geographical context, RFV provides an interactive visualization of citation networks, allowing the user to explore chains of publication Our intention is to help people that want to learn about a field, such as prospective graduate students, understand current research directions, track research history, and ultimately find the places where authors are conducting relevant and interesting research.

Demonstration session

Liquid benchmarks: benchmarking-as-a-service BIBAFull-Text 451-452
  Sherif Sakr; Fabio Casati
Experimental evaluation and comparison of techniques, algorithms or complete systems is a crucial requirement to assess the practical impact of research results. The quality of published experimental results is usually limited due to several reasons such as: limited time, unavailability of standard benchmarks or shortage of computing resources. Moreover, achieving an independent, consistent, complete and insightful assessment for different alternatives in the same domain is a time and resource consuming task. We demonstrate Liquid Benchmark as a cloud-based service that provides collaborative platforms to simplify the task of peer researchers in performing high quality experimental evaluations and guarantee a transparent scientific crediting process. The service allows building repositories of competing research implementations, sharing testing computing platforms, collaboratively building the specifications of standard benchmarks and allowing end-users to easily create and run testing experiments and share their results.
Exploring Wikipedia with HMpara BIBFull-Text 453-454
  David N. Milne; Ian H. Witten
FRBRPedia: a tool for FRBRizing web products and linking FRBR entities to DBpedia BIBAFull-Text 455-456
  Fabien Duchateau; Naimdjon Takhirov; Trond Aalberg
The FRBR model has received much attention due to its potential for greatly improving user interaction with digital libraries. However, the amount of information found on the Web is far larger than in digital libraries. In this demo, we present an approach to transform Web-based resources to a FRBR compatible form, a process known as FRBRization. The FRBRized collection is then linked to DBpedia, thus providing a basis for information sharing and verification.
An interactive flash website for oral histories BIBAFull-Text 457-458
  Michael G. Christel; Bryan S. Maher; Julieanna Richardson
Automatic speech alignment and natural language processing technologies provide full content search and retrieval access into oral history collections. These tools have been field-tested with The HistoryMakers and Harrisburg Living Legacy oral history archives, showing the value of an Adobe Flash front-end interface. Built with Adobe Flex 3, the interface works across browsers and operating systems, supports deep linking and browser-based navigation, provides synchronized transcripts that can be fully searched and tracked while watching the interviews, and incorporates filtering by facets, a menu bar breadcrumb interface, and a user play list to collect stories of interest. Refinements to the interface are discussed following the first six months of web deployment, with suggestions offered for other digital video libraries, particularly oral histories.
When personalization meets socialization: an iCADAL approach BIBAFull-Text 459-460
  Yin Zhang; Xiaojun Wang; Haihan Yu; Ruifeng Li; Baogang Wei; Jing Pan
CADAL has been a large-scale non-profit digital library. Besides various search facilities in the CADAL portal, we designed and implemented the iCADAL system for providing the user-oriented micro-content services on one million books in CADAL. Users of iCADAL can receive the stream of short messages of lending, annotation and the other reading activities shared by their followees in a twitter-like way, which combines socialization with personalization. Our implementation makes extensive use of open source softwares, i.e. Pylons, Cassandra, in order to support the high-traffic online micro-content services.
Supporting creative work in educational digital libraries BIBAFull-Text 461-462
  Naimdjon Takhirov
Educational digital libraries have become an effective source of sharing information and dissemination of knowledge. Like paintings, personal stories containing pictures, music, and text will exist for a long time to come and will be enjoyed long after their creation. In this demo, we present a system called Creaza Education. The system offers an engaging suit of are user-friendly web-based applications where users can use their imagination by creating, publishing and sharing digital stories.
Introducing Mr. DLib,: a machine-readable digital library BIBAFull-Text 463-464
  Joeran Beel; Bela Gipp; Stefan Langer; Marcel Genzmehr; Erik Wilde; Andreas Nürnberger; Jim Pitman
In this demonstration-paper we present Mr. DLib, a machine-readable digital library. Mr. DLib provides access to several millions of articles in full-text and their metadata in XML and JSON format via a RESTful Web Service. In addition, Mr. DLib provides related documents for given academic articles. The service is intended to serve researchers who need bibliographic data and full-text of scholarly literature for their analyses (e.g. impact and trend analysis); providers of academic services who need additional information to enhance their own services (e.g. literature recommendations); and providers who want to build their own services based on data from Mr. DLib.
Docear: an academic literature suite for searching, organizing and creating academic literature BIBAFull-Text 465-466
  Joeran Beel; Bela Gipp; Stefan Langer; Marcel Genzmehr
In this demonstration-paper we introduce Docear, an 'academic literature suite'. Docear offers to scientists what an office suite like Microsoft Office offers to office workers. While an office suite bundles various applications for office workers (word processing, spreadsheets, presentation software, etc.), Docear bundles several applications for scientists: academic search engine, PDF reader, reference manager, word processor, mind mapping module, and recommender system. Besides Docear's general concept, its special features are presented in this paper, namely a modular composition, free full-text access to literature, information management as mind map, automatic metadata extraction of PDFs and recommendations.
The digital library of historical cartography of the University of São Paulo BIBAFull-Text 467-468
  Rogerio Toshiaki Kondo; Maria de Lourdes Rebucci Lirani; Anderson Canale Garcia; Iris Kantor; Caetano, Jr. Traina
The Digital Library of Historical Cartography of the University of São Paulo makes available a set of high-resolution digital versions of maps printed between the XV and the XIX centuries belonging to the University's collections. Each map is available along with extensive carto-bibliographic and biographic references, and relevant technical, editorial and historical information for cartographic documents analysis. The Digital Library was also conceived to pursue data from other similar sites, constituting itself as a useful research tool. It provides facilities to gather relevant information to acknowledge the production, circulation and appropriation of historical maps in different contexts and media.
GreenWiki: a tool to support users' assessment of the quality of Wikipedia articles BIBAFull-Text 469-470
  Daniel Hasan Dalip; Raquel Lara Santos; Diogo Rennó Oliveira; Valéria Freitas Amaral; Marcos André Gonçalves; Raquel Oliveira Prates; Raquel C. M. Minardi; Jussara Marques de Almeida
In this work, we present GreenWiki, which is a wiki with a panel of quality indicators to assist the reader of a Wikipedia article in assessing its quality.
   In this work, we present GreenWiki, which is a wiki with a panel of quality indicators to assist the reader of a Wikipedia article in assessing its quality.
Perambulating libraries: demonstrating how a Victorian idea can help OLPC users share books BIBAFull-Text 471-472
  David Bainbridge; Ian H. Witten
In this extended abstract we detail how the open source digital library toolkit Greenstone [4] can help users of the XOlaptop produced by the One Laptop Per Child Foundation manage and share electronic documents. The idea draws upon mobile libraries (bookmobiles) for its inspiration, which first appeared in Victorian times. The implemented technique works by building on the mesh network that is instrumental to the XO-laptop approach. To use the technique, on each portable XO-laptop a version of Greenstone is installed, allowing the owner to develop and manage their own set of books. The version of Greenstone has been adapted to support a form of interoperability we have called Digital Library Talkback. On the mesh, when two XO-laptops "see" each other, the two users can search and browse the other user's digital library; when they see a book they like, they can have it transferred to their library with a single click using the Digital Library Talkback mechanism.
SNAC: the social networks and archival context project BIBAFull-Text 473-474
  Ray R. Larson
This demonstration will show the prototype access and search system for the Social Networks and Archival Context project. The system is built on a database of merged Encoded Archival Context -- Corporate Bodies, Persons, and Families (EAC-CPF) records derived from Encoded Archival Description (EAD) records held by the Library of Congress, the California Digital Library, the Northwest Digital Archives, and Virginia Heritage, combined with information from name authority files from the Library of Congress, OCLC Research, and the Getty Vocabulary Program. The database merges information from each instance of an individual name in these resources, along with variant names, biographical notes and their topical descriptions. The prototype interface makes this information searchable while retaining links to the various data sources, other resources (such as books by or about a person) and to other individuals, families and organizations associated with that name.
Synchronicity: automatically rediscover missing web pages in real time BIBAFull-Text 475-476
  Martin Klein; Moustafa Aly; Michael L. Nelson
Missing web pages (pages that return the 404 "Page Not Found" error) are part of the browsing experience. The manual use of search engines to rediscover such pages can be frustrating and unsuccessful. We introduce Synchronicity, a Mozilla Firefox add-on that supports the Internet user in (re-)discovering missing web pages in real time.
The Iowa City UNESCO City of literature digital library BIBAFull-Text 477-478
  Haowei Hsieh; Bridget Draxler; Nicole Dudley; Jim Cremer; Lauren Haldeman; Dat Nguyen; Peter Likarish; Jon Winet
Iowa City is one of only four designated Cities of Literature worldwide by UNESCO. To highlight the city's rich local literary history, a University of Iowa interdisciplinary research team developed a digital library featuring Iowa City authors and locations. The Iowa City UNESCO City of Literature "City of Lit" digital library consists of a mobile application for the general public and a web-based information system for researcher/content creators.
Exploiting music structures for digital libraries BIBAFull-Text 479-480
  Andreas F. Ehmann; Mert Bay; J. Stephen Downie; Ichiro Fujinaga; David De Roure
This demonstration presents a music structure-based audio/visual interface for the navigation of very large scale music digital libraries. This work is a product of the Structural Analysis of Large Amounts of Music Information (SALAMI) project.
   This demonstration presents a music structure-based audio/visual interface for the navigation of very large scale music digital libraries. This work is a product of the Structural Analysis of Large Amounts of Music Information (SALAMI) project.