JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries

Fullname:Proceedings of the 2009 joint international conference on Digital libraries
Editors:Fred Heath; Mary Lynn Rice-Lively; Richard Furuta
Location:Austin, Texas
Dates:2009-Jun-15 to 2009-Jun-19
Standard No:ISBN 1-60558-322-7, 978-1-60558-322-8; ACM Order Number: 606092
  1. Session 1
  2. Session 2
  3. Session 3
  4. Session 4
  5. Session 5: best paper nominees 1
  6. Session 6: best paper nominees 2
  7. Session 7
  8. Session 8: best paper finalists
  9. Session 9
  10. Session 10
  11. Session 11
  12. Session 12
  13. Posters
  14. Demos

Session 1

Science teachers' use of online resources and the digital library for Earth system education BIBAKFull-Text 1-10
  Lecia J. Barker
A three-part study of teachers' use of online resources and of the Digital Library for Earth System Education (DLESE) was conducted from 2004 through summer 2006. The first two phases were qualitative and informed a survey administered to 622 science teachers across the U.S., one-fifth of whom had used DLESE. The findings present a profile of teachers and their access to Internet-connected computers and other hardware/electronic media devices in their classrooms; and teachers' preferences for resource formats (e.g., customizability) and educational web site features (e.g., tagged reading level). Analysis of variance showed that teachers with more than one working computer and teachers with more other devices valued the Internet more highly for teaching than did their less equipped peers. DLESE users valued the Internet more highly for their teaching, had more years teaching experience, and valued customizable resources more than their non-DLESE using peers. Most believed that resources catalogued in DLESE were scientifically accurate. Teachers used DLESE most often for finding hands-on activities, still images and other visual aids, and hand-outs; they were least likely to seek people, games, or assessment tools. The findings provide guidance for developers of K12 educational resources.
Keywords: K12, educational digital libraries, empirical, evaluation, mixed-method, teaching
Dimensional standard alignment in K-12 digital libraries: assessment of self-found vs. recommended curriculum BIBAKFull-Text 11-14
  Byron Marshall; René Reitsma; Malinda Zarske
Enhancing the experience of digital library users depends, in part, on recognizing and understanding user tasks. In the context of K-12 educational libraries this means that we must understand how K-12 teachers interact with such libraries and how they assess the relevance of documents found or encountered. This paper presents the results of an experiment in which K-12 teachers scored the relevance of curriculum they found themselves and the relevance of documents their colleagues found and recommended. We found that teachers apply a significantly more detailed notion of relevance, both qualitatively and quantitatively, when searching for as compared to evaluating recommended curricula. Differences were observed in both relevance judgments and system interaction logs. These variations may be useful in identifying user intent and in dynamically adapting the behavior of digital libraries of educational material.
Keywords: context-specific measurement, curriculum-standard alignment, digital library, inter-rater reliability, relevance
Helping students with information fragmentation, assimilation and notetaking BIBAKFull-Text 15-18
  Yolanda Jacobs Reimer; Melissa Bubnash; Matthew Hagedal; Peter Wolf
The problem of information fragmentation is especially acute for today's college students who manage and assimilate information in various forms while completing many of their academic tasks, and who must do so within the confines of standard software applications. The goal of this research is to provide students with a novel information assimilation and notetaking tool that helps them more efficiently manage their electronic information and overcome some of the fragmentation challenges they routinely experience. Our Global Information Gatherer prototype allows students to view, edit and store files of different types from within a single interface, and provides an integrated web browser and notetaking functionality.
Keywords: PIM, information assimilation, information fragmentation, notetaking, students in higher education, user interface design
Topic model methods for automatically identifying out-of-scope resources BIBAKFull-Text 19-28
  Steven Bethard; Soumya Ghosh; James H. Martin; Tamara Sumner
Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover topics that fit within the theme of the library. We show that such scope judgments can be automated using a combination of text classification techniques and topic modeling. Our models address two significant challenges in making scope judgments: only a small number of out-of-scope resources are typically available, and the topic distinctions required for digital libraries are much more subtle than classic text classification problems. To meet these challenges, our models combine support vector machine learners optimized to different performance metrics and semantic topics induced by unsupervised statistical topic models. Our best model is able to distinguish resources that belong in DLESE from resources that don't with an accuracy of around 70%. We see these models as the first steps towards increasing the scalability of digital libraries and dramatically reducing the workload required to maintain them.
Keywords: digital libraries, machine learning, relevance, scope, topics
Automatically generating high quality metadata by analyzing the document code of common file types BIBAKFull-Text 29-38
  Lars Fredrik Høimyr Edvardsen; Ingeborg Torvik Sølvberg; Trond Aalberg; Hallvard Trætteberg
A major challenge for content management in intranets and other large scale document storage and retrieval services is the generation of high quality metadata. Manual generation of metadata is resource demanding and is often viewed by collection managers and document authors as inefficient use of their time, and there is a desire for other ways to create the needed metadata. Automatic Metadata Generation (AMG) is methods for generating metadata without manual interaction using computer program(s) to interpret the document and possibly the document context. Current AMG research has been limited to collection of similarly formatted documents. The research presented in this paper expands the field of AMG by presenting an approach that is independent of a common visualization scheme; AMG based on document code analysis. This is done by showing AMG possibilities from Latex, Word and PowerPoint documents and how this approach can significantly increase the quality of the generated metadata. This by avoiding common quality reducing factors as missing completeness, low accuracy, logical consistency and coherence and timeliness by giving AMG algorithms direct access to the user specified intellectual content and the file formatting. This research shows how this AMG approach can be combined with other AMG approaches, drawing on their strengths in order to achieve the desired high quality metadata entities.
Keywords: PDF, automatic metadata generation, document code, extraction, harvesting, latex, metadata quality, openXML, powerpoint, word

Session 2

Disambiguating authors in academic publications using random forests BIBAKFull-Text 39-48
  Pucktada Treeratpituk; C. Lee Giles
Users of digital libraries usually want to know the exact author or authors of an article. But different authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital library to differentiate among these authors. Name disambiguation can help in many cases; one being a user in a search of all articles written by a particular author. Disambiguation also enables better bibliometric analysis by allowing a more accurate counting and grouping of publications and citations. In this paper, we describe an algorithm for pair-wise disambiguation of author names based on a machine learning classification algorithm, random forests. We define a set of similarity profile features to assist in author disambiguation. Our experiments on the Medline database show that the random forest model outperforms other previously proposed techniques such as those using support-vector machines (SVM). In addition, we demonstrate that the variable importance produced by the random forest model can be used in feature selection with little degradation in the disambiguation accuracy. In particular, the inverse document frequency of author last name and the middle name's similarity alone achieves an accuracy of almost 90%.
Keywords: author disambiguation, medline, random forests
Using web information for author name disambiguation BIBAKFull-Text 49-58
  Denilson Alves Pereira; Berthier Ribeiro-Neto; Nivio Ziviani; Alberto H. F. Laender; Marcos André Gonçalves; Anderson A. Ferreira
In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.
Keywords: author name disambiguation, bibliographic citation, search engine
Whetting the appetite of scientists: producing summaries tailored to the citation context BIBAKFull-Text 59-68
  Stephen Wan; Cécile Paris; Robert Dale
The amount of scientific material available electronically is forever increasing. This makes reading the published literature, whether to stay up-to-date on a topic or to get up to speed on a new topic, a difficult task. Yet, this is an activity in which all researchers must be engaged on a regular basis. Based on a user requirements analysis, we developed a new research tool, called the Citation-Sensitive In-Browser Summariser (CSIBS), which supports researchers in this browsing task. CSIBS enables readers to obtain information about a citation at the point at which they encounter it. This information is aimed at enabling the reader to determine whether or not to invest the time in exploring the cited article further, thus alleviating information overload. CSIBS builds a summary of the cited document, bringing together meta-data about the document and a citation-sensitive preview that exploits the citation context to retrieve the sentences from the cited document that are relevant at this point. This paper briefly presents our user requirements analysis, then describes the system and, finally, discusses the observations from an initial pilot study. We found that CSIBS facilitates the relevancy judgment task, by increasing the users' self-reported confidence in making such judgements.
Keywords: biomedical researchers, information browsing, information needs, scientific literature, summarization, user modeling and interactive ir
Finding topic trends in digital libraries BIBAKFull-Text 69-72
  Levent Bolelli; Seyda Ertekin; Ding Zhou; C. Lee Giles
We propose a generative model based on latent Dirichlet allocation for mining distinct topics in document collections by integrating the temporal ordering of documents into the generative process. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. We conduct experiments on the collection of academic papers from CiteSeer repository. We augment the text corpus with the addition of user queries and tags and integrate the citation graph to boost the weight of the topical terms. The experiment results show that segmented topic model can effectively detect distinct topics and their evolution over time.
Keywords: latent dirichlet allocation, topic detection, trend analysis
CEBBIP: a parser of bibliographic information in chinese electronic books BIBAKFull-Text 73-76
  Liangcai Gao; Zhi Tang; Xiaofan Lin
Bibliographic information is essential for many digital library applications, such as citation analysis, academic searching and topic discovery. And bibliographic data extraction has attracted a great deal of attention in recent years. In this paper, we address the problem of automatic extraction of bibliographic data in Chinese electronic book and propose a tool called CEBBIP* for the task, which includes three main systems: data preprocessing, data parsing and data postprocessing. In the data preprocessing system, the tool adopts a rules-based method to locate citation data in a book and to segment citation data into citation strings of individual referencing literature. And a learning-based approach, Conditional Random Fields (CRF), is employed to parse citation strings in the data parsing system. Finally, the tool takes advantage of document intrinsic local format consistency to enhance citation data segmentation and parsing through clustering techniques. CEBBIP has been used in a commercial E-book production system. Experimental results show that CEBBIP's precision rate is very high. More specially, adopting the document intrinsic local format consistency obviously improves the citation data segmenting and parsing accuracy.
Keywords: bibliography, chinese electronic book, digital library, machine learning, metadata extraction

Session 3

Query parameters for harvesting digital video and associated contextual information BIBAKFull-Text 77-86
  Gary Marchionini; Chirag Shah; Christopher A. Lee; Robert Capra
Video is increasingly important to digital libraries and archives as both primary content and as context for the primary objects in collections. Services like YouTube not only offer large numbers of videos but also usage data such as comments and ratings that may help curators today make selections and aid future generations to interpret those selections. A query-based harvesting strategy is presented and results from daily harvests for six topics defined by 145 queries over a 20-month period are discussed with respect to, query specification parameters, topic, and contribution patterns. The limitations of the strategy and these data are considered and suggestions are offered for curators who wish to use query-based harvesting.
Keywords: digital curation, harvesting, video mining
ViGOR: a grouping oriented interface for search and retrieval in video libraries BIBAKFull-Text 87-96
  Martin Halvey; David Vallet; David Hannah; Joemon M. Jose
In this paper, we present ViGOR (Video Grouping, Organisation and Retrieval) a video retrieval system that allows users to group videos in order to facilitate video retrieval tasks. In this way users are able to visualise and conceptualise many aspects of their search tasks and carry out a localised search in order to solve a more global search problem. The main objective of this work is to aid users while carrying out explorative video retrieval tasks; these tasks can be often ambiguous and multi-faceted. Two user evaluations were carried out in order to evaluate the usefulness of this grouping paradigm for assisting users. The first evaluation involved users carrying out broad tasks on YouTube, and gave insights into the application of our interface to a vast online video collection. The second evaluation involved users carrying out focused tasks on the TRECVID 2007 video collection, allowing a comparison over a local collection, on which we could extract a number of content-based features. The results of our evaluations show that the use of the ViGOR system results in an increase in user performance and user satisfaction, showing the potential of a grouping paradigm for video search for various tasks in a variety of diverse video collections.
Keywords: search, user studies, video, visualisation
Developing a flexible content model for media repositories: a case study BIBAKFull-Text 97-100
  Christopher A. Beer; Peter D. Pinch; Karen Cariani
This article describes the process and challenges of developing a content model that can support the content and metadata present in a complex media archive. Media archives have some of the most diverse requirements in an effort to catalog, preserve, and make accessible a wide range of content with multifaceted relationships between works. We focus particularly on the design and implementation of the WGBH Media Library and Archives' Fedora digital access repository for scholars, educational users and the public. It is our hope that the process and findings from this work can support the architecture and development of other media archives.
Keywords: content model, digital libraries, fedora, media
An alignment based system for chord sequence retrieval BIBAKFull-Text 101-104
  Pierre Hanna; Matthias Robine; Thomas Rocher
Music retrieval systems for Western tonal music digital libraries have to consider rhythmic, timbral, melodic and harmonic information. Most existing retrieval systems only take into account melodies. Melody comparison may induce errors since two musical pieces can be very similar whereas their melodies may differ in a significant way. In this paper, we propose to investigate and experiment a retrieval system based on the comparison of chord progressions. The definition of chords may be ambiguous but their properties can be precisely described and represented. We detail the adaptations of alignment algorithms, successfully applied for the estimation of symbolic melodic similarity, for chord progression retrieval. Several experiments, performed on symbolic databases, show that the system described is robust to variations and outperforms a recent chord retrieval system.
Keywords: music information retrieval

Session 4

Query-page intention matching using clicked titles and snippets to boost search rankings BIBAKFull-Text 105-114
  Masaya Murata; Hiroyuki Toda; Yumiko Matsuura; Ryoji Kataoka
Users of text retrieval systems input only a few keywords or sometimes just one keyword to the systems even if they had complex information needs. Due to the lack of query keywords, it becomes hard to return relevant search results that satisfy the demands of each user. Because digital documents, in contrast to queries, are generally composed of many kinds of keywords, it is also difficult to estimate the main topic or grasp the inherent intentions of the documents. In this paper, we present techniques to represent users' search intentions and the intentions that digital documents can satisfy by making use of clicked titles and snippets acquired from a click log analysis. We then present a method to match these intentions to boost search result rankings. Through experiments that use click logs and indexes of a commercial search engine, we verified our method's capability of significantly improving search precision.
Keywords: click logs analysis, implicit relevance feedback, representation of intention, search result rankings
Supporting analysis of future-related information in news archives and the web BIBAKFull-Text 115-124
  Adam Jatowt; Kensuke Kanazawa; Satoshi Oyama; Katsumi Tanaka
A lot of future-related information is available in news articles or Web pages. This information can however differ to large extent and may fluctuate over time. It is therefore difficult for users to manually compare and aggregate it, and to re-construct the most probable course of future events. In this paper we approach a problem of automatically generating summaries of future events related to queries using data obtained from news archive collections or from the Web. We propose two methods, explicit and implicit future-related information detection. The former is based on analyzing the context of future temporal expressions in documents, while the latter relies on detecting periodical patterns in historical document collections. We present a graph-based visualization of future-related information and demonstrate its usefulness through several examples.
Keywords: event prediction, future-related information retrieval, temporal information analysis

Session 5: best paper nominees 1

Generalized formal models for faceted user interfaces BIBAKFull-Text 125-134
  Edward C. Clarkson; Shamkant B. Navathe; James D. Foley
Faceted metadata and navigation have become major topics in library science, information retrieval and Human-Computer Interaction (HCI). This work surveys a range of extant approaches in this design space, classifying systems along several relevant dimensions. We use that survey to analyze the organization of data and its querying within faceted browsing systems. We contribute formal entity-relationship (ER) and relational data models that explain that organization and relational query models that explain systems' browsing functionality. We use these types of models since they are widely used to conceptualize data and to model back-end data stores. Their structured nature also suggests ways in which both the models and faceted systems might be extended.
Keywords: ER model, design survey, entity-relationship model, faceted metadata, faceted navigation, relational model, tuple relational calculus
Large-scale ETD repositories: a case study of a digital library application BIBAKFull-Text 135-144
  Adam Mikeal; James Creel; Alexey Maslov; Scott Phillips; John Leggett; Mark McFarland
We describe the implementation of a statewide system for managing and preserving electronic theses and dissertations (ETDs) from Texas universities. We further explain the theoretical, technical and political issues that arose during the implementation of this system. These issues range from technical components developed by TDL 'such as a customized workflow management application and adding OAI-ORE capabilities to DSpace' to human-centered issues such as stakeholder engagement and participation. Our experiences reflect the challenges, expected and unexpected, that others will face when attempting to build digital library applications to scale.
Keywords: digital library infrastructure, electronic document workflow, electronic theses and dissertations, scalable systems

Session 6: best paper nominees 2

Style-consistency calligraphy synthesis system in digital library BIBAKFull-Text 145-152
  Kai Yu; Jiangqin Wu; Yueting Zhuang
There are lots of digitized calligraphy works written by ancient famous calligraphists in CADAL (China-America Digital Academic Library) digital library. To make use of these resources, users want to generate a tablet or a piece of calligraphic works written by some ancient famous calligraphist. But some characters in the tablet or the calligraphic work hadn't been written by the calligraphist or though were ever written but are hard to read because of long time weathering. In this paper, a novel approach is proposed to synthesize Chinese calligraphic characters which are in the same style of some calligraphist, and a corresponding system is developed for calligraphy works generation and tablets design.
   Calligraphic character is represented by a three-level hierarchical model. A novel approach for determining the character structure is proposed, which takes advantage of both the structure of the same characters of different styles and the structure of similar characters of the same style. A style evaluation model (SEM) is presented to evaluate whether the calligraphic character generated is in the same style of the specified calligraphist and to adjust the calligraphic character generated. Our experiments show that this system is effective.
Keywords: structure determination, style evaluation model (SEM), style-consistency calligraphy synthesis
Generative model-based metasearch for data fusion in information retrieval BIBAKFull-Text 153-162
  Miles Efron
"Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital library setting data fusion may support operations such as federated search based on multiple repository representations. This paper presents a novel approach to the fusion problem: generative model-based Metasearch (GeM). We suggest viewing the appearance of documents in a return set as the outcome of a probabilistic process; some documents are likely to occur in the model, while others are unlikely. Using Bayesian parameter estimation to fit a multinomial distribution based on the return sets to be merged, GeM achieves a final ranking by listing documents in decreasing probability of generation under the induced model. We also introduce what we call "the impatient reader" approach to normalizing document ranks in service to the fusion operation. We report results from several experiments on TREC data suggesting that GeM, informed with impatient reader document scores, operates at state-of-the-art levels of effectiveness.
Keywords: data fusion, digital libraries, generative models, information retrieval, metasearch, probabilistic models
EnTag: enhancing social tagging for discovery BIBAKFull-Text 163-172
  Koraljka Golub; Jim Moon; Douglas Tudhope; Catherine Jones; Brian Matthews; BartBomiej PuzoD; Marianne Lykke Nielsen
The EnTag (Enhanced Tagging for Discovery) project investigated the effect on indexing and retrieval when using only social tagging versus when using social tagging in combination with suggestions from a controlled vocabulary. Two different contexts were explored: tagging by readers of a digital collection and tagging by authors in an institutional repository; also two different controlled vocabularies were examined, Dewey Decimal Classification and ACM Computing Classification Scheme. For each context a separate demonstrator was developed and a user study conducted. The results showed the importance of controlled vocabulary suggestions for both indexing and retrieval: to help produce ideas of tags to use, to make it easier to find focus for the tagging, as well as to ensure consistency and increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both in terms of conceptual relevance to the user and in appropriateness of the terminology. The participants themselves could also see the advantages of controlled vocabulary terms for retrieval if the terms used were from an authoritative source.
Keywords: ACM computing classification scheme, controlled vocabularies, dewey decimal classification, digital collection, folksonomies, institutional repository, intute, social tagging, subject indexing
Review-oriented metadata enrichment: a case study BIBAKFull-Text 173-182
  Liang Zhang; Jiangqin Wu; Yueting Zhuang; Yin Zhang; Chenxing Yang
Book reviews contributed by readers in social sites contain valuable information on books' content, style and merit, many informative words in which can be used to enrich metadata of books in China-Us Million Book Digital Library. In this paper, we present a system for review-oriented metadata enrichment and propose an Book-Centric Diverse Random Walk algorithm on a four-partite graph containing three kinds of relations among authors, books, reviews and words, in order to produce highly relevant as well as diverse keywords for a book. Experimental results of a user study show that our approach significantly outperforms other methods in terms of relevance and diversity. The metadata generated by our approach also has a large overlap with popular social tags and brief introductions from DouBan for books in the coverage experiments.
Keywords: book review, digital libraries, diversity, graph-based scoring, keyword extraction, metadata, metadata enrichment

Session 7

Using timed-release cryptography to mitigate the preservation risk of embargo periods BIBAKFull-Text 183-192
  Rabia Haq; Michael L. Nelson
Due to temporary access restrictions, embargoed data cannot be refreshed to unlimited parties during the embargo time interval. A solution to mitigate the risk of data loss has been developed that uses a data dissemination framework, the Timed-Locked Embargo Framework (TLEF), that allows data refreshing of encrypted instances of embargoed content in an open, unrestricted scholarly community. TLEF exploits implementations of existing technologies to "time-lock" data using timed-release cryptology so that TLEF can be deployed as digital resources encoded in a complex object format suitable for metadata harvesting. The framework successfully demonstrates dynamic record identification, time-lock puzzle encryption, encapsulation and dissemination as XML documents. We implement TLEF and provide a quantitative analysis of its successful data harvest of time-locked embargoed data with minimum time overhead without compromising data security and integrity.
Keywords: cryptography, repositories, time lock, timed release
Learning to assess the quality of scientific conferences: a case study in computer science BIBAKFull-Text 193-202
  Waister Silva Martins; Marcos André Gonçalves; Alberto H. F. Laender; Gisele L. Pappa
Assessing the quality of scientific conferences is an important and useful service that can be provided by digital libraries and similar systems. This is specially true for fields such as Computer Science and Electric Engineering, where conference publications are crucial. However, the majority of the existing approaches for assessing the quality of publication venues has been proposed for journals. In this paper, we characterize a large number of features that can be used as criteria to assess the quality of scientific conferences and study how these several features can be automatically combined by means of machine learning techniques to effectively perform this task. Within the features studied are citations, submission and acceptance rates, tradition of the conference, and reputation of the program committee members. Among our several findings, we can cite that: (1) separating high quality conferences from medium and low quality ones can be performed quite effectively, but separating the last two types is a much harder task; and (2) citation features followed by those associated with the tradition of the conference are the most important ones for the task.
Keywords: classification, conference assessment, digital library, machine learning
CARES: a ranking-oriented CADAL recommender system BIBAKFull-Text 203-212
  Chenxing Yang; Baogang Wei; Jiangqin Wu; Yin Zhang; Liang Zhang
A recommender system is useful for a digital library to suggest the books that are likely preferred by a user. Most recommender systems using collaborative filtering approaches leverage the explicit user ratings to make personalized recommendations. However, many users are reluctant to provide explicit ratings, so ratings-oriented recommender systems do not work well. In this paper, we present a recommender system for CADAL digital library, namely CARES, which makes recommendations using a ranking-oriented collaborative filtering approach based on users' access logs, avoiding the problem of the lack of user ratings. Our approach employs mean AP correlation coefficients for computing similarities among users' implicit preference models and a random walk based algorithm for generating a book ranking personalized for the individual. Experimental results on real access logs from the CADAL web site show the effectiveness of our system and the impact of different values of parameters on the recommendation performance.
Keywords: collaborative filtering, digital library, recommendation system
Recommendation as link prediction: a graph kernel-based machine learning approach BIBAKFull-Text 213-216
  Xin Li; Hsinchun Chen
Recommender systems have demonstrated commercial success in multiple industries. In digital libraries they have the potential to be used as a support tool for traditional information retrieval functions. Among the major recommendation algorithms, the successful collaborative filtering (CF) methods explore the use of user-item interactions to infer user interests. Based on the finding that transitive user-item associations can alleviate the data sparsity problem in CF, multiple heuristic algorithms were designed to take advantage of the user-item interaction networks with both direct and indirect interactions. However, the use of such graph representation was still limited in learning-based algorithms. In this paper, we propose a graph kernel-based recommendation framework. For each user-item pair, we inspect its associative interaction graph (AIG) that contains the users, items, and interactions n steps away from the pair. We design a novel graph kernel to capture the AIG structures and use them to predict possible user-item interactions. The framework demonstrates improved performance on an online bookstore dataset, especially when a large number of suggestions are needed.
Keywords: collaborative filtering, kernel methods, recommender system
A polyrepresentational approach to interactive query expansion BIBAKFull-Text 217-220
  Abdigani Diriye; Ann Blandford; Anastasios Tombros
Interactive Query Expansion (IQE) presents suggested terms to the user during their search to enable better Information Retrieval (IR). However, IQE terms are poorly used, and tend to lack information meaningful to the user. The lack of cognitive and functional support during query refinement is a well documented problem, and despite the work carried out, it is still an under researched area. This stagnation in progress has been partly due to the long held belief that users are able to make good IQE term selections, and that the de facto way IQE terms are presented is effective. In this paper, we introduce a novel method to improve the presentation of IQE terms by providing supplementary information alongside them. We describe a user study that compared our novel polyrepresentational approach to IQE against a conventional IQE system and a baseline system. Our findings have shown that a polyrepresentational approach to IQE can address the ambiguity and uncertainty surrounding IQE, and improve the perceived usefulness of the terms.
Keywords: query formulation, interactive query expansion

Session 8: best paper finalists

Automatically characterizing resource quality for educational digital libraries BIBAKFull-Text 221-230
  Steven Bethard; Philipp Wetzer; Kirsten Butcher; James H. Martin; Tamara Sumner
With the rise of community-generated web content, the need for automatic characterization of resource quality has grown, particularly in the realm of educational digital libraries. We demonstrate how identifying concrete factors of quality for web-based educational resources can make machine learning approaches to automating quality characterization tractable. Using data from several previous studies of quality, we gathered a set of key dimensions and indicators of quality that were commonly identified by educators. We then performed a mixed-method study of digital library curation experts, showing that our characterization of quality captured the subjective processes used by the experts when assessing resource quality for classroom use. Using key indicators of quality selected from a statistical analysis of our expert study data, we developed a set of annotation guidelines and annotated a corpus of 1000 digital resources for the presence or absence of these key quality indicators. Agreement among annotators was high, and initial machine learning models trained from this corpus were able to identify some indicators of quality with as much as an 18% improvement over the baseline.
Keywords: educational digital library, learning resource, machine learning, quality
Improving optical character recognition through efficient multiple system alignment BIBAKFull-Text 231-240
  William B. Lund; Eric K. Ringger
Individual optical character recognition (OCR) engines vary in the types of errors they commit in recognizing text, particularly poor quality text. By aligning the output of multiple OCR engines and taking advantage of the differences between them, the error rate based on the aligned lattice of recognized words is significantly lower than the individual OCR word error rates. This lattice error rate constitutes a lower bound among aligned alternatives from the OCR output. Results from a collection of poor quality mid-twentieth century typewritten documents demonstrate an average reduction of 55.0% in the error rate of the lattice of alternatives and a realized word error rate (WER) reduction of 35.8% in a dictionary-based selection process. As an important precursor, an innovative admissible heuristic for the A* algorithm is developed, which results in a significant reduction in state space exploration to identify all optimal alignments of the OCR text output, a necessary step toward the construction of the word hypothesis lattice. On average 0.0079% of the state space is explored to identify all optimal alignments of the documents.
Keywords: A* algorithm, OCR error rate reduction, text alignment
No bull, no spin: a comparison of tags with other forms of user metadata BIBAKFull-Text 241-250
  Catherine C. Marshall
User-contributed tags have shown promise as a means of indexing multimedia collections by harnessing the combined efforts and enthusiasm of online communities. But tags are only one way of describing multimedia items. In this study, I compare the characteristics of public tags with other forms of descriptive metadata'titles and narrative captions'that users have assigned to a collection of very similar images gathered from the photo-sharing service Flickr. The study shows that tags converge on different descriptions than the other forms of metadata do, and that narrative metadata may be more effective than tags for capturing certain aspects of images that may influence their subsequent retrieval and use. The study also examines how photographers use peoples' names to personalize the different types of metadata and how they tell stories across short sequences of images. The study results are then brought to bear on design recommendations for user tagging tools and automated tagging algorithms and on using photo sharing sites as de facto art and architecture resources.
Keywords: collaborative information management, image collection, metadata, study, tags

Session 9

What happens when facebook is gone? BIBAKFull-Text 251-254
  Frank McCown; Michael L. Nelson
Web users are spending more of their time and creative energies within online social networking systems. While many of these networks allow users to export their personal data or expose themselves to third-party web archiving, some do not. Facebook, one of the most popular social networking websites, is one example of a "walled garden" where users' activities are trapped. We examine a variety of techniques for extracting users' activities from Facebook (and by extension, other social networking systems) for the personal archive and for the third-party archiver. Our framework could be applied to any walled garden where personal user data is being locked.
Keywords: digital preservation, personal archiving, social networks
Improving historical research by linking digital library information to a global genealogical database BIBAKFull-Text 255-258
  Douglas J. Kennard; William B. Lund; Bryan S. Morse
Journals, letters, and other writings are of great value to historians and those who research their own family history; however, it can be difficult to find writings by specific people, and even harder to find what others wrote about them. We present a prototype web-based system that enables users to discover information about historical people (including their own ancestors) by linking digital library content to unique PersonIDs from a genealogical database. Users can contribute content such as scanned journals or information about where items can be found. They can also transcribe content and tag it with PersonIDs to identify who it is about. Additional features provide tools for users to explore historical contexts and relationships. These include the ability to tag places and to create a historical social network by specifying non-family relationships or by using a mechanism we call rosters to imply participation in some group or event.
Keywords: authority control, diaries, family history, genealogy, historical social networks, journals, tagging
Collecting fragmentary authors in a digital library BIBAKFull-Text 259-262
  Monica Berti; Matteo Romanello; Alison Babeu; Gregory Crane
This paper discusses new work to represent, in a digital library of classical sources, authors whose works themselves are lost and who survive only where surviving authors quote, paraphrase or allude to them. It describes initial works from a digital collection of such fragmentary authors designed not only to capture but to extend the ontologies that traditional scholarship has developed over generations: the aim is representing every nuance of print conventions while using the capabilities of digital libraries to extend our ability to identify fragments, to represent what we have identified, and to render the results of that work intellectually and physically more accessible than was possible in print culture.
Keywords: digital libraries, fragmentary authors, greek fragmentary historians, tei p5 guidelines, xml
Robust registration of manuscript images BIBAKFull-Text 263-266
  Ryan Baumann; W. Brent Seales
In this paper we present an application of image registration techniques to the specific domain of manuscript images. We show the application of this technique to images of the Venetus A, a 10th century manuscript of Homer's Iliad. The same algorithm is used to register images of the MS across time (including photographs separated by over a century), as well as across imaging modalities.
Keywords: image registration, image warping, manuscript restoration, multispectral imaging

Session 10

Cost and benefit analysis of mediated enterprise search BIBAKFull-Text 267-276
  Mingfang Wu; James A. Thom; Andrew Turpin; Ross Wilkinson
The utility of an enterprise search system is determined by three key players: the information retrieval (IR) system (the search engine), the enterprise users, and the service provider who delivers the tailored IR service to its designated enterprise users. Currently, evaluations of enterprise search have been focused largely on the IR system effectiveness and efficiency, only a relatively small amount of effort on the user's involvement, and hardly any effort on the service provider's role. This paper will investigate the role of the service provider. We propose a method that evaluates the cost and benefit for a service provider of using a mediated search engine -- in particular, where domain experts intervene on the ranking of the search results from a search engine. We test our cost and benefit evaluation method in a case study and conduct user experiments to demonstrate it. Our study shows that: 1) by making use of domain experts' relevance assessments in search result ranking, the precision and the discount cumulated gain of ranked lists have been improved significantly (144% and 40% respectively); 2) the service provider gains substantial return on investment and higher search success rate by investing in domain experts' relevance assessments; and 3) the cost and benefit evaluation also indicates the type of queries to be selected from a query log for evaluating an enterprise search engine.
Keywords: cost and benefit analysis, enterprise search, evaluation, information retrieval, mediated search, relevance feedback
Document relevance assessment via term distribution analysis using fourier series expansion BIBAKFull-Text 277-284
  Patricio Galeas; Ralph Kretschmer; Bernd Freisleben
In addition to the frequency of terms in a document collection, the distribution of terms plays an important role in determining the relevance of documents for a given search query. In this paper, term distribution analysis using Fourier series expansion as a novel approach for calculating an abstract representation of term positions in a document corpus is introduced. Based on this approach, two methods for improving the evaluation of document relevance are proposed: (a) a function-based ranking optimization representing a user defined document region, and (b) a query expansion technique based on overlapping the term distributions in the top-ranked documents. Experimental results demonstrate the effectiveness of the proposed approach in providing new possibilities for optimizing the retrieval process.
Keywords: fourier series, query expansion, ranked retrieval, term distribution

Session 11

How do you feel about "dancing queen"?: deriving mood & theme annotations from user tags BIBAKFull-Text 285-294
  Kerstin Bischoff; Claudiu S. Firan; Wolfgang Nejdl; Raluca Paiu
Web 2.0 enables information sharing, collaboration among users and most notably supports active participation and creativity of the users. As a result, a huge amount of manually created metadata describing all kinds of resources is now available. Such semantically rich user generated annotations are especially valuable for digital libraries covering multimedia resources such as music, where these metadata enable retrieval relying not only on content-based (low level) features, but also on the textual descriptions represented by tags. However, if we analyze the annotations users generate for music tracks, we find them heavily biased towards genre. Previous work investigating the types of user provided annotations for music tracks showed that the types of tags which would be really beneficial for supporting retrieval -- usage (theme) and opinion (mood) tags -- are often neglected by users in the annotation process. In this paper we address exactly this problem: in order to support users in tagging and to fill these gaps in the tag space, we develop algorithms for recommending mood and theme annotations. Our methods exploit the available user annotations, the lyrics of music tracks, as well as combinations of both. We also compare the results for our recommended mood / theme annotations against genre and style recommendations -- a much easier and already studied task. Besides evaluating against an expert (AllMusic.com) ground truth, we evaluate the quality of our recommended tags through a Facebook-based user study. Our results are very promising both in comparison to experts as well as users and provide interesting insights into possible extensions for music tagging systems to support music search.
Keywords: collaborative tagging, high-level music descriptors, metadata enrichment, mood and theme tag recommendation
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia BIBAKFull-Text 295-304
  Daniel Hasan Dalip; Marcos André Gonçalves; Marco Cristo; Pável Calado
The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction.
Keywords: SVM, machine learning, quality assessment, wikipedia
Designing the reading experience for scanned multi-lingual picture books on mobile phones BIBAKFull-Text 305-308
  Benjamin B. Bederson; Alex Quinn; Allison Druin
This paper reports on an adaption of the existing PopoutText and ClearText display techniques to mobile phones. It explains the design rationale for a freely available iPhone application to read books from the International Children's Digital Library. Through a combination of applied image processing, a zoomable user interface, and a process of working with children to develop the detailed design, we present an interface that supports clear reading of scanned picture books in multiple languages on a mobile phone.
Keywords: books, children, digital libraries, iPhone, interaction design, interface design, mobile phones, readability
Mobility, digital libraries and a rural indian village BIBAKFull-Text 309-312
  Matt Jones; Emma Thom; David Bainbridge; David Frohlich
Millions of people in developed countries routinely create and share digital content; but what about the billions of others in on the wrong side of what has been called the 'global digital divide'? This paper considers three mobile platforms to illustrate their potential in enabling rural Indian villagers to make and share digital stories. We describe our experiences in creating prototypes using mobile phones; high-end media-players; and, paper. Interaction designs are discussed along with findings from various trials within the village and elsewhere. Our approach has been to develop prototypes that can work together in an integrated fashion so that content can flow freely and in interesting ways through the village. While our work has particular relevance to those users in emerging world contexts, we see it also informing needs and practices in the developed world for user-generated content.
Keywords: digital libraries, digital-divide, information ecologies, mobility, user-generated content
What do exploratory searchers look at in a faceted search interface? BIBAKFull-Text 313-322
  Bill Kules; Robert Capra; Matthew Banta; Tito Sierra
This study examined how searchers interacted with a web-based, faceted library catalog when conducting exploratory searches. It applied eye tracking, stimulated recall interviews, and direct observation to investigate important aspects of gaze behavior in a faceted search interface: what components of the interface searchers looked at, for how long, and in what order. It yielded empirical data that will be useful for both practitioners (e.g., for improving search interface designs), and researchers (e.g., to inform models of search behavior). Results of the study show that participants spent about 50 seconds per task looking at (fixating on) the results, about 25 seconds looking at the facets, and only about 6 seconds looking at the query itself. These findings suggest that facets played an important role in the exploratory search process.
Keywords: OPAC, exploratory search, eye tracking, faceted search, online public access catalogs, task design, user studies

Session 12

Aligning METS with the OAI-ORE data model BIBAKFull-Text 323-330
  Jerome P. McDonough
The Open Archives Initiative -- Object Reuse and Exchange (OAI-ORE) specifications provide a flexible set of mechanisms for transferring complex data objects between different systems. In order to serve as an exchange syntax, OAI-ORE must be able to support the import of information from localized data structures serving various communities of practice. In this paper, we examine the Metadata Encoding & Transmission Standard (METS) and the issues that arise when trying to map from a localized structural metadata schema into the OAI-ORE data model and serialization syntaxes.
Keywords: METS, OAI-ORE, aggregation, modeling, structural metadata
EverLast: a distributed architecture for preserving the web BIBAKFull-Text 331-340
  Avishek Anand; Srikanta Bedathur; Klaus Berberich; Ralf Schenkel; Christos Tryfonopoulos
The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web.
   However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.
Keywords: crawling, indexing, time-travel search, web archives
A framework for describing web repositories BIBAKFull-Text 341-344
  Frank McCown; Michael L. Nelson
In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive's Wayback Machine can be used to "lazily preserve" website and reconstruct them when they are lost. We use the term "web repositories" for collections of automatically refreshed and migrated content, and collectively we refer to these repositories as the "web infrastructure". In this paper we present a framework for describing web repositories and the status of web resources in them. This includes an abstract API for web repository interaction, the concepts of deep vs. flat and light/dark/grey repositories and terminology of describing the recoverability of a web resource. Our API may serve as a foundation for future web repository interfaces.
Keywords: preservation, web repositories, web resources
Preserving digital data in heterogeneous environments BIBAKFull-Text 345-348
  Gonçalo Antunes; José Barateiro; Manuel Cabral; José Borbinha; Rodrigo Rodrigues
Digital preservation aims at maintaining digital objects accessible over a long period of time, regardless of the challenges of organizational or technological changes or failures. In particular, data produced in e-Science domains could be reliably stored in today's data grids, taking advantage of the natural properties of this kind of infrastructure to support redundancy. However, to achieve reliability we must take into account failure interdependency. Taking into account the fact that correlated failures can affect multiple components and potentially cause complete loss of data, we propose a solution to evaluate redundancy strategies in the context of heterogeneous environments such as data grids. This solution is based on a simulation engine that can be used not only to support the process of designing the preservation environment and related policies, but also later on to observe and control the deployed system.
Keywords: data grids, dependability, digital libraries, digital preservation
Unsupervised creation of small world networks for the preservation of digital objects BIBAKFull-Text 349-352
  Charles L. Cartledge; Michael L. Nelson
The prevailing model for digital preservation is that archives should be similar to a "fortress": a large, protective infrastructure built to defend a relatively small collection of data from attack by external forces. Such projects are a luxury, suitable only for limited collections of known importance and requiring significant institutional commitment for sustainability. In previous research, we have shown the web infrastructure (i.e., search engine caches, web archives) refreshes and migrates web content in bulk as side-effects of their user-services, and these results can be mined as a useful, but passive preservation service. Our current research involves a number of questions resulting from removing the implicit assumption that web-based data objects must passively await curatorial services: What if data objects were not tethered to repositories? What are the implications if the content were actively seeking out and injecting itself into the web infrastructure (i.e., search engine caches, web archives)? All of this leads to our primary research question: Can we create objects that preserve themselves more effectively than repositories or web infrastructure can?
Keywords: digital preservation, small world
Towards a virtual organization for data cyberinfrastructure BIBAKFull-Text 353-356
  Christine L. Borgman; Geoffrey C. Bowker; Thomas A. Finholt; Jillian C. Wallis
We report on the exploratory stages of multi-university, multi-research-site, multi-year effort to investigate and compare data practices in multiple cyberinfrastructure projects and their emerging virtual organizations. Our long-term goal is to understand the data practices and data management requirements of virtual organizations and their implications for the design and development of data digital libraries. We have constructed our own virtual organization as a participant-observer approach to the research. Results to date suggest that collaborative technologies are emergent and that defining and scoping the data products of collaborations continues to be problematic.
Keywords: collaborative work, cyberinfrastructure, scientific data, sensor networks


Expanding the search for digital preservation solutions: adopting PREMIS in cultural heritage institutions BIBAKFull-Text 357-358
  Daniel Gelaw Alemneh
This paper will present some preliminary result on factors that affect the adoption of PREMIS (Preservation Metadata Implementation Strategies) in cultural heritage institutions. The study employed a web-based survey to collect data from 123 participants in 20 countries as well as a semi-structured, follow-up telephone interview with a smaller sample of the survey respondents. Roger's diffusion of innovation theory was used as a theoretical framework. The main constructs considered for the study were relative advantage, compatibility, complexity, trialability, observability, and institution readiness. The study yielded both qualitative and quantitative data, and preliminary analysis showed that all six factors influence the adoption of PREMIS in varying degrees.
Keywords: diffusion of innovation, digital preservation, metadata, premis
Collaborative digital library: enhancing digital collections to improve learning in educational programs BIBAKFull-Text 359-360
  Ali Sajedi Badashian; Asghar Dehghani Firouzabadi; Iman Khalkhali; Hamidreza Afzali; Morteza Ashurzad Delcheh; Mohammad Shoja Shafiei; Mahdi Alipour
In this article, a universal collaborative and competitive approach is introduced for deployment of digital collections in an ideal Digital Library (DL) for future's educational system. The collaborative and open-source aspects of the system guarantee its growth and the competitive aspects guarantee the accuracy.
Keywords: collection development, curriculum development, digital libraries, educational resources, exploring, information visualization, integration, knowledge sharing
Digitizing the flea market: eBay as a data source for historic collections BIBAKFull-Text 361-362
  Snowden Becker
The online auction site eBay has overtaken face-to-face transactions as the primary means of doing business for collectors and sellers of unique and ephemeral materials. Historical societies, museums, and archives also increasingly collect ephemera as records of social and cultural history. This presentation argues that the digitized flea market, as epitomized by eBay, replaces in-person sales while also providing a stream of rich information about a previously invisible, unquantifiable marketplace. Furthermore, identifying factors that influence collectibles buyers' behavior in online auction sales can also shed light on factors affecting user behaviors in digital libraries. Data from a survey of over 1,000 recent home movie auction listings on eBay suggest how eBay may be used as a data source by collectors, as well as the users and designers of digital libraries.
Keywords: collectibles, eBay, ephemera, home movies, online auctions
Semantic alerting for digital libraries BIBAKFull-Text 363-364
  George Buchanan; Annika Hinze
We previously investigated the support of alerting services across networks of heterogeneous digital libraries. We now report the first generation of semantically enhanced digital library alerting systems. Where previous alerting services have provided users with notifications of new library content using traditional metadata, we demonstrate the advantages and challenges of using semantic technologies. This uncovers key issues that are not yet fully understood in general event-based systems (including alerting systems).
Keywords: FRBR, aggregate documents, alerting, digital libraries, semantics
Addressing researchers' needs through the data curation profile BIBAKFull-Text 365-366
  Jake Carlson; Deborah Leiter
This poster describes a study currently in progress that seeks to identify and address the needs of researchers from multiple disciplines in managing, curating and preserving their data. One output of this study, which is still in its early stages, will be the "data curation profile," a methodological tool designed to enable the comparison of needs across disciplines and help librarians build digital libraries that accurately reflect and address the needs of data producers.
Keywords: data curation, data sharing, repositories
Implementation and evaluation of palm leaf manuscript metadata schema (PLMM) BIBAKFull-Text 367-368
  Nisachol Chamnongsri; Lampang Manmart; Vilas Wuwongse
The evaluation of Palm Leaf Manuscripts Metadata Schema (PLMM) aims to examine whether the PLMM satisfactorily meets the user requirements in searching for the PLMs and managing the PLMs collection. (1) An examination of the PLMM's capability in describing the particular characteristics of Northeastern Thai Palm Leaf Manuscripts, and its usefulness in the palm leave manuscripts preservation and rights control management (2) an investigation of users' satisfaction when using PLMM to search for the PLMs and managing the PLMs collection. The evaluation process began with the development of the prototype of PLMs management system to implement the PLMM. Then, more than 200 metadata records describing all types of sample PLMs (with variations in sizes, scripts, languages, titles, and number of content subjects contained in a fascicle) were provided in Extensible Markup Language (XML) format, while system interfaces and queries were developed with Hypertext Preprocessor (PHP). This was followed by the trials with end users and staff in their workplace in order to evaluate the usefulness of PLMM in user tasks according to the FRBR tasks: find, identify, select, and obtain; and collection development tasks. The research found that 'somewhat high' efficiency of the PLMM was perceived among the participants in the two tasks. The finding also suggests that perceived efficiency of the PLMM was significantly higher with more years of users' experience with the PLMs. The status of users is another factor which positively affected the perceived efficiency of the PLMM.
Keywords: cultural heritage, metadata schema, palm leaf manuscript
A personalized learning environment BIBAKFull-Text 369-370
  Sebastian de la Chica; Faisal Ahmad; Qianyi Gu; Ifi Okoye; Keith Maull; Tamara Sumner; Kirsten R. Butcher
We report on the current research activities and results obtained through the Concept Learning service for Concept Knowledge (CLICK) and present a demonstration of the system. This poster session will focus on a demonstration of the CLICK system and the results of the learning study.
Keywords: competency models, digital library resources, knowledge models, personalization, student misconceptions
Analysis of transaction logs for insights into use of life oral histories BIBAKFull-Text 371-372
  Michael G. Christel; Bryan S. Maher; Huan Li
A digital video library of over 900 hours of video and 18000 stories from The HistoryMakers was used by 214 students, faculty, librarians, and life-long learners interacting with a system providing multiple search and viewing capabilities over a trial period of several months. User demographics and actions were logged, providing metrics on how the system was used. This poster overviews a few highlights from these transaction logs of the Informedia digital video library system for life oral histories.
Keywords: digital video library, oral histories, video retrieval
Summarizing user-generated reviews in digital libraries: a visual clustering approach BIBAKFull-Text 373-374
  Wingyan Chung
In this paper, we describe a visual clustering approach to summarizing user-generated reviews of digital library items and services. The approach consists of the steps of sentence extraction, aspect identification, opinion classification, and review summarization. Our work augments existing work by considering non-standard input and by incorporating clustering and visualization in summarization.
Keywords: aspect analysis, clustering, sentiment analysis, text classification, text summarization, user-generated review, visualization
An interoperability service framework for high-resolution image applications BIBAKFull-Text 375-376
  Ryan Chute; Stephan Dresher; Luda Balakireva; Herbert Van de Sompel
This poster presents a prototype architecture and potential use-cases for a standards-based service framework to simplify development of high-resolution image viewing clients.
Keywords: JPEG 2000, JSON, OAI-ORE, architecture, digital imaging, digital libraries, interoperability, openurl, standards
Tailoring greenstone for seniors BIBAKFull-Text 377-378
  Sally Jo Cunningham; Erin K. Bennett
We present a re-design of Greenstone to support seniors (aged over 65) in managing documents reflecting their life history.
Keywords: home archiving, personal history, senior users
A mixed digital / physical snapshot of early internet / web usage in New Zealand BIBAKFull-Text 379-380
  Sally Jo Cunningham; Jillene Bydder
We are in the early stages of developing a unique physical and digital record of New Zealand's early experience of the Internet.
Keywords: digital museum, history of the web, internet archive
Mashing up life science literature resources BIBAKFull-Text 381-382
  Richard Easty; Nikolay Nikolov
In the life sciences one of the pronounced problems is the deluge of new results and data that are produced on a daily basis. This data can take many different forms, e.g. microarray probes, gene sequences, protein structures and is added by hundreds of research centers world-wide in a largely uncoordinated fashion. Thus integration of life science data is growing in importance. Unfortunately, most research centers do not have particular incentive to spend efforts on integrating their data with data produced by others. This task is largely left to large publicly-sponsored institutions like the US National Library of Medicine and similar institutions in other countries. Unfortunately, despite their work in this area, the integration of web-based life science resources is still an open issue (and one ever growing in importance) as these organizations cannot cope with the information deluge that is happening on a daily basis in the life sciences. Thus it becomes essential that as many as possible third parties are engaged in the process. Here we demonstrate a simple prototype of a browser plugin that creates a platform for third parties to contribute to cross-linking related online life science data resources and thus improving the search experience and the productivity of the life science community. The plugin creates a convenient programming interface that minimizes the effort that arises for such third-party contributors. We have provided reference implementations using the plugin that cross-link life science literature resources and illustrate the potential for third parties to create mashups that could be applied also in areas other than the life sciences.
Keywords: browser plugin, data integration, life science literature
Representing publication and distribution practices for scholarly materials: a cross-disciplinary comparison BIBAKFull-Text 383-384
  Phillip M. Edwards
This poster presents a pluralistic approach for representing discipline-specific, cross-disciplinary, and discipline-independent work practices related to scholarly communication. This approach has been applied to qualitative analysis from an investigation of publication and distribution practices of scholars within the biological sciences and the field of communication. The resulting representations illustrate shared work practices and areas where diverse practices exist, both of which can guide the development of digital collections of scholarly materials. This poster also considers challenges related to aligning data collection methods with the application of these representational techniques.
Keywords: scholarly communication, scholarly publication, work practices
Inferring intra-organizational collaboration from cosine similarity distributions in text documents BIBAKFull-Text 385-386
  Maria Esteva; Hai Bi
We present a method that uses text mining methods and statistical distributions to infer degrees of collaboration between staff members in an organization, based on the similarity of the documents that that they wrote and exchanged over time.
Keywords: digital archives, statistical distributions, text mining
Personal name-matching through name transformation BIBAKFull-Text 387-388
  Jun Gong; Lidan Wang; Douglas W. Oard
A graph theory based method is proposed to exploit name transformation for personal name-matching. Experiment results on three personal name datasets show that the method is effective.
Keywords: name transformation, personal name-matching, string distance
EMU: the emory user behavior data management system for automatic library search evaluation BIBAKFull-Text 389-390
  Qi Guo; Ryan P. Kelly; Selden Deemer; Arthur Murphy; Joan A. Smith; Eugene Agichtein
We describe EMU, a system for collecting, managing, and mining the behavior data collected in the Emory libraries search system. We describe the data capture system based on the LibX browser plugin, the database management system for successfully storing, searching and exploring millions of resulting user interactions, and preliminary results of interesting queries and statistics that we are using to evaluate the effectiveness of library search tools.
Keywords: data exploration, library search evaluation, user behavior modeling
Building a thailand researcher network based on a bibliographic database BIBAKFull-Text 391-392
  Choochart Haruechaiyasak; Alisa Kongthon; Santipong Thaiprayoon
Among many practical and domain-specific tasks, expertise retrieval (ER) has recently gained increasing attention in the information retrieval and knowledge management communities. This paper describes our ongoing project to design and implement an expert retrieval system with the scope on researchers who work in Thailand. In our current system prototype, we assume that the areas of expertise among researchers can be extracted from bibliographic databases. We use the Science Citation Index (SCI) database to provide the information for representing the expert profiles. From the SCI database, we queried and retrieved publications covering from the year 2001 to 2008 by specifying the affiliation equal to "Thailand". The results contain a set of approximately 23,000 publications. We downloaded and extracted four related fields including authors (denoted by AU), controlled terms (denoted by ID), keywords (denoted by DE) and subject category (denoted by SC). To build a researcher network, we consider two types of relationships: direct and indirect. The direct (or social) relationship is defined as the co-authoring degree between one researcher to others. The co-authoring degree between two researchers, co-authoring(A,B), can be calculated based on the co-occurrence frequency between A and B found in the field AU of 23,000 retrieved records. The indirect (or topical relationship is defined when two researchers have publications under the same topics. The topical degree between two researchers, topical(A,B), can be calculated based on the similarity measure between two sets of extracted keywords, keyword(A) and keyword(B), representing researcher A and B, respectively. The keyword set can be extracted from the fields ID, DE and SC. An author with high frequencies on particular keywords is considered an expert in the corresponding research topics.
Keywords: R&D management, expertise retrieval, social network
Building a MARC-to-OLAC crosswalk: repurposing library catalog data for the language resources community BIBAKFull-Text 393-394
  Christopher Hirt; Gary Simons; Joan Spanne
The Open Language Archives Community (OLAC) is an international partnership of institutions which are building a network of interoperating repositories and services to create a worldwide virtual library of language resources (that is, resources that document, describe, or develop the more than 7,000 known languages of the world). OLAC uses a community-specific refinement of qualified Dublin Core [http://www.language-archives.org/OLAC/metadata.htm] along with a community-specific refinement of the OAI Protocol for Metadata Harvesting [http://www.language-archives.org/OLAC/repositories.htm] to maintain an aggregated catalog of the holdings of the 35 participating archives. OLAC recognizes that the language resources of interest to the community come not only from sources within the community but also from many sources outside the community. This poster describes one approach we have developed for addressing this issue, namely, a crosswalk that transforms the MARC21 catalog for a library or archive into an OAI static repository that holds an OLAC metadata record for each MARC record identified as describing a language resource.
Keywords: ISO 639, language identification
Locating text in scanned books BIBAKFull-Text 395-396
  Chang Hu; Anne Rose; Benjamin B. Bederson
In this paper, we describe a work flow to extract and verify text locations using commercial software, along with free software products and human proofing. To help mid-sized digital libraries, we are making our solution available as open source software.
Keywords: adobe acrobat, book readers, digital libraries, readability, word location
Remote usability testing: a practice BIBAKFull-Text 397-398
  Sheng-Cheng Huang; Randolph G. Bias; Tanya L. Payne; Jay B. Rogers
For increasingly frequent use of library resources by remote users, remote usability testing has become a valuable tool for those who would pursue an empirical, user-centered design of the interfaces to their electronic resources and services. This paper describes our implementation of remote usability tests to evaluate prototypes of a web content management application developed by Vignette Corporation, and reports sample results to illustrate the utility of such an approach that can help designing and improving interfaces of digital library projects and their usability.
Keywords: collaborative design, remote testing, usability testing
Scientific digital libraries, interoperability, and ontologies BIBAKFull-Text 399-400
  J. Steven Hughes; Daniel J. Crichton; Chris A. Mattmann
Scientific digital libraries serve complex and evolving research communities. Justifications for the development of scientific digital libraries include the desire to preserve science data and the promises of information interconnectedness, correlative science, and system interoperability. Research [1] suggests single shared ontologies are fundamental to fulfilling these promises. We present a tool framework, a set of principles, and a real world case study where shared ontologies are used to develop and manage science information models and subsequently guide the implementation of scientific digital libraries. The tool framework, based on an ontology modeling tool as illustrated in Figure 1, was configured to develop, manage, and keep shared ontologies relevant within changing domains and to promote the interoperability, interconnectedness, and correlation desired by scientists.
Keywords: digital library, information model, interoperability, ontology, science data, science metadata
The landscape of information science: 1996-2008 BIBAKFull-Text 401-402
  Fidelia Ibekwe-SanJuan; Eric SanJuan
We propose a methodology combining symbolic and numeric information to map the structure of research in Information Science between 1996-2008. The visualization of the resulting maps showed that while the two-camp structure of Information Science observed in previous studies is still valid, other research poles like web and user-oriented studies are building bridges between the two hitherto isolated poles.
Keywords: clustering, information visualization, knowledge domain mapping, text mining
Forging the future: new tools for variable media art preservation BIBKFull-Text 403-404
  Jon Ippolito; Richard Rinehart; Marilyn Lutz; Sharon Fitzgerald
Keywords: metadata, new media, preservation strategies, variable media art
Analyzing OPAC use with screen views and eye tracking BIBAKFull-Text 405-406
  Emi Ishita; Shinji Mine; Masanori Koizumi; Yosuke Miyata; Chihiro Kunimoto; Junko Shiozaki; Keiko Kurata; Shuichi Ueda
Eye tracking was used to analyze which elements of which screens were viewed by users searching an Online Public Access Catalog (OPAC). Eye tracking data was obtained for 32 participants performing a known-item search task. The results show that more than 30% of participants did not make effective use of screens offering additional details, and that participants who did, and found the correct answer, gazed at specific screen elements more frequently than participants who gave incorrect answers.
Keywords: OPAC use, eye tracking, viewing patterns
A user-friendly metadata quality control tool for the internet public library BIBAKFull-Text 407-408
  Michael Khoo; Xia Lin; Jung-ran Park
The Internet Public Library (IPL) is crosswalking its metadata to Dublin Core. The quality of the crosswalked metadata will be unknown. The IPL is therefore developing a tool for metadata quality control suitable for use by LIS students who have little previous metadata quality control experience.
Keywords: HCI, LIS instruction, dublin core, evaluation, internet public library, metadata, metadata quality control, user-centered design
Using an institutional repository for personal digital collections of retired faculty members BIBAKFull-Text 409-410
  Sarah Kim
In this poster, I address practical issues related to using IRs for personal digital collections of retired faculty members.
Keywords: archival collection, archiving, institutional repository
Exploitation of the wikipedia category system for enhancing the value of LCSH BIBAKFull-Text 411-412
  Yoji Kiyota; Hiroshi Nakagawa; Satoshi Sakai; Tatsuya Mori; Hidetaka Masuda
This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH through the Wikipedia category system.
Keywords: LCSH, subject headings, wikipedia categories
Inter-search engine lexical signature performance BIBAKFull-Text 413-414
  Martin Klein; Michael L. Nelson
We generate lexical signatures (LSs) from web pages and acquire the mandatory document frequency values from three dierent search engine (SE) indexes. We cross-query the LSs against the two SEs they were not generated from and compare the retrieval performance by parsing the result set and analyzing the rank of the source URL.
Keywords: lexical signature, performance, search engine
Correlation of music charts and search engine rankings BIBAKFull-Text 415-416
  Martin Klein; Olena Hunsicker; Michael L. Nelson
We investigate the question whether expert rankings of real-world entities correlate with search engine (SE) rankings of corresponding web resources. We compare Billboards "Hot 100 Airplay" music charts with SE rankings of associated web resources. Out of nine comparisons we found two strong, two moderate, two weak and one negative correlation. The remaining two comparisons were inconclusive.
Keywords: correlation, real-world objects, search engine
Toward automatic generation of image-text document surrogates to optimize cognition BIBAKFull-Text 417-418
  Eunyee Koh; Andruid Kerne; Jon Moeller
The representation of information collections needs to be optimized for human cognition. Growing information collections play a crucial role in human experiences. While documents often include rich visual components, collections, including personal collections and those generated by search engines, are typically represented lists of text-only surrogates. By concurrently invoking complementary components of human cognition, combined image-text surrogates help people to more effectively see, understand, think about, and remember information collection. This research develops algorithmic methods that use the structural context of images in HTML documents to associate meaningful text and thus derive combined image-text surrogates.
Keywords: information extraction, search representation, surrogates
Designing exploratory search tasks for user studies of information seeking support systems BIBAKFull-Text 419-420
  Bill Kules; Robert Capra
This poster describes a procedure for designing exploratory tasks for use in laboratory evaluations of information seeking interfaces. This procedure is grounded in the literature on information seeking and information retrieval and has been refined by an evaluation of four tasks designed for a study of a faceted library catalog. The procedure is intended to be extensible to generate exploratory tasks for other types of interfaces and domains.
Keywords: n/a
Developing a review rubric for learning resources in digital libraries BIBKFull-Text 421-422
  Heather Leary; Sarah Giersch; Andrew Walker; Mimi Recker
Keywords: education digital library, instructional architect, national science digital library, review rubric
From harvesting to cultivating: transformation of a web collecting system into a robust curation environment BIBAKFull-Text 423-424
  Christopher A. Lee; Richard Marciano; Chien-yi Hou; Chirag Shah
Much has been written about the lifecycle of digital objects. This study is instead concerned with the lifecycle of collections and associated services. Online collection environments are built to fulfill specific collecting objectives and constraints. If a collection proves useful within its original hosting environment, it will often be necessary or desirable to move the collection to new environments, in order to support new forms of use and re-aggregation or extract resources from legacy data environments. Such a transformation can be extremely expensive, challenging and prone to error, especially if the collections include complex internal structures and services. When "services make the repository" [1], moving raw data from one location to another will often not be sufficient. Digital curators can preempt costly and problematic system migration efforts by integrating collections into environments specifically designed to support long-term preservation, scalability and interoperability [2]. We report on an integration of content and functionality of a feature-rich collecting environment (ContextMiner) into a robust data curation environment (iRODS).
   ContextMiner is a web-based service for building collections, through the execution and management of "campaigns" (i.e. sets of associated queries and parameters to harvest content over time). As a part of the VidArch project, we have been using the ContextMiner framework and services for harvesting YouTube videos and associated contextual information on a variety of topics. In July 2008, we released a public beta of ContextMiner, allowing anyone to run similar crawls. There are now more than 100 users. The current implementation -- based on a single MySQL database and associated code -- has served its intended purposes very well, but it is not a scalable or sustainable basis for offering wide-scale collecting services in support of the diverse array of potential users and use cases.
   iRODS (integrated Rule-Oriented Data System), is adaptive policy-driven data grid middleware, which addresses aspects of growth, evolution, openness, and closure -- fundamental requirements for digital preservation [3]. iRODS currently scales to hundreds of millions of files, tens of thousands of users, and petabytes of data. It operates in a highly distributed environment with heterogeneous storage resources and allows for growth through federation. It supports evolution through the virtualization of the underlying technology and supports changing business requirements through customization of repository behaviors. It supports openness through a data type agnostic treatment of content. iRODS can be instrumented with policies that support the management of the lifecycle of digital assets and will serve as a unique platform to study repository integration. One key feature is the automation of policy enforcement across distributed data that have been organized into a shared collection. The coupling of other open repositories and iRODS can create greater efficiencies and new types of repository services.
   We discuss various repository integration scenarios, their potential benefits, and implications for collection life cycles. The approaches co-locate metadata and content in varied ways and rely on efficiencies found in one repository only, or on the ability to combine policies in both spaces: (1) iRODS to ContexMiner data migration, (2) Policy-based data management for ContextMiner collections, and (3) Policy interchange between ContextMiner and iRODS collections.
Keywords: interoperable repositories
A semi-automatic system for managing multiple digital preservation risks of digital libraries in china BIBAKFull-Text 425-426
  Chao Li; Chunxiao Xing; Li Dong; Michael Bailou Huang
While many research projects in the world have been addressing challenges posed by digital preservation, digital libraries in China have their own native problems that have never been addressed before. Similar problems may occur in other countries, and their memory institutions may be less prepared to handle them. This poster analyses the requirements and challenges of digital libraries in China and describes an integrated and flexible digital preservation system -- AOMS.
Keywords: XML, digital preservation, integration, web service
What patrons want: supporting interaction for novice information seeking scholars BIBAKFull-Text 427-428
  Fernando Loizides; George R. Buchanan
In this paper, we undertake a study of inexperienced information seeking scholars, identifying areas for improvement in their electronic information seeking and document triage process[3]. We propose a software aid, currently under development.
Keywords: document triage, information seeking, novice users
Selective harvesting of regional digital libraries and national metadata aggregators BIBAKFull-Text 429-430
  Cezary Mazurek; Marcin Mielnicki; Marcin Werla
The poster presents the concept, implementation and practical application of the OAI-PMH protocol extension which allows OAI-PMH service providers to dynamically create and harvest sets of items from OAI-PMH data providers. The implementation of the presented concept is based on the encoding of dynamic set specifications in OAI-PMH requests with the CQL language. The extension was developed and widely applied in Poland and now it is used in several projects funded by the European Commission.
Keywords: CQL, OAI-PMH, interoperability, metadata access and distribution, metadata aggregation, selective metadata harvesting
User search behaviors within a library gateway BIBAKFull-Text 431-432
  William H. Mischo; Mary C. Schlembach; Michael A. Norman
This poster reports on user searching behavior within two information gateways developed at the University of Illinois at Urbana-Champaign Library. These gateways are built around a locally developed metasearch engine and are designed to assist users with search query formulation and modification. Search behavior data is being collected in custom transaction logs that gather user search arguments along with any system actions and contextual search assistance suggestions.
Keywords: metasearch, transaction logs, user searching behaviors
Users' adjustments to unsuccessful queries in biomedical search BIBAKFull-Text 433-434
  G. Craig Murray; Jimmy Lin; John Wilbur; Zhiyong Lu
Biomedical researchers depend on on-line databases and digital libraries for up to date information. We introduce a pilot project aimed at characterizing adjustments made to biomedical queries that improve search results. Specifically we focus on queries submitted to PubMedî, a large sophisticated search engine that facilitates Web access to abstracts of articles in over 5,200 biomedical journals. On average 2 million users search PubMed each day. During their search, nearly 20% will experience a result page from one of their queries that has zero results. In some cases there really is no document or abstract that will satisfy a particular query. However, in analyzing one month of queries submitted to PubMed, we find that more often than not, queries that retrieved no results are queries that would retrieve something relevant if they were constructed differently. This paper describes a new effort to identify some of the characteristics of a query that produces zero results, and the changes that users most often apply in constructing new, "corrected" queries. Zero-result queries afford us an opportunity to examine changes made to queries that we know did not return relevant data, because they did not return any data. An investigation of the changes users make under these circumstances can yield insight into users' search processes.
Keywords: PubMed, medical search, query reformulation, user modeling
Species identification: fish images with CBIR and annotations BIBKFull-Text 435-436
  Uma Murthy; Edward A. Fox; Yinlin Chen; Eric Hallerman; Ricardo Torres; Evandro J. Ramos; Tiago R. C. Falcao
Keywords: CBIR, fish species identification, image annotation, image retrieval, user study
Kindle usage among LIS students: an exploratory study BIBKFull-Text 437-438
  Debbie L. Rabina; Maria Cristina Pattuelli
Keywords: e-books, electronic publishing, social issues, user needs
Metababble: a clash of metadata cultures BIBAKFull-Text 439-440
  Monica Rivero; Geneva Henry
A tension exists between making digitized resources available to users quickly and providing detailed, item-level metadata and semantic markup that make those resources more discoverable. The Our Americas Archive Partnership (OAAP) project, funded by IMLS in the fall of 2007, is facing these challenges as the project progresses. This poster presents a summary of our approach and future thoughts about descriptive approaches for digital resources.
Keywords: TEI, digital library, metadata, minimal processing, social tagging
Evaluation of OAI-ORE via large-scale information topology visualization BIBAKFull-Text 441-442
  Robert Sanderson; Clare Llewellyn; Richard Jones
This poster evaluates the OAI-ORE specifications through experiments providing access to the JSTOR digital archive and the Flickr website. A browser-based dynamic graph visualization tool was designed and tested to determine if making the topology of the information available would provide end-user benefits in terms of navigation and discovery.
Keywords: OAI-ORE, linked data, visualization, web 2.0
Empirical analysis on chinese academic plagiarism BIBAKFull-Text 443-444
  Yang Shen; Huijuan Fu; Zitao Liu; Pengpeng Liu; Qingchuan Fu
This poster, from angels of subjects, authors' social network, authors' combination, and students' plagiarism law, apply self-developed ROST Anti-plagiarism Software to check 3781 papers, do a survey among 450 students, quantitatively analyzed academic plagiarism conditions in China, and draw several conclusions.
Keywords: ROST anti-plagiarism software, plagiarism law, social network
Adaptive personalized eLearning on top of existing LCMS BIBAKFull-Text 445-446
  Naimdjon Takhirov; Ingeborg T. Sølvberg
The next generation of eLearning systems should tailor the learning experience to each individual's learning needs and preferences. PEDAL-NG is a system that supports personalization in an existing, operational eLearning environment, based on prior knowledge and the learning style of users. It is built as a front-end of an existing LMS. The prototype is tested by a group of students. The test results are favorable regarding the personalized course and give valuable feedback for future research.
Keywords: eLearning, learning objects, personalization
User search characteristics on a specialized digital collection for domain- and task-specific information BIBAKFull-Text 447-448
  Xiaoya Tang
Domain-specialized digital collections have been growing rapidly in recent years. A good understanding of how users interact with such collections to accomplish domain-specific information tasks would help inform the design of effective systems. This study investigates users' interaction with a Web-based botanical collection by examining search logs recorded during an experiment. The findings indicate that while users' interactions with such collections demonstrate similar characteristics to those with general purpose search systems, they also demonstrate a domain- and task-specific nature.
Keywords: keyword search, query, terms, user study
MetRe: supporting the metadata revision process BIBAKFull-Text 449-450
  Emma Tonkin
MetRe is a prototype interface and service designed to support the metadata revision process. Improving consistency of metadata records within an environment is a common repository management task, due to potential for user error when submitting, as well as of other sources of error, such as systematic error resulting from the chosen deposit process. Evidence to support the metadata correction process may be gathered by automated metadata extraction tools, evidence from within the repository, or by comparison with best practice across the repository landscape. MetRe (Metadata Revision) is a prototype demonstrator that is able to identify several characteristic classes of error, twinned with an interface able to highlight several types of individual and systematic error, including a notion of local (intra-repository) and general (inter-repository) best practice.
Keywords: metadata
Finding centuries-old hyperlinks with a novel semi-supervised learning technique BIBAKFull-Text 451-452
  Xiaoyue Wang; Eamonn Keogh
Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system would not have the ubiquitous utility of text-based hyperlinks, there are several domains where it can potentially significantly augment textual information.
Keywords: historical digital libraries, historical manuscripts, hyperlinks, semi-supervised learning
Journal ranking based on social information BIBAKFull-Text 453-454
  Jinlong Wang; Ke Gao; Yongli Ren; Gang Li
Recently, literature analysis has become a hot issue in academic studies. In order to quantify the importance of journals and provide researchers with target vehicles for their work, this poster proposes a novel approach based on the social information through considering the potential relationship between journals quality and authors' affiliation. Based on the formula proposed in this work, the importance of journals can be estimated and ranked.
Keywords: journal ranking, mining, social information
The variety of ways in which instructors implement a modular digital library curriculum BIBAKFull-Text 455-456
  Barbara M. Wildemuth; Jeffrey P. Pomerantz; Sanghee Oh; Seungwon Yang; Edward A. Fox
With support from the National Science Foundation, researchers at Virginia Tech and the University of North Carolina developed a curriculum framework and a number of modules for instruction in the area of digital libraries. In 2008, 15 different modules were field tested by 11 instructors at 10 different institutions. As might be expected, instructors adapted these modules to fit the context of their courses, some of which are described here.
Keywords: computer science, curriculum development, digital libraries, education, instruction, library and information science
GRE: hybrid recommendations for NSDL collections BIBAKFull-Text 457-458
  Todd C. Will; Anand Srinivasan; Michael Bieber; Il Im; Vincent Oria; Yi-Fang (Brook) Wu
Recommendation systems have been proven to reduce the time and effort required by users to find relevant items, but there are only sporadic reports on their application in digital libraries. The General Recommendation Engine (GRE) is composed of the text search system Lucene augmented by the well-understood content based and collaborative filtering techniques and the first application of knowledge based recommendation in digital libraries to recommend items from 22 National Science Digital Library collections. In this study comprised of 60 subjects, the GRE outperformed the baseline system Lucene in all areas of evaluation.
Keywords: collaborative filtering, content based, digital libraries, knowledge based recommendation, recommendation systems, text search engine, user interface
Archiving the videogame industry: collecting primary materials of new media artifacts BIBAKFull-Text 459-460
  Megan A. Winget
This paper describes the initial deposits in The Videogame Archive at the Center for American History at the University of Texas at Austin.
Keywords: collection development, new media, video games
Analyzing user's book-loan behaviors in Peking university library from social network perspective BIBAKFull-Text 461-462
  Fei Yan; Ming Zhang; Tao Sun; Yang Lu; Naiyue Zhang; Long Xiao
In a university library, students from different background are connected by co-borrowing behaviors which form a knowledge sharing network. This poster presents a novel idea to study the users' book-loan behavior patterns (knowledge sharing patterns) from the social network perspective which enable us to understand the patterns in both the macro-level and micro-level analysis.
Keywords: digital library, log mining, social network analysis


An ajax-based digital music stand for greenstone BIBAKFull-Text 463-464
  David Bainbridge; Tim C. Bell
This extended abstract describes a digital music stand integrated with the Greenstone digital library software. It features text annotation and an animated fast-to-slow page wipe. Figure 1 illustrates both these features, although it is best appreciated in a live demonstration. Digital annotation provides a non-destructive alternative to a musician's habit of penciling in notes. In Figure 1, slightly over half way down the page, there is a note to watch the fingering. A user can have as many of these as they like, positioned anywhere on the page.
   The animated page wipe alleviates (somewhat) the issue of when to turn to the next page. Unlike its physical counterpart, where turning to the next page means you can no longer see the current page, with a digital music stand the next page can gradually be overlaid. The page transition occurring in Figure 1 can be seen as a marked horizontal bar not quite half-way down the page. The speed of the wipe is initially fast, but when it reaches the point where the scroll-bar marker is on the right-hand side of the page, it slows down significantly. This is to give the musician time to finishing playing the last line of the current page. In the event they have already finished playing that line, they will have naturally moved on to playing the top of the next page (which is already displayed).
   Rather than adopt a traditional client-side "helper" application for the digital music stand, we have integrated it within Greenstone using AJAX. For instance: next and previous pages are asynchronously loaded in the background; when generating a page, the dimensions of the user's screen is sent to the DL server so it can produce a version that maximizes the available space; and interactions such as adding an annotation, or altering the position of the animation-break are immediately stored as metadata associated with that document. Initially the animated page breaks are set to be between the last two staff systems. This is accomplished as part of the DL ingest process, leveraging off the staff detection step of Optical Music Recognition software.
Keywords: digital library integration, digital music stand
Accessing the densho and historymakers oral history collections via informedia technologies BIBAKFull-Text 465-466
  Michael G. Christel; Robert V. Baron; Geoff Froh; Dan Benson; Julieanna Richardson
Densho is a nonprofit organization started in 1996 with the goal of documenting oral histories from Japanese Americans who were incarcerated during World War II. The HistoryMakers is a nonprofit established in 1999 with the goal of documenting video life oral history interviews highlighting the accomplishments of individual African Americans and African-American-led groups and movements. Both collections share the goal of broader, deeper use of the oral history content through digitization and automated processing where appropriate. This demonstration showcases the application of Carnegie Mellon Informedia digital video library processing and interfaces to enhance access into the interview segments.
Keywords: digital video library, oral histories, video retrieval
Text mining for indexing BIBAKFull-Text 467-468
  Judith Gelernter; Michael Lesk
This paper describes techniques for automatically extracting and classifying maps found within articles. The process uses image analysis to find text in maps, document structure to find captions and titles, and then text mining to assign each map to a subject category, a geographical place, and a time period. The text analysis is based on authority lists taken from gazetteers and from library classifications.
Keywords: automatic classification, content analysis and indexing, text mining
Our Americas archive partnership demonstration BIBAKFull-Text 469-470
  Geneva Henry; Monica Rivero
The Our Americas Archive Partnership (OAAP) project is in year 2 of a 3-year IMLS funded grant led by Rice University in Partnership with the University of Maryland's Maryland Institute for Technology in the Humanities (MITH). Designed to meet the needs of American studies scholars researching the Americas from a hemispheric perspective, OAAP is developing an integrated framework for the discovery of digital resources that are managed in heterogeneous distributed repositories. This demonstration will show the current state of the project's common interface to support resource discovery.
Keywords: OAI, TEI, american studies, dspace, harvesting, repository, semantic markup, social tagging, tag cloud
Mapping life events: temporal and geographic context for biographical information BIBAKFull-Text 471-472
  Ray R. Larson; Ryan Shaw
Digital Libraries often fail to connect their contents to the wider context of information resources available that are about the same persons, related persons, places, or time periods and the events that happen to those persons, at those places and in a given time period. This demonstration will show prototype systems that can perform these tasks, linking the user to relevant contextual information.
Keywords: geographic information retrieval
Virtual DL poster sessions in second life BIBAKFull-Text 473-474
  Spencer J. Lee; Edward A. Fox; Gary Marchionini; Javier Velacso; Gonçalo Antunes; José Borbinha
In Second Life (SL), a popular general-purpose 3D virtual world, we are supporting the Digital Library community in a variety of ways, including through virtual poster sessions. This brings together the interests of those involved in JCDL 2009, IEEE-TCDL, and NSF-supported work in SL aimed to assist education, training, and dissemination in the digital preservation area.
Keywords: 3D, digital preservation, second life, tele-presence, virtual world
ContextMiner: building context-rich digital collections BIBKFull-Text 475-476
  Chirag Shah
Keywords: contextual information, digital curation, digital preservation
Using university collections in digital library education BIBKFull-Text 477-478
  Quinn Stewart; David Todd
Keywords: digital library curriculum, digitization, rich-media
A curriculum customization service BIBAKFull-Text 479-480
  Tamara Sumner; Holly Devaul; Lynne Davis; John Weatherley
We demonstrate a prototype Curriculum Customization Service designed and developed with significant teacher input. This prototype illustrates a model for embedding digital library resources into mainstream classroom use. A 10 week pilot study suggests that this Service can increase teachers' use of digital library resources in their class, and encourage them to use resources to customize instruction.
Keywords: customizing instruction, differentiated instruction, educational digital libraries, personalization, science education
XEB: a markup language document container format suitable for handheld devices BIBAKFull-Text 481-482
  Zhi Tang; Liangcai Gao; Aixia Jia; Xiaofan Lin
We propose a new document container format (XEB, eXtensible Electronic Book) based on block mechanism to efficiently process markup language documents in handheld devices. And random document access is also supported in the format through a pagination mechanism. The format has already been applied to a number of handheld devices' Chinese E-book readers and XEB documents can be downloaded from a Chinese E-book store.
Keywords: document parsing, handheld device, markup language document
AskDragon: a redundancy-based factoid question answering system with lightweight local context analysis BIBAKFull-Text 483-484
  Xiaohua Zhou; Palakorn Achananuparp; E. K. Park; Xiaohua Hu; Xiaodan Zhang
We introduce our QA system AskDragon which employs a novel lightweight local context analysis technique to handling two broad classes of factoid questions, entity and numeric questions. The local context analysis module dramatically improves the efficiency of QA systems without sacrificing high accuracy performance.
Keywords: answer generation, answer scoring, local context analysis, question answering, redundancy-based approach
Knowledge extraction and integration for semi-structural information in digital libraries BIBKFull-Text 485-486
  Wenhao Zhu; Baogang Wei; Jiangqin Wu; Shaomin Shi; Yan Yang
Keywords: digital libraries, digitized textbook, information extraction