%M C.DL.09.1 %T Science teachers' use of online resources and the digital library for Earth system education %S Session 1 %A Barker, Lecia J. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 1-10 %K K12, educational digital libraries, empirical, evaluation, mixed-method, teaching %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555402 %X A three-part study of teachers' use of online resources and of the Digital Library for Earth System Education (DLESE) was conducted from 2004 through summer 2006. The first two phases were qualitative and informed a survey administered to 622 science teachers across the U.S., one-fifth of whom had used DLESE. The findings present a profile of teachers and their access to Internet-connected computers and other hardware/electronic media devices in their classrooms; and teachers' preferences for resource formats (e.g., customizability) and educational web site features (e.g., tagged reading level). Analysis of variance showed that teachers with more than one working computer and teachers with more other devices valued the Internet more highly for teaching than did their less equipped peers. DLESE users valued the Internet more highly for their teaching, had more years teaching experience, and valued customizable resources more than their non-DLESE using peers. Most believed that resources catalogued in DLESE were scientifically accurate. Teachers used DLESE most often for finding hands-on activities, still images and other visual aids, and hand-outs; they were least likely to seek people, games, or assessment tools. The findings provide guidance for developers of K12 educational resources. %M C.DL.09.11 %T Dimensional standard alignment in K-12 digital libraries: assessment of self-found vs. recommended curriculum %S Session 1 %A Marshall, Byron %A Reitsma, René %A Zarske, Malinda %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 11-14 %K context-specific measurement, curriculum-standard alignment, digital library, inter-rater reliability, relevance %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555403 %X Enhancing the experience of digital library users depends, in part, on recognizing and understanding user tasks. In the context of K-12 educational libraries this means that we must understand how K-12 teachers interact with such libraries and how they assess the relevance of documents found or encountered. This paper presents the results of an experiment in which K-12 teachers scored the relevance of curriculum they found themselves and the relevance of documents their colleagues found and recommended. We found that teachers apply a significantly more detailed notion of relevance, both qualitatively and quantitatively, when searching for as compared to evaluating recommended curricula. Differences were observed in both relevance judgments and system interaction logs. These variations may be useful in identifying user intent and in dynamically adapting the behavior of digital libraries of educational material. %M C.DL.09.15 %T Helping students with information fragmentation, assimilation and notetaking %S Session 1 %A Reimer, Yolanda Jacobs %A Bubnash, Melissa %A Hagedal, Matthew %A Wolf, Peter %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 15-18 %K PIM, information assimilation, information fragmentation, notetaking, students in higher education, user interface design %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555404 %X The problem of information fragmentation is especially acute for today's college students who manage and assimilate information in various forms while completing many of their academic tasks, and who must do so within the confines of standard software applications. The goal of this research is to provide students with a novel information assimilation and notetaking tool that helps them more efficiently manage their electronic information and overcome some of the fragmentation challenges they routinely experience. Our Global Information Gatherer prototype allows students to view, edit and store files of different types from within a single interface, and provides an integrated web browser and notetaking functionality. %M C.DL.09.19 %T Topic model methods for automatically identifying out-of-scope resources %S Session 1 %A Bethard, Steven %A Ghosh, Soumya %A Martin, James H. %A Sumner, Tamara %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 19-28 %K digital libraries, machine learning, relevance, scope, topics %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555405 %X Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover topics that fit within the theme of the library. We show that such scope judgments can be automated using a combination of text classification techniques and topic modeling. Our models address two significant challenges in making scope judgments: only a small number of out-of-scope resources are typically available, and the topic distinctions required for digital libraries are much more subtle than classic text classification problems. To meet these challenges, our models combine support vector machine learners optimized to different performance metrics and semantic topics induced by unsupervised statistical topic models. Our best model is able to distinguish resources that belong in DLESE from resources that don't with an accuracy of around 70%. We see these models as the first steps towards increasing the scalability of digital libraries and dramatically reducing the workload required to maintain them. %M C.DL.09.29 %T Automatically generating high quality metadata by analyzing the document code of common file types %S Session 1 %A Edvardsen, Lars Fredrik Høimyr %A Sølvberg, Ingeborg Torvik %A Aalberg, Trond %A Trætteberg, Hallvard %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 29-38 %K PDF, automatic metadata generation, document code, extraction, harvesting, latex, metadata quality, openXML, powerpoint, word %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555406 %X A major challenge for content management in intranets and other large scale document storage and retrieval services is the generation of high quality metadata. Manual generation of metadata is resource demanding and is often viewed by collection managers and document authors as inefficient use of their time, and there is a desire for other ways to create the needed metadata. Automatic Metadata Generation (AMG) is methods for generating metadata without manual interaction using computer program(s) to interpret the document and possibly the document context. Current AMG research has been limited to collection of similarly formatted documents. The research presented in this paper expands the field of AMG by presenting an approach that is independent of a common visualization scheme; AMG based on document code analysis. This is done by showing AMG possibilities from Latex, Word and PowerPoint documents and how this approach can significantly increase the quality of the generated metadata. This by avoiding common quality reducing factors as missing completeness, low accuracy, logical consistency and coherence and timeliness by giving AMG algorithms direct access to the user specified intellectual content and the file formatting. This research shows how this AMG approach can be combined with other AMG approaches, drawing on their strengths in order to achieve the desired high quality metadata entities. %M C.DL.09.39 %T Disambiguating authors in academic publications using random forests %S Session 2 %A Treeratpituk, Pucktada %A Giles, C. Lee %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 39-48 %K author disambiguation, medline, random forests %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555408 %X Users of digital libraries usually want to know the exact author or authors of an article. But different authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital library to differentiate among these authors. Name disambiguation can help in many cases; one being a user in a search of all articles written by a particular author. Disambiguation also enables better bibliometric analysis by allowing a more accurate counting and grouping of publications and citations. In this paper, we describe an algorithm for pair-wise disambiguation of author names based on a machine learning classification algorithm, random forests. We define a set of similarity profile features to assist in author disambiguation. Our experiments on the Medline database show that the random forest model outperforms other previously proposed techniques such as those using support-vector machines (SVM). In addition, we demonstrate that the variable importance produced by the random forest model can be used in feature selection with little degradation in the disambiguation accuracy. In particular, the inverse document frequency of author last name and the middle name's similarity alone achieves an accuracy of almost 90%. %M C.DL.09.49 %T Using web information for author name disambiguation %S Session 2 %A Pereira, Denilson Alves %A Ribeiro-Neto, Berthier %A Ziviani, Nivio %A Laender, Alberto H. F. %A Gonçalves, Marcos André %A Ferreira, Anderson A. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 49-58 %K author name disambiguation, bibliographic citation, search engine %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555409 %X In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method. %M C.DL.09.59 %T Whetting the appetite of scientists: producing summaries tailored to the citation context %S Session 2 %A Wan, Stephen %A Paris, Cécile %A Dale, Robert %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 59-68 %K biomedical researchers, information browsing, information needs, scientific literature, summarization, user modeling and interactive ir %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555410 %X The amount of scientific material available electronically is forever increasing. This makes reading the published literature, whether to stay up-to-date on a topic or to get up to speed on a new topic, a difficult task. Yet, this is an activity in which all researchers must be engaged on a regular basis. Based on a user requirements analysis, we developed a new research tool, called the Citation-Sensitive In-Browser Summariser (CSIBS), which supports researchers in this browsing task. CSIBS enables readers to obtain information about a citation at the point at which they encounter it. This information is aimed at enabling the reader to determine whether or not to invest the time in exploring the cited article further, thus alleviating information overload. CSIBS builds a summary of the cited document, bringing together meta-data about the document and a citation-sensitive preview that exploits the citation context to retrieve the sentences from the cited document that are relevant at this point. This paper briefly presents our user requirements analysis, then describes the system and, finally, discusses the observations from an initial pilot study. We found that CSIBS facilitates the relevancy judgment task, by increasing the users' self-reported confidence in making such judgements. %M C.DL.09.69 %T Finding topic trends in digital libraries %S Session 2 %A Bolelli, Levent %A Ertekin, Seyda %A Zhou, Ding %A Giles, C. Lee %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 69-72 %K latent dirichlet allocation, topic detection, trend analysis %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555411 %X We propose a generative model based on latent Dirichlet allocation for mining distinct topics in document collections by integrating the temporal ordering of documents into the generative process. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. We conduct experiments on the collection of academic papers from CiteSeer repository. We augment the text corpus with the addition of user queries and tags and integrate the citation graph to boost the weight of the topical terms. The experiment results show that segmented topic model can effectively detect distinct topics and their evolution over time. %M C.DL.09.73 %T CEBBIP: a parser of bibliographic information in chinese electronic books %S Session 2 %A Gao, Liangcai %A Tang, Zhi %A Lin, Xiaofan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 73-76 %K bibliography, chinese electronic book, digital library, machine learning, metadata extraction %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555412 %X Bibliographic information is essential for many digital library applications, such as citation analysis, academic searching and topic discovery. And bibliographic data extraction has attracted a great deal of attention in recent years. In this paper, we address the problem of automatic extraction of bibliographic data in Chinese electronic book and propose a tool called CEBBIP* for the task, which includes three main systems: data preprocessing, data parsing and data postprocessing. In the data preprocessing system, the tool adopts a rules-based method to locate citation data in a book and to segment citation data into citation strings of individual referencing literature. And a learning-based approach, Conditional Random Fields (CRF), is employed to parse citation strings in the data parsing system. Finally, the tool takes advantage of document intrinsic local format consistency to enhance citation data segmentation and parsing through clustering techniques. CEBBIP has been used in a commercial E-book production system. Experimental results show that CEBBIP's precision rate is very high. More specially, adopting the document intrinsic local format consistency obviously improves the citation data segmenting and parsing accuracy. %M C.DL.09.77 %T Query parameters for harvesting digital video and associated contextual information %S Session 3 %A Marchionini, Gary %A Shah, Chirag %A Lee, Christopher A. %A Capra, Robert %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 77-86 %K digital curation, harvesting, video mining %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555414 %X Video is increasingly important to digital libraries and archives as both primary content and as context for the primary objects in collections. Services like YouTube not only offer large numbers of videos but also usage data such as comments and ratings that may help curators today make selections and aid future generations to interpret those selections. A query-based harvesting strategy is presented and results from daily harvests for six topics defined by 145 queries over a 20-month period are discussed with respect to, query specification parameters, topic, and contribution patterns. The limitations of the strategy and these data are considered and suggestions are offered for curators who wish to use query-based harvesting. %M C.DL.09.87 %T ViGOR: a grouping oriented interface for search and retrieval in video libraries %S Session 3 %A Halvey, Martin %A Vallet, David %A Hannah, David %A Jose, Joemon M. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 87-96 %K search, user studies, video, visualisation %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555415 %X In this paper, we present ViGOR (Video Grouping, Organisation and Retrieval) a video retrieval system that allows users to group videos in order to facilitate video retrieval tasks. In this way users are able to visualise and conceptualise many aspects of their search tasks and carry out a localised search in order to solve a more global search problem. The main objective of this work is to aid users while carrying out explorative video retrieval tasks; these tasks can be often ambiguous and multi-faceted. Two user evaluations were carried out in order to evaluate the usefulness of this grouping paradigm for assisting users. The first evaluation involved users carrying out broad tasks on YouTube, and gave insights into the application of our interface to a vast online video collection. The second evaluation involved users carrying out focused tasks on the TRECVID 2007 video collection, allowing a comparison over a local collection, on which we could extract a number of content-based features. The results of our evaluations show that the use of the ViGOR system results in an increase in user performance and user satisfaction, showing the potential of a grouping paradigm for video search for various tasks in a variety of diverse video collections. %M C.DL.09.97 %T Developing a flexible content model for media repositories: a case study %S Session 3 %A Beer, Christopher A. %A Pinch, Peter D. %A Cariani, Karen %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 97-100 %K content model, digital libraries, fedora, media %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555416 %X This article describes the process and challenges of developing a content model that can support the content and metadata present in a complex media archive. Media archives have some of the most diverse requirements in an effort to catalog, preserve, and make accessible a wide range of content with multifaceted relationships between works. We focus particularly on the design and implementation of the WGBH Media Library and Archives' Fedora digital access repository for scholars, educational users and the public. It is our hope that the process and findings from this work can support the architecture and development of other media archives. %M C.DL.09.101 %T An alignment based system for chord sequence retrieval %S Session 3 %A Hanna, Pierre %A Robine, Matthias %A Rocher, Thomas %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 101-104 %K music information retrieval %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555417 %X Music retrieval systems for Western tonal music digital libraries have to consider rhythmic, timbral, melodic and harmonic information. Most existing retrieval systems only take into account melodies. Melody comparison may induce errors since two musical pieces can be very similar whereas their melodies may differ in a significant way. In this paper, we propose to investigate and experiment a retrieval system based on the comparison of chord progressions. The definition of chords may be ambiguous but their properties can be precisely described and represented. We detail the adaptations of alignment algorithms, successfully applied for the estimation of symbolic melodic similarity, for chord progression retrieval. Several experiments, performed on symbolic databases, show that the system described is robust to variations and outperforms a recent chord retrieval system. %M C.DL.09.105 %T Query-page intention matching using clicked titles and snippets to boost search rankings %S Session 4 %A Murata, Masaya %A Toda, Hiroyuki %A Matsuura, Yumiko %A Kataoka, Ryoji %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 105-114 %K click logs analysis, implicit relevance feedback, representation of intention, search result rankings %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555419 %X Users of text retrieval systems input only a few keywords or sometimes just one keyword to the systems even if they had complex information needs. Due to the lack of query keywords, it becomes hard to return relevant search results that satisfy the demands of each user. Because digital documents, in contrast to queries, are generally composed of many kinds of keywords, it is also difficult to estimate the main topic or grasp the inherent intentions of the documents. In this paper, we present techniques to represent users' search intentions and the intentions that digital documents can satisfy by making use of clicked titles and snippets acquired from a click log analysis. We then present a method to match these intentions to boost search result rankings. Through experiments that use click logs and indexes of a commercial search engine, we verified our method's capability of significantly improving search precision. %M C.DL.09.115 %T Supporting analysis of future-related information in news archives and the web %S Session 4 %A Jatowt, Adam %A Kanazawa, Kensuke %A Oyama, Satoshi %A Tanaka, Katsumi %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 115-124 %K event prediction, future-related information retrieval, temporal information analysis %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555420 %X A lot of future-related information is available in news articles or Web pages. This information can however differ to large extent and may fluctuate over time. It is therefore difficult for users to manually compare and aggregate it, and to re-construct the most probable course of future events. In this paper we approach a problem of automatically generating summaries of future events related to queries using data obtained from news archive collections or from the Web. We propose two methods, explicit and implicit future-related information detection. The former is based on analyzing the context of future temporal expressions in documents, while the latter relies on detecting periodical patterns in historical document collections. We present a graph-based visualization of future-related information and demonstrate its usefulness through several examples. %M C.DL.09.125 %T Generalized formal models for faceted user interfaces %S Session 5: best paper nominees 1 %A Clarkson, Edward C. %A Navathe, Shamkant B. %A Foley, James D. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 125-134 %K ER model, design survey, entity-relationship model, faceted metadata, faceted navigation, relational model, tuple relational calculus %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555422 %X Faceted metadata and navigation have become major topics in library science, information retrieval and Human-Computer Interaction (HCI). This work surveys a range of extant approaches in this design space, classifying systems along several relevant dimensions. We use that survey to analyze the organization of data and its querying within faceted browsing systems. We contribute formal entity-relationship (ER) and relational data models that explain that organization and relational query models that explain systems' browsing functionality. We use these types of models since they are widely used to conceptualize data and to model back-end data stores. Their structured nature also suggests ways in which both the models and faceted systems might be extended. %M C.DL.09.135 %T Large-scale ETD repositories: a case study of a digital library application %S Session 5: best paper nominees 1 %A Mikeal, Adam %A Creel, James %A Maslov, Alexey %A Phillips, Scott %A Leggett, John %A McFarland, Mark %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 135-144 %K digital library infrastructure, electronic document workflow, electronic theses and dissertations, scalable systems %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555423 %X We describe the implementation of a statewide system for managing and preserving electronic theses and dissertations (ETDs) from Texas universities. We further explain the theoretical, technical and political issues that arose during the implementation of this system. These issues range from technical components developed by TDL 'such as a customized workflow management application and adding OAI-ORE capabilities to DSpace' to human-centered issues such as stakeholder engagement and participation. Our experiences reflect the challenges, expected and unexpected, that others will face when attempting to build digital library applications to scale. %M C.DL.09.145 %T Style-consistency calligraphy synthesis system in digital library %S Session 6: best paper nominees 2 %A Yu, Kai %A Wu, Jiangqin %A Zhuang, Yueting %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 145-152 %K structure determination, style evaluation model (SEM), style-consistency calligraphy synthesis %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555425 %X There are lots of digitized calligraphy works written by ancient famous calligraphists in CADAL (China-America Digital Academic Library) digital library. To make use of these resources, users want to generate a tablet or a piece of calligraphic works written by some ancient famous calligraphist. But some characters in the tablet or the calligraphic work hadn't been written by the calligraphist or though were ever written but are hard to read because of long time weathering. In this paper, a novel approach is proposed to synthesize Chinese calligraphic characters which are in the same style of some calligraphist, and a corresponding system is developed for calligraphy works generation and tablets design. Calligraphic character is represented by a three-level hierarchical model. A novel approach for determining the character structure is proposed, which takes advantage of both the structure of the same characters of different styles and the structure of similar characters of the same style. A style evaluation model (SEM) is presented to evaluate whether the calligraphic character generated is in the same style of the specified calligraphist and to adjust the calligraphic character generated. Our experiments show that this system is effective. %M C.DL.09.153 %T Generative model-based metasearch for data fusion in information retrieval %S Session 6: best paper nominees 2 %A Efron, Miles %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 153-162 %K data fusion, digital libraries, generative models, information retrieval, metasearch, probabilistic models %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555426 %X "Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital library setting data fusion may support operations such as federated search based on multiple repository representations. This paper presents a novel approach to the fusion problem: generative model-based Metasearch (GeM). We suggest viewing the appearance of documents in a return set as the outcome of a probabilistic process; some documents are likely to occur in the model, while others are unlikely. Using Bayesian parameter estimation to fit a multinomial distribution based on the return sets to be merged, GeM achieves a final ranking by listing documents in decreasing probability of generation under the induced model. We also introduce what we call "the impatient reader" approach to normalizing document ranks in service to the fusion operation. We report results from several experiments on TREC data suggesting that GeM, informed with impatient reader document scores, operates at state-of-the-art levels of effectiveness. %M C.DL.09.163 %T EnTag: enhancing social tagging for discovery %S Session 6: best paper nominees 2 %A Golub, Koraljka %A Moon, Jim %A Tudhope, Douglas %A Jones, Catherine %A Matthews, Brian %A PuzoD, BartBomiej %A Nielsen, Marianne Lykke %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 163-172 %K ACM computing classification scheme, controlled vocabularies, dewey decimal classification, digital collection, folksonomies, institutional repository, intute, social tagging, subject indexing %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555427 %X The EnTag (Enhanced Tagging for Discovery) project investigated the effect on indexing and retrieval when using only social tagging versus when using social tagging in combination with suggestions from a controlled vocabulary. Two different contexts were explored: tagging by readers of a digital collection and tagging by authors in an institutional repository; also two different controlled vocabularies were examined, Dewey Decimal Classification and ACM Computing Classification Scheme. For each context a separate demonstrator was developed and a user study conducted. The results showed the importance of controlled vocabulary suggestions for both indexing and retrieval: to help produce ideas of tags to use, to make it easier to find focus for the tagging, as well as to ensure consistency and increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both in terms of conceptual relevance to the user and in appropriateness of the terminology. The participants themselves could also see the advantages of controlled vocabulary terms for retrieval if the terms used were from an authoritative source. %M C.DL.09.173 %T Review-oriented metadata enrichment: a case study %S Session 6: best paper nominees 2 %A Zhang, Liang %A Wu, Jiangqin %A Zhuang, Yueting %A Zhang, Yin %A Yang, Chenxing %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 173-182 %K book review, digital libraries, diversity, graph-based scoring, keyword extraction, metadata, metadata enrichment %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555428 %X Book reviews contributed by readers in social sites contain valuable information on books' content, style and merit, many informative words in which can be used to enrich metadata of books in China-Us Million Book Digital Library. In this paper, we present a system for review-oriented metadata enrichment and propose an Book-Centric Diverse Random Walk algorithm on a four-partite graph containing three kinds of relations among authors, books, reviews and words, in order to produce highly relevant as well as diverse keywords for a book. Experimental results of a user study show that our approach significantly outperforms other methods in terms of relevance and diversity. The metadata generated by our approach also has a large overlap with popular social tags and brief introductions from DouBan for books in the coverage experiments. %M C.DL.09.183 %T Using timed-release cryptography to mitigate the preservation risk of embargo periods %S Session 7 %A Haq, Rabia %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 183-192 %K cryptography, repositories, time lock, timed release %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555430 %X Due to temporary access restrictions, embargoed data cannot be refreshed to unlimited parties during the embargo time interval. A solution to mitigate the risk of data loss has been developed that uses a data dissemination framework, the Timed-Locked Embargo Framework (TLEF), that allows data refreshing of encrypted instances of embargoed content in an open, unrestricted scholarly community. TLEF exploits implementations of existing technologies to "time-lock" data using timed-release cryptology so that TLEF can be deployed as digital resources encoded in a complex object format suitable for metadata harvesting. The framework successfully demonstrates dynamic record identification, time-lock puzzle encryption, encapsulation and dissemination as XML documents. We implement TLEF and provide a quantitative analysis of its successful data harvest of time-locked embargoed data with minimum time overhead without compromising data security and integrity. %M C.DL.09.193 %T Learning to assess the quality of scientific conferences: a case study in computer science %S Session 7 %A Martins, Waister Silva %A Gonçalves, Marcos André %A Laender, Alberto H. F. %A Pappa, Gisele L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 193-202 %K classification, conference assessment, digital library, machine learning %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555431 %X Assessing the quality of scientific conferences is an important and useful service that can be provided by digital libraries and similar systems. This is specially true for fields such as Computer Science and Electric Engineering, where conference publications are crucial. However, the majority of the existing approaches for assessing the quality of publication venues has been proposed for journals. In this paper, we characterize a large number of features that can be used as criteria to assess the quality of scientific conferences and study how these several features can be automatically combined by means of machine learning techniques to effectively perform this task. Within the features studied are citations, submission and acceptance rates, tradition of the conference, and reputation of the program committee members. Among our several findings, we can cite that: (1) separating high quality conferences from medium and low quality ones can be performed quite effectively, but separating the last two types is a much harder task; and (2) citation features followed by those associated with the tradition of the conference are the most important ones for the task. %M C.DL.09.203 %T CARES: a ranking-oriented CADAL recommender system %S Session 7 %A Yang, Chenxing %A Wei, Baogang %A Wu, Jiangqin %A Zhang, Yin %A Zhang, Liang %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 203-212 %K collaborative filtering, digital library, recommendation system %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555432 %X A recommender system is useful for a digital library to suggest the books that are likely preferred by a user. Most recommender systems using collaborative filtering approaches leverage the explicit user ratings to make personalized recommendations. However, many users are reluctant to provide explicit ratings, so ratings-oriented recommender systems do not work well. In this paper, we present a recommender system for CADAL digital library, namely CARES, which makes recommendations using a ranking-oriented collaborative filtering approach based on users' access logs, avoiding the problem of the lack of user ratings. Our approach employs mean AP correlation coefficients for computing similarities among users' implicit preference models and a random walk based algorithm for generating a book ranking personalized for the individual. Experimental results on real access logs from the CADAL web site show the effectiveness of our system and the impact of different values of parameters on the recommendation performance. %M C.DL.09.213 %T Recommendation as link prediction: a graph kernel-based machine learning approach %S Session 7 %A Li, Xin %A Chen, Hsinchun %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 213-216 %K collaborative filtering, kernel methods, recommender system %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555433 %X Recommender systems have demonstrated commercial success in multiple industries. In digital libraries they have the potential to be used as a support tool for traditional information retrieval functions. Among the major recommendation algorithms, the successful collaborative filtering (CF) methods explore the use of user-item interactions to infer user interests. Based on the finding that transitive user-item associations can alleviate the data sparsity problem in CF, multiple heuristic algorithms were designed to take advantage of the user-item interaction networks with both direct and indirect interactions. However, the use of such graph representation was still limited in learning-based algorithms. In this paper, we propose a graph kernel-based recommendation framework. For each user-item pair, we inspect its associative interaction graph (AIG) that contains the users, items, and interactions n steps away from the pair. We design a novel graph kernel to capture the AIG structures and use them to predict possible user-item interactions. The framework demonstrates improved performance on an online bookstore dataset, especially when a large number of suggestions are needed. %M C.DL.09.217 %T A polyrepresentational approach to interactive query expansion %S Session 7 %A Diriye, Abdigani %A Blandford, Ann %A Tombros, Anastasios %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 217-220 %K query formulation, interactive query expansion %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555434 %X Interactive Query Expansion (IQE) presents suggested terms to the user during their search to enable better Information Retrieval (IR). However, IQE terms are poorly used, and tend to lack information meaningful to the user. The lack of cognitive and functional support during query refinement is a well documented problem, and despite the work carried out, it is still an under researched area. This stagnation in progress has been partly due to the long held belief that users are able to make good IQE term selections, and that the de facto way IQE terms are presented is effective. In this paper, we introduce a novel method to improve the presentation of IQE terms by providing supplementary information alongside them. We describe a user study that compared our novel polyrepresentational approach to IQE against a conventional IQE system and a baseline system. Our findings have shown that a polyrepresentational approach to IQE can address the ambiguity and uncertainty surrounding IQE, and improve the perceived usefulness of the terms. %M C.DL.09.221 %T Automatically characterizing resource quality for educational digital libraries %S Session 8: best paper finalists %A Bethard, Steven %A Wetzer, Philipp %A Butcher, Kirsten %A Martin, James H. %A Sumner, Tamara %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 221-230 %K educational digital library, learning resource, machine learning, quality %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555436 %X With the rise of community-generated web content, the need for automatic characterization of resource quality has grown, particularly in the realm of educational digital libraries. We demonstrate how identifying concrete factors of quality for web-based educational resources can make machine learning approaches to automating quality characterization tractable. Using data from several previous studies of quality, we gathered a set of key dimensions and indicators of quality that were commonly identified by educators. We then performed a mixed-method study of digital library curation experts, showing that our characterization of quality captured the subjective processes used by the experts when assessing resource quality for classroom use. Using key indicators of quality selected from a statistical analysis of our expert study data, we developed a set of annotation guidelines and annotated a corpus of 1000 digital resources for the presence or absence of these key quality indicators. Agreement among annotators was high, and initial machine learning models trained from this corpus were able to identify some indicators of quality with as much as an 18% improvement over the baseline. %M C.DL.09.231 %T Improving optical character recognition through efficient multiple system alignment %S Session 8: best paper finalists %A Lund, William B. %A Ringger, Eric K. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 231-240 %K A* algorithm, OCR error rate reduction, text alignment %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555437 %X Individual optical character recognition (OCR) engines vary in the types of errors they commit in recognizing text, particularly poor quality text. By aligning the output of multiple OCR engines and taking advantage of the differences between them, the error rate based on the aligned lattice of recognized words is significantly lower than the individual OCR word error rates. This lattice error rate constitutes a lower bound among aligned alternatives from the OCR output. Results from a collection of poor quality mid-twentieth century typewritten documents demonstrate an average reduction of 55.0% in the error rate of the lattice of alternatives and a realized word error rate (WER) reduction of 35.8% in a dictionary-based selection process. As an important precursor, an innovative admissible heuristic for the A* algorithm is developed, which results in a significant reduction in state space exploration to identify all optimal alignments of the OCR text output, a necessary step toward the construction of the word hypothesis lattice. On average 0.0079% of the state space is explored to identify all optimal alignments of the documents. %M C.DL.09.241 %T No bull, no spin: a comparison of tags with other forms of user metadata %S Session 8: best paper finalists %A Marshall, Catherine C. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 241-250 %K collaborative information management, image collection, metadata, study, tags %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555438 %X User-contributed tags have shown promise as a means of indexing multimedia collections by harnessing the combined efforts and enthusiasm of online communities. But tags are only one way of describing multimedia items. In this study, I compare the characteristics of public tags with other forms of descriptive metadata'titles and narrative captions'that users have assigned to a collection of very similar images gathered from the photo-sharing service Flickr. The study shows that tags converge on different descriptions than the other forms of metadata do, and that narrative metadata may be more effective than tags for capturing certain aspects of images that may influence their subsequent retrieval and use. The study also examines how photographers use peoples' names to personalize the different types of metadata and how they tell stories across short sequences of images. The study results are then brought to bear on design recommendations for user tagging tools and automated tagging algorithms and on using photo sharing sites as de facto art and architecture resources. %M C.DL.09.251 %T What happens when facebook is gone? %S Session 9 %A McCown, Frank %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 251-254 %K digital preservation, personal archiving, social networks %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555440 %X Web users are spending more of their time and creative energies within online social networking systems. While many of these networks allow users to export their personal data or expose themselves to third-party web archiving, some do not. Facebook, one of the most popular social networking websites, is one example of a "walled garden" where users' activities are trapped. We examine a variety of techniques for extracting users' activities from Facebook (and by extension, other social networking systems) for the personal archive and for the third-party archiver. Our framework could be applied to any walled garden where personal user data is being locked. %M C.DL.09.255 %T Improving historical research by linking digital library information to a global genealogical database %S Session 9 %A Kennard, Douglas J. %A Lund, William B. %A Morse, Bryan S. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 255-258 %K authority control, diaries, family history, genealogy, historical social networks, journals, tagging %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555441 %X Journals, letters, and other writings are of great value to historians and those who research their own family history; however, it can be difficult to find writings by specific people, and even harder to find what others wrote about them. We present a prototype web-based system that enables users to discover information about historical people (including their own ancestors) by linking digital library content to unique PersonIDs from a genealogical database. Users can contribute content such as scanned journals or information about where items can be found. They can also transcribe content and tag it with PersonIDs to identify who it is about. Additional features provide tools for users to explore historical contexts and relationships. These include the ability to tag places and to create a historical social network by specifying non-family relationships or by using a mechanism we call rosters to imply participation in some group or event. %M C.DL.09.259 %T Collecting fragmentary authors in a digital library %S Session 9 %A Berti, Monica %A Romanello, Matteo %A Babeu, Alison %A Crane, Gregory %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 259-262 %K digital libraries, fragmentary authors, greek fragmentary historians, tei p5 guidelines, xml %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555442 %X This paper discusses new work to represent, in a digital library of classical sources, authors whose works themselves are lost and who survive only where surviving authors quote, paraphrase or allude to them. It describes initial works from a digital collection of such fragmentary authors designed not only to capture but to extend the ontologies that traditional scholarship has developed over generations: the aim is representing every nuance of print conventions while using the capabilities of digital libraries to extend our ability to identify fragments, to represent what we have identified, and to render the results of that work intellectually and physically more accessible than was possible in print culture. %M C.DL.09.263 %T Robust registration of manuscript images %S Session 9 %A Baumann, Ryan %A Seales, W. Brent %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 263-266 %K image registration, image warping, manuscript restoration, multispectral imaging %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555443 %X In this paper we present an application of image registration techniques to the specific domain of manuscript images. We show the application of this technique to images of the Venetus A, a 10th century manuscript of Homer's Iliad. The same algorithm is used to register images of the MS across time (including photographs separated by over a century), as well as across imaging modalities. %M C.DL.09.267 %T Cost and benefit analysis of mediated enterprise search %S Session 10 %A Wu, Mingfang %A Thom, James A. %A Turpin, Andrew %A Wilkinson, Ross %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 267-276 %K cost and benefit analysis, enterprise search, evaluation, information retrieval, mediated search, relevance feedback %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555445 %X The utility of an enterprise search system is determined by three key players: the information retrieval (IR) system (the search engine), the enterprise users, and the service provider who delivers the tailored IR service to its designated enterprise users. Currently, evaluations of enterprise search have been focused largely on the IR system effectiveness and efficiency, only a relatively small amount of effort on the user's involvement, and hardly any effort on the service provider's role. This paper will investigate the role of the service provider. We propose a method that evaluates the cost and benefit for a service provider of using a mediated search engine -- in particular, where domain experts intervene on the ranking of the search results from a search engine. We test our cost and benefit evaluation method in a case study and conduct user experiments to demonstrate it. Our study shows that: 1) by making use of domain experts' relevance assessments in search result ranking, the precision and the discount cumulated gain of ranked lists have been improved significantly (144% and 40% respectively); 2) the service provider gains substantial return on investment and higher search success rate by investing in domain experts' relevance assessments; and 3) the cost and benefit evaluation also indicates the type of queries to be selected from a query log for evaluating an enterprise search engine. %M C.DL.09.277 %T Document relevance assessment via term distribution analysis using fourier series expansion %S Session 10 %A Galeas, Patricio %A Kretschmer, Ralph %A Freisleben, Bernd %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 277-284 %K fourier series, query expansion, ranked retrieval, term distribution %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555446 %X In addition to the frequency of terms in a document collection, the distribution of terms plays an important role in determining the relevance of documents for a given search query. In this paper, term distribution analysis using Fourier series expansion as a novel approach for calculating an abstract representation of term positions in a document corpus is introduced. Based on this approach, two methods for improving the evaluation of document relevance are proposed: (a) a function-based ranking optimization representing a user defined document region, and (b) a query expansion technique based on overlapping the term distributions in the top-ranked documents. Experimental results demonstrate the effectiveness of the proposed approach in providing new possibilities for optimizing the retrieval process. %M C.DL.09.285 %T How do you feel about "dancing queen"?: deriving mood & theme annotations from user tags %S Session 11 %A Bischoff, Kerstin %A Firan, Claudiu S. %A Nejdl, Wolfgang %A Paiu, Raluca %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 285-294 %K collaborative tagging, high-level music descriptors, metadata enrichment, mood and theme tag recommendation %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555448 %X Web 2.0 enables information sharing, collaboration among users and most notably supports active participation and creativity of the users. As a result, a huge amount of manually created metadata describing all kinds of resources is now available. Such semantically rich user generated annotations are especially valuable for digital libraries covering multimedia resources such as music, where these metadata enable retrieval relying not only on content-based (low level) features, but also on the textual descriptions represented by tags. However, if we analyze the annotations users generate for music tracks, we find them heavily biased towards genre. Previous work investigating the types of user provided annotations for music tracks showed that the types of tags which would be really beneficial for supporting retrieval -- usage (theme) and opinion (mood) tags -- are often neglected by users in the annotation process. In this paper we address exactly this problem: in order to support users in tagging and to fill these gaps in the tag space, we develop algorithms for recommending mood and theme annotations. Our methods exploit the available user annotations, the lyrics of music tracks, as well as combinations of both. We also compare the results for our recommended mood / theme annotations against genre and style recommendations -- a much easier and already studied task. Besides evaluating against an expert (AllMusic.com) ground truth, we evaluate the quality of our recommended tags through a Facebook-based user study. Our results are very promising both in comparison to experts as well as users and provide interesting insights into possible extensions for music tagging systems to support music search. %M C.DL.09.295 %T Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia %S Session 11 %A Dalip, Daniel Hasan %A Gonçalves, Marcos André %A Cristo, Marco %A Calado, Pável %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 295-304 %K SVM, machine learning, quality assessment, wikipedia %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555449 %X The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. %M C.DL.09.305 %T Designing the reading experience for scanned multi-lingual picture books on mobile phones %S Session 11 %A Bederson, Benjamin B. %A Quinn, Alex %A Druin, Allison %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 305-308 %K books, children, digital libraries, iPhone, interaction design, interface design, mobile phones, readability %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555450 %X This paper reports on an adaption of the existing PopoutText and ClearText display techniques to mobile phones. It explains the design rationale for a freely available iPhone application to read books from the International Children's Digital Library. Through a combination of applied image processing, a zoomable user interface, and a process of working with children to develop the detailed design, we present an interface that supports clear reading of scanned picture books in multiple languages on a mobile phone. %M C.DL.09.309 %T Mobility, digital libraries and a rural indian village %S Session 11 %A Jones, Matt %A Thom, Emma %A Bainbridge, David %A Frohlich, David %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 309-312 %K digital libraries, digital-divide, information ecologies, mobility, user-generated content %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555451 %X Millions of people in developed countries routinely create and share digital content; but what about the billions of others in on the wrong side of what has been called the 'global digital divide'? This paper considers three mobile platforms to illustrate their potential in enabling rural Indian villagers to make and share digital stories. We describe our experiences in creating prototypes using mobile phones; high-end media-players; and, paper. Interaction designs are discussed along with findings from various trials within the village and elsewhere. Our approach has been to develop prototypes that can work together in an integrated fashion so that content can flow freely and in interesting ways through the village. While our work has particular relevance to those users in emerging world contexts, we see it also informing needs and practices in the developed world for user-generated content. %M C.DL.09.313 %T What do exploratory searchers look at in a faceted search interface? %S Session 11 %A Kules, Bill %A Capra, Robert %A Banta, Matthew %A Sierra, Tito %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 313-322 %K OPAC, exploratory search, eye tracking, faceted search, online public access catalogs, task design, user studies %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555452 %X This study examined how searchers interacted with a web-based, faceted library catalog when conducting exploratory searches. It applied eye tracking, stimulated recall interviews, and direct observation to investigate important aspects of gaze behavior in a faceted search interface: what components of the interface searchers looked at, for how long, and in what order. It yielded empirical data that will be useful for both practitioners (e.g., for improving search interface designs), and researchers (e.g., to inform models of search behavior). Results of the study show that participants spent about 50 seconds per task looking at (fixating on) the results, about 25 seconds looking at the facets, and only about 6 seconds looking at the query itself. These findings suggest that facets played an important role in the exploratory search process. %M C.DL.09.323 %T Aligning METS with the OAI-ORE data model %S Session 12 %A McDonough, Jerome P. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 323-330 %K METS, OAI-ORE, aggregation, modeling, structural metadata %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555454 %X The Open Archives Initiative -- Object Reuse and Exchange (OAI-ORE) specifications provide a flexible set of mechanisms for transferring complex data objects between different systems. In order to serve as an exchange syntax, OAI-ORE must be able to support the import of information from localized data structures serving various communities of practice. In this paper, we examine the Metadata Encoding & Transmission Standard (METS) and the issues that arise when trying to map from a localized structural metadata schema into the OAI-ORE data model and serialization syntaxes. %M C.DL.09.331 %T EverLast: a distributed architecture for preserving the web %S Session 12 %A Anand, Avishek %A Bedathur, Srikanta %A Berberich, Klaus %A Schenkel, Ralf %A Tryfonopoulos, Christos %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 331-340 %K crawling, indexing, time-travel search, web archives %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555455 %X The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results. %M C.DL.09.341 %T A framework for describing web repositories %S Session 12 %A McCown, Frank %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 341-344 %K preservation, web repositories, web resources %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555456 %X In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive's Wayback Machine can be used to "lazily preserve" website and reconstruct them when they are lost. We use the term "web repositories" for collections of automatically refreshed and migrated content, and collectively we refer to these repositories as the "web infrastructure". In this paper we present a framework for describing web repositories and the status of web resources in them. This includes an abstract API for web repository interaction, the concepts of deep vs. flat and light/dark/grey repositories and terminology of describing the recoverability of a web resource. Our API may serve as a foundation for future web repository interfaces. %M C.DL.09.345 %T Preserving digital data in heterogeneous environments %S Session 12 %A Antunes, Gonçalo %A Barateiro, José %A Cabral, Manuel %A Borbinha, José %A Rodrigues, Rodrigo %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 345-348 %K data grids, dependability, digital libraries, digital preservation %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555457 %X Digital preservation aims at maintaining digital objects accessible over a long period of time, regardless of the challenges of organizational or technological changes or failures. In particular, data produced in e-Science domains could be reliably stored in today's data grids, taking advantage of the natural properties of this kind of infrastructure to support redundancy. However, to achieve reliability we must take into account failure interdependency. Taking into account the fact that correlated failures can affect multiple components and potentially cause complete loss of data, we propose a solution to evaluate redundancy strategies in the context of heterogeneous environments such as data grids. This solution is based on a simulation engine that can be used not only to support the process of designing the preservation environment and related policies, but also later on to observe and control the deployed system. %M C.DL.09.349 %T Unsupervised creation of small world networks for the preservation of digital objects %S Session 12 %A Cartledge, Charles L. %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 349-352 %K digital preservation, small world %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555458 %X The prevailing model for digital preservation is that archives should be similar to a "fortress": a large, protective infrastructure built to defend a relatively small collection of data from attack by external forces. Such projects are a luxury, suitable only for limited collections of known importance and requiring significant institutional commitment for sustainability. In previous research, we have shown the web infrastructure (i.e., search engine caches, web archives) refreshes and migrates web content in bulk as side-effects of their user-services, and these results can be mined as a useful, but passive preservation service. Our current research involves a number of questions resulting from removing the implicit assumption that web-based data objects must passively await curatorial services: What if data objects were not tethered to repositories? What are the implications if the content were actively seeking out and injecting itself into the web infrastructure (i.e., search engine caches, web archives)? All of this leads to our primary research question: Can we create objects that preserve themselves more effectively than repositories or web infrastructure can? %M C.DL.09.353 %T Towards a virtual organization for data cyberinfrastructure %S Session 12 %A Borgman, Christine L. %A Bowker, Geoffrey C. %A Finholt, Thomas A. %A Wallis, Jillian C. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 353-356 %K collaborative work, cyberinfrastructure, scientific data, sensor networks %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555459 %X We report on the exploratory stages of multi-university, multi-research-site, multi-year effort to investigate and compare data practices in multiple cyberinfrastructure projects and their emerging virtual organizations. Our long-term goal is to understand the data practices and data management requirements of virtual organizations and their implications for the design and development of data digital libraries. We have constructed our own virtual organization as a participant-observer approach to the research. Results to date suggest that collaborative technologies are emergent and that defining and scoping the data products of collaborations continues to be problematic. %M C.DL.09.357 %T Expanding the search for digital preservation solutions: adopting PREMIS in cultural heritage institutions %S Posters %A Alemneh, Daniel Gelaw %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 357-358 %K diffusion of innovation, digital preservation, metadata, premis %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555461 %X This paper will present some preliminary result on factors that affect the adoption of PREMIS (Preservation Metadata Implementation Strategies) in cultural heritage institutions. The study employed a web-based survey to collect data from 123 participants in 20 countries as well as a semi-structured, follow-up telephone interview with a smaller sample of the survey respondents. Roger's diffusion of innovation theory was used as a theoretical framework. The main constructs considered for the study were relative advantage, compatibility, complexity, trialability, observability, and institution readiness. The study yielded both qualitative and quantitative data, and preliminary analysis showed that all six factors influence the adoption of PREMIS in varying degrees. %M C.DL.09.359 %T Collaborative digital library: enhancing digital collections to improve learning in educational programs %S Posters %A Badashian, Ali Sajedi %A Firouzabadi, Asghar Dehghani %A Khalkhali, Iman %A Afzali, Hamidreza %A Delcheh, Morteza Ashurzad %A Shafiei, Mohammad Shoja %A Alipour, Mahdi %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 359-360 %K collection development, curriculum development, digital libraries, educational resources, exploring, information visualization, integration, knowledge sharing %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555462 %X In this article, a universal collaborative and competitive approach is introduced for deployment of digital collections in an ideal Digital Library (DL) for future's educational system. The collaborative and open-source aspects of the system guarantee its growth and the competitive aspects guarantee the accuracy. %M C.DL.09.361 %T Digitizing the flea market: eBay as a data source for historic collections %S Posters %A Becker, Snowden %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 361-362 %K collectibles, eBay, ephemera, home movies, online auctions %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555463 %X The online auction site eBay has overtaken face-to-face transactions as the primary means of doing business for collectors and sellers of unique and ephemeral materials. Historical societies, museums, and archives also increasingly collect ephemera as records of social and cultural history. This presentation argues that the digitized flea market, as epitomized by eBay, replaces in-person sales while also providing a stream of rich information about a previously invisible, unquantifiable marketplace. Furthermore, identifying factors that influence collectibles buyers' behavior in online auction sales can also shed light on factors affecting user behaviors in digital libraries. Data from a survey of over 1,000 recent home movie auction listings on eBay suggest how eBay may be used as a data source by collectors, as well as the users and designers of digital libraries. %M C.DL.09.363 %T Semantic alerting for digital libraries %S Posters %A Buchanan, George %A Hinze, Annika %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 363-364 %K FRBR, aggregate documents, alerting, digital libraries, semantics %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555464 %X We previously investigated the support of alerting services across networks of heterogeneous digital libraries. We now report the first generation of semantically enhanced digital library alerting systems. Where previous alerting services have provided users with notifications of new library content using traditional metadata, we demonstrate the advantages and challenges of using semantic technologies. This uncovers key issues that are not yet fully understood in general event-based systems (including alerting systems). %M C.DL.09.365 %T Addressing researchers' needs through the data curation profile %S Posters %A Carlson, Jake %A Leiter, Deborah %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 365-366 %K data curation, data sharing, repositories %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555465 %X This poster describes a study currently in progress that seeks to identify and address the needs of researchers from multiple disciplines in managing, curating and preserving their data. One output of this study, which is still in its early stages, will be the "data curation profile," a methodological tool designed to enable the comparison of needs across disciplines and help librarians build digital libraries that accurately reflect and address the needs of data producers. %M C.DL.09.367 %T Implementation and evaluation of palm leaf manuscript metadata schema (PLMM) %S Posters %A Chamnongsri, Nisachol %A Manmart, Lampang %A Wuwongse, Vilas %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 367-368 %K cultural heritage, metadata schema, palm leaf manuscript %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555466 %X The evaluation of Palm Leaf Manuscripts Metadata Schema (PLMM) aims to examine whether the PLMM satisfactorily meets the user requirements in searching for the PLMs and managing the PLMs collection. (1) An examination of the PLMM's capability in describing the particular characteristics of Northeastern Thai Palm Leaf Manuscripts, and its usefulness in the palm leave manuscripts preservation and rights control management (2) an investigation of users' satisfaction when using PLMM to search for the PLMs and managing the PLMs collection. The evaluation process began with the development of the prototype of PLMs management system to implement the PLMM. Then, more than 200 metadata records describing all types of sample PLMs (with variations in sizes, scripts, languages, titles, and number of content subjects contained in a fascicle) were provided in Extensible Markup Language (XML) format, while system interfaces and queries were developed with Hypertext Preprocessor (PHP). This was followed by the trials with end users and staff in their workplace in order to evaluate the usefulness of PLMM in user tasks according to the FRBR tasks: find, identify, select, and obtain; and collection development tasks. The research found that 'somewhat high' efficiency of the PLMM was perceived among the participants in the two tasks. The finding also suggests that perceived efficiency of the PLMM was significantly higher with more years of users' experience with the PLMs. The status of users is another factor which positively affected the perceived efficiency of the PLMM. %M C.DL.09.369 %T A personalized learning environment %S Posters %A de la Chica, Sebastian %A Ahmad, Faisal %A Gu, Qianyi %A Okoye, Ifi %A Maull, Keith %A Sumner, Tamara %A Butcher, Kirsten R. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 369-370 %K competency models, digital library resources, knowledge models, personalization, student misconceptions %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555467 %X We report on the current research activities and results obtained through the Concept Learning service for Concept Knowledge (CLICK) and present a demonstration of the system. This poster session will focus on a demonstration of the CLICK system and the results of the learning study. %M C.DL.09.371 %T Analysis of transaction logs for insights into use of life oral histories %S Posters %A Christel, Michael G. %A Maher, Bryan S. %A Li, Huan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 371-372 %K digital video library, oral histories, video retrieval %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555468 %X A digital video library of over 900 hours of video and 18000 stories from The HistoryMakers was used by 214 students, faculty, librarians, and life-long learners interacting with a system providing multiple search and viewing capabilities over a trial period of several months. User demographics and actions were logged, providing metrics on how the system was used. This poster overviews a few highlights from these transaction logs of the Informedia digital video library system for life oral histories. %M C.DL.09.373 %T Summarizing user-generated reviews in digital libraries: a visual clustering approach %S Posters %A Chung, Wingyan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 373-374 %K aspect analysis, clustering, sentiment analysis, text classification, text summarization, user-generated review, visualization %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555469 %X In this paper, we describe a visual clustering approach to summarizing user-generated reviews of digital library items and services. The approach consists of the steps of sentence extraction, aspect identification, opinion classification, and review summarization. Our work augments existing work by considering non-standard input and by incorporating clustering and visualization in summarization. %M C.DL.09.375 %T An interoperability service framework for high-resolution image applications %S Posters %A Chute, Ryan %A Dresher, Stephan %A Balakireva, Luda %A Van de Sompel, Herbert %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 375-376 %K JPEG 2000, JSON, OAI-ORE, architecture, digital imaging, digital libraries, interoperability, openurl, standards %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555470 %X This poster presents a prototype architecture and potential use-cases for a standards-based service framework to simplify development of high-resolution image viewing clients. %M C.DL.09.377 %T Tailoring greenstone for seniors %S Posters %A Cunningham, Sally Jo %A Bennett, Erin K. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 377-378 %K home archiving, personal history, senior users %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555471 %X We present a re-design of Greenstone to support seniors (aged over 65) in managing documents reflecting their life history. %M C.DL.09.379 %T A mixed digital / physical snapshot of early internet / web usage in New Zealand %S Posters %A Cunningham, Sally Jo %A Bydder, Jillene %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 379-380 %K digital museum, history of the web, internet archive %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555472 %X We are in the early stages of developing a unique physical and digital record of New Zealand's early experience of the Internet. %M C.DL.09.381 %T Mashing up life science literature resources %S Posters %A Easty, Richard %A Nikolov, Nikolay %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 381-382 %K browser plugin, data integration, life science literature %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555473 %X In the life sciences one of the pronounced problems is the deluge of new results and data that are produced on a daily basis. This data can take many different forms, e.g. microarray probes, gene sequences, protein structures and is added by hundreds of research centers world-wide in a largely uncoordinated fashion. Thus integration of life science data is growing in importance. Unfortunately, most research centers do not have particular incentive to spend efforts on integrating their data with data produced by others. This task is largely left to large publicly-sponsored institutions like the US National Library of Medicine and similar institutions in other countries. Unfortunately, despite their work in this area, the integration of web-based life science resources is still an open issue (and one ever growing in importance) as these organizations cannot cope with the information deluge that is happening on a daily basis in the life sciences. Thus it becomes essential that as many as possible third parties are engaged in the process. Here we demonstrate a simple prototype of a browser plugin that creates a platform for third parties to contribute to cross-linking related online life science data resources and thus improving the search experience and the productivity of the life science community. The plugin creates a convenient programming interface that minimizes the effort that arises for such third-party contributors. We have provided reference implementations using the plugin that cross-link life science literature resources and illustrate the potential for third parties to create mashups that could be applied also in areas other than the life sciences. %M C.DL.09.383 %T Representing publication and distribution practices for scholarly materials: a cross-disciplinary comparison %S Posters %A Edwards, Phillip M. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 383-384 %K scholarly communication, scholarly publication, work practices %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555474 %X This poster presents a pluralistic approach for representing discipline-specific, cross-disciplinary, and discipline-independent work practices related to scholarly communication. This approach has been applied to qualitative analysis from an investigation of publication and distribution practices of scholars within the biological sciences and the field of communication. The resulting representations illustrate shared work practices and areas where diverse practices exist, both of which can guide the development of digital collections of scholarly materials. This poster also considers challenges related to aligning data collection methods with the application of these representational techniques. %M C.DL.09.385 %T Inferring intra-organizational collaboration from cosine similarity distributions in text documents %S Posters %A Esteva, Maria %A Bi, Hai %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 385-386 %K digital archives, statistical distributions, text mining %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555475 %X We present a method that uses text mining methods and statistical distributions to infer degrees of collaboration between staff members in an organization, based on the similarity of the documents that that they wrote and exchanged over time. %M C.DL.09.387 %T Personal name-matching through name transformation %S Posters %A Gong, Jun %A Wang, Lidan %A Oard, Douglas W. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 387-388 %K name transformation, personal name-matching, string distance %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555476 %X A graph theory based method is proposed to exploit name transformation for personal name-matching. Experiment results on three personal name datasets show that the method is effective. %M C.DL.09.389 %T EMU: the emory user behavior data management system for automatic library search evaluation %S Posters %A Guo, Qi %A Kelly, Ryan P. %A Deemer, Selden %A Murphy, Arthur %A Smith, Joan A. %A Agichtein, Eugene %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 389-390 %K data exploration, library search evaluation, user behavior modeling %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555477 %X We describe EMU, a system for collecting, managing, and mining the behavior data collected in the Emory libraries search system. We describe the data capture system based on the LibX browser plugin, the database management system for successfully storing, searching and exploring millions of resulting user interactions, and preliminary results of interesting queries and statistics that we are using to evaluate the effectiveness of library search tools. %M C.DL.09.391 %T Building a thailand researcher network based on a bibliographic database %S Posters %A Haruechaiyasak, Choochart %A Kongthon, Alisa %A Thaiprayoon, Santipong %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 391-392 %K R&D management, expertise retrieval, social network %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555478 %X Among many practical and domain-specific tasks, expertise retrieval (ER) has recently gained increasing attention in the information retrieval and knowledge management communities. This paper describes our ongoing project to design and implement an expert retrieval system with the scope on researchers who work in Thailand. In our current system prototype, we assume that the areas of expertise among researchers can be extracted from bibliographic databases. We use the Science Citation Index (SCI) database to provide the information for representing the expert profiles. From the SCI database, we queried and retrieved publications covering from the year 2001 to 2008 by specifying the affiliation equal to "Thailand". The results contain a set of approximately 23,000 publications. We downloaded and extracted four related fields including authors (denoted by AU), controlled terms (denoted by ID), keywords (denoted by DE) and subject category (denoted by SC). To build a researcher network, we consider two types of relationships: direct and indirect. The direct (or social) relationship is defined as the co-authoring degree between one researcher to others. The co-authoring degree between two researchers, co-authoring(A,B), can be calculated based on the co-occurrence frequency between A and B found in the field AU of 23,000 retrieved records. The indirect (or topical relationship is defined when two researchers have publications under the same topics. The topical degree between two researchers, topical(A,B), can be calculated based on the similarity measure between two sets of extracted keywords, keyword(A) and keyword(B), representing researcher A and B, respectively. The keyword set can be extracted from the fields ID, DE and SC. An author with high frequencies on particular keywords is considered an expert in the corresponding research topics. %M C.DL.09.393 %T Building a MARC-to-OLAC crosswalk: repurposing library catalog data for the language resources community %S Posters %A Hirt, Christopher %A Simons, Gary %A Spanne, Joan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 393-394 %K ISO 639, language identification %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555479 %X The Open Language Archives Community (OLAC) is an international partnership of institutions which are building a network of interoperating repositories and services to create a worldwide virtual library of language resources (that is, resources that document, describe, or develop the more than 7,000 known languages of the world). OLAC uses a community-specific refinement of qualified Dublin Core [http://www.language-archives.org/OLAC/metadata.htm] along with a community-specific refinement of the OAI Protocol for Metadata Harvesting [http://www.language-archives.org/OLAC/repositories.htm] to maintain an aggregated catalog of the holdings of the 35 participating archives. OLAC recognizes that the language resources of interest to the community come not only from sources within the community but also from many sources outside the community. This poster describes one approach we have developed for addressing this issue, namely, a crosswalk that transforms the MARC21 catalog for a library or archive into an OAI static repository that holds an OLAC metadata record for each MARC record identified as describing a language resource. %M C.DL.09.395 %T Locating text in scanned books %S Posters %A Hu, Chang %A Rose, Anne %A Bederson, Benjamin B. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 395-396 %K adobe acrobat, book readers, digital libraries, readability, word location %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555480 %X In this paper, we describe a work flow to extract and verify text locations using commercial software, along with free software products and human proofing. To help mid-sized digital libraries, we are making our solution available as open source software. %M C.DL.09.397 %T Remote usability testing: a practice %S Posters %A Huang, Sheng-Cheng %A Bias, Randolph G. %A Payne, Tanya L. %A Rogers, Jay B. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 397-398 %K collaborative design, remote testing, usability testing %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555481 %X For increasingly frequent use of library resources by remote users, remote usability testing has become a valuable tool for those who would pursue an empirical, user-centered design of the interfaces to their electronic resources and services. This paper describes our implementation of remote usability tests to evaluate prototypes of a web content management application developed by Vignette Corporation, and reports sample results to illustrate the utility of such an approach that can help designing and improving interfaces of digital library projects and their usability. %M C.DL.09.399 %T Scientific digital libraries, interoperability, and ontologies %S Posters %A Hughes, J. Steven %A Crichton, Daniel J. %A Mattmann, Chris A. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 399-400 %K digital library, information model, interoperability, ontology, science data, science metadata %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555482 %X Scientific digital libraries serve complex and evolving research communities. Justifications for the development of scientific digital libraries include the desire to preserve science data and the promises of information interconnectedness, correlative science, and system interoperability. Research [1] suggests single shared ontologies are fundamental to fulfilling these promises. We present a tool framework, a set of principles, and a real world case study where shared ontologies are used to develop and manage science information models and subsequently guide the implementation of scientific digital libraries. The tool framework, based on an ontology modeling tool as illustrated in Figure 1, was configured to develop, manage, and keep shared ontologies relevant within changing domains and to promote the interoperability, interconnectedness, and correlation desired by scientists. %M C.DL.09.401 %T The landscape of information science: 1996-2008 %S Posters %A Ibekwe-SanJuan, Fidelia %A SanJuan, Eric %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 401-402 %K clustering, information visualization, knowledge domain mapping, text mining %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555483 %X We propose a methodology combining symbolic and numeric information to map the structure of research in Information Science between 1996-2008. The visualization of the resulting maps showed that while the two-camp structure of Information Science observed in previous studies is still valid, other research poles like web and user-oriented studies are building bridges between the two hitherto isolated poles. %M C.DL.09.403 %T Forging the future: new tools for variable media art preservation %S Posters %A Ippolito, Jon %A Rinehart, Richard %A Lutz, Marilyn %A Fitzgerald, Sharon %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 403-404 %K metadata, new media, preservation strategies, variable media art %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555484 %M C.DL.09.405 %T Analyzing OPAC use with screen views and eye tracking %S Posters %A Ishita, Emi %A Mine, Shinji %A Koizumi, Masanori %A Miyata, Yosuke %A Kunimoto, Chihiro %A Shiozaki, Junko %A Kurata, Keiko %A Ueda, Shuichi %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 405-406 %K OPAC use, eye tracking, viewing patterns %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555485 %X Eye tracking was used to analyze which elements of which screens were viewed by users searching an Online Public Access Catalog (OPAC). Eye tracking data was obtained for 32 participants performing a known-item search task. The results show that more than 30% of participants did not make effective use of screens offering additional details, and that participants who did, and found the correct answer, gazed at specific screen elements more frequently than participants who gave incorrect answers. %M C.DL.09.407 %T A user-friendly metadata quality control tool for the internet public library %S Posters %A Khoo, Michael %A Lin, Xia %A Park, Jung-ran %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 407-408 %K HCI, LIS instruction, dublin core, evaluation, internet public library, metadata, metadata quality control, user-centered design %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555486 %X The Internet Public Library (IPL) is crosswalking its metadata to Dublin Core. The quality of the crosswalked metadata will be unknown. The IPL is therefore developing a tool for metadata quality control suitable for use by LIS students who have little previous metadata quality control experience. %M C.DL.09.409 %T Using an institutional repository for personal digital collections of retired faculty members %S Posters %A Kim, Sarah %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 409-410 %K archival collection, archiving, institutional repository %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555487 %X In this poster, I address practical issues related to using IRs for personal digital collections of retired faculty members. %M C.DL.09.411 %T Exploitation of the wikipedia category system for enhancing the value of LCSH %S Posters %A Kiyota, Yoji %A Nakagawa, Hiroshi %A Sakai, Satoshi %A Mori, Tatsuya %A Masuda, Hidetaka %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 411-412 %K LCSH, subject headings, wikipedia categories %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555488 %X This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH through the Wikipedia category system. %M C.DL.09.413 %T Inter-search engine lexical signature performance %S Posters %A Klein, Martin %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 413-414 %K lexical signature, performance, search engine %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555489 %X We generate lexical signatures (LSs) from web pages and acquire the mandatory document frequency values from three dierent search engine (SE) indexes. We cross-query the LSs against the two SEs they were not generated from and compare the retrieval performance by parsing the result set and analyzing the rank of the source URL. %M C.DL.09.415 %T Correlation of music charts and search engine rankings %S Posters %A Klein, Martin %A Hunsicker, Olena %A Nelson, Michael L. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 415-416 %K correlation, real-world objects, search engine %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555490 %X We investigate the question whether expert rankings of real-world entities correlate with search engine (SE) rankings of corresponding web resources. We compare Billboards "Hot 100 Airplay" music charts with SE rankings of associated web resources. Out of nine comparisons we found two strong, two moderate, two weak and one negative correlation. The remaining two comparisons were inconclusive. %M C.DL.09.417 %T Toward automatic generation of image-text document surrogates to optimize cognition %S Posters %A Koh, Eunyee %A Kerne, Andruid %A Moeller, Jon %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 417-418 %K information extraction, search representation, surrogates %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555491 %X The representation of information collections needs to be optimized for human cognition. Growing information collections play a crucial role in human experiences. While documents often include rich visual components, collections, including personal collections and those generated by search engines, are typically represented lists of text-only surrogates. By concurrently invoking complementary components of human cognition, combined image-text surrogates help people to more effectively see, understand, think about, and remember information collection. This research develops algorithmic methods that use the structural context of images in HTML documents to associate meaningful text and thus derive combined image-text surrogates. %M C.DL.09.419 %T Designing exploratory search tasks for user studies of information seeking support systems %S Posters %A Kules, Bill %A Capra, Robert %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 419-420 %K n/a %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555492 %X This poster describes a procedure for designing exploratory tasks for use in laboratory evaluations of information seeking interfaces. This procedure is grounded in the literature on information seeking and information retrieval and has been refined by an evaluation of four tasks designed for a study of a faceted library catalog. The procedure is intended to be extensible to generate exploratory tasks for other types of interfaces and domains. %M C.DL.09.421 %T Developing a review rubric for learning resources in digital libraries %S Posters %A Leary, Heather %A Giersch, Sarah %A Walker, Andrew %A Recker, Mimi %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 421-422 %K education digital library, instructional architect, national science digital library, review rubric %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555493 %M C.DL.09.423 %T From harvesting to cultivating: transformation of a web collecting system into a robust curation environment %S Posters %A Lee, Christopher A. %A Marciano, Richard %A Hou, Chien-yi %A Shah, Chirag %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 423-424 %K interoperable repositories %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555494 %X Much has been written about the lifecycle of digital objects. This study is instead concerned with the lifecycle of collections and associated services. Online collection environments are built to fulfill specific collecting objectives and constraints. If a collection proves useful within its original hosting environment, it will often be necessary or desirable to move the collection to new environments, in order to support new forms of use and re-aggregation or extract resources from legacy data environments. Such a transformation can be extremely expensive, challenging and prone to error, especially if the collections include complex internal structures and services. When "services make the repository" [1], moving raw data from one location to another will often not be sufficient. Digital curators can preempt costly and problematic system migration efforts by integrating collections into environments specifically designed to support long-term preservation, scalability and interoperability [2]. We report on an integration of content and functionality of a feature-rich collecting environment (ContextMiner) into a robust data curation environment (iRODS). ContextMiner is a web-based service for building collections, through the execution and management of "campaigns" (i.e. sets of associated queries and parameters to harvest content over time). As a part of the VidArch project, we have been using the ContextMiner framework and services for harvesting YouTube videos and associated contextual information on a variety of topics. In July 2008, we released a public beta of ContextMiner, allowing anyone to run similar crawls. There are now more than 100 users. The current implementation -- based on a single MySQL database and associated code -- has served its intended purposes very well, but it is not a scalable or sustainable basis for offering wide-scale collecting services in support of the diverse array of potential users and use cases. iRODS (integrated Rule-Oriented Data System), is adaptive policy-driven data grid middleware, which addresses aspects of growth, evolution, openness, and closure -- fundamental requirements for digital preservation [3]. iRODS currently scales to hundreds of millions of files, tens of thousands of users, and petabytes of data. It operates in a highly distributed environment with heterogeneous storage resources and allows for growth through federation. It supports evolution through the virtualization of the underlying technology and supports changing business requirements through customization of repository behaviors. It supports openness through a data type agnostic treatment of content. iRODS can be instrumented with policies that support the management of the lifecycle of digital assets and will serve as a unique platform to study repository integration. One key feature is the automation of policy enforcement across distributed data that have been organized into a shared collection. The coupling of other open repositories and iRODS can create greater efficiencies and new types of repository services. We discuss various repository integration scenarios, their potential benefits, and implications for collection life cycles. The approaches co-locate metadata and content in varied ways and rely on efficiencies found in one repository only, or on the ability to combine policies in both spaces: (1) iRODS to ContexMiner data migration, (2) Policy-based data management for ContextMiner collections, and (3) Policy interchange between ContextMiner and iRODS collections. %M C.DL.09.425 %T A semi-automatic system for managing multiple digital preservation risks of digital libraries in china %S Posters %A Li, Chao %A Xing, Chunxiao %A Dong, Li %A Huang, Michael Bailou %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 425-426 %K XML, digital preservation, integration, web service %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555495 %X While many research projects in the world have been addressing challenges posed by digital preservation, digital libraries in China have their own native problems that have never been addressed before. Similar problems may occur in other countries, and their memory institutions may be less prepared to handle them. This poster analyses the requirements and challenges of digital libraries in China and describes an integrated and flexible digital preservation system -- AOMS. %M C.DL.09.427 %T What patrons want: supporting interaction for novice information seeking scholars %S Posters %A Loizides, Fernando %A Buchanan, George R. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 427-428 %K document triage, information seeking, novice users %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555496 %X In this paper, we undertake a study of inexperienced information seeking scholars, identifying areas for improvement in their electronic information seeking and document triage process[3]. We propose a software aid, currently under development. %M C.DL.09.429 %T Selective harvesting of regional digital libraries and national metadata aggregators %S Posters %A Mazurek, Cezary %A Mielnicki, Marcin %A Werla, Marcin %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 429-430 %K CQL, OAI-PMH, interoperability, metadata access and distribution, metadata aggregation, selective metadata harvesting %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555497 %X The poster presents the concept, implementation and practical application of the OAI-PMH protocol extension which allows OAI-PMH service providers to dynamically create and harvest sets of items from OAI-PMH data providers. The implementation of the presented concept is based on the encoding of dynamic set specifications in OAI-PMH requests with the CQL language. The extension was developed and widely applied in Poland and now it is used in several projects funded by the European Commission. %M C.DL.09.431 %T User search behaviors within a library gateway %S Posters %A Mischo, William H. %A Schlembach, Mary C. %A Norman, Michael A. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 431-432 %K metasearch, transaction logs, user searching behaviors %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555498 %X This poster reports on user searching behavior within two information gateways developed at the University of Illinois at Urbana-Champaign Library. These gateways are built around a locally developed metasearch engine and are designed to assist users with search query formulation and modification. Search behavior data is being collected in custom transaction logs that gather user search arguments along with any system actions and contextual search assistance suggestions. %M C.DL.09.433 %T Users' adjustments to unsuccessful queries in biomedical search %S Posters %A Murray, G. Craig %A Lin, Jimmy %A Wilbur, John %A Lu, Zhiyong %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 433-434 %K PubMed, medical search, query reformulation, user modeling %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555499 %X Biomedical researchers depend on on-line databases and digital libraries for up to date information. We introduce a pilot project aimed at characterizing adjustments made to biomedical queries that improve search results. Specifically we focus on queries submitted to PubMedî, a large sophisticated search engine that facilitates Web access to abstracts of articles in over 5,200 biomedical journals. On average 2 million users search PubMed each day. During their search, nearly 20% will experience a result page from one of their queries that has zero results. In some cases there really is no document or abstract that will satisfy a particular query. However, in analyzing one month of queries submitted to PubMed, we find that more often than not, queries that retrieved no results are queries that would retrieve something relevant if they were constructed differently. This paper describes a new effort to identify some of the characteristics of a query that produces zero results, and the changes that users most often apply in constructing new, "corrected" queries. Zero-result queries afford us an opportunity to examine changes made to queries that we know did not return relevant data, because they did not return any data. An investigation of the changes users make under these circumstances can yield insight into users' search processes. %M C.DL.09.435 %T Species identification: fish images with CBIR and annotations %S Posters %A Murthy, Uma %A Fox, Edward A. %A Chen, Yinlin %A Hallerman, Eric %A Torres, Ricardo %A Ramos, Evandro J. %A Falcao, Tiago R. C. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 435-436 %K CBIR, fish species identification, image annotation, image retrieval, user study %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555500 %M C.DL.09.437 %T Kindle usage among LIS students: an exploratory study %S Posters %A Rabina, Debbie L. %A Pattuelli, Maria Cristina %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 437-438 %K e-books, electronic publishing, social issues, user needs %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555501 %M C.DL.09.439 %T Metababble: a clash of metadata cultures %S Posters %A Rivero, Monica %A Henry, Geneva %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 439-440 %K TEI, digital library, metadata, minimal processing, social tagging %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555502 %X A tension exists between making digitized resources available to users quickly and providing detailed, item-level metadata and semantic markup that make those resources more discoverable. The Our Americas Archive Partnership (OAAP) project, funded by IMLS in the fall of 2007, is facing these challenges as the project progresses. This poster presents a summary of our approach and future thoughts about descriptive approaches for digital resources. %M C.DL.09.441 %T Evaluation of OAI-ORE via large-scale information topology visualization %S Posters %A Sanderson, Robert %A Llewellyn, Clare %A Jones, Richard %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 441-442 %K OAI-ORE, linked data, visualization, web 2.0 %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555503 %X This poster evaluates the OAI-ORE specifications through experiments providing access to the JSTOR digital archive and the Flickr website. A browser-based dynamic graph visualization tool was designed and tested to determine if making the topology of the information available would provide end-user benefits in terms of navigation and discovery. %M C.DL.09.443 %T Empirical analysis on chinese academic plagiarism %S Posters %A Shen, Yang %A Fu, Huijuan %A Liu, Zitao %A Liu, Pengpeng %A Fu, Qingchuan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 443-444 %K ROST anti-plagiarism software, plagiarism law, social network %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555504 %X This poster, from angels of subjects, authors' social network, authors' combination, and students' plagiarism law, apply self-developed ROST Anti-plagiarism Software to check 3781 papers, do a survey among 450 students, quantitatively analyzed academic plagiarism conditions in China, and draw several conclusions. %M C.DL.09.445 %T Adaptive personalized eLearning on top of existing LCMS %S Posters %A Takhirov, Naimdjon %A Sølvberg, Ingeborg T. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 445-446 %K eLearning, learning objects, personalization %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555505 %X The next generation of eLearning systems should tailor the learning experience to each individual's learning needs and preferences. PEDAL-NG is a system that supports personalization in an existing, operational eLearning environment, based on prior knowledge and the learning style of users. It is built as a front-end of an existing LMS. The prototype is tested by a group of students. The test results are favorable regarding the personalized course and give valuable feedback for future research. %M C.DL.09.447 %T User search characteristics on a specialized digital collection for domain- and task-specific information %S Posters %A Tang, Xiaoya %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 447-448 %K keyword search, query, terms, user study %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555506 %X Domain-specialized digital collections have been growing rapidly in recent years. A good understanding of how users interact with such collections to accomplish domain-specific information tasks would help inform the design of effective systems. This study investigates users' interaction with a Web-based botanical collection by examining search logs recorded during an experiment. The findings indicate that while users' interactions with such collections demonstrate similar characteristics to those with general purpose search systems, they also demonstrate a domain- and task-specific nature. %M C.DL.09.449 %T MetRe: supporting the metadata revision process %S Posters %A Tonkin, Emma %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 449-450 %K metadata %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555507 %X MetRe is a prototype interface and service designed to support the metadata revision process. Improving consistency of metadata records within an environment is a common repository management task, due to potential for user error when submitting, as well as of other sources of error, such as systematic error resulting from the chosen deposit process. Evidence to support the metadata correction process may be gathered by automated metadata extraction tools, evidence from within the repository, or by comparison with best practice across the repository landscape. MetRe (Metadata Revision) is a prototype demonstrator that is able to identify several characteristic classes of error, twinned with an interface able to highlight several types of individual and systematic error, including a notion of local (intra-repository) and general (inter-repository) best practice. %M C.DL.09.451 %T Finding centuries-old hyperlinks with a novel semi-supervised learning technique %S Posters %A Wang, Xiaoyue %A Keogh, Eamonn %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 451-452 %K historical digital libraries, historical manuscripts, hyperlinks, semi-supervised learning %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555508 %X Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system would not have the ubiquitous utility of text-based hyperlinks, there are several domains where it can potentially significantly augment textual information. %M C.DL.09.453 %T Journal ranking based on social information %S Posters %A Wang, Jinlong %A Gao, Ke %A Ren, Yongli %A Li, Gang %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 453-454 %K journal ranking, mining, social information %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555509 %X Recently, literature analysis has become a hot issue in academic studies. In order to quantify the importance of journals and provide researchers with target vehicles for their work, this poster proposes a novel approach based on the social information through considering the potential relationship between journals quality and authors' affiliation. Based on the formula proposed in this work, the importance of journals can be estimated and ranked. %M C.DL.09.455 %T The variety of ways in which instructors implement a modular digital library curriculum %S Posters %A Wildemuth, Barbara M. %A Pomerantz, Jeffrey P. %A Oh, Sanghee %A Yang, Seungwon %A Fox, Edward A. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 455-456 %K computer science, curriculum development, digital libraries, education, instruction, library and information science %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555510 %X With support from the National Science Foundation, researchers at Virginia Tech and the University of North Carolina developed a curriculum framework and a number of modules for instruction in the area of digital libraries. In 2008, 15 different modules were field tested by 11 instructors at 10 different institutions. As might be expected, instructors adapted these modules to fit the context of their courses, some of which are described here. %M C.DL.09.457 %T GRE: hybrid recommendations for NSDL collections %S Posters %A Will, Todd C. %A Srinivasan, Anand %A Bieber, Michael %A Im, Il %A Oria, Vincent %A Wu, Yi-Fang (Brook) %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 457-458 %K collaborative filtering, content based, digital libraries, knowledge based recommendation, recommendation systems, text search engine, user interface %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555511 %X Recommendation systems have been proven to reduce the time and effort required by users to find relevant items, but there are only sporadic reports on their application in digital libraries. The General Recommendation Engine (GRE) is composed of the text search system Lucene augmented by the well-understood content based and collaborative filtering techniques and the first application of knowledge based recommendation in digital libraries to recommend items from 22 National Science Digital Library collections. In this study comprised of 60 subjects, the GRE outperformed the baseline system Lucene in all areas of evaluation. %M C.DL.09.459 %T Archiving the videogame industry: collecting primary materials of new media artifacts %S Posters %A Winget, Megan A. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 459-460 %K collection development, new media, video games %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555512 %X This paper describes the initial deposits in The Videogame Archive at the Center for American History at the University of Texas at Austin. %M C.DL.09.461 %T Analyzing user's book-loan behaviors in Peking university library from social network perspective %S Posters %A Yan, Fei %A Zhang, Ming %A Sun, Tao %A Lu, Yang %A Zhang, Naiyue %A Xiao, Long %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 461-462 %K digital library, log mining, social network analysis %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555513 %X In a university library, students from different background are connected by co-borrowing behaviors which form a knowledge sharing network. This poster presents a novel idea to study the users' book-loan behavior patterns (knowledge sharing patterns) from the social network perspective which enable us to understand the patterns in both the macro-level and micro-level analysis. %M C.DL.09.463 %T An ajax-based digital music stand for greenstone %S Demos %A Bainbridge, David %A Bell, Tim C. %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 463-464 %K digital library integration, digital music stand %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555515 %X This extended abstract describes a digital music stand integrated with the Greenstone digital library software. It features text annotation and an animated fast-to-slow page wipe. Figure 1 illustrates both these features, although it is best appreciated in a live demonstration. Digital annotation provides a non-destructive alternative to a musician's habit of penciling in notes. In Figure 1, slightly over half way down the page, there is a note to watch the fingering. A user can have as many of these as they like, positioned anywhere on the page. The animated page wipe alleviates (somewhat) the issue of when to turn to the next page. Unlike its physical counterpart, where turning to the next page means you can no longer see the current page, with a digital music stand the next page can gradually be overlaid. The page transition occurring in Figure 1 can be seen as a marked horizontal bar not quite half-way down the page. The speed of the wipe is initially fast, but when it reaches the point where the scroll-bar marker is on the right-hand side of the page, it slows down significantly. This is to give the musician time to finishing playing the last line of the current page. In the event they have already finished playing that line, they will have naturally moved on to playing the top of the next page (which is already displayed). Rather than adopt a traditional client-side "helper" application for the digital music stand, we have integrated it within Greenstone using AJAX. For instance: next and previous pages are asynchronously loaded in the background; when generating a page, the dimensions of the user's screen is sent to the DL server so it can produce a version that maximizes the available space; and interactions such as adding an annotation, or altering the position of the animation-break are immediately stored as metadata associated with that document. Initially the animated page breaks are set to be between the last two staff systems. This is accomplished as part of the DL ingest process, leveraging off the staff detection step of Optical Music Recognition software. %M C.DL.09.465 %T Accessing the densho and historymakers oral history collections via informedia technologies %S Demos %A Christel, Michael G. %A Baron, Robert V. %A Froh, Geoff %A Benson, Dan %A Richardson, Julieanna %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 465-466 %K digital video library, oral histories, video retrieval %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555516 %X Densho is a nonprofit organization started in 1996 with the goal of documenting oral histories from Japanese Americans who were incarcerated during World War II. The HistoryMakers is a nonprofit established in 1999 with the goal of documenting video life oral history interviews highlighting the accomplishments of individual African Americans and African-American-led groups and movements. Both collections share the goal of broader, deeper use of the oral history content through digitization and automated processing where appropriate. This demonstration showcases the application of Carnegie Mellon Informedia digital video library processing and interfaces to enhance access into the interview segments. %M C.DL.09.467 %T Text mining for indexing %S Demos %A Gelernter, Judith %A Lesk, Michael %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 467-468 %K automatic classification, content analysis and indexing, text mining %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555517 %X This paper describes techniques for automatically extracting and classifying maps found within articles. The process uses image analysis to find text in maps, document structure to find captions and titles, and then text mining to assign each map to a subject category, a geographical place, and a time period. The text analysis is based on authority lists taken from gazetteers and from library classifications. %M C.DL.09.469 %T Our Americas archive partnership demonstration %S Demos %A Henry, Geneva %A Rivero, Monica %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 469-470 %K OAI, TEI, american studies, dspace, harvesting, repository, semantic markup, social tagging, tag cloud %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555518 %X The Our Americas Archive Partnership (OAAP) project is in year 2 of a 3-year IMLS funded grant led by Rice University in Partnership with the University of Maryland's Maryland Institute for Technology in the Humanities (MITH). Designed to meet the needs of American studies scholars researching the Americas from a hemispheric perspective, OAAP is developing an integrated framework for the discovery of digital resources that are managed in heterogeneous distributed repositories. This demonstration will show the current state of the project's common interface to support resource discovery. %M C.DL.09.471 %T Mapping life events: temporal and geographic context for biographical information %S Demos %A Larson, Ray R. %A Shaw, Ryan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 471-472 %K geographic information retrieval %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555519 %X Digital Libraries often fail to connect their contents to the wider context of information resources available that are about the same persons, related persons, places, or time periods and the events that happen to those persons, at those places and in a given time period. This demonstration will show prototype systems that can perform these tasks, linking the user to relevant contextual information. %M C.DL.09.473 %T Virtual DL poster sessions in second life %S Demos %A Lee, Spencer J. %A Fox, Edward A. %A Marchionini, Gary %A Velacso, Javier %A Antunes, Gonçalo %A Borbinha, José %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 473-474 %K 3D, digital preservation, second life, tele-presence, virtual world %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555520 %X In Second Life (SL), a popular general-purpose 3D virtual world, we are supporting the Digital Library community in a variety of ways, including through virtual poster sessions. This brings together the interests of those involved in JCDL 2009, IEEE-TCDL, and NSF-supported work in SL aimed to assist education, training, and dissemination in the digital preservation area. %M C.DL.09.475 %T ContextMiner: building context-rich digital collections %S Demos %A Shah, Chirag %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 475-476 %K contextual information, digital curation, digital preservation %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555521 %M C.DL.09.477 %T Using university collections in digital library education %S Demos %A Stewart, Quinn %A Todd, David %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 477-478 %K digital library curriculum, digitization, rich-media %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555522 %M C.DL.09.479 %T A curriculum customization service %S Demos %A Sumner, Tamara %A Devaul, Holly %A Davis, Lynne %A Weatherley, John %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 479-480 %K customizing instruction, differentiated instruction, educational digital libraries, personalization, science education %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555523 %X We demonstrate a prototype Curriculum Customization Service designed and developed with significant teacher input. This prototype illustrates a model for embedding digital library resources into mainstream classroom use. A 10 week pilot study suggests that this Service can increase teachers' use of digital library resources in their class, and encourage them to use resources to customize instruction. %M C.DL.09.481 %T XEB: a markup language document container format suitable for handheld devices %S Demos %A Tang, Zhi %A Gao, Liangcai %A Jia, Aixia %A Lin, Xiaofan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 481-482 %K document parsing, handheld device, markup language document %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555524 %X We propose a new document container format (XEB, eXtensible Electronic Book) based on block mechanism to efficiently process markup language documents in handheld devices. And random document access is also supported in the format through a pagination mechanism. The format has already been applied to a number of handheld devices' Chinese E-book readers and XEB documents can be downloaded from a Chinese E-book store. %M C.DL.09.483 %T AskDragon: a redundancy-based factoid question answering system with lightweight local context analysis %S Demos %A Zhou, Xiaohua %A Achananuparp, Palakorn %A Park, E. K. %A Hu, Xiaohua %A Zhang, Xiaodan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 483-484 %K answer generation, answer scoring, local context analysis, question answering, redundancy-based approach %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555525 %X We introduce our QA system AskDragon which employs a novel lightweight local context analysis technique to handling two broad classes of factoid questions, entity and numeric questions. The local context analysis module dramatically improves the efficiency of QA systems without sacrificing high accuracy performance. %M C.DL.09.485 %T Knowledge extraction and integration for semi-structural information in digital libraries %S Demos %A Zhu, Wenhao %A Wei, Baogang %A Wu, Jiangqin %A Shi, Shaomin %A Yang, Yan %B JCDL'09: Proceedings of the 2009 Joint International Conference on Digital Libraries %D 2009-06-15 %P 485-486 %K digital libraries, digitized textbook, information extraction %* (c) Copyright 2009 ACM %W http://doi.acm.org/10.1145/1555400.1555526