%M C.DL.10.1 %T Making web annotations persistent over time %S Annotations & markup %A Sanderson, Robert %A Van de Sompel, Herbert %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 1-10 %K annotation, digital preservation, persistence, web architecture %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816125 %X As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource. %M C.DL.10.11 %T Transferring structural markup across translations using multilingual alignment and projection %S Annotations & markup %A Bamman, David %A Babeu, Alison %A Crane, Gregory %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 11-20 %K annotation projection, knowledge transfer, multilingual alignment %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816126 %X We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags from source documents to their transcribed translations, with an 83.6% accuracy rate when projecting to texts containing uncorrected OCR. This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one. %M C.DL.10.21 %T ProcessTron: efficient semi-automated markup generation for scientific documents %S Annotations & markup %A Sautter, Guido %A Böhm, Klemens %A Kühne, Conny %A Mathäß, Tobias %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 21-28 %K data-driven markup process control, semantic xml markup %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816127 %X Digitizing legacy documents and marking them up with XML is important for many scientific domains. However, creating comprehensive semantic markup of high quality is challenging. Respective processes consist of many steps, with automated markup generation and intermediate manual correction. These corrections are extremely laborious. To reduce this effort, this paper makes two contributions: First, it proposes ProcessTron, a lightweight markup-process-control mechanism. ProcessTron assists users in two ways: It ensures that the steps are executed in the appropriate order, and it points the user to possible errors during manual correction. Second, ProcessTron has been deployed in real-world projects, and this paper reports on our experiences. A core observation is that ProcessTron more than halves the time users need to mark up a document. Results from laboratory experiments, which we have conducted as well, confirm this finding. %M C.DL.10.29 %T Scholarly paper recommendation via user's recent research interests %S Scholarly publications %A Sugiyama, Kazunari %A Kan, Min-Yen %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 29-38 %K digital library, information retrieval, recommendation, user modeling %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816129 %X We examine the effect of modeling a researcher's past works in recommending scholarly papers to the researcher. Our hypothesis is that an author's published works constitute a clean signal of the latent interests of a researcher. A key part of our model is to enhance the profile derived directly from past works with information coming from the past works' referenced papers as well as papers that cite the work. In our experiments, we differentiate between junior researchers that have only published one paper and senior researchers that have multiple publications. We show that filtering these sources of information is advantageous -- when we additionally prune noisy citations, referenced papers and publication history, we achieve statistically significant higher levels of recommendation accuracy. %M C.DL.10.39 %T Effective self-training author name disambiguation in scholarly digital libraries %S Scholarly publications %A Ferreira, Anderson A. %A Veloso, Adriano %A Gonçalves, Marcos André %A Laender, Alberto H. F. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 39-48 %K bibliographic citations, name disambiguation %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816130 %X Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example. %M C.DL.10.49 %T Citing for high impact %S Scholarly publications %A Shi, Xiaolin %A Leskovec, Jure %A McFarland, Daniel A. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 49-58 %K citation networks, citation projection, publication impact %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816131 %X The question of citation behavior has always intrigued scientists from various disciplines. While general citation patterns have been widely studied in the literature we develop the notion of citation projection graphs by investigating the citations among the publications that a given paper cites. We investigate how patterns of citations vary between various scientific disciplines and how such patterns reflect the scientific impact of the paper. We find that idiosyncratic citation patterns are characteristic for low impact papers; while narrow, discipline-focused citation patterns are common for medium impact papers. Our results show that crossing-community, or bridging citation patters are high risk and high reward since such patterns are characteristic for both low and high impact papers. Last, we observe that recently citation networks are trending toward more bridging and interdisciplinary forms. %M C.DL.10.59 %T Evaluating methods to rediscover missing web pages from the web infrastructure %S Search 1 %A Klein, Martin %A Nelson, Michael L. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 59-68 %K digital preservation, search engines, web page discovery %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816133 %X Missing web pages (pages that return the 404 "Page Not Found error) are part of the browsing experience. The manual use of search engines to rediscover missing pages can be frustrating and unsuccessful. We compare four automated methods for rediscovering web pages. We extract the page's title, generate the page's lexical signature (LS), obtain the page's tags from the bookmarking website delicious.com and generate a LS from the page's link neighborhood. We use the output of all methods to query Internet search engines and analyze their retrieval performance. Our results show that both LSs and titles perform fairly well with over 60% URIs returned top ranked from Yahoo!. However, the combination of methods improves the retrieval performance. Considering the complexity of the LS generation, querying the title first and in case of insufficient results querying the LSs second is the preferable setup. This combination accounts for more than 75% top ranked URIs. %M C.DL.10.69 %T Search behaviors in different task types %S Search 1 %A Liu, Jingjing %A Cole, Michael J. %A Liu, Chang %A Bierig, Ralf %A Gwizdka, Jacek %A Belkin, Nicholas J. %A Zhang, Jun %A Zhang, Xiangmin %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 69-78 %K eye tracking, information retrieval, personalization, task type, user behavior %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816134 %X Personalization of information retrieval tailors search towards individual users to meet their particular information needs by taking into account information about users and their contexts, often through implicit sources of evidence such as user behaviors. Task types have been shown to influence search behaviors including usefulness judgments. This paper reports on an investigation of user behaviors associated with different task types. Twenty-two undergraduate journalism students participated in a controlled lab experiment, each searching on four tasks which varied on four dimensions: complexity, task product, task goal and task level. Results indicate regular differences associated with different task characteristics in several search behaviors, including task completion time, decision time (the time taken to decide whether a document is useful or not), and eye fixations, etc. We suggest these behaviors can be used as implicit indicators of the user's task type. %M C.DL.10.79 %T Exploiting time-based synonyms in searching document archives %S Search 1 %A Kanhabua, Nattiya %A Nørvåg, Kjetil %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 79-88 %K query expansion, synonym detection, temporal search %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816135 %X Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. %M C.DL.10.89 %T Using word sense discrimination on historic document collections %S Historical text & documents %A Tahmasebi, Nina %A Niklas, Kai %A Theuerkauf, Thomas %A Risse, Thomas %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 89-98 %K OCR error impact, historic document collections, information extraction, word sense discrimination %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816137 %X Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digital archive. Algorithms for word sense discrimination have been developed while keeping today's language in mind and have thus been evaluated on well selected, modern datasets. The quality of the word senses found in the discrimination step has a large impact on the detection of language evolution. Therefore, as a first step, we verify that word sense discrimination can successfully be applied to digitized historic documents and that the results correctly correspond to word senses. Because accessibility of digitized historic collections is influenced also by the quality of the optical character recognition (OCR), as a second step we investigate the effects of OCR errors on word sense discrimination results. All evaluations in this paper are performed on The Times Archive, a collection of newspaper articles from 1785-1985. %M C.DL.10.99 %T Chinese calligraphy specific style rendering system %S Historical text & documents %A Zhang, Zhenting %A Wu, Jiangqin %A Yu, Kai %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 99-108 %K rule-base stroke deformation, special nine grid, specific style rendering %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816138 %X Manifesting the handwriting characters with the specific style of a famous artwork is fascinating. In this paper, a system is built to render the user's handwriting characters with a specific style. A stroke database is established firstly. When rendering a character, the strokes are extracted and recognized, then proper radicals and strokes are filtered, finally these strokes are deformed and the result is generated. The Special Nine Grid (SNG) is presented to help recognize radicals and strokes. The Rule-base Stroke Deformation Algorithm (RSDA) is proposed to deform the original strokes according to the handwriting strokes. The rendering result manifests the specific style with high quality. It is feasible for people to generate the tablet or other artworks with the proposed system. %M C.DL.10.109 %T Translating handwritten bushman texts %S Historical text & documents %A Williams, Kyle %A Suleman, Hussein %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 109-118 %K CBIR, cultural heritage preservation, digital libraries, handwritten manuscripts, image processing, information retrieval %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816139 %X The Bleek and Lloyd Collection is a collection of artefacts documenting the life and language of the Bushman people of southern Africa in the 19th century. Included in this collection is a handwritten dictionary that contains English words and their corresponding |xam Bushman language translations. This dictionary allows for the manual translation of |xam words that appear in the notebooks of the Bleek and Lloyd collection. This, however, is not practical due to the size of the dictionary, which contains over 14000 entries. To solve this problem a content-based image retrieval system was built that allows for the selection of a |xam word from a notebook and returns matching words from the dictionary. The system shows promise with some search keys returning relevant results. %M C.DL.10.119 %T Do Wikipedians follow domain experts?: a domain-specific study on Wikipedia knowledge building %S Collaborative information environments %A Zhang, Yi %A Sun, Aixin %A Datta, Anwitaman %A Chang, Kuiyu %A Lim, Ee-Peng %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 119-128 %K Wikipedia, contributing behavior, knowledge building %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816141 %X Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB -- despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. %M C.DL.10.129 %T Spatiotemporal mapping of Wikipedia concepts %S Collaborative information environments %A Popescu, Adrian %A Grefenstette, Gregory %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 129-138 %K concept, cultural, interaction, multilinguism, spatial-temporal, wikipedia %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816142 %X Space and time are important dimensions in the representation of a large number of concepts. However there exists no available resource that provides spatiotemporal mappings of generic concepts. Here we present a link-analysis based method for extracting the main locations and periods associated to all Wikipedia concepts. Relevant locations are selected from a set of geotagged articles, while relevant periods are discovered using a list of people with associated life periods. We analyze article versions over multiple languages and consider the strength of a spatial/temporal reference to be proportional to the number of languages in which it appears. To illustrate the utility of the spatiotemporal mapping of Wikipedia concepts, we present an analysis of cultural interactions and a temporal analysis of two domains. The Wikipedia mapping can also be used to perform rich spatiotemporal document indexing by extracting implicit spatial and temporal references from texts. %M C.DL.10.139 %T Crowdsourcing the assembly of concept hierarchies %S Collaborative information environments %A Eckert, Kai %A Niepert, Mathias %A Niemann, Christof %A Buckner, Cameron %A Allen, Colin %A Stuckenschmidt, Heiner %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 139-148 %K crowdsourcing, similarity, thesaurus learning %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816143 %X The "wisdom of crowds" is accomplishing tasks that are cumbersome for individuals yet cannot be fully automated by means of specialized computer algorithms. One such task is the construction of thesauri and other types of concept hierarchies. Human expert feedback on the relatedness and relative generality of terms, however, can be aggregated to dynamically construct evolving concept hierarchies. The InPhO (Indiana Philosophy Ontology) project bootstraps feedback from volunteer users unskilled in ontology design into a precise representation of a specific domain. The approach combines statistical text processing methods with expert feedback and logic programming to create a dynamic semantic representation of the discipline of philosophy. In this paper, we show that results of comparable quality can be achieved by leveraging the workforce of crowdsourcing services such as the Amazon Mechanical Turk (AMT). In an extensive empirical study, we compare the feedback obtained from AMT's workers with that from the InPhO volunteer users providing an insight into qualitative differences of the two groups. Furthermore, we present a set of strategies for assessing the quality of different users when gold standards are missing. We finally use these methods to construct a concept hierarchy based on the feedback acquired from AMT workers. %M C.DL.10.149 %T A user-centered design of a personal digital library for music exploration %S Personal collections %A Bainbridge, David %A Novak, Brook J. %A Cunningham, Sally Jo %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 149-158 %K music composition, personal digital music library, spatial hypermedia %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816145 %X We describe the evaluation of a personal digital library environment designed to help musicians capture, enrich and store their ideas using a spatial hypermedia paradigm. The target user group is musicians who primarily use audio and text for composition and arrangement, rather than with formal music notation. Using the principle of user-centered design, the software implementation was guided by a diary study involving nine musicians which suggested five requirements for the software to support: capturing, overdubbing, developing, storing, and organizing. Moreover, the underlying spatial data-model was exploited to give raw audio compositions a hierarchical structure, and -- to aid musicians in retrieving previous ideas -- a search facility is available to support both query by humming and text-based queries. A user evaluation of the completed design with eleven subjects indicated that musicians, in general, would find the hypermedia environment useful for capturing and managing their moments of musical creativity and exploration. More specifically they would make use of the query by humming facility and the hierarchical track organization, but not the overdubbing facility as implemented. %M C.DL.10.159 %T Improving mood classification in music digital libraries by combining lyrics and audio %S Personal collections %A Hu, Xiao %A Downie, J. Stephen %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 159-168 %K audio features, feature fusion, lyric sentiment analysis, music digital libraries, music mood classification, supervised learning %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816146 %X Mood is an emerging metadata type and access point in music digital libraries (MDL) and online music repositories. In this study, we present a comprehensive investigation of the usefulness of lyrics in music mood classification by evaluating and comparing a wide range of lyric text features including linguistic and text stylistic features. We then combine the best lyric features with features extracted from music audio using two fusion methods. The results show that combining lyrics and audio significantly outperformed systems using audio-only features. In addition, the examination of learning curves shows that the hybrid lyric + audio system needed fewer training samples to achieve the same or better classification accuracies than systems using lyrics or audio singularly. These experiments were conducted on a unique large-scale dataset of 5,296 songs (with both audio and lyrics for each) representing 18 mood categories derived from social tags. The findings push forward the state-of-the-art on lyric sentiment analysis and automatic music mood classification and will help make mood a practical access point in music digital libraries. %M C.DL.10.169 %T Visualizing personal digital collections %S Personal collections %A Xu, Weijia %A Esteva, Maria %A Jain, Suyog Dott %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 169-172 %K database applications, digital collections, information visualization, personal information management (PIM), treemap %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816147 %X This paper describes the use of relational database management system (RDBMS) and treemap visualization to represent and analyze a group of personal digital collections created in the context of work and with no external metadata. We evaluated the visualization vis a vis the results of previous personal information management (PIM) studies. We suggest that this visualization supports analysis that allow understanding PIM practices overtime. %M C.DL.10.173 %T Interpretation of web page layouts by blind users %S Personal collections %A Francisco-Revilla, Luis %A Crow, Jeff %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 173-176 %K assistive technology, blind users, web page layouts %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816148 %X Digital libraries must support assistive technologies that allow people with disabilities such as blindness to use, navigate and understand their documents. Increasingly, many documents are Web-based and present their contents using complex layouts. However, approaches that translate two-dimensional layouts to one-dimensional speech produce a very different user experience and loss of information. To address this issue, we conducted a study of how blind people navigate and interpret layouts of news and shopping Web pages using current assistive technology. The study revealed that blind people do not parse Web pages fully during their first visit, and that they can miss important parts. The study also provided insights for improving assistive technologies. %M C.DL.10.177 %T Supporting document triage via annotation-based multi-application visualizations %S Visualization %A Bae, Soonil %A Kim, DoHyoung %A Meintanis, Konstantinos %A Moore, J. Michael %A Zacchi, Anna %A Shipman, Frank %A Hsieh, Haowei %A Marshall, Catherine C. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 177-186 %K document triage, multi-application user modeling, visualization %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816150 %X For open-ended information tasks, users must sift through many potentially relevant documents, a practice we refer to as document triage. Normally, people perform triage using multiple applications in concert: a search engine interface presents lists of potentially relevant documents; a document reader displays their contents; and a third tool -- a text editor or personal information management application -- is used to record notes and assessments. To support document triage, we have developed an extensible multi-application architecture that initially includes an information workspace and a document reader. An Interest Profile Manager infers users' interests from their interactions with the triage applications, coupled with the characteristics of the documents they are interacting with. The resulting interest profile is used to generate visualizations that direct users' attention to documents or parts of documents that match their inferred interests. The novelty of our approach lies in the aggregation of activity records across applications to generate fine-grained models of user interest. %M C.DL.10.187 %T Flexible access to photo libraries via time, place, tags, and visual features %S Visualization %A Girgensohn, Andreas %A Shipman, Frank %A Turner, Thea %A Wilcox, Lynn %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 187-196 %K geographic data, photo libraries, photo retrieval, similarity criteria, tagged photos, visual similarity %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816151 %X Photo libraries are growing in quantity and size, requiring better support for locating desired photographs. MediaGLOW is an interactive visual workspace designed to address this concern. It uses attributes such as visual appearance, GPS locations, user-assigned tags, and dates to filter and group photos. An automatic layout algorithm positions photos with similar attributes near each other to support users in serendipitously finding multiple relevant photos. In addition, the system can explicitly select photos similar to specified photos. We conducted a user evaluation to determine the benefit provided by similarity layout and the relative advantages offered by the different layout similarity criteria and attribute filters. Study participants had to locate photos matching probe statements. In some tasks, participants were restricted to a single layout similarity criterion and filter option. Participants used multiple attributes to filter photos. Layout by similarity without additional filters turned out to be one of the most used strategies and was especially beneficial for geographical similarity. Lastly, the relative appropriateness of the single similarity criterion to the probe significantly affected retrieval performance. %M C.DL.10.197 %T Interactively browsing movies in terms of action, foreshadowing and resolution %S Visualization %A Greenhill, Stewart %A Adams, Brett %A Venkatesh, Svetha %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 197-200 %K compression, media aesthetics, video browsing %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816152 %X We describe a novel video player that uses Temporal Semantic Compression (TSC) to present a compressed summary of a movie. Compression is based on tempo which is derived from film rhythms. The technique identifies periods of action, drama, foreshadowing and resolution, which can be mixed in different amounts to vary the kind of summary presented. The compression algorithm is embedded in a video player, so that the summary can be interactively recomputed during playback. %M C.DL.10.201 %T Timeline interactive multimedia experience (time): on location access to aggregate event information %S Visualization %A Crow, Jeff %A Whitworth, Eryn %A Wongsa, Ame %A Francisco-Revilla, Luis %A Pendyala, Swati %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 201-204 %K complex scheduled events, events, multi-touch, planning, social media, timeline %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816153 %X Attending a complex scheduled social event, such as a multi-day music festival, requires a significant amount of planning before and during its progression. Advancements in mobile technology and social networks enable attendees to contribute content in real-time that can provide useful information to many. Currently access to and presentation of such information is challenging to use during an event. The Timeline Interactive Multimedia Experience (TIME) system aggregates information posted to multiple social networks and presents the flow of information in a multi-touch timeline interface. TIME was designed to be placed on location to allow real-time access to relevant information that helps attendees to make plans and navigate their crowded surroundings. %M C.DL.10.205 %T Domain-specific iterative readability computation %S Data mining %A Zhao, Jin %A Kan, Min-Yen %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 205-214 %K domain-specific information retrieval, graph-based algorithm, iterative computation, readability measure %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816155 %X We present a new algorithm to measure domain-specific readability. It iteratively computes the readability of domain-specific resources based on the difficulty of domain-specific concepts and vice versa, in a style reminiscent of other bipartite graph algorithms such as Hyperlink-Induced Topic Search (HITS) and the Stochastic Approach for Link-Structure Analysis (SALSA). While simple, our algorithm outperforms standard heuristic measures and remains competitive among supervised-learning approaches. Moreover, it is less domain-dependent and portable across domains as it does not rely on an annotated corpus or expensive expert knowledge that supervised or domain-specific methods require. %M C.DL.10.215 %T Evaluating topic models for digital libraries %S Data mining %A Newman, David %A Noh, Youn %A Talley, Edmund %A Karimi, Sarvnaz %A Baldwin, Timothy %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 215-224 %K evaluation, topic models, topic quality, user studies %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816156 %X Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and interpretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how scoring model -- based on pointwise mutual information of word-pair using Wikipedia, Google and MEDLINE as external data sources -- performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. %M C.DL.10.225 %T FRBRization of MARC records in multiple catalogs %S Data mining %A Manguinhas, Hugo Miguel Álvaro %A Freire, Nuno Miguel Antunes %A Borbinha, José Luis Brinquete %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 225-234 %K FRBR, FRBRization, bibliographic records, multilingual catalogs %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816157 %X This paper addresses the problem of using the FRBR model to support the presentation of results. It describes a service implementing new algorithms and techniques for transforming existing MARC records into the FRBR model for this specific purpose. This work was developed in the context of the TELPlus project and processed 100,000 bibliographic and authority records from multilingual catalogs of 12 European countries. %M C.DL.10.235 %T Exposing the hidden web for chemical digital libraries %S Infrastructure & systems %A Tönnies, Sascha %A Köhncke, Benjamin %A Koepler, Oliver %A Balke, Wolf-Tilo %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 235-244 %K chemical digital collections, digital libraries, hidden web, information extraction, information retrieval, web search %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816159 %X In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more digital collections are available, but the complex query formulation still hampers their intuitive adoption. This is because information seeking in chemical documents is focused on chemical entities, for which current standard search relies on complex structures which are hard to extract from documents. Moreover, although simple keyword searches would often be sufficient, current collections simply cannot be indexed by Web search providers due to the ambiguity of chemical substance names. In this paper we present a framework for automatically generating metadata-enriched index pages for all documents in a given chemical collection. All information is then linked to the respective documents and thus provides an easy to crawl metadata repository promising to open up digital chemical libraries. Our experiments, indexing an open access journal, show that not only the documents can be found using a simple Google search via the automatically created index pages, but also that the quality of the search is much more efficient than fulltext indexing in terms of both precision/recall and performance. Finally, we compare our indexing against a classical structure search and figured out that keyword-based search can indeed solve at least some of the daily tasks in chemical workflows. To use our framework thus promises to expose a large part of the currently still hidden chemical Web, making the techniques employed interesting for chemical information providers like digital libraries and open access journals. %M C.DL.10.245 %T oreChem ChemXSeer: a semantic digital library for chemistry %S Infrastructure & systems %A Li, Na %A Zhu, Leilei %A Mitra, Prasenjit %A Mueller, Karl %A Poweleit, Eric %A Giles, C. Lee %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 245-254 %K ChemXSeer, digital library, metadata extraction, oai-ore, seersuite, semantic web, support vector machines %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816160 %X Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem Chem{sub:x}Seer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository Chem{sub:x} Seer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) (http://www.openarchives.org/ore/ standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents. %M C.DL.10.255 %T BinarizationShop: a user-assisted software suite for converting old documents to black-and-white %S Infrastructure & systems %A Deng, Fanbo %A Wu, Zheng %A Lu, Zheng %A Brown, Michael S. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 255-258 %K binarization, document processing, user-assisted software %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816161 %X Converting a scanned document to a binary format (black and white) is a key step in the digitization process. While many existing binarization algorithms operate robustly for well-kept documents, these algorithms often produce less than satisfactory results when applied to old documents, especially those degraded with stains and other discolorations. For these challenging documents, user assistance can be advantageous in directing the binarization procedure. Many existing algorithms, however, are poorly designed to incorporate user assistance. In this paper, we discuss a software framework, BinarizationShop, that combines a series of binarization approaches that have been tailored to exploit user assistance. This framework provides a practical approach for converting difficult documents to black and white. %M C.DL.10.259 %T Using an ontology and a multilingual glossary for enhancing the nautical archaeology digital library %S Infrastructure & systems %A Monroy, Carlos %A Furuta, Richard %A Castro, Filipe %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 259-262 %K information retrieval, interfaces, multilingual technical manuscripts, nautical archaeology, ship reconstruction, technical documents %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816162 %X Access to materials in digital collections has been extensively studied within digital libraries. Exploring a collection requires customized indices and novel interfaces to allow users new exploration mechanisms. Materials or objects can then be found by way of full-text, faceted, or thematic indexes. There has been a marked interest not only in finding objects in a collection, but in discovering relationships and properties. For example, multiple representations of the same object enable the use of visual aids to augment collection exploration. Depending on the domain and characteristics of the objects in a collection, relationships among components can be used to enrich the process of understanding their contents. In this context, the Nautical Archaeology Digital Library (NADL) includes multilingual textual- and visual-rich objects (shipbuilding treatises, illustrations, photographs, and drawings). In this paper we describe an approach for enhancing access to a collection of ancient technical documents, illustrations, and photographs documenting archaeological excavations. Because of the nature of our collection, we exploit a multilingual glossary along with an ontology. Preliminary tests of our prototype suggest the feasibility of our method for enhancing access to the collection. %M C.DL.10.263 %T In-depth utilization of Chinese ancient maps: a hybrid approach to digitizing map resources in CADAL %S Integration of physical and digital media %A Ye, Zhenchao %A Zhuang, Ling %A Wu, Jiangqin %A Du, Chenyang %A Wei, Baogang %A Zhang, Yin %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 263-272 %K atlases, digital library, image processing, kernel method %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816164 %X Digital map is getting increasingly popular as an intuitive and interactive platform for data presentation recently. Thus applications integrated with digital map have attracted much attention. But no off-the-shelf systems or services could we use if the time span of maps be extended to historical ones. There are a large number of valuable ancient atlases in CADAL digital library. However, they are seldom made use of because the ones which are in image format are not convenient for users to read or search. In this paper, we propose a novel hybrid approach to utilizing these atlases directly and constructing some applications based on ancient maps. We call it CAMAME which means Chinese Ancient Maps Automatic Marking and Extraction. We create a gazetteer to store the geographic information of sites which will be project on the map, then use kernel method to do the regression and correct the estimated results with image processing and local regression methods. The empirical results show that CAMAME is effective and efficient, by which most valuable data in the map images is marked and identified. Some Chinese literary chronicle applications that exhibit ancient literary and related historical information over those digitized atlas resources in CADAL digital library were developed. %M C.DL.10.273 %T The fused library: integrating digital and physical libraries with location-aware sensors %S Integration of physical and digital media %A Buchanan, George R. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 273-282 %K digital libraries, human factors, physical interaction %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816165 %X This paper reports an investigation into the connection of the workspace of physical libraries with digital library services. Using simple sensor technology, we provide focused access to digital resources on the basis of the user's physical context, including the topic of the stacks they are next to, and the content of books on their reading desks. Our research developed the technological infrastructure to support this fused interaction, investigated current patron behavior in physical libraries, and evaluated our system in a user-centred pilot study. The outcome of this research demonstrates the potential utility of the fused library, and provides a starting point for future exploitation. %M C.DL.10.283 %T What humanists want: how scholars use source materials %S Integration of physical and digital media %A Audenaert, Neal %A Furuta, Richard %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 283-292 %K digital humanities, source documents, user studies %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816166 %X Despite the growing prominence of digital libraries as tools to support humanities scholars, little is known about the work practices and needs of these scholars as they pertain to working with source documents. In this paper we present our findings from a formative user study consisting of semi-structured interviews with eight scholars. We find that the use of source materials (by which we mean the original physical documents or digital facsimiles with minimal editorial intervention) in scholarship is not a simple, straight-forward examination of a document in isolation. Instead, scholars study source materials as an integral part of a complex ecosystem of inquiry that seeks to understand both the text being studied and the context in which that text was created, transmitted and used. Drawing examples from our interviews, we address critical questions of why scholars use source documents and what information they hope to gain by studying them. We also briefly summarize key note-taking practices as a means for assessing the potential to design user interfaces that support scholarly work-practices. %M C.DL.10.293 %T Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries %S Search 2 %A Angrosh, M. A. %A Cranefield, Stephen %A Stanger, Nigel %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 293-302 %K citation classification, conditional random fields, linear chain CRFs, sentence classification %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816168 %X Identification of contexts associated with sentences is becoming increasingly necessary for developing intelligent information retrieval systems. This article describes a supervised learning mechanism employing a conditional random field (CRF) for context identification and sentence classification. Specifically, we focus on sentences in related work sections in research articles. Based on a generic rhetorical pattern, a framework for modelling the sequential flow in these sections is proposed. Adopting a generalization strategy, each of these sentences is transformed into a set of features, which forms our dataset. We distinguish between two kinds of features for each of these sentences viz., citation features and sentence features. While an overall accuracy of 96.51% is achieved by using a combination of both citation and sentence features, the use of sentence features alone yields an accuracy of 93.22%. The results also show F-Scores ranging from 0.99 to 0.90 for various classes indicating the robustness of our application. %M C.DL.10.303 %T Can an intermediary collection help users search image databases without annotations? %S Search 2 %A Villa, Robert %A Halvey, Martin %A Joho, Hideo %A Hannah, David %A Jose, Joemon M. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 303-312 %K content-based image retrieval, search strategies %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816169 %X Developing methods for searching image databases is a challenging and ongoing area of research. A common approach is to use manual annotations, although generating annotations can be expensive in terms of time and money, and therefore may not be justified in many situations. Content-based search techniques which extract visual features from image data can be used, but users are typically forced to express their information need using example images, or through sketching interfaces. This can be difficult if no visual example of the information need is available, or when the information need cannot be easily drawn. In this paper, we consider an alternative approach which allows a user to search for images through an intermediate database. In this approach, a user can search using text in the intermediate database as a way of finding visual examples of their information need. The visual examples can then be used to search a database that lacks annotations. Three experiments are presented which investigate this process. The first experiment automatically selects the image queries from the intermediary database; the second instead uses images which have been hand-picked by users. A third experiment, an interactive study, is then presented this study compares the intermediary interface to text search, where we consider text as an upper bound of performance. For this last study, an interface which supports the intermediary search process is described. Results show that while performance does not match manual annotations, users are able to find relevant material without requiring collection annotations. %M C.DL.10.313 %T Social network document ranking %S Search 2 %A Gou, Liang %A Zhang, Xiaolong (Luke) %A Chen, Hung-Hsuan %A Kim, Jung-Hyun %A Giles, C. Lee %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 313-322 %K information retrieval, multilevel actor similarity, ranking, social networks %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816170 %X In search engines, ranking algorithms measure the importance and relevance of documents mainly based on the contents and relationships between documents. User attributes are usually not considered in ranking. This user-neutral approach, however, may not meet the diverse interests of users, who may demand different documents even with the same queries. To satisfy this need for more personalized ranking, we propose a ranking framework. Social Network Document Rank (SNDocRank), that considers both document contents and the relationship between a searched and document owners in a social network. This method combined the traditional tf-idf ranking for document contents with out Multi-level Actor Similarity (MAS) algorithm to measure to what extent document owners and the searcher are structurally similar in a social network. We implemented our ranking method in simulated video social network based on data extracted from YouTube and tested its effectiveness on video search. The results show that compared with the traditional ranking method like tf-idfs the SNDocRank algorithm returns more relevant documents. More specifically, a searcher can get significantly better results be being in a larger social network, having more friends, and being associated with larger local communities in a social network. %M C.DL.10.323 %T A mathematical framework for modeling and analyzing migration time %S Theory & frameworks %A Luan, Feng %A Nygård, Mads %A Mestl, Thomas %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 323-332 %K long-term preservation, migration, performance, process modeling, storage %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816172 %X File format obsolescence has so far been considered the major risk in long-term storage of digital objects. There are, however, growing indications that file transfer may be a real threat as the migration time, i.e., the time required to migrate Petabytes of data, may easily spend years. However, hardware support is usually limited to 3-4 years and a situation can emerge when a new migration has to be started although the previous one is still not finished yet. This paper chooses a process modeling approach to obtain estimates of upper and lower bounds for the required migration time. The advantage is that information about potential bottlenecks can be acquired. Our theoretical considerations are validated by migration tests at the National Library of Norway (NB) as well as at our department. %M C.DL.10.333 %T Digital libraries for scientific data discovery and reuse: from vision to practical reality %S Theory & frameworks %A Wallis, Jillian C. %A Mayernik, Matthew S. %A Borgman, Christine L. %A Pepe, Alberto %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 333-340 %K collaborative research, cyberinfrastructure, data deluge, distributed research, escience %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816173 %X Science and technology research is becoming not only more distributed and collaborative, but more highly instrumented. Digital libraries provide a means to capture, manage, and access the data deluge that results from these research enterprises. We have conducted research on data practices and participated in developing data management services for the Center for Embedded Networked Sensing since its founding in 2002 as a National Science Foundation Science and Technology Center. Over the course of eight years, our digital library strategy has shifted dramatically in response to changing technologies, practices, and policies. We report on the development of several DL systems and on the lessons learned, which include the difficulty of anticipating data requirements from nascent technologies, building systems for highly diverse work practices and data types, the need to bind together multiple single-purpose systems, the lack of incentives to manage and share data, the complementary nature of research and development in understanding practices, and sustainability. %M C.DL.10.341 %T Ensemble PDP-8: eight principles for distributed portals %S Theory & frameworks %A Fox, Edward A. %A Chen, Yinlin %A Akbar, Monika %A Shaffer, Clifford A. %A Edwards, Stephen H. %A Brusilovsky, Peter %A Garcia, Dan %A Delcambre, Lois %A Decker, Felicia %A Archer, David %A Furuta, Richard %A Shipman, Frank %A Carpenter, Stephen %A Cassel, Lillian %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 341-344 %K adaptive education system, distributed portal, ontology, superimposed information %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816174 %X Ensemble, the National Science Digital Library (NSDL) Pathways project for Computing, builds upon a diverse group of prior NSDL, DL-I, and other projects. Ensemble has shaped its activities according to principles related to design, development, implementation, and operation of distributed portals. Here we articulate 8 key principles for distributed portals (PDPs). While our focus is on education and pedagogy, we expect that our experiences will generalize to other digital library application domains. These principles inform, facilitate, and enhance the Ensemble R&D and production activities. They allow us to provide a broad range of services, from personalization to coordination across communities. The eight PDPs can be briefly summarized as: (1) Articulation across communities using ontologies. (2) Browsing tailored to collections. (3) Integration across interfaces and virtual environments. (4) Metadata interoperability and integration. (5) Social graph construction using logging and metrics. (6) Superimposed information and annotation integrated across distributed systems. (7) Streamlined user access with IDs. (8) Web 2.0 multiple social network system interconnection. %M C.DL.10.345 %T Discovering Australia's research data %S Theory & frameworks %A Kethers, Stefanie %A Shen, Xiaobin %A Treloar, Andrew E. %A Wilkinson, Ross G. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 345-348 %K Australian research data commons, e-research, metadata %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816175 %X Access to data crucial to research is often slow and difficult. When research problems cross disciplinary boundaries, problems are exacerbated. This paper argues that it is important to make it easier to find and access data that might be found in an institution, in a disciplinary data store, in a government department, or held privately. We explore how to meet ad hoc needs that cannot easily be supported by a disciplinary ontology, and argue that web pages that describe data collections with rich links and rich text are valuable. We describe the approach followed by the Australian National Data Service (ANDS) in making such pages available. Finally, we discuss how we plan to evaluate this approach. %M C.DL.10.349 %T This is what i'm doing and why: reflections on a think-aloud study of dl users' information behaviour %S Social aspects %A Makri, Stephann %A Blandford, Ann %A Cox, Anna L. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 349-352 %K methodology, reflection, think-aloud, user study %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816177 %X Many user-centred studies of digital libraries (DLs) include a think-aloud element and are usually conducted with the purpose of identifying usability issues related to the DLs used or understanding aspects of users' information behaviour. However, few of these studies present detailed accounts of how their think-aloud data was collected and analysed or reflect on this process. In this paper, we discuss and reflect on the decisions made when planning and conducting a think-aloud study of lawyers' interactive information behaviour. Our discussion is framed by Blandford et al.'s PRET A Rapporter ('ready to report') framework -- a framework that can be used to plan, conduct and describe user-centred studies of DL use from an information work perspective. %M C.DL.10.353 %T Customizing science instruction with educational digital libraries %S Social aspects %A Sumner, Tamara %Q CCS Team %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 353-356 %K customizing instruction, differentiated instruction, educational digital libraries, personalization, science education, software infrastructure for teachers %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816178 %X The Curriculum Customization Service enables science educators to customize their instruction with interactive digital library resources. Preliminary results from a field trial with 124 middle and high school teachers suggest that the Service offers a promising model for embedding educational digital libraries into teaching practices and for supporting teachers to integrate customizing into their curriculum planning. %M C.DL.10.357 %T Impact and prospect of social bookmarks for bibliographic information retrieval %S Social aspects %A Seki, Kazuhiro %A Qin, Huawei %A Uehara, Kuniaki %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 357-360 %K controlled vocabulary, folksonomy, free keywords, subject headings %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816179 %X This paper presents our ongoing study of the current/future impact of social bookmarks (or social tags) on information retrieval (IR). Our main research question asked in the present work is "How are social tags compared with conventional, yet reliable manual indexing from the viewpoint of IR performance?". To answer the question, we look at the biomedical literature and begin with examining basic statistics of social tags from CiteULike in comparison with Medical Subject Headings (MeSH) annotated in the Medline bibliographic database. Then, using the data, we conduct various experiments in an IR setting, which reveals that social tags work complementarily with MeSH and that retrieval performance would improve as the coverage of CiteULike grows. %M C.DL.10.361 %T Merging metadata: a sociotechnical study of crosswalking and interoperability %S Social aspects %A Khoo, Michael %A Hall, Catherine %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 361-364 %K Dublin core, crosswalk, interoperability, metadata, operations, organizational knowledge, organizations, sociotechnical %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816180 %X Digital library interoperability relies on the use of a common metadata format. However, implementing a common metadata format among multiple digital libraries is not always a straightforward exercise. This paper reviews some of the metadata issues that arose during the merger of two digital libraries, the Internet Public Library and the Librarian's Internet Index. As part of the merger, each library's metadata was crosswalked to Dublin Core. This required considerable work. A sociotechnical analysis suggests that the metadata for each library had been shaped in complex ways over time by local factors, and that this complexity negatively impacted the efficiency of the crosswalk. Some implications of this finding for digital library interoperability are discussed. %M C.DL.10.365 %T Emulation based services in digital preservation %S Digital preservation %A Rechert, Klaus %A von Suchodoletz, Dirk %A Welte, Randolph %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 365-368 %K digital library, digital preservation, emulation, interactive software, long-term access %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816182 %X The creation of most digital objects occurs solely in interactive graphical user interfaces which were available at the particular time period. Archiving and preservation organizations are posed with large amounts of such objects of various types. At some point they will need to process these automatically to make them available to their users or convert them to a commonly used format. A substantial problem is to provide a wide range of different users with access to ancient environments and to allow using the original environment for a given object. We propose an abstract architecture for emulation services in digital preservation to provide remote user interfaces to emulation over computer networks without the need to install additional software components. Furthermore, we describe how these ideas can be integrated in a framework of web services for common preservation tasks like viewing or migrating digital objects. %M C.DL.10.369 %T Many-to-many information connection connections in a distributed digital library portal %S Posters %A Cassel, Lillian N. %A Fox, Edward A. %A Furuta, Richard %A Delcambre, Lois M. L. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 369-370 %K NSDL, digital library, distributed portal %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816184 %X The Ensemble computing education portal is part of the US NSF's National Science Digital Library (NSDL). The underlying assumption in Ensemble's design is that people will not come just because we build something new. The information must be available from wherever potential users are. This poster describes early efforts to provide multiple community oriented entry points to multiple sources relevant to computing educators. %M C.DL.10.371 %T SPIRO-V: a collaborative approach to controlled vocabularies gathering and management %S Posters %A Huang, Lina %A Deshmukh, Rahul A. %A Mostafa, Javed %A Greenberg, Jane %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 371-372 %K clinical study, controlled vocabulary construction %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816185 %X This paper describes SPIRO-V, a collaborative controlled vocabulary development system integrating automatic and manual approaches for domain-specific vocabulary acquisition, and leveraging the knowledge of field experts. %M C.DL.10.373 %T Generating citation digests for scientific publications %S Posters %A Easty, Richard %A Nikolov, Nikolay %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 373-374 %K browser extension, citation contexts, science literature %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816186 %X Science is characterized nowadays by unprecedented growth in the number of publications. Thus it would be helpful if there were a way to summarize the contents of the publications or explain the argumentative relationship between them (e.g. support, further improvement, critique). Such semantic analysis might involve analyzing the citation contexts (the paragraphs where a certain publication is referred to by another publication). Here we present our work on a system that creates the pre-requisites for such analysis by harvesting publications from the web, extracting the contexts from them, and aggregating them into citation digests that are retrieved in the context of user interactions with web sites that mention these publications. %M C.DL.10.375 %T AIRFrame: integrating diverse digital collections in astrobiology %S Posters %A Gazan, Rich %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 375-376 %K astrobiology, collaboration, interdisciplinary science %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816187 %X Astrobiology is an inherently interdisciplinary field concerned with questions of life in the universe. This paper describes the design and ongoing implementation of the Astrobiology Integrative Research Framework (AIRFrame), an open source, ontology-driven information system designed to ingest and analyze heterogeneous inputs of both published and unpublished data, and to identify and illustrate latent connections between research in astrobiology's diverse constituent fields. %M C.DL.10.377 %T A public education tool for tsunami disasters based on walking tours in TDL %S Posters %A Imai, Sayaka %A Kanamori, Yoshinari %A Shuto, Nobuo %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 377-378 %K GPS mobile phones, public education, tsunami digital library %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816188 %X As described in this paper, we proposed a public education tools for Tsunami Disasters based on TDL. %M C.DL.10.379 %T A search engine for Japanese academic papers %S Posters %A Ishita, Emi %A Agata, Teru %A Ikeuchi, Atsushi %A Michiko, Nozue %A Yosuke, Miyata %A Ueda, Shuichi %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 379-380 %K PDF, academic papers, search engine %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816189 %X A search engine for Japanese academic papers rendered in PDF is described. Evaluation results indicate fewer zero-result queries and higher precision in the top-10 documents than was obtained for the same Japanese queries using Google Scholar or Scirus. %M C.DL.10.381 %T Analyzing viewing patterns while reading picture books %S Posters %A Ishita, Emi %A Mine, Shinji %A Kunimoto, Chihiro %A Shiozaki, Junko %A Kurata, Keiko %A Ueda, Shuichi %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 381-382 %K eye tracking, viewing patterns %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816190 %X We examine the eye movements of children who can read books on their own as they read printed picture books. Our analysis focuses on two points; 1) Is it the pictures or the text that they most frequently gaze at?, and 2) In what sequence do they read picture books? Our results indicate that children look at both text and pictures, but that there are large variations in the ratio of viewing time for each child. Both circular and linear patterns are found in the sequence of eye movements. %M C.DL.10.383 %T Personalizing information retrieval for people with different levels of topic knowledge %S Posters %A Liu, Jingjing %A Belkin, Nicholas J. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 383-384 %K decision time, dwell time, personalization of IR, topic knowledge %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816191 %M C.DL.10.385 %T Rethinking preservation validation with the preserved object and repository risks ontology (PORRO) %S Posters %A McHugh, Andrew %A Lalmas, Mounia %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 385-386 %K digital preservation, ontologies, validation %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816192 %X For securing digital longevity, the processes of preservation planning and evaluation are fundamentally implicit and share similar complexity. Means are required for the identification, documentation and association of those properties of data, representation and management mechanisms that in combination lend value, facilitate interaction and influence the preservation process. These properties may be almost limitless in terms of diversity, but are integral to the establishment of classes of risk exposure, and the planning and deployment of appropriate preservation strategies. We present PORRO, an ontology based approach for documenting objects, repositories and risk information, intended to support preservation decision making and evaluation. %M C.DL.10.387 %T ForeCite: towards a reader-centric scholarly digital library %S Posters %A Nguyen, Thuy Dung %A Kan, Min-Yen %A Dang, Dinh-Trung %A Hänse, Markus %A Hong, Ching Hoi Andy %A Luong, Minh-Thang %A Gozali, Jesse Prabawa %A Sugiyama, Kazunari %A Tang, Yee Fan %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 387-388 %K ForeCite, argumentative zoning, document logical structure, scholarly digital library %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816193 %X We present ForeCite (FC), a prototype reader-centric digital library that supports the scholar in using scholarly documents. FC integrates three user interfaces: a bibliometric component, a document reader and annotation system, and a bibliographic management application. %M C.DL.10.389 %T An architecture for a distributed digital library from the desktop up: the fascinator %S Posters %A Sefton, Peter %A Dickinson, Duncan %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 389-390 %K information systems, repositories, research, search %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816194 %X This poster describes the architecture of a new kind of digital repository service that includes components that run on desktop computers, designed to close the gap between Institutional Repositories (IRs) and the day-to-day electronic work environment used by researchers, and to address the too-often heard cry from repository managers of "we built it but they didn't come. The team at the Australian Digital Futures Institute are working with researchers to provide software that can (a) index and expose the research data content on their hard disks (b) extract metadata from files (c) automatically process data according to highly configurable workflows including producing web-ready renditions of research objects including documents, domain specific data visualizations (such as chemical molecules) and converting video and images so that they may be easily previewed. The architecture is inspired by the success of consumer software in two ways; the way entertainment programs organize content via faceted browse and search interfaces using embedded metadata, and the way photographic software allows content to be grouped into collections and pushed to online services, which are essentially repositories. %M C.DL.10.391 %T A digital library architecture supporting massive small files and efficient replica maintenance %S Posters %A Shen, Chunhui %A Lu, Weiming %A Wu, Jiangqin %A Wei, Baogang %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 391-392 %K digital libraries, distributed system, replication, small file %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816195 %X In this paper, we presented a service infrastructure based on distributed file system for massive storage in digital library. In addition, we addressed the small-file problem by merging small files into big ones, and proposed a novel dynamic replica number adjustment scheme to ensure the maximal availability and reliability in a limited storage space. %M C.DL.10.393 %T Text clustering with important words using normalization %S Posters %A Wu, Shunyao %A Wang, Jinlong %A Vu, Huy Quan %A Li, Gang %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 393-394 %K document clustering, important words, normalization %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816196 %X Important words, which usually exist in part of Title, Subject and Keywords, can briefly reflect the main topic of a document. In recent years, it is a common practice to exploit the semantic topic of documents and utilize important words to achieve document clustering, especially for short texts such as news articles. This paper proposes a novel method to extract important words from Subject and Keywords of articles, and then partition documents only with those important words. Considering the fact that frequencies of important words are usually low and the scale matrix dataset for important words is small, a normalization method is then proposed to normalize the scale dataset so that more accurate results can be achieved by sufficiently exploiting the limited information. The experiments validate the effectiveness of our method. %M C.DL.10.395 %T Liquid journals: scientific journals in the Web 2.0 era %S Demonstrations %A Baez, Marcos %A Mussi, Alejandro %A Casati, Fabio %A Birukou, Aliaksandr %A Marchese, Maurizio %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 395-396 %K Web, academic journals, enhanced search %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816198 %X In this demo we introduce a platform and a model of journal in the age of the Web called liquid journal. The goal of the model (and of the supporting platform) is to disseminate knowledge in the best possible way while also supporting scientists in the credit attribution. In a nutshell, liquid journals are collections of "interesting" links to scientific contributions, such as papers, blogs, datasets, that are related to certain topics. The content gets to the journal either by querying both conventional and non conventional sources on the Web or manually by the group of editors. Liquid journals combines depth and breath in bringing a wider spectrum of scientific contributions from different communities, while also focusing editors' and readers' attention on the things they care about. The demo illustrates the features and benefits of the proposed platform. %M C.DL.10.397 %T Multiple sources with multiple portals: a demonstration of the ensemble computing portal in second life %S Demonstrations %A Carpenter, B. Stephen, II %A Furuta, Richard %A Shipman, Frank %A Huie, Allison %A Pogue, Daniel %A Fox, Edward A. %A Lee, Spencer %A Brusilovsky, Peter %A Cassel, Lillian %A Delcambre, Lois %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 397-398 %K computing portal, ensemble, second life, virtual worlds %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816199 %X This demonstration is an overview of our Ensemble pathway project with group members on-location at the conference and in the virtual world of Second Life from remote locations providing a live walk-through tour of our project online. This approach allows the demonstration to extend beyond the allocated conference session as a means to attract people to JCDL/ICADL. %M C.DL.10.399 %T Capturing and curating published data %S Demonstrations %A DiLauro, Tim %A Cyzyk, Mark %A Metsger, Elliot %A Patton, Mark %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 399-400 %K OAI-ORE, data curation, data publication, data services, scholarly communication %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816200 %X Verifiability and reproducibility are core tenets of the scholarly communication process. For many scientific publications, however, it is often the case that supporting datasets are not preserved, even when the article text is. And when they are, it is usually as a collection of files without relationships amongst one another or to the articles with which they are associated. There are some existing approaches that attempt to link datasets with articles after the fact (e.g., NED), but they are relatively few and involve substantial human intervention. The Digital Research and Curation Center in the Johns Hopkins University Sheridan Libraries, in conjunction with its partners has developed a proof-of-concept system that demonstrates an approach to capturing datasets during the process of submitting the associated article. As part of this process, linkages are established between the datasets and the article. %M C.DL.10.401 %T OntoFrame S3: academic research information portal service using semantic web technologies and linguistic knowledge %S Demonstrations %A Lee, Seungwoo %A Lee, Mikyoung %A Kim, Pyung %A Jung, Hanmin %A Sung, Won-Kyung %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 401-402 %K academic research information service, ontology, reasoning, semantic word network %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816201 %X In this paper, we show how Semantic Web technologies can be used for information connection and fusion in academic research information service and empowered by linguistic knowledge. %M C.DL.10.403 %T Entertainment history museums in virtual worlds: video game and music preservation in second life %S Demonstrations %A Lee, Spencer %A Willis, Bradley %A Bourne, Joseph S., Jr. %A Fox, Edward A. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 403-404 %K 3D, digital preservation, entertainment, game, history, music, second life, virtual worlds %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816202 %X This research explores and demonstrates the use of Second Life (the popular 3D virtual world) for the purpose of digitally preserving various aspects of video game and music history. Physical game interfaces like joysticks, advertisements used for games, and famous game characters and cultural icons over the history are displayed and preserved in multiple video game exhibits for different eras. Selected game characters are digitally recreated in 3D format as Second Life avatar appearances. Historical changes of musical instruments, musicians, and genres are displayed and preserved likewise. Selected musical instruments are digitally recreated as 3D models playing their real sounds. Some of them will be available for the visitors to play in basic ways. %M C.DL.10.405 %T Integrating Greenstone with an interactive map visualizer %S Demonstrations %A McIntosh, Sam %A Bainbridge, David %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 405-406 %K digital library integration, interactive map visualizer %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816203 %X This extended abstract describes recent work in combining interactive map functionality with the Greenstone 3 digital library software research framework. %M C.DL.10.407 %T Subject metadata support powered by Maui %S Demonstrations %A Medelyan, Olena %A Perrone, Vye %A Witten, Ian H. %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 407-408 %K keyword extraction, metadata extraction, subject heading extraction, web interface %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816204 %M C.DL.10.409 %T Recommender system for MIR research community %S Demonstrations %A Yu, Yi %A Oria, Vincent %A Downie, J. Stephen %B JCDL'10: Proceedings of the 2010 Joint International Conference on Digital Libraries %D 2010-06-21 %P 409-410 %K ISMIR, music-IR, recommender systems, social networks %* (c) Copyright 2010 ACM %W http://doi.acm.org/10.1145/1816123.1816205 %X In this demonstration, we show a recommender system for the Music Information Retrieval (MIR) research community. We extract the key topics and tags by analyzing the ten-year cumulative ISMIR proceedings, and recommend papers and research colleagues to users in an interactive way.