TPDL 2013: Proceedings of the International Conference on Theory and Practice of Digital Libraries

Fullname:TPDL 2013: Research and Advanced Technology for Digital Libraries: International Conference on Theory and Practice of Digital Libraries
Editors:Trond Aalberg; Christos Papatheodorou; Milena Dobreva; Giannis Tsakonas; Charles J. Farrugia
Location:Valletta, Malta
Dates:2013-Sep-22 to 2013-Sep-26
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 8092
Standard No:DOI: 10.1007/978-3-642-40501-3 hcibib: TPDL13; ISBN: 978-3-642-40500-6 (print), 978-3-642-40501-3 (online)
Links:Online Proceedings | Conference Home Page (defunct)
  1. Conceptual Models and Formal Issues
  2. Aggregation and Archiving
  3. User Behavior
  4. Digital Curation
  5. Mining and Extraction
  6. Architectures and Interoperability
  7. Interfaces to Digital Libraries
  8. Semantic Web
  9. Information Retrieval and Browsing
  10. Preservation
  11. Posters
  12. Demos
  13. Panels
  14. Tutorials

Conceptual Models and Formal Issues

Sustainability of Digital Libraries: A Conceptual Model BIBAKFull-Text 1-12
  Gobinda G. Chowdhury
Major factors related to the economic, social and environmental sustainability of digital libraries have been discussed. Some research in digital information systems and services in general, and digital libraries in particular, have been discussed to illustrate different issues of sustainability. Based on these discussions the paper, for the first time, proposes a conceptual model and a theoretical research framework for sustainable digital libraries. It shows that the sustainable business models to support digital libraries should also support equitable access supported by specific design and usability guidelines that facilitate easier, better and cheaper access, support the personal, institutional and social culture of users, and at the same time conform with the policy and regulatory frameworks of the respective regions, countries and institutions.
Keywords: digital libraries; sustainability; social sustainability; economic sustainability; environmental sustainability
Quality Assessment in Crowdsourced Indigenous Language Transcription BIBAKFull-Text 13-22
  Ngoni Munyaradzi; Hussein Suleman
The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. In this paper, a crowdsourcing method is proposed to transcribe the manuscripts, where non-expert volunteers transcribe pages of the notebooks using an online tool. Experiments were conducted to determine the quality and consistency of transcriptions. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers.
Keywords: crowdsourcing; transcription; cultural heritage
Defining Digital Library BIBAKFull-Text 23-28
  Armand Brahaj; Matthias Razum; Julia Hoxha
This paper reflects on the range of the definitions of digital libraries demonstrating their extent. We analyze a number of definitions through a simplified intensional definition method, through which we exploit the nature of the definitions by analyzing their respective genera and attributes. The goal of this paper is to provide a synthesis of the works related to definitions of digital library, giving a fine-grained comparative approach on these definitions. We conclude that, although there are a large number of definitions, they are defined in overlapping families and attributes, and an inclusive definition is possible.
Keywords: Digital Library; Definition; Evaluation of Digital Libraries
E-Books in Swedish Public Libraries: Policy Implications BIBAKFull-Text 29-34
  Elena Maceviciute; Tom D. Wilson
The aims of the paper are: review the situation of e-books delivery in the Swedish public libraries (as it looked at the end of 2012); identify the barriers that public libraries encounter in providing access to e-books; highlight the policy-related problems of e-book provision through public libraries. A survey was carried out in October, 2012 of all public libraries in Sweden. 291 questionnaires were issued. 185 were completed, response rate was 63.3%. The provision of an e-book service has arisen as a result of either demand or an ideological belief that the ethos of democratic values and equality of access requires libraries to offer material in all media. Librarians find the situation of e-books provision through libraries unsatisfactory: the provider of titles removes them from the catalogue without warning or explanation, there are too few titles for children and students, and access to popular titles is delayed.
Keywords: e-books; public libraries; information policy; Sweden

Aggregation and Archiving

On the Change in Archivability of Websites Over Time BIBAKFull-Text 35-47
  Mat Kelly; Justin F. Brunelle; Michele C. Weigle; Michael L. Nelson
As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. Because of the evolving schemes of publishing web pages along with the progressive capability of web preservation tools, the archivability of pages on the web has varied over time. In this paper we show that the archivability of a web page can be deduced from the type of page being archived, which aligns with that page's accessibility in respect to dynamic content. We show concrete examples of when these technologies were introduced by referencing mementos of pages that have persisted through a long evolution of available technologies. Identifying these reasons for the inability of these web pages to be archived in the past in respect to accessibility serves as a guide for ensuring that content that has longevity is published using good practice methods that make it available for preservation.
Keywords: Web Archiving; Digital Preservation
Checking Out: Download and Digital Library Exchange for Complex Objects BIBAFull-Text 48-59
  Scott Britell; Lois M. L. Delcambre; Lillian N. Cassel; Richard Furuta
Digital resources are becoming increasingly complex and are being used in diverse ways. For example, educational resources may be cataloged in digital libraries, used offline by educators and students, or used in a learning management system. In this paper we present the notion of "checking out" complex resources from a digital library for offline download or exchange with another digital library or learning management system. We present a mechanism that enables the customization, download and exchange of complex resources. We show how the mechanism also supports digital library and learning management system exchange formats in a generic fashion with minimal overhead. We also show how checkouts grow linearly with respect to the complexity of the resources.
Profiling Web Archive Coverage for Top-Level Domain and Content Language BIBAKFull-Text 60-71
  Ahmed Alsum; Michele C. Weigle; Michael L. Nelson; Herbert Van de Sompel
The Memento aggregator currently polls every known public web archive when serving a request for an archived web page, even though some web archives focus on only specific domains and ignore the others. Similar to query routing in distributed search, we investigate the impact on aggregated Memento TimeMaps (lists of when and where a web page was archived) by only sending queries to archives likely to hold the archived page. We profile twelve public web archives using data from a variety of sources (the web, archives' access logs, and full-text queries to archives) and discover that only sending queries to the top three web archives (i.e., a 75% reduction in the number of queries) for any request produces the full TimeMaps on 84% of the cases.
Keywords: Web archive; query routing; memento aggregator

User Behavior

Selecting Fiction in Library Catalogs: A Gaze Tracking Study BIBAFull-Text 72-83
  Janna Pöntinen; Pertti Vakkari
It is studied how readers explore metadata in book pages when selecting fiction in a traditional and an enriched online catalog for fiction. The associations between attention devoted to metadata elements and selecting an interesting book were analyzed. Eye movements of 30 users selecting fiction for four search tasks were recorded. The results indicate that although participants paid most attention in book pages to content description and keywords, these had no bearing on selecting an interesting book. Author and title information received less attention, but were significant predictors of selection.
Social Information Behaviour in Bookshops: Implications for Digital Libraries BIBAKFull-Text 84-95
  Sally Jo Cunningham; Nicholas Vanderschantz; Claire Timpany; Annika Hinze; George Buchanan
We discuss here our observations of the interaction of bookshop customers with the books and with each other. Contrary to our initial expectations, customers do not necessarily engage in focused, joint information search, as observed in libraries, but rather the bookshop is treated as a social space similar to a cafe. Our results extend the known repertoire of collaborative behaviours, supporting further development of models of user tasks and goals. We compare our findings with previous work and discuss possible implications of our observations for the design of digital libraries as places of both information access and social interaction.
Keywords: participant observation; social space; collaborative information behaviour; book-based social networking
Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection? BIBAFull-Text 96-107
  Jeremy Steinhauer; Lois M. L. Delcambre; Marianne Lykke; Marit Kristine Ådland
We seek to improve information retrieval in a domain-specific collection by clustering user sessions as recorded in a click log and then classifying later user sessions in real-time. As a preliminary step, we explore the main assumption of this approach: whether user sessions in such a site relate to the question that they are answering. The contribution of this paper is the evaluation of the suitability of common machine learning measurements (measuring the distance between two sessions) to distinguish sessions of users searching for the answer to same or different questions. We found that sessions for people answering the same question are significantly different than those answering different questions, but results are dependent on the distance measure used. We explain why some distance metrics performed better than others.

Digital Curation

Digital Libraries for Experimental Data: Capturing Process through Sheer Curation BIBAFull-Text 108-119
  Mark Hedges; Tobias Blanke
This paper presents an approach to the 'sheer curation' of experimental data and processes of a group of researchers in the life sciences, which involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment described does not capture just individual datasets, but the entire workflow that represents the 'story' of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the researchers' processing environment, a provenance graph is inferred from a variety of domain-specific contextual information as the data is generated, using software that implements the knowledge and expertise of the researchers.
Metadata Management and Interoperability Support for Natural History Museums BIBAKFull-Text 120-131
  Konstantinos Makris; Giannis Skevakis; Varvara Kalokyri; Polyxeni Arapi; Stavros Christodoulakis
Natural History Museums (NHMs) are a rich source of knowledge about Earth's biodiversity and natural history. However, an impressive abundance of high quality scientific content available in NHMs around Europe remains largely unexploited due to a number of barriers, such as: the lack of interconnection and interoperability between the management systems used by museums, the lack of centralized access through a European point of reference like Europeana, and the inadequacy of the current metadata and content organization. The Natural Europe project offers a coordinated solution at European level that aims to overcome those barriers. This paper presents the architecture, deployment and evaluation of the Natural Europe infrastructure allowing the curators to publish, semantically describe and manage the museums' Cultural Heritage Objects, as well as disseminate them to Europeana.eu and biodiversity networks like BioCASE and GBIF.
Keywords: digital curation; preservation metadata; Europeana; BioCASE
A Curation-Oriented Thematic Aggregator BIBAKFull-Text 132-137
  Dimitris Gavrilis; Costis Dallas; Stavros Angelis
The emergence of the European Digital Library (Europeana) presents the need for aggregating content using a more intelligent and effective approach, taking into account the need to support potential changes in target metadata schemas and new services. This paper presents the concept, architecture and services provided by a curation-oriented, OAIS-compliant thematic metadata aggregator, developed and used in the CARARE project, that addresses these challenges.
Keywords: Digital curation; metadata aggregator; Europeana; CARARE; workflows; metadata enrichment
Can Social Reference Management Systems Predict a Ranking of Scholarly Venues? BIBAKFull-Text 138-143
  Hamed Alhoori; Richard Furuta
New scholarly venues (e.g., conferences and journals) are emerging as research fields expand. Ranking these new venues is imperative to assist researchers, librarians, and research institutions. However, rankings based on traditional citation-based metrics have limitations and are no longer the only or the best choice to determine the impact of scholarly venues. Here, we propose a venue-ranking approach based on scholarly references from academic social media sites, and we compare a number of citation-based rankings with social-based rankings. Our preliminary results show a statistically significant correlation between the two approaches in a number of general rankings, research areas, and subdisciplines. Furthermore, we found that social-based rankings favor open-access venues over venues that require a subscription.
Keywords: Scholarly Venues; Ranking; Digital Libraries; Bibliometrics; Altmetrics; Impact Factor; Readership; Social Reference Management; Citation Analysis; Google Scholar Metrics

Mining and Extraction

An Unsupervised Machine Learning Approach to Body Text and Table of Contents Extraction from Digital Scientific Articles BIBAFull-Text 144-155
  Stefan Klampfl; Roman Kern
Scientific articles are predominantly stored in digital document formats, which are optimised for presentation, but lack structural information. This poses challenges to access the documents' content, for example for information retrieval. We have developed a processing pipeline that makes use of unsupervised machine learning techniques and heuristics to detect the logical structure of a PDF document. Our system uses only information available from the current document and does not require any pre-trained model. Starting from a set of contiguous text blocks extracted from the PDF file, we first determine geometrical relations between these blocks. These relations, together with geometrical and font information, are then used categorize the blocks into different classes. Based on this logical structure we finally extract the body text and the table of contents of a scientific article. We evaluate our pipeline on a number of datasets and compare it with state-of-the-art document structure analysis approaches.
Entity Network Extraction Based on Association Finding and Relation Extraction BIBAFull-Text 156-167
  Ridho Reinanda; Marta Utama; Fridus Steijlen; Maarten de Rijke
One of the core aims of semantic search is to directly present users with information instead of lists of documents. Various entity-oriented tasks have been or are being considered, including entity search and related entity finding. In the context of digital libraries for computational humanities, we consider another task, network extraction: given an input entity and a document collection, extract related entities from the collection and present them as a network. We develop a combined approach for entity network extraction that consists of a co-occurrence-based approach to association finding and a machine learning-based approach to relation extraction. We evaluate our approach by comparing the results on a ground truth obtained using a pooling method.
Word Occurrence Based Extraction of Work Contributors from Statements of Responsibility BIBAKFull-Text 168-179
  Nuno Freire
This paper addresses the identification of all contributors of an intellectual work, when they are recorded in bibliographic data but in unstructured form. National bibliographies are very reliable on representing the first author of a work, but frequently, secondary contributors are represented in the statements of responsibility that are transcribed by the cataloguer from the book into the bibliographic records. The identification of work contributors mentioned in statements of responsibility is a typical motivation for the application of information extraction techniques. This paper presents an approach developed for the specific application scenario of the ARROW rights infrastructure being deployed in several European countries to assist in the determination of the copyright status of works that may not be under public domain. Our approach performed reliably in most languages and bibliographic datasets of at least one million records, achieving precision and recall above 0.97 on five of the six evaluated datasets. We conclude that the approach can be reliably applied to other national bibliographies and languages.
Keywords: named entity recognition; information extraction; national bibliographies; library catalogues; copyright

Architectures and Interoperability

Evaluating the Deployment of a Collection of Images in the CULTURA Environment BIBAKFull-Text 180-191
  Maristella Agosti; Marta Manfioletti; Nicola Orio; Chiara Ponchia
The paper reports on the effort of reconsidering the characteristics of the IPSA online collection of illuminated images created for specialised users, involving the redesigning of the interaction functions to make the online collection of interest for new and diverse user categories. The effort is part of the design and development of a new adaptive and dynamic environment that aims at increasing user engagement with cultural heritage collections and which is taking place in the context of the European CULTURA project.
Keywords: Cultural heritage systems; IPSA collection of illuminated images; CULTURA environment; archives; illuminated manuscripts; user engagement with cultural heritage collections
Formal Models for Digital Archives: NESTOR and the 5S BIBAFull-Text 192-203
  Nicola Ferro; Gianmaria Silvello
Archives are a valuable part of our cultural heritage but despite their importance, the models and technologies that have been developed over the past two decades in the DL field have not been specifically tailored to them. This is especially true when it comes to formal and foundational frameworks, as the 5S model is.
   Therefore, we propose an innovative formal model, called NESTOR, for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. We then use NESTOR to extend the 5S model offering the possibility of opening up the full wealth of DL methods to archives. We provide account for this by presenting two concrete applications.
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool BIBAKFull-Text 204-215
  Justin F. Brunelle; Michael L. Nelson; Lyudmila Balakireva; Robert Sanderson; Herbert Van de Sompel
Conventional Web archives are created by periodically crawling a Web site and archiving the responses from the Web server. Although easy to implement and commonly deployed, this form of archiving typically misses updates and may not be suitable for all preservation scenarios, for example a site that is required (perhaps for records compliance) to keep a copy of all pages it has served. In contrast, transactional archives work in conjunction with a Web server to record all content that has been served. Los Alamos National Laboratory has developed SiteStory, an open-source transactional archive written in Java that runs on Apache Web servers, provides a Memento compatible access interface, and WARC file export features. We used Apache's ApacheBench utility on a pre-release version of SiteStory to measure response time and content delivery time in different environments. The performance tests were designed to determine the feasibility of SiteStory as a production-level solution for high fidelity automatic Web archiving. We found that SiteStory does not significantly affect content server performance when it is performing transactional archiving. Content server performance slows from 0.076 seconds to 0.086 seconds per Web page access when the content server is under load, and from 0.15 seconds to 0.21 seconds when the resource has many embedded and changing resources.
Keywords: Web Archiving; Digital Preservation

Interfaces to Digital Libraries

Exploring Large Digital Library Collections Using a Map-Based Visualisation BIBAFull-Text 216-227
  Mark Hall; Paul Clough
In this paper we describe a novel approach for exploring large document collections using a map-based visualisation. We use hierarchically structured semantic concepts that are attached to the documents to create a visualisation of the semantic space that resembles a Google Map. The approach is novel in that we exploit the hierarchical structure to enable the approach to scale to large document collections and to create a map where the higher levels of spatial abstraction have semantic meaning. An informal evaluation is carried out to gather subjective feedback from users. Overall results are positive with users finding the visualisation enticing and easy to use.
AugDesk. Fusing Reality with the Virtual in Document Triage. Part1: Gesture Interactions BIBAFull-Text 228-234
  Fernando Loizides; Doros Polydorou; Keti Mavri; George Buchanan; Panayiotis Zaphiris
In this paper we present the first version of AugDesk, an affordable augmented reality prototype desk for sorting documents based on their relevance to an information need. The set-up is based on the findings from previous work in conjunction with a user-centred iterative design process to improve both the software and hardware configuration. In this initial version of the prototype the documents automatically appear on a table from an overhead projector and the user can control the movement and selection of these documents by using gestures, identified from a Microsoft Kinect Sensor. The first part of our work included recording users' actions to identify the most popular interactions with virtual documents on a table and integrating these into AugDesk.
The Role of Search Interface Features during Information Seeking BIBAFull-Text 235-240
  Abdigani Diriye; Ann Blandford; Anastasios Tombros; Pertti Vakkari
In this paper, we examine the role search interface features play in information seeking across different categories and complexities of search tasks. We present a system called Search Buddy that provides features to enable exploration, filtering and browsing of information. Differing categories and complexities of search tasks were studied through qualitative and quantitative methods. We find specific user patterns in the frequency, points and context of search interface usage. This study highlight the potential value of contextualizing interface features to the type of task and stage of information seeking.
Users Requirements in Audiovisual Search: A Quantitative Approach BIBAFull-Text 241-246
  Danish Nadeem; Roeland Ordelman; Robin Aly; Erwin Verbruggen
This paper reports on the results of a quantitative analysis of user requirements for audiovisual search that allow the categorisation of requirements and to compare requirements across user groups. The categorisation provides clear directions with respect to the prioritisation of system features from the perspective of the development of systems for specific, single user groups and systems that have a more general target user group.

Semantic Web

Hierarchical Structuring of Cultural Heritage Objects within Large Aggregations BIBAFull-Text 247-259
  Shenghui Wang; Antoine Isaac; Valentine Charles; Rob Koopman; Anthi Agoropoulou; Titia van der Werf
Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hierarchically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.
Methodology for Dynamic Extraction of Highly Relevant Information Describing Particular Object from Semantic Web Knowledge Base BIBAKFull-Text 260-271
  Krzysztof Sielski; Justyna Walkowska; Marcin Werla
Exploration and information discovery in a big knowledge base that uses a complex ontology is often difficult, because relevant information may be spread over a number of related objects amongst many other, loosely connected ones. This paper introduces 3 types of relations between classes in an ontology and defines the term of RDF Unit to group relevant and closely connected information. The type of relation is chosen based on association strength in the context of particular ontology. This approach was designed and implemented to manipulate and browse data in a cultural heritage Knowledge Base with over 500M triples, created by PSNC during the SYNAT research project.
Keywords: Semantic Web; ontology; OWL; RDF; CIDOC CRM; FRBRoo; RDF Unit; RDF Molecule; knowledge base
Personalizing Keyword Search on RDF Data BIBAFull-Text 272-278
  Giorgos Giannopoulos; Evmorfia Biliri; Timos Sellis
Despite the vast amount on works on personalizing keyword search on unstructured data (i.e. web pages), there is not much work done handling RDF data. In this paper we present our first cut approach on personalizing keyword query results on RDF data. We adopt the well known Ranking SVM approach, by training ranking functions with RDF-specific training features. The training utilizes historical user feedback, in the form of ratings on the searched items. In order to do so, we join Netflix and DBpedia datasets, obtaining a dataset where we can simulate personalized search scenarios for a number of discrete users. Our evaluation shows that our approach outperforms the baseline and, in cases, it scores very close to the ground truth.
Providing Meaningful Information in a Large Scale Digital Library -- A Case Study BIBAKFull-Text 279-284
  Laura Rueda; Sünje Dallmeier-Tiessen; Patricia Herterich; Samuele Carli; Salvatore Mele; Simeon Warner
Emerging open science practices require persistent identification and citability of a diverse set of scholarly materials, from paper based materials to research data. This paper presents a case study of the digital library INSPIRE digital library and its approach to connecting persistent identifiers for scientific material and author identification. The workflows developed under the ODIN project, connecting DataCite DOIs and ORCIDs, can serve as a best practice example for integrating external information into such digital libraries.
Keywords: persistent identifier; digital library; interoperability model; open science

Information Retrieval and Browsing

Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital Libraries BIBAKFull-Text 285-296
  Benjamin Köhncke; Wolf-Tilo Balke
Today, entity-centric searches are common tasks for information gathering. But, due to the huge amount of available information the entity itself is often not sufficient for finding suitable results. Users are usually searching for entities in a specific search context which is important for their relevance assessment. Therefore, for digital library providers it is inevitable to also consider this search context to allow for high quality retrieval. In this paper we present an approach enabling context searches for chemical entities. Chemical entities play a major role in many specific domains, ranging from biomedical over biology to material science. Since most of the domain specific documents lack of suitable context annotations, we present a similarity measure using cross-domain knowledge gathered from Wikipedia. We show that structure-based similarity measures are not suitable for chemical context searches and introduce a similarity measure combining entity- and context similarity. Our experiments show that our measure outperforms structure-based similarity measures for chemical entities. We compare against two baseline approaches: a Boolean retrieval model and a model using statistical query expansion for the context term. We compared the measures computing mean average precision (MAP) using a set of queries and manual relevance assessments from domain experts. We were able to get a total increase of the MAP of 30% (from 31% to 61%). Furthermore, we show a personalized retrieval system which leads to another increase of around 10%.
Keywords: Chemical Digital Libraries; Personalization; Context Search
Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora BIBAKFull-Text 297-308
  Nam Khanh Tran; Sergej Zerr; Kerstin Bischoff; Claudia Niederée; Ralf Krestel
Topic modeling has gained a lot of popularity as a means for identifying and describing the topical structure of textual documents and whole corpora. There are, however, many document collections such as qualitative studies in the digital humanities that cannot easily benefit from this technology. The limited size of those corpora leads to poor quality topic models. Higher quality topic models can be learned by incorporating additional domain-specific documents with similar topical content. This, however, requires finding or even manually composing such corpora, requiring considerable effort. For solving this problem, we propose a fully automated adaptable process of topic cropping. For learning topics, this process automatically tailors a domain-specific Cropping corpus from a general corpus such as Wikipedia. The learned topic model is then mapped to the working corpus via topic inference. Evaluation with a real world data set shows that the learned topics are of higher quality than those learned from the working corpus alone. In detail, we analyzed the learned topics with respect to coherence, diversity, and relevance.
Keywords: digital humanities; qualitative data; topic modeling
A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles BIBAFull-Text 309-314
  Francesco Cauteruccio; Giovambattista Ianni
In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known "seed" authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique "meta-wrapper", not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments.
Securing Access to Complex Digital Artifacts -- Towards a Controlled Processing Environment for Digital Research Data BIBAFull-Text 315-320
  Johann Latocha; Klaus Rechert; Isao Echizen
Providing secured and restricted access to digital objects, especially access to digital research data, for a general audience poses new challenges to memory institutions. For instance, to protect individuals, only anonymized or pseudonymized data should be released to a general audience. Standard procedures have been established over time to cope with privacy issues of non-interactive digital objects like text, audio and video. Appearances of identifiers and potentially also quasi-identifiers were removed by a simple overlay, e.g. in text documents such appearances were simply blackened out. Today's digital artifacts, especially research data, have complex, non-linear and even interactive manifestations. Thus, a different approach to securing access to complex digital artifacts is required. This paper presents an architecture and technical methods to control access to digital research data.


Restoring Semantically Incomplete Document Collections Using Lexical Signatures BIBAKFull-Text 321-332
  Luis Meneses; Himanshu Barthwal; Sanjeev Singh; Richard Furuta; Frank Shipman
Unexpected changes create a problem when managing missing resources in a digital collection. In decentralized and distributed collections such as Walden's Paths, a missing point or an incomplete resource is of grave importance as it can potentially interrupt the continuity in the narration and render the collection semantically incomplete. We can foresee two possible scenarios occurring when resources cannot be found. First, we have access to a copy of the missing document or to its lexical signatures, which allows us to find the missing resource. The second case is more interesting to us. What happens if we don't have any valid metadata associated to the missing resource? To solve this problem, we used the lexical signatures of valid documents within a collection to find suitable replacements for absent resources. As results we found that traditional similarity metrics do not adequately convey the relationships between the elements in the collections. Our analyses also showed that our procedures were able to restore the semantic integrity of incomplete document collections.
Keywords: Semantic replacements; Web resource management; distributed collections
Resurrecting My Revolution BIBAKFull-Text 333-345
  Hany M. Salaheldeen; Michael L. Nelson
In previous work we reported that resources linked in tweets disappeared at the rate of 11% in the first year followed by 7.3% each year afterwards. We also found that in the first year 6.7%, and 14.6% in each subsequent year, of the resources were archived in public web archives. In this paper we revisit the same dataset of tweets and find that our prior model still holds and the calculated error for estimating percentages missing was about 4%, but we found the rate of archiving produced a higher error of about 11.5%. We also discovered that resources have disappeared from the archives themselves (7.89%) as well as reappeared on the live web after being declared missing (6.54%). We have also tested the availability of the tweets themselves and found that 10.34% have disappeared from the live web. To mitigate the loss of resources on the live web, we propose the use of a "tweet signature". Using the Topsy API, we extract the top five most frequent terms from the union of all tweets about a resource, and use these five terms as a query to Google. We found that using tweet signatures results in discovering replacement resources with 70+% textual similarity to the missing resource 41% of the time.
Keywords: Web Archiving; Social Media; Digital Preservation; Reconstruction
Who and What Links to the Internet Archive BIBAKFull-Text 346-357
  Yasmin Alnoamany; Ahmed Alsum; Michele C. Weigle; Michael L. Nelson
The Internet Archive's (IA) Wayback Machine is the largest and oldest public web archive and has become a significant repository of our recent history and cultural heritage. Despite its importance, there has been little research about how it is discovered and used. Based on web access logs, we analyze what users are looking for, why they come to IA, where they come from, and how pages link to IA. We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web. About 65% of the requested archived pages no longer exist on the live web. We find that more than 82% of human sessions connect to the Wayback Machine via referrals from other web sites, while only 15% of robots have referrers. Most of the links (86%) from websites are to individual archived pages at specific points in time, and of those 83% no longer exist on the live web.
Keywords: Web Archiving; Web Server Logs; Web Usage Mining; Language Detection


A Study of Digital Curator Competencies -- A Delphi Study BIBAKFull-Text 358-361
  Anna Maria Tammaro; Melody Madrid
The aim of this research was to define competencies for digital curators, and to validate them through a Delphi process in the context of Library, Archives, Museum curriculum development. The objective for the study was to obtain consensus regarding competence statements for Library, Archives and Museum digital curators.
Keywords: Digital Curation; Digital Curator Competencies; Delphi Method
Large Scale Citation Matching Using Apache Hadoop BIBAKFull-Text 362-365
  Mateusz Fedoryszak; Dominika Tkaczyk; Lukasz Bolikowski
During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.
Keywords: citation matching; approximate indexing; MapReduce; Hadoop; CRF; SVM
Building an Online Environment for Usefulness Evaluation BIBAKFull-Text 366-369
  Jasmin Hügi; René Schneider
In this paper we present a methodological framework for usefulness evaluation of digital libraries and information services that has been tested successfully in two case studies before developing a corresponding tool that may be used for further investigations. The tool is based on a combination of a knowledge base with exploitable and modifiable questions and an open source tool for online-questionnaires.
Keywords: Digital Libraries; Usefulness Evaluation; Quality metrics
Topic Modeling for Search and Exploration in Multivariate Research Data Repositories BIBAKFull-Text 370-373
  Maximilian Scherer; Tatiana von Landesberger; Tobias Schreck
Huge amounts of multivariate research data are produced and made publicly available in digital libraries. Little research focused on similarity functions that take multivariate data documents as a whole into account. Such similarity functions are highly beneficial for users, by enabling them to browse and query large collections of multivariate data using nearest-neighbor indexing. In this paper we tackle this challenge and propose a novel similarity function for multivariate data documents based on topic-modeling. Based on a previously developed bag-of-words approach for multivariate data, we can then learn a topic model for a collection of multivariate data documents and represent each document as a mixture of topics. This representation is very suitable for efficient nearest-neighbor indexing and clustering according to the topic distribution of a document. We present a use-case where we apply this approach to retrieval of multivariate data in the field of climate research.
Keywords: multivariate data; content-based retrieval; bag-of-words; lda
Time-Based Exploratory Search in Scientific Literature BIBAKFull-Text 374-377
  Silviu Homoceanu; Sascha Tönnies; Philipp Wille; Wolf-Tilo Balke
State-of-the-art faceted search graphical user interfaces for digital libraries provide a wide range of filters perfectly suitable for narrowing down results for well-defined user needs. However, they fail to deliver summarized overview information for users that need to familiarize themselves with a new scientific topic. In fact, exploratory search remains one of the major problems for scientific literature search in digital libraries. Exploiting a user study about how computer scientists actually approach new subject areas we developed ESSENCE, a system for empowering exploratory search in scientific literature.
Keywords: Digital Libraries; User Interface; Exploratory Search; Timeline
Crowds and Content: Crowd-Sourcing Primitives for Digital Libraries BIBAKFull-Text 378-381
  Stuart Dunn; Mark Hedges
This poster reports on a nine month scoping survey of research in the arts and humanities involving crowd-sourcing. This study proposed a twelve-facet typology of research processes currently in use, and these are reported here, along with the context of current research practice, the types of research assets which are currently being exposed to crowd-sourcing, and the sorts of outputs (including digital libraries and collections) which such projects are producing.
Keywords: crowd-sourcing; typology; humanities
Regional Effects on Query Reformulation Patterns BIBAFull-Text 382-385
  Steph Jesper; Paul Clough; Mark Hall
This paper describes an in-depth study of the effects of geographic region on search patterns; particularly query reformulations, in a large query log from the UK National Archives (TNA). A total of 1,700 sessions involving 9,447 queries from 17 countries were manually analyzed for their semantic composition and pairs of queries for their reformulation type. Results show country-level variations for the types of queries commonly issued and typical patterns of query reformulation. Understanding the effects of regional differences will assist with the future design of search algorithms at TNA as they seek to improve their international reach.
Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times BIBAKFull-Text 386-390
  Joeran Beel; Stefan Langer; Marcel Genzmehr; Andreas Nürnberger
How do click-through rates vary between research paper recommendations previously shown to the same users and recommendations shown for the very first time? To answer this question we analyzed 31,942 research paper recommendations given to 1,155 students and researchers with the literature management software Docear. Results indicate that recommendations should only be given once. Click-through rates for 'fresh', i.e. previously unknown, recommendations are twice as high as for already known recommendations. Results also show that some users are 'oblivious'. It frequently happened that users clicked on recommendations they already knew. In one case the same recommendation was shown six times to the same user and the user clicked on it each time again. Overall, around 50% of clicks on reshown recommendations were such 'oblivious-clicks'.
Keywords: recommender systems; persistence; re-rating; research paper
Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling BIBAKFull-Text 391-395
  Joeran Beel; Stefan Langer; Marcel Genzmehr
In this paper we show that organic recommendations are preferred over commercial recommendations even when they point to the same freely downloadable research papers. Simply the fact that users perceive recommendations as commercial decreased their willingness to accept them. It is further shown that the exact labeling of recommendations matters. For instance, recommendations labeled as 'advertisement' performed worse than those labeled as 'sponsored'. Similarly, recommendations labeled as 'Free Research Papers' performed better than those labeled as 'Research Papers'. However, whatever the differences between the labels were -- the best performing recommendations were those with no label at all.
Keywords: recommender systems; organic search; sponsored search; labeling
The Impact of Demographics (Age and Gender) and Other User-Characteristics on Evaluating Recommender Systems BIBAKFull-Text 396-400
  Joeran Beel; Stefan Langer; Andreas Nürnberger; Marcel Genzmehr
In this paper we show the importance of considering demographics and other user characteristics when evaluating (research paper) recommender systems. We analyzed 37,572 recommendations delivered to 1,028 users and found that elderly users clicked more often on recommendations than younger ones. For instance, 20-24 years old users achieved click-through rates (CTR) of 2.73% on average while CTR for users between 50 and 54 years was 9.26%. Gender only had a marginal impact (CTR males 6.88%; females 6.67%) but other user characteristics such as whether a user was registered (CTR: 6.95%) or not (4.97%) had a strong impact. Due to the results we argue that future research articles on recommender systems should report detailed data on their users to make results better comparable.
Keywords: recommender systems; demographics; evaluation; research paper
PoliMedia BIBAKFull-Text 401-404
  Max Kemman; Martijn Kleppe
Analysing media coverage across several types of media-outlets is a challenging task for academic researchers. The PoliMedia project aimed to showcase the potential of cross-media analysis by linking the digitised transcriptions of the debates at the Dutch Parliament (Dutch Hansard) with three media-outlets: 1) newspapers in their original layout of the historical newspaper archive at the National Library, 2) radio bulletins of the Dutch National Press Agency (ANP) and 3) newscasts and current affairs programs from the Netherlands Institute for Sound and Vision. In this paper we describe generally how these links were created and we introduce the PoliMedia search user interface developed for scholars to navigate the links. In our evaluation we found that the linking algorithm had a recall of 67% and precision of 75%. Moreover, in an eye tracking evaluation we found that the interface enabled scholars to perform known-item and exploratory searches for qualitative analysis.
Keywords: political communication; parliamentary debates; newspapers; radio bulletins; television; cross-media analysis; semantic web; information retrieval
Eye Tracking the Use of a Collapsible Facets Panel in a Search Interface BIBAKFull-Text 405-408
  Max Kemman; Martijn Kleppe; Jim Maarseveen
Facets can provide an interesting functionality in digital libraries. However, while some research shows facets are important, other research found facets are only moderately used. Therefore, in this exploratory study we compare two search interfaces; one where the facets panel is always visible and one where the facets panel is hidden by default. Our main research question is "Is folding the facets panel in a digital library search interface beneficial to academic users?" By performing an eye tracking study with N=24, we measured search efficiency, distribution of attention and user satisfaction. We found no significant differences in the eye tracking data nor in usability feedback and conclude that collapsing facets is neither beneficial nor detrimental.
Keywords: eye tracking; facets; information retrieval; usability; user studies; digital library; user behaviour; search user interface
Efficient Access to Emulation-as-a-Service -- Challenges and Requirements BIBAFull-Text 409-412
  Dirk von Suchodoletz; Klaus Rechert
The shift of the usually non-trivial task of emulation of obsolete software environments from the end user to specialized providers through Emulation-as-a-Service (EaaS) helps to simplify digital preservation and access strategies. End users interact with emulators remotely through standardized (web-)clients on their various devices. Besides offering relevant advantages, EaaS makes emulation a networked service introducing new challenges like remote rendering, stream synchronization and real time requirements. Various objectives, like fidelity, performance or authenticity can be required depending on the actual purpose and user expectations. Various original environments and complex artefacts have different needs regarding expedient and/or authentic performance.
RDivF: Diversifying Keyword Search on RDF Graphs BIBAKFull-Text 413-416
  Nikos Bikakis; Giorgos Giannopoulos; John Liagouris; Dimitrios Skoutas; Theodore Dalamagas; Timos Sellis
In this paper, we outline our ongoing work on diversifying keyword search results on RDF data. Given a keyword query over an RDF graph, we define the problem of diversifying the search results and we present diversification criteria that take into consideration both the content and the structure of the results, as well as the underlying RDF/S-OWL schema.
Keywords: Linked Data; Semantic Web; Web of Data; Structured Data
Evolution of eBooks on Demand Web-Based Service: A Perspective through Surveys BIBAKFull-Text 417-420
  Õnne Mets; Silvia Gstrein; Veronika Gründhammer
In 2007 a document delivery service eBooks on Demand (EOD) was launched by 13 libraries from 8 European countries. It enables users to request digitisation of public domain books. By 2013 the self-sustained network has enlarged to 35 libraries in 12 countries and generated thousands of PDF e-books. Several surveys have been carried out to design the service to be relevant and attractive for end-users and libraries. The current paper explores the EOD service through a retrospective overview of the surveys, describes the status quo including ongoing improvements and suggests further surveys. The focus of the surveys illustrates the benchmarks (such as user groups and their expectations, evaluation of the service environment and form of outcomes, business to business opportunities and professional networking) that have been achieved to run an effective library service. It aims to be a possible model for libraries to start and develop a service.
Keywords: user surveys; evaluation; library services; digital library services; digitisaton on demand; online environments; ebooks
Embedding Impact into the Development of Digital Collections: Rhyfel Byd 1914-1918 a'r Profiad Cymreig / Welsh Experience of the First World War 1914-1918 BIBAFull-Text 421-424
  Lorna Hughes
This poster describes a mass digitisation project led by the National Library of Wales to digitize archives and special collections about the Welsh experience of the First World War. The digital archive that will be created by the project will be a cohesive, digitally reunified archive that has value for research, education, and public engagement in time for the hundredth anniversary of the start of the First World War. In order to maximize impact of the digital outputs of the project, it has actively sought to embed methods that will increase its value to the widest audience. This paper describes these approaches and how they sit within the digital life cycle of project development.
Creating a Repository of Community Memory in a 3D Virtual World: Experiences and Challenges BIBAKFull-Text 425-428
  Ekaterina Prasolova-Førland; Mikhail Fominykh; Leif Martin Hokstad
In this paper, we focus on creation of 3D content in learning communities, exemplified with a Virtual Gallery and Virtual Research Arena projects in the virtual campus of our university in Second Life. Based on our experiences, we discuss the possibilities and challenges of creating a repository of community memory in 3D virtual worlds.
Keywords: repository of community memory; learning communities; 3D virtual worlds
Social Navigation Support for Groups in a Community-Based Educational Portal BIBAKFull-Text 429-433
  Peter Brusilovsky; Yiling Lin; Chirayu Wongchokprasitti; Scott Britell; Lois M. L. Delcambre; Richard Furuta; Kartheek Chiluka; Lillian N. Cassel; Ed Fox
This work seeks to enhance a user's experience in a digital library using group-based social navigation. Ensemble is a portal focusing on computing education as part of the US National Science Digital Library providing access to a large amount of learning materials and resources for education in Science, Technology, Engineering and Mathematics. With so many resources and so many contributing groups, we are seeking an effective way to guide users to find the right resource(s) by using group-based social navigation. This poster demonstrates how group-based social navigation can be used to extend digital library portals and how it can be used to guide portal users to valuable resources.
Keywords: social navigation; digital library; portal; navigation support
Evaluation of Preserved Scientific Processes BIBAFull-Text 434-437
  Rudolf Mayer; Mark Guttenbrunner; Andreas Rauber
Digital preservation research has seen an increased focus is on objects that are non-deterministic but depend on external events like user input or data from external sources. Among those is the preservation of scientific processes, aiming at reuse of research outputs. Ensuring that the preserved object is equivalent to the original is a key concern, and is traditionally measured by comparing significant properties of the objects. We adapt a framework for comparing emulated versions of a digital object to measure equivalence also in processes.
An Open Source System Architecture for Digital Geolinguistic Linked Open Data BIBAFull-Text 438-441
  Emanuele Di Buccio; Giorgio Maria Di Nunzio; Gianmaria Silvello
Digital Geolinguistic systems encourages collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. These systems can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. In this poster, we present a system architecture based on a LOD approach the aim of which is to increase the level of interoperability of geolinguistic applications and the reuse of the data.
Committee-Based Active Learning for Dependency Parsing BIBAKFull-Text 442-445
  Saeed Majidi; Gregory Crane
Annotations on structured corpora provide a foundational instrument for emerging linguistic research. To generate annotations automatically, data-driven dependency parsers need a large annotated corpus to learn from. But these annotations are expensive to collect and require a labor intensive task. In order to reduce the costs of annotation, we provide a novel framework in which a committee of dependency parsers collaborate to improve their efficiency using active learning.
Keywords: active learning; corpus annotation; dependency parsing


PoliticalMashup Ngramviewer BIBAFull-Text 446-449
  Bart de Goede; Justin van Wees; Maarten Marx; Ridho Reinanda
The PoliticalMashup Ngramviewer is an application that allows a user to visualise the use of terms and phrases in the "Tweede Kamer" (the Dutch parliament). Inspired by the Google Books Ngramviewer, the PoliticalMashup Ngramviewer additionally allows for faceting on politicians and parties, providing a more detailed insight in the use of certain terms and phrases by politicians and parties with different points of view.
Monitrix -- A Monitoring and Reporting Dashboard for Web Archivists BIBAKFull-Text 450-453
  Rainer Simon; Andrew Jackson
This demonstration paper introduces Monitrix, an upcoming monitoring and reporting tool for Web archivists. Monitrix works in conjunction with the Heritrix 3 Open Source Web crawler and provides real-time analytics about an ongoing crawl, as well as summary information aggregated about crawled hosts and URLs. In addition, Monitrix monitors the crawl for the occurrence of suspicious patterns that may indicate undesirable behavior, such as crawler traps or blocking hosts. Monitrix is developed as a cooperation between the British Library's UK Web Archive and the Austrian Institute of Technology, and is licensed under the terms of the Apache 2 Open Source license.
Keywords: Web Archiving; Quality Assurance; Analytics
SpringerReference: A Scientific Online Publishing System BIBAKFull-Text 454-457
  Sebastian Lindner; Christian Simon; Daniel Wieth
This paper presents an online publishing system with focus on scientific peer reviewed content. The goal is to provide authors and editors with a platform to constantly publish and update content well in advance of their print editions across every subject. The techniques in this paper show some of the main components of the implemented document lifecycle. These include a custom document workflow to cope with HTML- and file-based content, an online editing platform including LaTeX formula generation, automatic link insertion between different documents, the generation of auto suggests to simplify search and navigation and a Solr-based search engine.
Keywords: digital library; semi structured data; dynamic scientific content; data mining; document workflow; collaboration
Data Searchery BIBAKFull-Text 458-461
  Paolo Manghi; Andrea Mannocci
The novel e-Science's data-centric paradigm has proved that interlinking publications and research data objects coming from different realms and data sources (e.g. publication repositories, data repositories) makes dissemination, re-use, and validation of research activities more effective. Scholarly Communication Infrastructures are advocated for bridging such data sources, by offering tools for identification, creation, and navigation of relationships. Since realization and maintenance of such infrastructures is expensive, in this demo we propose a lightweight approach for "preliminary analysis of data source interlinking" to help practitioners at evaluating whether and to what extent realizing them can be effective. We present Data Searchery, a configurable tool enabling users to easily plug-in data sources from different realms with the purpose of cross-relating their objects, be them publications or research data, by identifying relationships between their metadata descriptions.
Keywords: Interoperability; Interlinking; Research Data; Publications
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment BIBAFull-Text 462-465
  Eneko Agirre; Ander Barrena; Kike Fernandez; Esther Miranda; Arantxa Otegi; Aitor Soroa
Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS.
Leveraging Domain Specificity to Improve Findability in OER Repositories BIBAKFull-Text 466-469
  Darina Dicheva; Christo Dichev
This paper addresses the problem of improving the findability of open educational resources (OER) in Computer Science. It presents a domain-specific OER reference repository and portal aimed at increasing the low OER use. The focus is on enhancing the search and navigation capabilities. A distinctive feature is the proposed query-by-navigation method.
Keywords: Open Educational Resources; Information Retrieval; Search
VirDO: A Virtual Workspace for Research Documents BIBAKFull-Text 470-473
  George E. Raptis; Christina P. Katsini; Stephen J. Payne
We report the design of a system which integrates a suite of tools to allow scholars to manage related documents in their personal digital stores. VirDO provides a virtual workspace in which pdfs can be placed and displayed, and which allows these documents to be manipulated in various ways that prior literature suggests to be useful. Particularly noteworthy are the various maps that support users in uncovering the inter-relations among documents in the workspace, including citation relations and flexible user-defined tags. Early evaluation of the system was positive: especially promising was the increasing use of maps by two participants who used VirDO for their own research over a period of a week, as well as the extensive use by all participants of sticky notes.
Keywords: sensemaking; document mapping; annotation; scholarship
Domain Search and Exploration with Meta-Indexes BIBAKFull-Text 474-477
  Michael Huggett; Edie Rasmussen
In order to facilitate navigation and search of large collections of digital books, we have developed a new knowledge structure, the meta-index, which aggregates the back-of-book indexes within a subject domain. Using a test collection of digital books, we demonstrate the use of the meta-index and associated metrics that characterize the books within a digital domain, and explore some of the challenges presented by the meta-index structure.
Keywords: Indexes; Meta-indexes; Bibliometrics; Visualization; Search; User interfaces


COST Actions and Digital Libraries: Between Sustaining Best Practices and Unleashing Further Potential BIBAFull-Text 478-479
  Matthew J. Driscoll; Ralph Stübner; Touradj Ebrahimi; Muriel Foulonneau; Andreas Nürnberger; Andrea Scharnhorst; Joie Springer
The panel brings together chairs or key participants from a number of COST-funded Actions from several domains -- Individuals, Societies, Cultures and Health (ISCH), Information and Communication Technologies (ICT) -- as well as a Trans-Domain Action), a Science Officer from the COST Office and will be complemented by a representative of the Memory of the World programme of UNESCO.
e-Infrastructures for Digital Libraries...the Future BIBAFull-Text 480-481
  Wim Jansen; Roberto Barbera; Michel Drescher; Antonella Fresa; Matthias Hemmje; Yannis Ioannidis; Norbert Meyer; Nick Poole; Peter Stanchev
The digital ICT revolution is profoundly changing the way knowledge is created, communicated and is being deployed. New research methods based on computing and "big data" enable new means and forms for scientific collaboration also through policy measures supporting open access to data and research results. The exponential growth of digital resources and services is supported by the deployment of e-Infrastructure, which allows researchers to access remote facilities, run complex simulations or to manage and exchange unprecedented amounts of digital data.


The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits BIBAFull-Text 482-483
  Laura Mandell; Violeta Ilik
We offer a half day tutorial that will explore the role of XML and XSLT (eXtensible Stylesheet Language Transformations, themselves XML documents) in digital library and digital humanities projects. Digital libraries ideally aim to provide both access and interaction. Digital libraries and digital humanities projects should foster edition building and curation. Therefore, this tutorial aims to teach librarians, scholars, and those involved in cultural heritage projects a scripting language that allows for easy manipulation of metadata, pictures, and text. The modules in this tutorial will help participants in planning for their own organizations digital efforts and scholarly communications as well as in facilitating their efforts at digitization and creating interoperability between document editions. In five instructional modules, including hands-on exercises, we will help participants gain experience and knowledge of the possibilities that XSLT offers in transforming documents from XML to HTML, from XML to text, and from one metadata schema to another.
Mapping Cross-Domain Metadata to the Europeana Data Model (EDM) BIBAKFull-Text 484-485
  Valentine Charles; Antoine Isaac; Vassilis Tzouvaras; Steffen Hennicke
With the growing amount and the diversity of aggregation services for cultural heritage, the challenge of data mapping has become crucial.
Keywords: Interoperability; EDM; mapping; MINT
State-of-the-Art Tools for Text Digitisation BIBAKFull-Text 486-487
  Bob Boelhouwer; Adam Dudczak; Sebastian Kirch
The goal of this tutorial (organised by the Succeed project) is to introduce participants to state-of-the-art tools in digitisation and text processing which have been developed in recent research projects. The tutorial will focus on hands-on demonstration and on the testing of the tools in real-life situations, even those provided by the participants.
Keywords: Digitisation; OCR; Image Enhancement; Enrichment; Lexicon; Ground Truth; NLP
ResourceSync: The NISO/OAI Resource Synchronization Framework BIBAFull-Text 488-489
  Herbert Van de Sompel; Michael L. Nelson; Martin Klein; Robert Sanderson
This tutorial provides an overview and a practical introduction to ResourceSync, a web-based synchronization framework consisting of multiple modular capabilities that a server can selectively implement to enable third party systems to remain synchronized with the server's evolving resources. The tutorial motivates the ResourceSync approach by outlining several synchronization use cases including scholarly article repositories, OAI-PMH repositories, linked data knowledge bases, as well as content aggregators. It details the concepts of the ResourceSync capabilities, their discovery mechanisms, and their serialization based on the widely adopted Sitemap protocol. The tutorial further hints at the extensibility of the synchronization framework, for example, for scenarios to provide references to mirror locations of synchronization resources, to transferring partial content, and to offering historical data.
From Preserving Data to Preserving Research: Curation of Process and Context BIBAFull-Text 490-491
  Rudolf Mayer; Stefan Pröll; Andreas Rauber; Raul Palma; Daniel Garijo
In the domain of eScience, investigations are increasingly collaborative. Most scientific and engineering domains benefit from building on top of the outputs of other research: By sharing information to reason over and data to incorporate in the modelling task at hand.
   This raises the need to provide means for preserving and sharing entire eScience workflows and processes for later reuse. It is required to define which information is to be collected, create means to preserve it and approaches to enable and validate the re-execution of a preserved process. This includes and goes beyond preserving the data used in the experiments, as the process underlying its creation and use is essential.
   This tutorial thus provides an introduction to the problem domain and discusses solutions for the curation of eScience processes.
Linked Data for Digital Libraries BIBAFull-Text 492-493
  Uldis Bojars; Nuno Lopes; Jodi Schneider
This tutorial will empower attendees with the necessary skills to take advantage of Linked Data already available on the Web, provide insights on how to incorporate this data and tools into their daily workflow, and finally touch upon how the attendees' own data can be shared as Linked Data.