| Sustainability of Digital Libraries: A Conceptual Model | | BIBAK | Full-Text | 1-12 | |
| Gobinda G. Chowdhury | |||
| Major factors related to the economic, social and environmental
sustainability of digital libraries have been discussed. Some research in
digital information systems and services in general, and digital libraries in
particular, have been discussed to illustrate different issues of
sustainability. Based on these discussions the paper, for the first time,
proposes a conceptual model and a theoretical research framework for
sustainable digital libraries. It shows that the sustainable business models to
support digital libraries should also support equitable access supported by
specific design and usability guidelines that facilitate easier, better and
cheaper access, support the personal, institutional and social culture of
users, and at the same time conform with the policy and regulatory frameworks
of the respective regions, countries and institutions. Keywords: digital libraries; sustainability; social sustainability; economic
sustainability; environmental sustainability | |||
| Quality Assessment in Crowdsourced Indigenous Language Transcription | | BIBAK | Full-Text | 13-22 | |
| Ngoni Munyaradzi; Hussein Suleman | |||
| The digital Bleek and Lloyd Collection is a rare collection that contains
artwork, notebooks and dictionaries of the indigenous people of Southern
Africa. The notebooks, in particular, contain stories that encode the language,
culture and beliefs of these people, handwritten in now-extinct languages with
a specialised notation system. Previous attempts have been made to convert the
approximately 20000 pages of text to a machine-readable form using machine
learning algorithms but, due to the complexity of the text, the recognition
accuracy was low. In this paper, a crowdsourcing method is proposed to
transcribe the manuscripts, where non-expert volunteers transcribe pages of the
notebooks using an online tool. Experiments were conducted to determine the
quality and consistency of transcriptions. The results show that volunteers are
able to produce reliable transcriptions of high quality. The inter-transcriber
agreement is 80% for |Xam text and 95% for English text. When the |Xam text
transcriptions produced by the volunteers are compared with a gold standard,
the volunteers achieve an average accuracy of 64.75%, which exceeded that in
previous work. Finally, the degree of transcription agreement correlates with
the degree of transcription accuracy. This suggests that the quality of unseen
data can be assessed based on the degree of agreement among transcribers. Keywords: crowdsourcing; transcription; cultural heritage | |||
| Defining Digital Library | | BIBAK | Full-Text | 23-28 | |
| Armand Brahaj; Matthias Razum; Julia Hoxha | |||
| This paper reflects on the range of the definitions of digital libraries
demonstrating their extent. We analyze a number of definitions through a
simplified intensional definition method, through which we exploit the nature
of the definitions by analyzing their respective genera and attributes. The
goal of this paper is to provide a synthesis of the works related to
definitions of digital library, giving a fine-grained comparative approach on
these definitions. We conclude that, although there are a large number of
definitions, they are defined in overlapping families and attributes, and an
inclusive definition is possible. Keywords: Digital Library; Definition; Evaluation of Digital Libraries | |||
| E-Books in Swedish Public Libraries: Policy Implications | | BIBAK | Full-Text | 29-34 | |
| Elena Maceviciute; Tom D. Wilson | |||
| The aims of the paper are: review the situation of e-books delivery in the
Swedish public libraries (as it looked at the end of 2012); identify the
barriers that public libraries encounter in providing access to e-books;
highlight the policy-related problems of e-book provision through public
libraries. A survey was carried out in October, 2012 of all public libraries in
Sweden. 291 questionnaires were issued. 185 were completed, response rate was
63.3%. The provision of an e-book service has arisen as a result of either
demand or an ideological belief that the ethos of democratic values and
equality of access requires libraries to offer material in all media.
Librarians find the situation of e-books provision through libraries
unsatisfactory: the provider of titles removes them from the catalogue without
warning or explanation, there are too few titles for children and students, and
access to popular titles is delayed. Keywords: e-books; public libraries; information policy; Sweden | |||
| On the Change in Archivability of Websites Over Time | | BIBAK | Full-Text | 35-47 | |
| Mat Kelly; Justin F. Brunelle; Michele C. Weigle; Michael L. Nelson | |||
| As web technologies evolve, web archivists work to keep up so that our
digital history is preserved. Recent advances in web technologies have
introduced client-side executed scripts that load data without a referential
identifier or that require user interaction (e.g., content loading when the
page has scrolled). These advances have made automating methods for capturing
web pages more difficult. Because of the evolving schemes of publishing web
pages along with the progressive capability of web preservation tools, the
archivability of pages on the web has varied over time. In this paper we show
that the archivability of a web page can be deduced from the type of page being
archived, which aligns with that page's accessibility in respect to dynamic
content. We show concrete examples of when these technologies were introduced
by referencing mementos of pages that have persisted through a long evolution
of available technologies. Identifying these reasons for the inability of these
web pages to be archived in the past in respect to accessibility serves as a
guide for ensuring that content that has longevity is published using good
practice methods that make it available for preservation. Keywords: Web Archiving; Digital Preservation | |||
| Checking Out: Download and Digital Library Exchange for Complex Objects | | BIBA | Full-Text | 48-59 | |
| Scott Britell; Lois M. L. Delcambre; Lillian N. Cassel; Richard Furuta | |||
| Digital resources are becoming increasingly complex and are being used in diverse ways. For example, educational resources may be cataloged in digital libraries, used offline by educators and students, or used in a learning management system. In this paper we present the notion of "checking out" complex resources from a digital library for offline download or exchange with another digital library or learning management system. We present a mechanism that enables the customization, download and exchange of complex resources. We show how the mechanism also supports digital library and learning management system exchange formats in a generic fashion with minimal overhead. We also show how checkouts grow linearly with respect to the complexity of the resources. | |||
| Profiling Web Archive Coverage for Top-Level Domain and Content Language | | BIBAK | Full-Text | 60-71 | |
| Ahmed Alsum; Michele C. Weigle; Michael L. Nelson; Herbert Van de Sompel | |||
| The Memento aggregator currently polls every known public web archive when
serving a request for an archived web page, even though some web archives focus
on only specific domains and ignore the others. Similar to query routing in
distributed search, we investigate the impact on aggregated Memento TimeMaps
(lists of when and where a web page was archived) by only sending queries to
archives likely to hold the archived page. We profile twelve public web
archives using data from a variety of sources (the web, archives' access logs,
and full-text queries to archives) and discover that only sending queries to
the top three web archives (i.e., a 75% reduction in the number of queries) for
any request produces the full TimeMaps on 84% of the cases. Keywords: Web archive; query routing; memento aggregator | |||
| Selecting Fiction in Library Catalogs: A Gaze Tracking Study | | BIBA | Full-Text | 72-83 | |
| Janna Pöntinen; Pertti Vakkari | |||
| It is studied how readers explore metadata in book pages when selecting fiction in a traditional and an enriched online catalog for fiction. The associations between attention devoted to metadata elements and selecting an interesting book were analyzed. Eye movements of 30 users selecting fiction for four search tasks were recorded. The results indicate that although participants paid most attention in book pages to content description and keywords, these had no bearing on selecting an interesting book. Author and title information received less attention, but were significant predictors of selection. | |||
| Social Information Behaviour in Bookshops: Implications for Digital Libraries | | BIBAK | Full-Text | 84-95 | |
| Sally Jo Cunningham; Nicholas Vanderschantz; Claire Timpany; Annika Hinze; George Buchanan | |||
| We discuss here our observations of the interaction of bookshop customers
with the books and with each other. Contrary to our initial expectations,
customers do not necessarily engage in focused, joint information search, as
observed in libraries, but rather the bookshop is treated as a social space
similar to a cafe. Our results extend the known repertoire of collaborative
behaviours, supporting further development of models of user tasks and goals.
We compare our findings with previous work and discuss possible implications of
our observations for the design of digital libraries as places of both
information access and social interaction. Keywords: participant observation; social space; collaborative information behaviour;
book-based social networking | |||
| Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection? | | BIBA | Full-Text | 96-107 | |
| Jeremy Steinhauer; Lois M. L. Delcambre; Marianne Lykke; Marit Kristine Ådland | |||
| We seek to improve information retrieval in a domain-specific collection by clustering user sessions as recorded in a click log and then classifying later user sessions in real-time. As a preliminary step, we explore the main assumption of this approach: whether user sessions in such a site relate to the question that they are answering. The contribution of this paper is the evaluation of the suitability of common machine learning measurements (measuring the distance between two sessions) to distinguish sessions of users searching for the answer to same or different questions. We found that sessions for people answering the same question are significantly different than those answering different questions, but results are dependent on the distance measure used. We explain why some distance metrics performed better than others. | |||
| Digital Libraries for Experimental Data: Capturing Process through Sheer Curation | | BIBA | Full-Text | 108-119 | |
| Mark Hedges; Tobias Blanke | |||
| This paper presents an approach to the 'sheer curation' of experimental data and processes of a group of researchers in the life sciences, which involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment described does not capture just individual datasets, but the entire workflow that represents the 'story' of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the researchers' processing environment, a provenance graph is inferred from a variety of domain-specific contextual information as the data is generated, using software that implements the knowledge and expertise of the researchers. | |||
| Metadata Management and Interoperability Support for Natural History Museums | | BIBAK | Full-Text | 120-131 | |
| Konstantinos Makris; Giannis Skevakis; Varvara Kalokyri; Polyxeni Arapi; Stavros Christodoulakis | |||
| Natural History Museums (NHMs) are a rich source of knowledge about Earth's
biodiversity and natural history. However, an impressive abundance of high
quality scientific content available in NHMs around Europe remains largely
unexploited due to a number of barriers, such as: the lack of interconnection
and interoperability between the management systems used by museums, the lack
of centralized access through a European point of reference like Europeana, and
the inadequacy of the current metadata and content organization. The Natural
Europe project offers a coordinated solution at European level that aims to
overcome those barriers. This paper presents the architecture, deployment and
evaluation of the Natural Europe infrastructure allowing the curators to
publish, semantically describe and manage the museums' Cultural Heritage
Objects, as well as disseminate them to Europeana.eu and biodiversity networks
like BioCASE and GBIF. Keywords: digital curation; preservation metadata; Europeana; BioCASE | |||
| A Curation-Oriented Thematic Aggregator | | BIBAK | Full-Text | 132-137 | |
| Dimitris Gavrilis; Costis Dallas; Stavros Angelis | |||
| The emergence of the European Digital Library (Europeana) presents the need
for aggregating content using a more intelligent and effective approach, taking
into account the need to support potential changes in target metadata schemas
and new services. This paper presents the concept, architecture and services
provided by a curation-oriented, OAIS-compliant thematic metadata aggregator,
developed and used in the CARARE project, that addresses these challenges. Keywords: Digital curation; metadata aggregator; Europeana; CARARE; workflows;
metadata enrichment | |||
| Can Social Reference Management Systems Predict a Ranking of Scholarly Venues? | | BIBAK | Full-Text | 138-143 | |
| Hamed Alhoori; Richard Furuta | |||
| New scholarly venues (e.g., conferences and journals) are emerging as
research fields expand. Ranking these new venues is imperative to assist
researchers, librarians, and research institutions. However, rankings based on
traditional citation-based metrics have limitations and are no longer the only
or the best choice to determine the impact of scholarly venues. Here, we
propose a venue-ranking approach based on scholarly references from academic
social media sites, and we compare a number of citation-based rankings with
social-based rankings. Our preliminary results show a statistically significant
correlation between the two approaches in a number of general rankings,
research areas, and subdisciplines. Furthermore, we found that social-based
rankings favor open-access venues over venues that require a subscription. Keywords: Scholarly Venues; Ranking; Digital Libraries; Bibliometrics; Altmetrics;
Impact Factor; Readership; Social Reference Management; Citation Analysis;
Google Scholar Metrics | |||
| An Unsupervised Machine Learning Approach to Body Text and Table of Contents Extraction from Digital Scientific Articles | | BIBA | Full-Text | 144-155 | |
| Stefan Klampfl; Roman Kern | |||
| Scientific articles are predominantly stored in digital document formats, which are optimised for presentation, but lack structural information. This poses challenges to access the documents' content, for example for information retrieval. We have developed a processing pipeline that makes use of unsupervised machine learning techniques and heuristics to detect the logical structure of a PDF document. Our system uses only information available from the current document and does not require any pre-trained model. Starting from a set of contiguous text blocks extracted from the PDF file, we first determine geometrical relations between these blocks. These relations, together with geometrical and font information, are then used categorize the blocks into different classes. Based on this logical structure we finally extract the body text and the table of contents of a scientific article. We evaluate our pipeline on a number of datasets and compare it with state-of-the-art document structure analysis approaches. | |||
| Entity Network Extraction Based on Association Finding and Relation Extraction | | BIBA | Full-Text | 156-167 | |
| Ridho Reinanda; Marta Utama; Fridus Steijlen; Maarten de Rijke | |||
| One of the core aims of semantic search is to directly present users with information instead of lists of documents. Various entity-oriented tasks have been or are being considered, including entity search and related entity finding. In the context of digital libraries for computational humanities, we consider another task, network extraction: given an input entity and a document collection, extract related entities from the collection and present them as a network. We develop a combined approach for entity network extraction that consists of a co-occurrence-based approach to association finding and a machine learning-based approach to relation extraction. We evaluate our approach by comparing the results on a ground truth obtained using a pooling method. | |||
| Word Occurrence Based Extraction of Work Contributors from Statements of Responsibility | | BIBAK | Full-Text | 168-179 | |
| Nuno Freire | |||
| This paper addresses the identification of all contributors of an
intellectual work, when they are recorded in bibliographic data but in
unstructured form. National bibliographies are very reliable on representing
the first author of a work, but frequently, secondary contributors are
represented in the statements of responsibility that are transcribed by the
cataloguer from the book into the bibliographic records. The identification of
work contributors mentioned in statements of responsibility is a typical
motivation for the application of information extraction techniques. This paper
presents an approach developed for the specific application scenario of the
ARROW rights infrastructure being deployed in several European countries to
assist in the determination of the copyright status of works that may not be
under public domain. Our approach performed reliably in most languages and
bibliographic datasets of at least one million records, achieving precision and
recall above 0.97 on five of the six evaluated datasets. We conclude that the
approach can be reliably applied to other national bibliographies and
languages. Keywords: named entity recognition; information extraction; national bibliographies;
library catalogues; copyright | |||
| Evaluating the Deployment of a Collection of Images in the CULTURA Environment | | BIBAK | Full-Text | 180-191 | |
| Maristella Agosti; Marta Manfioletti; Nicola Orio; Chiara Ponchia | |||
| The paper reports on the effort of reconsidering the characteristics of the
IPSA online collection of illuminated images created for specialised users,
involving the redesigning of the interaction functions to make the online
collection of interest for new and diverse user categories. The effort is part
of the design and development of a new adaptive and dynamic environment that
aims at increasing user engagement with cultural heritage collections and which
is taking place in the context of the European CULTURA project. Keywords: Cultural heritage systems; IPSA collection of illuminated images; CULTURA
environment; archives; illuminated manuscripts; user engagement with cultural
heritage collections | |||
| Formal Models for Digital Archives: NESTOR and the 5S | | BIBA | Full-Text | 192-203 | |
| Nicola Ferro; Gianmaria Silvello | |||
| Archives are a valuable part of our cultural heritage but despite their
importance, the models and technologies that have been developed over the past
two decades in the DL field have not been specifically tailored to them. This
is especially true when it comes to formal and foundational frameworks, as the
5S model is.
Therefore, we propose an innovative formal model, called NESTOR, for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. We then use NESTOR to extend the 5S model offering the possibility of opening up the full wealth of DL methods to archives. We provide account for this by presenting two concrete applications. | |||
| Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool | | BIBAK | Full-Text | 204-215 | |
| Justin F. Brunelle; Michael L. Nelson; Lyudmila Balakireva; Robert Sanderson; Herbert Van de Sompel | |||
| Conventional Web archives are created by periodically crawling a Web site
and archiving the responses from the Web server. Although easy to implement and
commonly deployed, this form of archiving typically misses updates and may not
be suitable for all preservation scenarios, for example a site that is required
(perhaps for records compliance) to keep a copy of all pages it has served. In
contrast, transactional archives work in conjunction with a Web server to
record all content that has been served. Los Alamos National Laboratory has
developed SiteStory, an open-source transactional archive written in Java that
runs on Apache Web servers, provides a Memento compatible access interface, and
WARC file export features. We used Apache's ApacheBench utility on a
pre-release version of SiteStory to measure response time and content delivery
time in different environments. The performance tests were designed to
determine the feasibility of SiteStory as a production-level solution for high
fidelity automatic Web archiving. We found that SiteStory does not
significantly affect content server performance when it is performing
transactional archiving. Content server performance slows from 0.076 seconds to
0.086 seconds per Web page access when the content server is under load, and
from 0.15 seconds to 0.21 seconds when the resource has many embedded and
changing resources. Keywords: Web Archiving; Digital Preservation | |||
| Exploring Large Digital Library Collections Using a Map-Based Visualisation | | BIBA | Full-Text | 216-227 | |
| Mark Hall; Paul Clough | |||
| In this paper we describe a novel approach for exploring large document collections using a map-based visualisation. We use hierarchically structured semantic concepts that are attached to the documents to create a visualisation of the semantic space that resembles a Google Map. The approach is novel in that we exploit the hierarchical structure to enable the approach to scale to large document collections and to create a map where the higher levels of spatial abstraction have semantic meaning. An informal evaluation is carried out to gather subjective feedback from users. Overall results are positive with users finding the visualisation enticing and easy to use. | |||
| AugDesk. Fusing Reality with the Virtual in Document Triage. Part1: Gesture Interactions | | BIBA | Full-Text | 228-234 | |
| Fernando Loizides; Doros Polydorou; Keti Mavri; George Buchanan; Panayiotis Zaphiris | |||
| In this paper we present the first version of AugDesk, an affordable augmented reality prototype desk for sorting documents based on their relevance to an information need. The set-up is based on the findings from previous work in conjunction with a user-centred iterative design process to improve both the software and hardware configuration. In this initial version of the prototype the documents automatically appear on a table from an overhead projector and the user can control the movement and selection of these documents by using gestures, identified from a Microsoft Kinect Sensor. The first part of our work included recording users' actions to identify the most popular interactions with virtual documents on a table and integrating these into AugDesk. | |||
| The Role of Search Interface Features during Information Seeking | | BIBA | Full-Text | 235-240 | |
| Abdigani Diriye; Ann Blandford; Anastasios Tombros; Pertti Vakkari | |||
| In this paper, we examine the role search interface features play in information seeking across different categories and complexities of search tasks. We present a system called Search Buddy that provides features to enable exploration, filtering and browsing of information. Differing categories and complexities of search tasks were studied through qualitative and quantitative methods. We find specific user patterns in the frequency, points and context of search interface usage. This study highlight the potential value of contextualizing interface features to the type of task and stage of information seeking. | |||
| Users Requirements in Audiovisual Search: A Quantitative Approach | | BIBA | Full-Text | 241-246 | |
| Danish Nadeem; Roeland Ordelman; Robin Aly; Erwin Verbruggen | |||
| This paper reports on the results of a quantitative analysis of user requirements for audiovisual search that allow the categorisation of requirements and to compare requirements across user groups. The categorisation provides clear directions with respect to the prioritisation of system features from the perspective of the development of systems for specific, single user groups and systems that have a more general target user group. | |||
| Hierarchical Structuring of Cultural Heritage Objects within Large Aggregations | | BIBA | Full-Text | 247-259 | |
| Shenghui Wang; Antoine Isaac; Valentine Charles; Rob Koopman; Anthi Agoropoulou; Titia van der Werf | |||
| Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hierarchically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset. | |||
| Methodology for Dynamic Extraction of Highly Relevant Information Describing Particular Object from Semantic Web Knowledge Base | | BIBAK | Full-Text | 260-271 | |
| Krzysztof Sielski; Justyna Walkowska; Marcin Werla | |||
| Exploration and information discovery in a big knowledge base that uses a
complex ontology is often difficult, because relevant information may be spread
over a number of related objects amongst many other, loosely connected ones.
This paper introduces 3 types of relations between classes in an ontology and
defines the term of RDF Unit to group relevant and closely connected
information. The type of relation is chosen based on association strength in
the context of particular ontology. This approach was designed and implemented
to manipulate and browse data in a cultural heritage Knowledge Base with over
500M triples, created by PSNC during the SYNAT research project. Keywords: Semantic Web; ontology; OWL; RDF; CIDOC CRM; FRBRoo; RDF Unit; RDF Molecule;
knowledge base | |||
| Personalizing Keyword Search on RDF Data | | BIBA | Full-Text | 272-278 | |
| Giorgos Giannopoulos; Evmorfia Biliri; Timos Sellis | |||
| Despite the vast amount on works on personalizing keyword search on unstructured data (i.e. web pages), there is not much work done handling RDF data. In this paper we present our first cut approach on personalizing keyword query results on RDF data. We adopt the well known Ranking SVM approach, by training ranking functions with RDF-specific training features. The training utilizes historical user feedback, in the form of ratings on the searched items. In order to do so, we join Netflix and DBpedia datasets, obtaining a dataset where we can simulate personalized search scenarios for a number of discrete users. Our evaluation shows that our approach outperforms the baseline and, in cases, it scores very close to the ground truth. | |||
| Providing Meaningful Information in a Large Scale Digital Library -- A Case Study | | BIBAK | Full-Text | 279-284 | |
| Laura Rueda; Sünje Dallmeier-Tiessen; Patricia Herterich; Samuele Carli; Salvatore Mele; Simeon Warner | |||
| Emerging open science practices require persistent identification and
citability of a diverse set of scholarly materials, from paper based materials
to research data. This paper presents a case study of the digital library
INSPIRE digital library and its approach to connecting persistent identifiers
for scientific material and author identification. The workflows developed
under the ODIN project, connecting DataCite DOIs and ORCIDs, can serve as a
best practice example for integrating external information into such digital
libraries. Keywords: persistent identifier; digital library; interoperability model; open science | |||
| Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital Libraries | | BIBAK | Full-Text | 285-296 | |
| Benjamin Köhncke; Wolf-Tilo Balke | |||
| Today, entity-centric searches are common tasks for information gathering.
But, due to the huge amount of available information the entity itself is often
not sufficient for finding suitable results. Users are usually searching for
entities in a specific search context which is important for their relevance
assessment. Therefore, for digital library providers it is inevitable to also
consider this search context to allow for high quality retrieval. In this paper
we present an approach enabling context searches for chemical entities.
Chemical entities play a major role in many specific domains, ranging from
biomedical over biology to material science. Since most of the domain specific
documents lack of suitable context annotations, we present a similarity measure
using cross-domain knowledge gathered from Wikipedia. We show that
structure-based similarity measures are not suitable for chemical context
searches and introduce a similarity measure combining entity- and context
similarity. Our experiments show that our measure outperforms structure-based
similarity measures for chemical entities. We compare against two baseline
approaches: a Boolean retrieval model and a model using statistical query
expansion for the context term. We compared the measures computing mean average
precision (MAP) using a set of queries and manual relevance assessments from
domain experts. We were able to get a total increase of the MAP of 30% (from
31% to 61%). Furthermore, we show a personalized retrieval system which leads
to another increase of around 10%. Keywords: Chemical Digital Libraries; Personalization; Context Search | |||
| Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora | | BIBAK | Full-Text | 297-308 | |
| Nam Khanh Tran; Sergej Zerr; Kerstin Bischoff; Claudia Niederée; Ralf Krestel | |||
| Topic modeling has gained a lot of popularity as a means for identifying and
describing the topical structure of textual documents and whole corpora. There
are, however, many document collections such as qualitative studies in the
digital humanities that cannot easily benefit from this technology. The limited
size of those corpora leads to poor quality topic models. Higher quality topic
models can be learned by incorporating additional domain-specific documents
with similar topical content. This, however, requires finding or even manually
composing such corpora, requiring considerable effort. For solving this
problem, we propose a fully automated adaptable process of topic cropping. For
learning topics, this process automatically tailors a domain-specific Cropping
corpus from a general corpus such as Wikipedia. The learned topic model is then
mapped to the working corpus via topic inference. Evaluation with a real world
data set shows that the learned topics are of higher quality than those learned
from the working corpus alone. In detail, we analyzed the learned topics with
respect to coherence, diversity, and relevance. Keywords: digital humanities; qualitative data; topic modeling | |||
| A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles | | BIBA | Full-Text | 309-314 | |
| Francesco Cauteruccio; Giovambattista Ianni | |||
| In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known "seed" authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique "meta-wrapper", not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments. | |||
| Securing Access to Complex Digital Artifacts -- Towards a Controlled Processing Environment for Digital Research Data | | BIBA | Full-Text | 315-320 | |
| Johann Latocha; Klaus Rechert; Isao Echizen | |||
| Providing secured and restricted access to digital objects, especially access to digital research data, for a general audience poses new challenges to memory institutions. For instance, to protect individuals, only anonymized or pseudonymized data should be released to a general audience. Standard procedures have been established over time to cope with privacy issues of non-interactive digital objects like text, audio and video. Appearances of identifiers and potentially also quasi-identifiers were removed by a simple overlay, e.g. in text documents such appearances were simply blackened out. Today's digital artifacts, especially research data, have complex, non-linear and even interactive manifestations. Thus, a different approach to securing access to complex digital artifacts is required. This paper presents an architecture and technical methods to control access to digital research data. | |||
| Restoring Semantically Incomplete Document Collections Using Lexical Signatures | | BIBAK | Full-Text | 321-332 | |
| Luis Meneses; Himanshu Barthwal; Sanjeev Singh; Richard Furuta; Frank Shipman | |||
| Unexpected changes create a problem when managing missing resources in a
digital collection. In decentralized and distributed collections such as
Walden's Paths, a missing point or an incomplete resource is of grave
importance as it can potentially interrupt the continuity in the narration and
render the collection semantically incomplete. We can foresee two possible
scenarios occurring when resources cannot be found. First, we have access to a
copy of the missing document or to its lexical signatures, which allows us to
find the missing resource. The second case is more interesting to us. What
happens if we don't have any valid metadata associated to the missing resource?
To solve this problem, we used the lexical signatures of valid documents within
a collection to find suitable replacements for absent resources. As results we
found that traditional similarity metrics do not adequately convey the
relationships between the elements in the collections. Our analyses also showed
that our procedures were able to restore the semantic integrity of incomplete
document collections. Keywords: Semantic replacements; Web resource management; distributed collections | |||
| Resurrecting My Revolution | | BIBAK | Full-Text | 333-345 | |
| Hany M. Salaheldeen; Michael L. Nelson | |||
| In previous work we reported that resources linked in tweets disappeared at
the rate of 11% in the first year followed by 7.3% each year afterwards. We
also found that in the first year 6.7%, and 14.6% in each subsequent year, of
the resources were archived in public web archives. In this paper we revisit
the same dataset of tweets and find that our prior model still holds and the
calculated error for estimating percentages missing was about 4%, but we found
the rate of archiving produced a higher error of about 11.5%. We also
discovered that resources have disappeared from the archives themselves (7.89%)
as well as reappeared on the live web after being declared missing (6.54%). We
have also tested the availability of the tweets themselves and found that
10.34% have disappeared from the live web. To mitigate the loss of resources on
the live web, we propose the use of a "tweet signature". Using the Topsy API,
we extract the top five most frequent terms from the union of all tweets about
a resource, and use these five terms as a query to Google. We found that using
tweet signatures results in discovering replacement resources with 70+% textual
similarity to the missing resource 41% of the time. Keywords: Web Archiving; Social Media; Digital Preservation; Reconstruction | |||
| Who and What Links to the Internet Archive | | BIBAK | Full-Text | 346-357 | |
| Yasmin Alnoamany; Ahmed Alsum; Michele C. Weigle; Michael L. Nelson | |||
| The Internet Archive's (IA) Wayback Machine is the largest and oldest public
web archive and has become a significant repository of our recent history and
cultural heritage. Despite its importance, there has been little research about
how it is discovered and used. Based on web access logs, we analyze what users
are looking for, why they come to IA, where they come from, and how pages link
to IA. We find that users request English pages the most, followed by the
European languages. Most human users come to web archives because they do not
find the requested pages on the live web. About 65% of the requested archived
pages no longer exist on the live web. We find that more than 82% of human
sessions connect to the Wayback Machine via referrals from other web sites,
while only 15% of robots have referrers. Most of the links (86%) from websites
are to individual archived pages at specific points in time, and of those 83%
no longer exist on the live web. Keywords: Web Archiving; Web Server Logs; Web Usage Mining; Language Detection | |||
| A Study of Digital Curator Competencies -- A Delphi Study | | BIBAK | Full-Text | 358-361 | |
| Anna Maria Tammaro; Melody Madrid | |||
| The aim of this research was to define competencies for digital curators,
and to validate them through a Delphi process in the context of Library,
Archives, Museum curriculum development. The objective for the study was to
obtain consensus regarding competence statements for Library, Archives and
Museum digital curators. Keywords: Digital Curation; Digital Curator Competencies; Delphi Method | |||
| Large Scale Citation Matching Using Apache Hadoop | | BIBAK | Full-Text | 362-365 | |
| Mateusz Fedoryszak; Dominika Tkaczyk; Lukasz Bolikowski | |||
| During the process of citation matching links from bibliography entries to
referenced publications are created. Such links are indicators of topical
similarity between linked texts, are used in assessing the impact of the
referenced document and improve navigation in the user interfaces of digital
libraries. In this paper we present a citation matching method and show how to
scale it up to handle great amounts of data using appropriate indexing and a
MapReduce paradigm in the Hadoop environment. Keywords: citation matching; approximate indexing; MapReduce; Hadoop; CRF; SVM | |||
| Building an Online Environment for Usefulness Evaluation | | BIBAK | Full-Text | 366-369 | |
| Jasmin Hügi; René Schneider | |||
| In this paper we present a methodological framework for usefulness
evaluation of digital libraries and information services that has been tested
successfully in two case studies before developing a corresponding tool that
may be used for further investigations. The tool is based on a combination of a
knowledge base with exploitable and modifiable questions and an open source
tool for online-questionnaires. Keywords: Digital Libraries; Usefulness Evaluation; Quality metrics | |||
| Topic Modeling for Search and Exploration in Multivariate Research Data Repositories | | BIBAK | Full-Text | 370-373 | |
| Maximilian Scherer; Tatiana von Landesberger; Tobias Schreck | |||
| Huge amounts of multivariate research data are produced and made publicly
available in digital libraries. Little research focused on similarity functions
that take multivariate data documents as a whole into account. Such similarity
functions are highly beneficial for users, by enabling them to browse and query
large collections of multivariate data using nearest-neighbor indexing. In this
paper we tackle this challenge and propose a novel similarity function for
multivariate data documents based on topic-modeling. Based on a previously
developed bag-of-words approach for multivariate data, we can then learn a
topic model for a collection of multivariate data documents and represent each
document as a mixture of topics. This representation is very suitable for
efficient nearest-neighbor indexing and clustering according to the topic
distribution of a document. We present a use-case where we apply this approach
to retrieval of multivariate data in the field of climate research. Keywords: multivariate data; content-based retrieval; bag-of-words; lda | |||
| Time-Based Exploratory Search in Scientific Literature | | BIBAK | Full-Text | 374-377 | |
| Silviu Homoceanu; Sascha Tönnies; Philipp Wille; Wolf-Tilo Balke | |||
| State-of-the-art faceted search graphical user interfaces for digital
libraries provide a wide range of filters perfectly suitable for narrowing down
results for well-defined user needs. However, they fail to deliver summarized
overview information for users that need to familiarize themselves with a new
scientific topic. In fact, exploratory search remains one of the major problems
for scientific literature search in digital libraries. Exploiting a user study
about how computer scientists actually approach new subject areas we developed
ESSENCE, a system for empowering exploratory search in scientific literature. Keywords: Digital Libraries; User Interface; Exploratory Search; Timeline | |||
| Crowds and Content: Crowd-Sourcing Primitives for Digital Libraries | | BIBAK | Full-Text | 378-381 | |
| Stuart Dunn; Mark Hedges | |||
| This poster reports on a nine month scoping survey of research in the arts
and humanities involving crowd-sourcing. This study proposed a twelve-facet
typology of research processes currently in use, and these are reported here,
along with the context of current research practice, the types of research
assets which are currently being exposed to crowd-sourcing, and the sorts of
outputs (including digital libraries and collections) which such projects are
producing. Keywords: crowd-sourcing; typology; humanities | |||
| Regional Effects on Query Reformulation Patterns | | BIBA | Full-Text | 382-385 | |
| Steph Jesper; Paul Clough; Mark Hall | |||
| This paper describes an in-depth study of the effects of geographic region on search patterns; particularly query reformulations, in a large query log from the UK National Archives (TNA). A total of 1,700 sessions involving 9,447 queries from 17 countries were manually analyzed for their semantic composition and pairs of queries for their reformulation type. Results show country-level variations for the types of queries commonly issued and typical patterns of query reformulation. Understanding the effects of regional differences will assist with the future design of search algorithms at TNA as they seek to improve their international reach. | |||
| Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times | | BIBAK | Full-Text | 386-390 | |
| Joeran Beel; Stefan Langer; Marcel Genzmehr; Andreas Nürnberger | |||
| How do click-through rates vary between research paper recommendations
previously shown to the same users and recommendations shown for the very first
time? To answer this question we analyzed 31,942 research paper recommendations
given to 1,155 students and researchers with the literature management software
Docear. Results indicate that recommendations should only be given once.
Click-through rates for 'fresh', i.e. previously unknown, recommendations are
twice as high as for already known recommendations. Results also show that some
users are 'oblivious'. It frequently happened that users clicked on
recommendations they already knew. In one case the same recommendation was
shown six times to the same user and the user clicked on it each time again.
Overall, around 50% of clicks on reshown recommendations were such
'oblivious-clicks'. Keywords: recommender systems; persistence; re-rating; research paper | |||
| Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling | | BIBAK | Full-Text | 391-395 | |
| Joeran Beel; Stefan Langer; Marcel Genzmehr | |||
| In this paper we show that organic recommendations are preferred over
commercial recommendations even when they point to the same freely downloadable
research papers. Simply the fact that users perceive recommendations as
commercial decreased their willingness to accept them. It is further shown that
the exact labeling of recommendations matters. For instance, recommendations
labeled as 'advertisement' performed worse than those labeled as 'sponsored'.
Similarly, recommendations labeled as 'Free Research Papers' performed better
than those labeled as 'Research Papers'. However, whatever the differences
between the labels were -- the best performing recommendations were those with
no label at all. Keywords: recommender systems; organic search; sponsored search; labeling | |||
| The Impact of Demographics (Age and Gender) and Other User-Characteristics on Evaluating Recommender Systems | | BIBAK | Full-Text | 396-400 | |
| Joeran Beel; Stefan Langer; Andreas Nürnberger; Marcel Genzmehr | |||
| In this paper we show the importance of considering demographics and other
user characteristics when evaluating (research paper) recommender systems. We
analyzed 37,572 recommendations delivered to 1,028 users and found that elderly
users clicked more often on recommendations than younger ones. For instance,
20-24 years old users achieved click-through rates (CTR) of 2.73% on average
while CTR for users between 50 and 54 years was 9.26%. Gender only had a
marginal impact (CTR males 6.88%; females 6.67%) but other user characteristics
such as whether a user was registered (CTR: 6.95%) or not (4.97%) had a strong
impact. Due to the results we argue that future research articles on
recommender systems should report detailed data on their users to make results
better comparable. Keywords: recommender systems; demographics; evaluation; research paper | |||
| PoliMedia | | BIBAK | Full-Text | 401-404 | |
| Max Kemman; Martijn Kleppe | |||
| Analysing media coverage across several types of media-outlets is a
challenging task for academic researchers. The PoliMedia project aimed to
showcase the potential of cross-media analysis by linking the digitised
transcriptions of the debates at the Dutch Parliament (Dutch Hansard) with
three media-outlets: 1) newspapers in their original layout of the historical
newspaper archive at the National Library, 2) radio bulletins of the Dutch
National Press Agency (ANP) and 3) newscasts and current affairs programs from
the Netherlands Institute for Sound and Vision. In this paper we describe
generally how these links were created and we introduce the PoliMedia search
user interface developed for scholars to navigate the links. In our evaluation
we found that the linking algorithm had a recall of 67% and precision of 75%.
Moreover, in an eye tracking evaluation we found that the interface enabled
scholars to perform known-item and exploratory searches for qualitative
analysis. Keywords: political communication; parliamentary debates; newspapers; radio bulletins;
television; cross-media analysis; semantic web; information retrieval | |||
| Eye Tracking the Use of a Collapsible Facets Panel in a Search Interface | | BIBAK | Full-Text | 405-408 | |
| Max Kemman; Martijn Kleppe; Jim Maarseveen | |||
| Facets can provide an interesting functionality in digital libraries.
However, while some research shows facets are important, other research found
facets are only moderately used. Therefore, in this exploratory study we
compare two search interfaces; one where the facets panel is always visible and
one where the facets panel is hidden by default. Our main research question is
"Is folding the facets panel in a digital library search interface beneficial
to academic users?" By performing an eye tracking study with N=24, we measured
search efficiency, distribution of attention and user satisfaction. We found no
significant differences in the eye tracking data nor in usability feedback and
conclude that collapsing facets is neither beneficial nor detrimental. Keywords: eye tracking; facets; information retrieval; usability; user studies;
digital library; user behaviour; search user interface | |||
| Efficient Access to Emulation-as-a-Service -- Challenges and Requirements | | BIBA | Full-Text | 409-412 | |
| Dirk von Suchodoletz; Klaus Rechert | |||
| The shift of the usually non-trivial task of emulation of obsolete software environments from the end user to specialized providers through Emulation-as-a-Service (EaaS) helps to simplify digital preservation and access strategies. End users interact with emulators remotely through standardized (web-)clients on their various devices. Besides offering relevant advantages, EaaS makes emulation a networked service introducing new challenges like remote rendering, stream synchronization and real time requirements. Various objectives, like fidelity, performance or authenticity can be required depending on the actual purpose and user expectations. Various original environments and complex artefacts have different needs regarding expedient and/or authentic performance. | |||
| RDivF: Diversifying Keyword Search on RDF Graphs | | BIBAK | Full-Text | 413-416 | |
| Nikos Bikakis; Giorgos Giannopoulos; John Liagouris; Dimitrios Skoutas; Theodore Dalamagas; Timos Sellis | |||
| In this paper, we outline our ongoing work on diversifying keyword search
results on RDF data. Given a keyword query over an RDF graph, we define the
problem of diversifying the search results and we present diversification
criteria that take into consideration both the content and the structure of the
results, as well as the underlying RDF/S-OWL schema. Keywords: Linked Data; Semantic Web; Web of Data; Structured Data | |||
| Evolution of eBooks on Demand Web-Based Service: A Perspective through Surveys | | BIBAK | Full-Text | 417-420 | |
| Õnne Mets; Silvia Gstrein; Veronika Gründhammer | |||
| In 2007 a document delivery service eBooks on Demand (EOD) was launched by
13 libraries from 8 European countries. It enables users to request
digitisation of public domain books. By 2013 the self-sustained network has
enlarged to 35 libraries in 12 countries and generated thousands of PDF
e-books. Several surveys have been carried out to design the service to be
relevant and attractive for end-users and libraries. The current paper explores
the EOD service through a retrospective overview of the surveys, describes the
status quo including ongoing improvements and suggests further surveys. The
focus of the surveys illustrates the benchmarks (such as user groups and their
expectations, evaluation of the service environment and form of outcomes,
business to business opportunities and professional networking) that have been
achieved to run an effective library service. It aims to be a possible model
for libraries to start and develop a service. Keywords: user surveys; evaluation; library services; digital library services;
digitisaton on demand; online environments; ebooks | |||
| Embedding Impact into the Development of Digital Collections: Rhyfel Byd 1914-1918 a'r Profiad Cymreig / Welsh Experience of the First World War 1914-1918 | | BIBA | Full-Text | 421-424 | |
| Lorna Hughes | |||
| This poster describes a mass digitisation project led by the National Library of Wales to digitize archives and special collections about the Welsh experience of the First World War. The digital archive that will be created by the project will be a cohesive, digitally reunified archive that has value for research, education, and public engagement in time for the hundredth anniversary of the start of the First World War. In order to maximize impact of the digital outputs of the project, it has actively sought to embed methods that will increase its value to the widest audience. This paper describes these approaches and how they sit within the digital life cycle of project development. | |||
| Creating a Repository of Community Memory in a 3D Virtual World: Experiences and Challenges | | BIBAK | Full-Text | 425-428 | |
| Ekaterina Prasolova-Førland; Mikhail Fominykh; Leif Martin Hokstad | |||
| In this paper, we focus on creation of 3D content in learning communities,
exemplified with a Virtual Gallery and Virtual Research Arena projects in the
virtual campus of our university in Second Life. Based on our experiences, we
discuss the possibilities and challenges of creating a repository of community
memory in 3D virtual worlds. Keywords: repository of community memory; learning communities; 3D virtual worlds | |||
| Social Navigation Support for Groups in a Community-Based Educational Portal | | BIBAK | Full-Text | 429-433 | |
| Peter Brusilovsky; Yiling Lin; Chirayu Wongchokprasitti; Scott Britell; Lois M. L. Delcambre; Richard Furuta; Kartheek Chiluka; Lillian N. Cassel; Ed Fox | |||
| This work seeks to enhance a user's experience in a digital library using
group-based social navigation. Ensemble is a portal focusing on computing
education as part of the US National Science Digital Library providing access
to a large amount of learning materials and resources for education in Science,
Technology, Engineering and Mathematics. With so many resources and so many
contributing groups, we are seeking an effective way to guide users to find the
right resource(s) by using group-based social navigation. This poster
demonstrates how group-based social navigation can be used to extend digital
library portals and how it can be used to guide portal users to valuable
resources. Keywords: social navigation; digital library; portal; navigation support | |||
| Evaluation of Preserved Scientific Processes | | BIBA | Full-Text | 434-437 | |
| Rudolf Mayer; Mark Guttenbrunner; Andreas Rauber | |||
| Digital preservation research has seen an increased focus is on objects that are non-deterministic but depend on external events like user input or data from external sources. Among those is the preservation of scientific processes, aiming at reuse of research outputs. Ensuring that the preserved object is equivalent to the original is a key concern, and is traditionally measured by comparing significant properties of the objects. We adapt a framework for comparing emulated versions of a digital object to measure equivalence also in processes. | |||
| An Open Source System Architecture for Digital Geolinguistic Linked Open Data | | BIBA | Full-Text | 438-441 | |
| Emanuele Di Buccio; Giorgio Maria Di Nunzio; Gianmaria Silvello | |||
| Digital Geolinguistic systems encourages collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. These systems can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. In this poster, we present a system architecture based on a LOD approach the aim of which is to increase the level of interoperability of geolinguistic applications and the reuse of the data. | |||
| Committee-Based Active Learning for Dependency Parsing | | BIBAK | Full-Text | 442-445 | |
| Saeed Majidi; Gregory Crane | |||
| Annotations on structured corpora provide a foundational instrument for
emerging linguistic research. To generate annotations automatically,
data-driven dependency parsers need a large annotated corpus to learn from. But
these annotations are expensive to collect and require a labor intensive task.
In order to reduce the costs of annotation, we provide a novel framework in
which a committee of dependency parsers collaborate to improve their efficiency
using active learning. Keywords: active learning; corpus annotation; dependency parsing | |||
| PoliticalMashup Ngramviewer | | BIBA | Full-Text | 446-449 | |
| Bart de Goede; Justin van Wees; Maarten Marx; Ridho Reinanda | |||
| The PoliticalMashup Ngramviewer is an application that allows a user to visualise the use of terms and phrases in the "Tweede Kamer" (the Dutch parliament). Inspired by the Google Books Ngramviewer, the PoliticalMashup Ngramviewer additionally allows for faceting on politicians and parties, providing a more detailed insight in the use of certain terms and phrases by politicians and parties with different points of view. | |||
| Monitrix -- A Monitoring and Reporting Dashboard for Web Archivists | | BIBAK | Full-Text | 450-453 | |
| Rainer Simon; Andrew Jackson | |||
| This demonstration paper introduces Monitrix, an upcoming monitoring and
reporting tool for Web archivists. Monitrix works in conjunction with the
Heritrix 3 Open Source Web crawler and provides real-time analytics about an
ongoing crawl, as well as summary information aggregated about crawled hosts
and URLs. In addition, Monitrix monitors the crawl for the occurrence of
suspicious patterns that may indicate undesirable behavior, such as crawler
traps or blocking hosts. Monitrix is developed as a cooperation between the
British Library's UK Web Archive and the Austrian Institute of Technology, and
is licensed under the terms of the Apache 2 Open Source license. Keywords: Web Archiving; Quality Assurance; Analytics | |||
| SpringerReference: A Scientific Online Publishing System | | BIBAK | Full-Text | 454-457 | |
| Sebastian Lindner; Christian Simon; Daniel Wieth | |||
| This paper presents an online publishing system with focus on scientific
peer reviewed content. The goal is to provide authors and editors with a
platform to constantly publish and update content well in advance of their
print editions across every subject. The techniques in this paper show some of
the main components of the implemented document lifecycle. These include a
custom document workflow to cope with HTML- and file-based content, an online
editing platform including LaTeX formula generation, automatic link insertion
between different documents, the generation of auto suggests to simplify search
and navigation and a Solr-based search engine. Keywords: digital library; semi structured data; dynamic scientific content; data
mining; document workflow; collaboration | |||
| Data Searchery | | BIBAK | Full-Text | 458-461 | |
| Paolo Manghi; Andrea Mannocci | |||
| The novel e-Science's data-centric paradigm has proved that interlinking
publications and research data objects coming from different realms and data
sources (e.g. publication repositories, data repositories) makes dissemination,
re-use, and validation of research activities more effective. Scholarly
Communication Infrastructures are advocated for bridging such data sources, by
offering tools for identification, creation, and navigation of relationships.
Since realization and maintenance of such infrastructures is expensive, in this
demo we propose a lightweight approach for "preliminary analysis of data source
interlinking" to help practitioners at evaluating whether and to what extent
realizing them can be effective. We present Data Searchery, a configurable tool
enabling users to easily plug-in data sources from different realms with the
purpose of cross-relating their objects, be them publications or research data,
by identifying relationships between their metadata descriptions. Keywords: Interoperability; Interlinking; Research Data; Publications | |||
| PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment | | BIBA | Full-Text | 462-465 | |
| Eneko Agirre; Ander Barrena; Kike Fernandez; Esther Miranda; Arantxa Otegi; Aitor Soroa | |||
| Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS. | |||
| Leveraging Domain Specificity to Improve Findability in OER Repositories | | BIBAK | Full-Text | 466-469 | |
| Darina Dicheva; Christo Dichev | |||
| This paper addresses the problem of improving the findability of open
educational resources (OER) in Computer Science. It presents a domain-specific
OER reference repository and portal aimed at increasing the low OER use. The
focus is on enhancing the search and navigation capabilities. A distinctive
feature is the proposed query-by-navigation method. Keywords: Open Educational Resources; Information Retrieval; Search | |||
| VirDO: A Virtual Workspace for Research Documents | | BIBAK | Full-Text | 470-473 | |
| George E. Raptis; Christina P. Katsini; Stephen J. Payne | |||
| We report the design of a system which integrates a suite of tools to allow
scholars to manage related documents in their personal digital stores. VirDO
provides a virtual workspace in which pdfs can be placed and displayed, and
which allows these documents to be manipulated in various ways that prior
literature suggests to be useful. Particularly noteworthy are the various maps
that support users in uncovering the inter-relations among documents in the
workspace, including citation relations and flexible user-defined tags. Early
evaluation of the system was positive: especially promising was the increasing
use of maps by two participants who used VirDO for their own research over a
period of a week, as well as the extensive use by all participants of sticky
notes. Keywords: sensemaking; document mapping; annotation; scholarship | |||
| Domain Search and Exploration with Meta-Indexes | | BIBAK | Full-Text | 474-477 | |
| Michael Huggett; Edie Rasmussen | |||
| In order to facilitate navigation and search of large collections of digital
books, we have developed a new knowledge structure, the meta-index, which
aggregates the back-of-book indexes within a subject domain. Using a test
collection of digital books, we demonstrate the use of the meta-index and
associated metrics that characterize the books within a digital domain, and
explore some of the challenges presented by the meta-index structure. Keywords: Indexes; Meta-indexes; Bibliometrics; Visualization; Search; User interfaces | |||
| COST Actions and Digital Libraries: Between Sustaining Best Practices and Unleashing Further Potential | | BIBA | Full-Text | 478-479 | |
| Matthew J. Driscoll; Ralph Stübner; Touradj Ebrahimi; Muriel Foulonneau; Andreas Nürnberger; Andrea Scharnhorst; Joie Springer | |||
| The panel brings together chairs or key participants from a number of COST-funded Actions from several domains -- Individuals, Societies, Cultures and Health (ISCH), Information and Communication Technologies (ICT) -- as well as a Trans-Domain Action), a Science Officer from the COST Office and will be complemented by a representative of the Memory of the World programme of UNESCO. | |||
| e-Infrastructures for Digital Libraries...the Future | | BIBA | Full-Text | 480-481 | |
| Wim Jansen; Roberto Barbera; Michel Drescher; Antonella Fresa; Matthias Hemmje; Yannis Ioannidis; Norbert Meyer; Nick Poole; Peter Stanchev | |||
| The digital ICT revolution is profoundly changing the way knowledge is created, communicated and is being deployed. New research methods based on computing and "big data" enable new means and forms for scientific collaboration also through policy measures supporting open access to data and research results. The exponential growth of digital resources and services is supported by the deployment of e-Infrastructure, which allows researchers to access remote facilities, run complex simulations or to manage and exchange unprecedented amounts of digital data. | |||
| The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits | | BIBA | Full-Text | 482-483 | |
| Laura Mandell; Violeta Ilik | |||
| We offer a half day tutorial that will explore the role of XML and XSLT (eXtensible Stylesheet Language Transformations, themselves XML documents) in digital library and digital humanities projects. Digital libraries ideally aim to provide both access and interaction. Digital libraries and digital humanities projects should foster edition building and curation. Therefore, this tutorial aims to teach librarians, scholars, and those involved in cultural heritage projects a scripting language that allows for easy manipulation of metadata, pictures, and text. The modules in this tutorial will help participants in planning for their own organizations digital efforts and scholarly communications as well as in facilitating their efforts at digitization and creating interoperability between document editions. In five instructional modules, including hands-on exercises, we will help participants gain experience and knowledge of the possibilities that XSLT offers in transforming documents from XML to HTML, from XML to text, and from one metadata schema to another. | |||
| Mapping Cross-Domain Metadata to the Europeana Data Model (EDM) | | BIBAK | Full-Text | 484-485 | |
| Valentine Charles; Antoine Isaac; Vassilis Tzouvaras; Steffen Hennicke | |||
| With the growing amount and the diversity of aggregation services for
cultural heritage, the challenge of data mapping has become crucial. Keywords: Interoperability; EDM; mapping; MINT | |||
| State-of-the-Art Tools for Text Digitisation | | BIBAK | Full-Text | 486-487 | |
| Bob Boelhouwer; Adam Dudczak; Sebastian Kirch | |||
| The goal of this tutorial (organised by the Succeed project) is to introduce
participants to state-of-the-art tools in digitisation and text processing
which have been developed in recent research projects. The tutorial will focus
on hands-on demonstration and on the testing of the tools in real-life
situations, even those provided by the participants. Keywords: Digitisation; OCR; Image Enhancement; Enrichment; Lexicon; Ground Truth; NLP | |||
| ResourceSync: The NISO/OAI Resource Synchronization Framework | | BIBA | Full-Text | 488-489 | |
| Herbert Van de Sompel; Michael L. Nelson; Martin Klein; Robert Sanderson | |||
| This tutorial provides an overview and a practical introduction to ResourceSync, a web-based synchronization framework consisting of multiple modular capabilities that a server can selectively implement to enable third party systems to remain synchronized with the server's evolving resources. The tutorial motivates the ResourceSync approach by outlining several synchronization use cases including scholarly article repositories, OAI-PMH repositories, linked data knowledge bases, as well as content aggregators. It details the concepts of the ResourceSync capabilities, their discovery mechanisms, and their serialization based on the widely adopted Sitemap protocol. The tutorial further hints at the extensibility of the synchronization framework, for example, for scenarios to provide references to mirror locations of synchronization resources, to transferring partial content, and to offering historical data. | |||
| From Preserving Data to Preserving Research: Curation of Process and Context | | BIBA | Full-Text | 490-491 | |
| Rudolf Mayer; Stefan Pröll; Andreas Rauber; Raul Palma; Daniel Garijo | |||
| In the domain of eScience, investigations are increasingly collaborative.
Most scientific and engineering domains benefit from building on top of the
outputs of other research: By sharing information to reason over and data to
incorporate in the modelling task at hand.
This raises the need to provide means for preserving and sharing entire eScience workflows and processes for later reuse. It is required to define which information is to be collected, create means to preserve it and approaches to enable and validate the re-execution of a preserved process. This includes and goes beyond preserving the data used in the experiments, as the process underlying its creation and use is essential. This tutorial thus provides an introduction to the problem domain and discusses solutions for the curation of eScience processes. | |||
| Linked Data for Digital Libraries | | BIBA | Full-Text | 492-493 | |
| Uldis BojÄ rs; Nuno Lopes; Jodi Schneider | |||
| This tutorial will empower attendees with the necessary skills to take advantage of Linked Data already available on the Web, provide insights on how to incorporate this data and tools into their daily workflow, and finally touch upon how the attendees' own data can be shared as Linked Data. | |||