| Improving Placeholders in Digital Documents | | BIBAK | Full-Text | 1-12 | |
| George Buchanan; Jennifer Pearson | |||
| Placeholders in physical documents provide critical support for the human
reader in relocating material and their place in the text. However, the
equivalent tools in digital documents have long been identified as suffering
from unintuitive interactions and low rates of use. This paper evaluates the
current bookmarking technologies found in digital document readers, and
identifies a number of specific and significant shortcomings in their support
for user activity. We introduce some simple interactions that close the gap
between user requirements and the placeholder support in a simple document
reader program. Through this, we demonstrate that improved interactions can be
created that reduce the barriers that inhibit placeholder use in digital
documents. Keywords: Digital Libraries; Interaction Design; Document Triage | |||
| Towards Ontology-Based Chinese E-Government Digital Archives Knowledge Management | | BIBAK | Full-Text | 13-24 | |
| Ying Jiang; Hui Dong | |||
| This paper focuses on the problem of e-Government digital archives
management in China. It firstly depicts the background of e-Government progress
of China, and then points out the knowledge utilization challenge of
e-Government digital archives. What's following is an introduction of a
project, which aims at making digital archives in a provincial archives bureau
easy to use for civil servants. The main approach of this project is ontology
related technology, including the building of knowledge base and the
realization of knowledge retrieval system. It's actually a knowledge management
solution for digital archives. Keywords: Digital Archives; Chinese E-Government; Ontology; Ontology Molecule;
Knowledge Management | |||
| Distributed Preservation Services: Integrating Planning and Actions | | BIBA | Full-Text | 25-36 | |
| Christoph Becker; Miguel Ferreira; Michael Kraxner; Andreas Rauber; Ana Alice Baptista; José Carlos Ramalho | |||
| Digital preservation has turned into an active field of research. The most
prominent approaches today are migration and emulation; especially considering
migration, a range of working tools is available, each with specific strengths
and weaknesses. The decision process on which actions to take to preserve a
given set of digital objects for future access, i.e., preservation planning, is
usually an ad-hoc procedure with little tool support and even less support for
automation.
This paper presents the integration of tools and services for object migration and characterization through a service oriented architecture into a planning tool called Plato, thus creating a distributed and highly automated preservation planning environment. | |||
| Archive Design Based on Planets Inspired Logical Object Model | | BIBA | Full-Text | 37-40 | |
| Eld Zierau; Anders Sewerin Johansen | |||
| This paper describes a proposal for a logical data model based on preliminary work within the Planets project. In OAIS terms the main areas discussed are related to the introduction of a logical data model for representing the past, present and future versions of the digital object associated with the Archival Storage Package for the publications deposited by our client repositories. | |||
| Significant Characteristics to Abstract Content: Long Term Preservation of Information | | BIBAK | Full-Text | 41-49 | |
| Manfred Thaller; Volker Heydegger; Jan Schnasse; Sebastian Beyl; Elona Chudobkaite | |||
| The (automatic) extraction of significant characteristics of files is an
important feature of all long term preservation activities. We propose,
however, that for the necessary automatic evaluation of the outcomes of certain
preservation actions -- notably migration -- an approach is necessary, which
follows other traditions in the abstraction of format descriptions. To
implement a strategy for the automatic evaluation of various actions within a
preservation environment, we define two formal, XML base languages: One
allowing to define the content of a specific file, the other describing a file
format in such a way, that it can be handled by multi-purpose software. Keywords: File characteristics; format definition languages; data abstraction; long
term preservation | |||
| Can Social Tags Help You Find What You Want? | | BIBAK | Full-Text | 50-61 | |
| Khasfariyati Razikin; Dion Hoe-Lian Goh; Alton Yeow-Kuan Chua; Chei Sian Lee | |||
| One of the uses of social tagging is to associate freely selected terms
(tags) to resources for sharing resources among tag consumers. This enables tag
consumers to locate new resources through the collective intelligence of other
tag creators, and offers a new avenue for resource discovery. This paper
investigates the effectiveness of tags as resource descriptors determined
through the use of text categorisation using Support Vector Machines. Two text
categorisation experiments were done for this research, and tags and web pages
from del.icio.us were used. The first study concentrated on the use of terms as
its features. The second study used both terms and its tags as part of its
feature set. The results indicate that the tags were not always reliable
indicators of the resource contents. At the same time, the results from the
terms only experiment were better compared to the experiment with terms and
tags. A deeper analysis of a sample of tags and documents were also conducted
and implications of this research are discussed. Keywords: Social tagging; Resource Descriptors; Resource Discovery; Support Vector
Machines | |||
| TagNSearch: Searching and Navigating Geo-referenced Collections of Photographs | | BIBAK | Full-Text | 62-73 | |
| Quang Minh Nguyen; Thi Nhu Quynh Kim; Dion Hoe-Lian Goh; Yin Leng Theng; Ee-Peng Lim; Aixin Sun; Chew-Hung Chang; Kalyani Chatterjea | |||
| TagNSearch is a map-based tool for searching and browsing geo-tagged
photographs based on their associated tags. Using Flickr as the dataset,
TagNSearch returns, for a given query, photographs clustered by locations, and
summarizes each cluster of photographs by cluster-specific tags. A map-based
interface is also provided to help users better search, navigate and browse
photographs and their clusters. A qualitative evaluation comparing TagNSearch
and an existing tag search support in Flickr was also conducted. The task
involved finding locations associated with a set of photographs. Participants
were found to perform this task better using TagNSearch than Flickr. Keywords: Social tagging; TagNSearch; clustering; Flickr; geo-tagged photographs | |||
| Evaluation of Semantic and Social Technologies for Digital Libraries | | BIBA | Full-Text | 74-77 | |
| Sebastian Ryszard Kruk; Ewelina Kruk; Katarzyna Stankiewicz | |||
| Libraries are the tools we use to learn and to answer our questions. The quality of our work depends, among others, on the quality of the tools we use. Recently, the semantic web and social networking technologies are being introduced to the digital libraries domain. In this article we present the results of an evaluation of social and semantic end-user information discovery services for the digital libraries. | |||
| Identifying Quotations in Reference Works and Primary Materials | | BIBAK | Full-Text | 78-87 | |
| Andrea Ernst-Gerlach; Gregory Crane | |||
| Identifying quotations from reference works in primary materials is a very
important feature for digital libraries. By adding corresponding citation links
to the original text, we can help contextualize the source material. In this
paper we introduce an algorithm for identifying citations automatically based
on an analysis of the structure of quotations from three different reference
works of Latin texts. An evaluation shows that this approach is capable of
finding a large number of quotations with which no machine actionable citations
are associated. Additionally this approach can be applied for quotations that
have been altered in a range of ways from their source. Keywords: citations; reference works | |||
| Superimposed Information Architecture for Digital Libraries | | BIBA | Full-Text | 88-99 | |
| David W. Archer; Lois M. L. Delcambre; Fabio Corubolo; Lillian N. Cassel; Susan Price; Uma Murthy; David Maier; Edward A. Fox; Sudarshan Murthy; John McCall; Kiran Kuchibhotla; Rahul Suryavanshi | |||
| A variety of software tools commonly used in research and industry allow a user to select (usually contiguous) segments of content to be annotated, referenced, or otherwise distinguished from a containing document. However, digital libraries (DLs) often curate only full documents, not these selected sub-documents. Thus, sub-documents in a DL may not have the full complement of metadata, and they may not be visible using DL browse and search facilities. We are interested in explicit representation of sub-documents in a DL environment. In this paper, we show how sub-documents may be represented and curated. We focus on the explicit representation of what we call a mark -- an encapsulated address of a sub-document along with associated context. Our contributions are: a software architecture for representing marks as first-class objects together with regular documents in a DL; and an implementation of our architecture using existing software packages with modest enhancements. This approach provides new capabilities for the DL with minimal modification to tools and interfaces familiar to the DL user. | |||
| Impact-ED -- A New Model of Digital Library Impact Evaluation | | BIBAK | Full-Text | 100-105 | |
| Gemma Madle; Patty Kostkova; Abdul V. Roudsari | |||
| This paper presents Impact-ED, a new model for digital library impact
evaluation. The model draws on assumptions from the Theory of Planned Behaviour
and the Sense-Making Model. The paper discusses the current shortfalls of
digital library impact evaluation and presents an alternative. Knowledge and
attitude are put forward as potential measures of impact and different methods
are triangulated and data linked to provide a comprehensive picture of the
impact of the library at the time of use. The model shows how the digital
library is being used to benefit users in their work, how it is changing their
knowledge and attitudes and how the information found is used in real-time in
the real world. It is being tested in the healthcare domain on the National
Resource for Infection Control (www.nric.org.uk) but is expected to be
transferable to other domains as further work will prove. Keywords: Digital Library Evaluation; Sense-making; Knowledge and Attitudes; Impact
Evaluation | |||
| Prioritisation, Resources and Search Terms: A Study of Decision-Making at the Virtual Reference Desk | | BIBAK | Full-Text | 106-116 | |
| Simon Attfield; Stephann Makri; James Kalbach; Ann Blandford; Stephen De Gabrielle; Mark Edwards | |||
| The reinterpretation of the traditional reference service in an online
context is the virtual reference desk. Placing reference services into an
online setting, however, presents many challenges. We report a study and
analytic framework which addresses support for decision-making during virtual
enquiry work. Focusing on specialist law-libraries, the study shows that
enquirers do not volunteer important information to the service and that
asynchronous communication media and some social obstacles present barriers to
prompting. Also, previous enquiries are frequently used to inform current
enquiry strategies but barriers exist in accessing this information. We
conclude that email is an inadequate medium for supporting virtual reference
services, and that system should support automatic, speculative matching
between new enquiry content and integrated enquiry knowledge bases. The
contribution of the framework is to offer a structured approach to evaluation
in multiple virtual reference contexts and enable rapid convergence on barriers
to efficient and effective service. Keywords: virtual reference service; evaluation; collaborative information access | |||
| Searchling: User-Centered Evaluation of a Visual Thesaurus-Enhanced Interface for Bilingual Digital Libraries | | BIBAK | Full-Text | 117-121 | |
| Amy Stafford; Ali Shiri; Stan Ruecker; Matthew Bouchard; Paras Mehta; Karl Anvik; Ximena Rossello | |||
| In this paper, we describe a qualitative user study of Searchling -- an
experimental visual interface that allows users to leverage a bilingual
thesaurus for query formulation and enhancement. The design of Searchling is
based on theories of thesaurus-based interface design from Shiri et al. [1],
combined with the principles of rich-prospect browsing [2]. The Searchling
interface provides the user with three working spaces on one screen: the
Thesaurus space, Query space, and Document space. We interviewed 15 graduate
and faculty researchers at the University of Alberta, who carried out three
structured tasks in a think-aloud protocol, with simultaneous audio recording
and screen capture. These participants identified a number of significant
advantages to the researcher provided by Searchling, including the value of
having an interface that could help with identifying search terms, suggesting
preferred terms, and giving bilingual search support. They also suggested areas
for future improvement, primarily related to our assumption that common
knowledge of thesauri would be sufficient to make the various features clear if
they were described using standard vocabulary from the thesaurus field. Keywords: Visual Interfaces; Thesauri; Multilingual Digital Libraries; Information
Retrieval; User Evaluation | |||
| An Extensible Virtual Digital Libraries Generator | | BIBA | Full-Text | 122-134 | |
| Massimiliano Assante; Leonardo Candela; Donatella Castelli; Luca Frosini; Lucio Lelii; Paolo Manghi; Andrea Manzi; Pasquale Pagano; Manuele Simi | |||
| In this paper we describe the design and implementation of the VDL Generator, a tool to simplify and automatise the Digital Library development process. In particular, we discuss how our approach to the realisation of this tool simplifies the task of implementing, extending and modifying such a fundamental component. This tool models its issue as a generic search problem that can easily be adapted to different application scenarios. In particular, to guarantee its extensibility we carefully identify, isolate and organise the VDL Generator constituents, i.e. (i) the set of logical components that can be used when designing a Digital Library, (ii) the set of physical components that by implementing the logical components contribute to implement the Digital Library and (iii) the search strategy exploited to accomplish the generation task. Furthermore, we report on the experiences matured in implementing and exploiting such an innovative service in the context of the Diligent EU funded project and discuss future plans for its consolidation. | |||
| A Participative Digital Archiving Approach to University History and Memory | | BIBAK | Full-Text | 135-147 | |
| Jyishane Liu | |||
| As digital archiving is heading into the next level of development and
influence, we must consider the need of connecting digital archiving with more
people and more resources to enhance the continuing effort. In this paper, we
address the issue of engaging users in digital archiving task and forming a
community of collective content creation. We propose a conceptual architecture
for participative digital archiving and report a pilot project to redesign and
reconstruct the archiving process of a university history. It also serves the
purpose of showcasing archived content and providing reminiscence of university
life for all university members. Keywords: Digital Archiving; Web 2.0; User Participation; Collective Memory | |||
| Enhancing Library Services with Web 2.0 Functionalities | | BIBAK | Full-Text | 148-159 | |
| Dimitris Gavrilis; Constantia Kakali; Christos Papatheodorou | |||
| In this paper, a prototype of an Online Public Access Catalog (OPAC) is
presented. This new OPAC features new functionalities and utilizes web 2.0
technologies in order to deliver improved search and retrieval services. Some
of these new services include social tag annotations, user opinions and ranks
and tag-based similarity searches. The prototype is evaluated by a user group
through questionnaires, interviews and with the system's integrated logging
mechanism. The results are encouraging enough and show that Library 2.0
technologies seem to be acceptable by the majority of the users. Keywords: Web 2.0; social tagging; subject representation; OPAC; evaluation | |||
| A Service-Oriented Infrastructure for Early Citation Management | | BIBAK | Full-Text | 160-171 | |
| José Hilario Canós; Manuel Llavador; Eduardo Mena; Marcos R. S. Borges | |||
| Citation analysis needs an in-depth transformation. Current systems have
been long criticized due to defects such as lack of coverage and low accuracy
of the citation data. Surprisingly, incorrect or incomplete data are used to
make important decisions about researchers' careers. We argue that a new
approach based on the collection of citation data when they are actually
generated (that is, during the edition of papers) can overcome current
limitations, and propose a new framework in which the research community as a
whole is the owner as well as beneficiary of a Global Citation Registry
characterized by high quality citation data. The registry will be accessible
for all the interested parties and will be the source over which the different
impact models can be applied. Keywords: Citation management; Service-Oriented Architecture | |||
| Releasing the Power of Digital Metadata: Examining Large Networks of Co-related Publications | | BIBA | Full-Text | 172-184 | |
| David Tarrant; Les Carr; Terry R. Payne | |||
| Bibliographic metadata plays a key role in scientific literature, not only to summarise and establish the facts of the publication record, but also to track citations between publications and hence to establish the impact of individual articles within the literature. Commercial secondary publishers have typically taken on the role of rekeying, mining and analysing this huge corpus of linked data, but as the primary literature has moved to the world of the digital repository, this task is now undertaken by new services such as Citeseer, Citebase or Google Scholar. As institutional and subject-based repositories proliferate and Open Access mandates increase, more of the literature will become openly available in well managed data islands containing a much greater amount of detailed bibliometric metadata in formats such as RDF. Through the use of efficient extraction and inference techniques, complex relations between data items can be established. In this paper we explain the importance of the co-relation in enabling new techniques to rate the impact of a paper or author within a large corpus of publications. | |||
| Author Name Disambiguation for Citations Using Topic and Web Correlation | | BIBAK | Full-Text | 185-196 | |
| Kai-Hsiang Yang; Hsin-Tsung Peng; Jian-Yi Jiang; Hahn-Ming Lee; Jan-Ming Ho | |||
| Today, bibliographic digital libraries play an important role in helping
members of academic community search for novel research. In particular, author
disambiguation for citations is a major problem during the data integration and
cleaning process, since author names are usually very ambiguous. For solving
this problem, we proposed two kinds of correlations between citations, namely,
Topic Correlation and Web Correlation, to exploit relationships between
citations, in order to identify whether two citations with the same author name
refer to the same individual. The topic correlation measures the similarity
between research topics of two citations; while the Web correlation measures
the number of co-occurrence in web pages. We employ a pair-wise grouping
algorithm to group citations into clusters. The results of experiments show
that the disambiguation accuracy has great improvement when using topic
correlation and Web correlation, and Web correlation provides stronger
evidences about the authors of citations. Keywords: Citation clustering; Citation analysis; Author disambiguation | |||
| Development of a National Syllabus Repository for Higher Education in Ireland | | BIBA | Full-Text | 197-208 | |
| Arash Joorabchi; Abdulhussain E. Mahdi | |||
| With the significant growth in electronic education materials such as syllabus documents and lecture notes available on the Internet and intranets, there is a need for developing structured central repositories of such materials to allow both educators and learners to easily share, search and access them. This paper reports on our on-going work to develop a national repository for course syllabi in Ireland. In specific, it describes a prototype syllabus repository system for higher education in Ireland that has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm. Preliminary experimental results for evaluating the system's performance are presented and discussed. | |||
| Matching Hierarchies Using Shared Objects | | BIBAK | Full-Text | 209-220 | |
| Robert Ikeda; Kai Zhao; Hector Garcia-Molina | |||
| One of the main challenges in integrating two hierarchies (e.g., of books or
web pages) is determining the correspondence between the edges of each
hierarchy. Traditionally, this process, which we call hierarchy matching, is
done by comparing the text associated with each edge. In this paper we instead
use the placement of objects present in both hierarchies to infer how the
hierarchies relate. We present two algorithms that, given a hierarchy with
known facets (attribute-value pairs that define what objects are placed under
an edge), determine feasible facets for a second hierarchy, based on shared
objects. One algorithm is rule-based and the other is statistics-based. In the
experimental section, we compare the results of the two algorithms, and see how
their performances vary based on the amount of noise in the hierarchies. Keywords: data integration; mapping | |||
| Virtual Unification of the Earliest Christian Bible: Digitisation, Transcription, Translation and Physical Description of the Codex Sinaiticus | | BIBA | Full-Text | 221-226 | |
| Zeki Mustafa Dogan; Alfred Scharsky | |||
| This paper describes the deployment of innovative digitisation methods and new web technologies to reunify the oldest Bible -- the Codex Sinaiticus -- and to make it available to wider public. The conception of the website development has begun in late 2006 and the first stage of the development will allow free access to the website of this eminent part of the cultural heritage in 2008, which only has been possible through the close collaboration between international partners. | |||
| Sustainable Digital Library Systems over the DRIVER Repository Infrastructure | | BIBA | Full-Text | 227-231 | |
| Michele Artini; Leonardo Candela; Donatella Castelli; Paolo Manghi; Marko Mikulicic; Pasquale Pagano | |||
| The DRIVER Infrastructure is an e-infrastructure providing an environment where organizations find the tools to aggregate heterogeneous content sources into uniform shared Information Spaces and then build and customize their Digital Library Systems to operate over them. In this paper, we shall show the benefits for organizations embracing the infrastructural approach by presenting the DRIVER infrastructure, its current status of maintenance, its participating organizations, and the first two systems built on top of its Information Space. | |||
| Interactive Paper as a Reading Medium in Digital Libraries | | BIBA | Full-Text | 232-243 | |
| Moira C. Norrie; Beat Signer; Nadir Weibel | |||
| In digital libraries, much of the reading activity is still done on printed copies of documents. We show how digital pen and paper technologies can be used to support readers by automatically creating interactive paper versions of digital documents during the printing process that enable users to activate embedded hyperlinks to other documents and services from printed versions. The approach uses a special printer driver that allows information about hyperlinks to be extracted and stored at print time. Users can then activate hyperlinks in the printed document with a digital pen. | |||
| Personalizing the Selection of Digital Library Resources to Support Intentional Learning | | BIBAK | Full-Text | 244-255 | |
| Qianyi Gu; Sebastian de la Chica; Faisal Ahmad; Huda J. Khan; Tamara Sumner; James H. Martin; Kirsten R. Butcher | |||
| This paper describes a personalization approach for using online resources
in digital libraries to support intentional learning. Personalized resource
recommendations are made based on what learners currently know and what they
should know within a targeted domain to support their learning process. We use
natural language processing and graph based algorithms to automatically select
online resources to address students' specific conceptual learning needs. An
evaluation of the graph based algorithm indicates that the majority of
recommended resources are highly relevant or relevant for addressing students'
individual knowledge gaps and prior conceptions. Keywords: Personalization; Information Retrieval; Intentional Learning; Knowledge Map | |||
| Enrichment of European Digital Resources by Federating Regional Digital Libraries in Poland | | BIBAK | Full-Text | 256-259 | |
| Agnieszka Lewandowska; Cezary Mazurek; Marcin Werla | |||
| In this paper we present the PIONIER Network Digital Libraries Federation,
which was founded in the June 2007 in Poland. This federation is a single point
of access to the majority of Polish digital resources gathered in regional and
institutional digital libraries. Besides of the resources aggregation and
promotion this service also allows for automated coordination of digitization
and PURL resolution of OAI identifiers for objects from Polish digital
libraries. It is also a part of networked digital library user profile system
enabled recently in the Polish network of distributed digital libraries. During
the development of the PIONIER Network Digital Libraries Federation extensions
for OAI-PMH protocol and Shibboleth middleware were made and deployed in order
to achieve required federation functionality. The PIONIER DLF service is based
on the set of distributed atomic services giving together its functionality. Keywords: digital libraries federation; coordination of digitization; metadata
harvesting; atomic services; digital object identifiers resolution; networked
user profile | |||
| Access Modalities to an Imagistic Library for Medical e-Learning | | BIBAK | Full-Text | 260-263 | |
| Liana Stanescu; Dumitru Dan Burdescu; Gabriel Mihai; Cosmin Stoica Spahiu; Anca Ion | |||
| The paper presents the organization way and the access facilities to a
multimedia digital library with medical information for electronic learning.
The digital library contains course materials and medical images collected in
the patient's diagnosis process. The originality of the paper is given by the
presentation of two access modalities to multimedia information from the
digital library: content-based visual query and semantic query. The
content-based visual query can be effectuated at the image or region level
using colour and texture characteristics automatically extracted from medical
images at their loading in the database. Also, semantic queries against the
multimedia database can be automatically launched with the help of the topic
map based on a part of MeSH thesaurus, the part that includes the medical
diagnosis names. The student can navigate through topic map depending on its
interest subject, bringing in this way big advantages. These access paths can
be combined for retrieving the interest information. The multimedia digital
library represents a very useful tool in the medical knowledge improvement,
addressing to the students, resident doctors, young specialists or family
doctors. Keywords: imagistic library; content-based visual query; color feature; texture
feature; topic map; semantic query | |||
| What a Difference a Default Setting Makes | | BIBAK | Full-Text | 264-267 | |
| Te Taka Keegan; Sally Jo Cunningham | |||
| This paper examines the effect of the default interface language on the
usage of a bilingual digital library. In 2005 the default interface language of
a bilingual digital library was alternated on a monthly basis between Maori and
English. A comprehensive transaction log analysis over this period reveals that
not only did usage in a particular language increase when the default interface
language was set to that language but that the way the interface was used, in
both languages, was quite different depending on the default language. Keywords: Log Analysis; Multi-Language Access | |||
| A Methodology for Sharing Archival Descriptive Metadata in a Distributed Environment | | BIBA | Full-Text | 268-279 | |
| Nicola Ferro; Gianmaria Silvello | |||
| This paper discusses how to exploit widely accepted solutions for interoperation, such as the pair OAI-PMH and DC metadata format, in order to deal with the peculiar features of archival description metadata and allow their sharing. We present a methodology for mapping EAD metadata into DC metadata records without losing information. The methodology exploits DLS technologies enhancing archival metadata sharing possibilities and at the same time considers archival needs; furthermore, it permits to open valuable information resources held by archives to the wider context of the cross-domain interoperation among different cultural heritage institutions. | |||
| Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM | | BIBAK | Full-Text | 280-290 | |
| Ceri Binding; Keith May; Douglas Tudhope | |||
| Findings from a data mapping and extraction exercise undertaken as part of
the STAR project are described and related to recent work in the area. The
exercise was undertaken in conjunction with English Heritage and encompassed
five differently structured relational databases containing various results of
archaeological excavations. The aim of the exercise was to demonstrate the
potential benefits in cross searching data expressed as RDF and conforming to a
common overarching conceptual data structure schema -- the English Heritage
Centre for Archaeology ontological model (CRM-EH), an extension of the CIDOC
Conceptual Reference Model (CRM). A semi-automatic mapping/extraction tool
proved an essential component. The viability of the approach is demonstrated by
web services and a client application on an integrated data and concept
network. Keywords: knowledge organization systems; mapping; CIDOC CRM; core ontology; semantic
interoperability; semi-automatic mapping tool; thesaurus; terminology services | |||
| Annotations: A Way to Interoperability in DL | | BIBA | Full-Text | 291-295 | |
| Maristella Agosti; Nicola Ferro | |||
| This paper discusses how annotations and interoperability relate together and affect each other in digital library settings. We analyse interoperability and annotations in the light of the evolution of the field of digital libraries and provide recommendations for successful interoperable annotations towards the European Digital Library. | |||
| Semantic Based Substitution of Unsupported Access Points in the Library Meta-search Environments | | BIBA | Full-Text | 296-307 | |
| Michalis Sfakakis; Sarantos Kapidakis | |||
| Meta-searching library communities involve access to sources where metadata are invisible behind query interfaces. Many of the query interfaces utilize predefined abstract Access Points for the implementation of the search services, without any further access to the underlining meta-data and query methods. The existence of unsupported Access Points and its consequences, which are either query failures or inconsistent query answers, creates a major issue when meta-searching this kind of systems. An example of the abstract Access Point based search model is the Z39.50 information retrieval protocol, which is widely used by the library community. In this paper we present the zSAPN (Z39.50 Semantic Access Point Network), a system which improves the search consistency and eliminates the query failures by exploiting the semantic information of the Access Points from an RDFS description. The current implementation of zSAPN is in the context of the Z39.50 protocol, using the official specification of the Access Point semantics and can benefit the huge number of the available sources worldwide. zSAPN substitutes each unsupported Access Point with a set of other supported ones, whose appropriate combination would either broaden or narrow the initial semantics, according to the user's choice. Finally, we estimate the impact of the modification of the initial semantics during the substitution process to the precision or the recall of the original query, with the unsupported Access Point. | |||
| Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search | | BIBA | Full-Text | 308-319 | |
| Yukio Uematsu; Takafumi Inoue; Kengo Fujioka; Ryoji Kataoka; Hayato Ohwada | |||
| We propose a search method that uses sentence-based inverted indexes to achieve high accuracy at practical speeds. The proposed method well supports the vast majority of queries entered on the web; these queries contain single words, multiple words for proximity searches, and semantically direct phrases. The existing approach, the inverted index which holds word-level position data is not efficient, because the size of index becomes extremely large. Our solution is to drop the word position data and index only the existence of each word in each sentence. We incorporate the sentence-based inverted index into a commercial search engine and evaluate it using both Japanese and English standard IR corpuses. The experiment shows that our method offers high accuracy, while index size and search processing time are greatly reduced. | |||
| Information Retrieval and Filtering over Self-organising Digital Libraries | | BIBA | Full-Text | 320-333 | |
| Paraskevi Raftopoulou; Euripides G. M. Petrakis; Christos Tryfonopoulos; Gerhard Weikum | |||
| We present iClusterDL, a self-organising overlay network that supports information retrieval and filtering functionality in a digital library environment. iClusterDL is able to handle huge amounts of data provided by digital libraries in a distributed and self-organising way. The two-tier architecture and the use of semantic overlay networks provide an infrastructure for creating large networks of digital libraries that require minimum administration, yet offer a rich set of tools to the end-user. We present the main components of our architecture, the protocols that regulate peer interactions, and an experimental evaluation that shows the efficiency, and the retrieval and filtering effectiveness of our approach. | |||
| A Framework for Managing Multimodal Digitized Music Collections | | BIBA | Full-Text | 334-345 | |
| Frank Kurth; David Damm; Christian Fremerey; Meinard Müller; Michael Clausen | |||
| In this paper, we present a framework for managing heterogeneous, multimodal digitized music collections containing visual music representations (scanned sheet music) as well as acoustic music material (audio recordings). As a first contribution, we propose a preprocessing workflow comprising feature extraction, audio indexing, and music synchronization (linking the visual with the acoustic data). Then, as a second contribution, we introduce novel user interfaces for multimodal music presentation, navigation, and content-based retrieval. In particular, our system offers high quality audio playback with time-synchronous display of the digitized sheet music. Furthermore, our system allows a user to select regions within the scanned pages of a musical score in order to search for musically similar sections within the audio documents. Our novel user interfaces and search functionalities will be integrated into the library service system of the Bavarian State Library as part of the Probado project. | |||
| A Quantitative Evaluation of Dissemination-Time Preservation Metadata | | BIBA | Full-Text | 346-357 | |
| Joan A. Smith; Michael L. Nelson | |||
| One of many challenges facing web preservation efforts is the lack of metadata available for web resources. In prior work, we proposed a model that takes advantage of a site's own web server to prepare its resources for preservation. When responding to a request from an archiving repository, the server applies a series of metadata utilities, such as Jhove and Exif, to the requested resource. The output from each utility is included in the HTTP response along with the resource itself. This paper addresses the question of feasibility: Is it in fact practical to use the site's web server as a just-in-time metadata generator, or does the extra processing create an unacceptable deterioration in server responsiveness to quotidian events? Our tests indicate that (a) this approach can work effectively for both the crawler and the server; and that (b) utility selection is an important factor in overall performance. | |||
| Improving Temporal Language Models for Determining Time of Non-timestamped Documents | | BIBA | Full-Text | 358-370 | |
| Nattiya Kanhabua; Kjetil Nørvåg | |||
| Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or can not be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models. | |||
| Revisiting Lexical Signatures to (Re-)Discover Web Pages | | BIBA | Full-Text | 371-382 | |
| Martin Klein; Michael L. Nelson | |||
| A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to discover that page at a different URL as well as to find relevant pages in the Internet. From a set of randomly selected URLs we took all their copies from the Internet Archive between 1996 and 2007 and generated their LSs. We conducted an overlap analysis of terms in all LSs and found only small overlaps in the early years (1996-2000) but increasing numbers in the more recent past (from 2003 on). We measured the performance of all LSs in dependence of the number of terms they consist of. We found that LSs created more recently perform better than early LSs created between 1996 and 2000. All LSs created from year 2000 on show a similar pattern in their performance curve. Our results show that 5-, 6- and 7-term LSs perform best with returning the URLs of interest in the top ten of the result set. In about 50% of all cases these URLs are returned as the number one result and in 30% of all times we considered the URLs as not discovered. | |||
| The Web Versus Digital Libraries: Time to Revisit This Once Hot Topic | | BIBA | Full-Text | 383-384 | |
| Vittore Casarosa; Jill Cousins; Anna Maria Tammaro; Yannis E. Ioannidis | |||
| At the end of last century (Internet time elapses much quicker than normal time, and it already looks like a long time ago), the "information explosion" on the Web on one side, and the flourishing of research activities on digital library technologies on the other, spurred heated discussions about the future of traditional libraries. The view of one camp was that since "all" the information was available on-line, the use of smart search engines and clever software tools would allow Digital Libraries to provide all the information (and the services) needed by an information seeker. The view of the other camp was that the value of information was not just in its sheer quantity, but was rather in the organization and the quality of the information made available, and that could never be done by "programs". | |||
| The MultiMatch Prototype: Multilingual/Multimedia Search for Cultural Heritage Objects | | BIBA | Full-Text | 385-387 | |
| Giuseppe Amato; Franca Debole; Carol Peters; Pasquale Savino | |||
| MultiMatch is a 30 month targeted research project under the Sixth Framework Programme, supported by the unit for Content, Learning and Cultural Heritage (Digicult) of the Information Society DG. MultiMatch is developing a multimedia/multilingual search engine designed specifically for the access, organization and personalized presentation of cultural heritage information. The demonstration will present the MultiMatch system prototype. | |||
| Digital Preservation of Scientific Data | | BIBAK | Full-Text | 388-391 | |
| José Barateiro; Gonçalo Antunes; Manuel Cabral; José Luis Borbinha; Rodrigo Rodrigues | |||
| Digital preservation aims at maintaining digital objects and data accessible
over long periods of time. We propose the use of dedicated or surplus storage
resources of data grids to build frameworks of digital preservation. In this
paper we focus on the problem of digital preservation in two scenarios: a
national digital library and a repository of scientific information for dam
safety. We detail the scenario of dam safety data and provide an analysis of an
existing data grid solution that can be used for this purpose. Keywords: Digital Libraries; Digital Preservation; Data Grids | |||
| Using Terminology Web Services for the Archaeological Domain | | BIBA | Full-Text | 392-393 | |
| Ceri Binding; Douglas Tudhope | |||
| The AHRC funded STAR project (Semantic Technologies for Archaeological Resources) has developed web services for knowledge organisation systems (KOS) represented in SKOS RDF format, building on previous work by the University of Glamorgan Hypermedia Research Unit on terminology web services. The current service operates on a repository of multiple (English Heritage) thesauri converted to SKOS format, containing terms and concepts that would be familiar to those working within the archaeological domain. It provides facilities for search, concept browsing and semantic expansion across these specialist terminologies. | |||
| Building a Digital Research Community in the Humanities | | BIBA | Full-Text | 394-397 | |
| Toby Burrows; Ela Majocha | |||
| The ARC Network for Early European Research (NEER), funded under the Australian Research Council's Research Networks programme, aims to enhance the scale of Australian research in medieval and early modern studies, and to build collaborative and innovative approaches to planning and managing research. An integral part of NEER's vision is the development of a digital environment which provides a setting for the work of this national research community. This environment has three major components: the Confluence collaborative Web workspace, the PioNEER digital repository for research outputs and data, and the Europa Inventa gateway to cultural heritage objects. | |||
| Agile DL: Building a DELOS-ConformedDigital Library Using Agile Software Development | | BIBA | Full-Text | 398-399 | |
| Javier D. Fernández; Miguel A. Martínez-Prieto; Pablo de la Fuente; Jesús Vegas; Joaquín Adiego | |||
| This paper describes a concrete partial implementation of the DELOS Reference Model to the particular field of manuscripts and incunabula, and how an agile software methodology, SCRUM, suits the evolutive nature of Digital Libraries, solving misunderstandings and lightening the underlying model. | |||
| Design of a Digital Library System for Large-Scale Evaluation Campaigns | | BIBA | Full-Text | 400-401 | |
| Marco Dussin; Nicola Ferro | |||
| This work describes the effort of designing and developing a Digital Library System (DLS) able to manage the different types of information resources produced during a large-scale evaluation campaign and to support the different stages of it. We discuss, in particular, the design of DIRECT, a DLS developed to assist the work of the actors of international evaluation campaigns. | |||
| An XML-Centric Storage for Better Preservation and Maintenance of Data: Union Catalog of NDAP, Taiwan | | BIBAK | Full-Text | 402-405 | |
| Tzu-Yen Hsu; Ting-Hua Chen; Chung-Hsi Hung; Sea-Hom Chou | |||
| The Union Catalog (UC) of Taiwan was established to provide an integrated
search service for millions of digital objects distributed in the databases of
different institutions. The main challenge is how to continuously and
consistently manage large quantities of data. XML technologies have already
been recommended for greater data preservation rather than database systems. In
addition, we assume that a database design in our case would be complex and
that consistent maintenance would be difficult. For this reason, databases are
not used as the primary storage mechanism of the UC. Although the UC adopts an
XML-centric architecture, it has difficulty handling data queries, data
modification, and category listing efficiently. In this paper, we discuss how
we use XML technologies to implement the UC system, and how we solve the issues
arising from XML's limitations. Keywords: NDAP; architecture; digital library | |||
| Summa: This Is Not a Demo | | BIBAK | Full-Text | 406-409 | |
| Gitte Behrens; Mikkel Kamstrup Erlandsen; Toke Eskildsen; Bolette Ammitzbøll Jurik; Dorete Bøving Larsen; Hans Lauridsen; Michael Poltorak Nielsen; Jørn Thøgersen; Mads Villadsen | |||
| The Summa search system is a fast, scalable, modular, open source search
system, which can integrate all types of library metadata and full text. The
Summa search system is based on user studies and on librarian expertise in
formats and metadata. Summa is an open and modular design. Summa offers modules
for faceted browsing, automated cluster extraction and a flexible user
interface among others. The in-house Summa production system at The State and
University Library in Denmark searches a corpus of 8 million records. The Summa
search system version 1.0 to be released in the autumn 2008 is designed to
scale to hundreds of millions. Keywords: Search; open source; modularity; scalability; performance | |||
| New Tasks on Collections of Digitized Books | | BIBA | Full-Text | 410-412 | |
| Gabriella Kazai; Antoine Doucet; Monica Landoni | |||
| Motivated by the plethora of book digitization projects around the world, the Initiative for the Evaluation of XML Retrieval (INEX) launched a Book Search track in 2007. In its first year, the track focused on Information Retrieval (IR) tasks, exploring the utility of traditional and structured document retrieval techniques to books. In this paper, we propose three new tasks to be investigated at the Book Search track in 2008. The tasks aim to promote research in a wider context, across IR, Human Computer Interaction, Digital Libraries, and eBooks. We identify three novel problem areas, define tasks around these and propose possible evaluation methods. | |||
| Plato: A Preservation Planning Tool Integrating Preservation Action Services | | BIBA | Full-Text | 413-414 | |
| Hannes Kulovits; Christoph Becker; Michael Kraxner; Florian Motlik; Kevin Stadler; Andreas Rauber | |||
| The creation of a concrete plan for preserving a collection of digital objects of a specific institution necessitates the evaluation of available solutions against clearly defined and measurable criteria. This process is called preservation planning and aids in the decision making process to find the most suitable preservation strategy considering the institution's requirements, the planning context and available actions applicable to the objects contained in the repository. Performed manually, this evaluation promises to be hard and tedious work, inasmuch as there exist numerous potential preservation action tools of different quality. In this demonstration, we present Plato [4], an interactive software tool aimed at creating preservation plans. | |||
| Event Representation in Temporal and Geographic Context | | BIBA | Full-Text | 415-418 | |
| Ryan Shaw; Ray R. Larson | |||
| Linking digital resources that refer to the same people or places is becoming common. Events are another kind of entity that might be used to link resources in this way. We examine a number of standards for encoding of archival, historical, genealogical, and news information to compare the tools they offer for representing events. | |||
| A Mechanism for Solving the Unencoded Chinese Character Problem on the Web | | BIBAK | Full-Text | 419-422 | |
| Te-Jun Lin; Jyun-Wei Huang; Christine Lin; Hung-Yi Li; Hsiang-An Wang; Chih-Yi Chiu | |||
| The unencoded Chinese character problem that occurs when digitizing
historical Chinese documents makes digital archiving difficult. Expanding the
character coding space, such as by using the Unicode Standard, does not solve
the problem completely due to the extensibility of Chinese characters. In this
paper, we propose a mechanism based on a Chinese glyph structure database,
which contains glyph expressions that represent the composition of Chinese
characters. Users can search for Chinese characters through our web interface
and browse the search results. Each Chinese character can be embedded in a web
document using a specific Java Script code. When the web document is opened,
the Java Script code will load the image of the Chinese character in an
appropriate font size for display. Even if the Chinese characters are not
available in the database, their images can be generated through the dynamic
character composition function. As the proposed mechanism is cross-platform,
users can easily access unencoded Chinese characters without installing any
additional font files in their personal computers. A demonstration system is
available at http://char.ndap.org.tw. Keywords: Chinese glyph structure database; digital archive; unencoded Chinese
characters | |||
| Gaze Interaction and Access to Library Collection | | BIBA | Full-Text | 423-424 | |
| Haakon Lund; John Paulin Hansen | |||
| A new module in the GazeTalk eye-typing communication software for people with severe disabilities has been developed The web-service based module enables the user to gain access to a collection of digitized full text. This demonstration shows the functionalities in the library access module. | |||
| Covering Heterogeneous Educative Environments with Integrated Editions in the Electronic Work | | BIBA | Full-Text | 425-426 | |
| Miguel A. Martínez-Prieto; Pablo de la Fuente; Jesús Vegas; Joaquín Adiego | |||
| Although e-books usage has a positive impact in educational environments, contents representation is a complex issue given their audience. In this paper, we show a flexible and functional appearance that allows a synchronized consultation of the literary editions integrated in an electronic work. | |||
| Exploring Query Formulation and Reformulation: A Preliminary Study to Map Users' Search Behaviour | | BIBAK | Full-Text | 427-430 | |
| Anna Mastora; Maria Monopoli; Sarantos Kapidakis | |||
| This study aims to investigate the query formulation and reformulation
patterns such as generalisations, specifications, parallel movements and
replacements with synonyms within the search procedure. Results showed that
users reformulated their queries by using terms contained in the retrieved
results while in the query reformulation process they mainly used terms with
parallel meanings. Participants used equally either more specific or more
general terms for follow-up queries. Finally, the study revealed that a high
proportion of same terms were used instead of unique ones; half of them were
included in the Eurovoc thesaurus. Keywords: Query formulation; Query reformulation; Search behaviour; Search patterns;
Query length | |||
| Identification of Bibliographic Information Written in Both Japanese and English | | BIBA | Full-Text | 431-433 | |
| Yuko Taniguchi; Hidetsugu Nanba | |||
| We have studied the automatic construction of a multilingual citation index by collecting Postscript and PDF files from the Internet [2], and in this paper, we propose a method that can identify duplicate bibliographic information written in both Japanese and English, which will be an indispensable module for the construction of a multilingual citation index. | |||
| DIGMAP: A Digital Library Reusing Metadata of Old Maps and Enriching It with Geographic Information | | BIBAK | Full-Text | 434-435 | |
| Gilberto Pedrosa; João Luzio; Hugo Manguinhas; Bruno Martins; José Luis Borbinha | |||
| The DIGMAP service reuses metadata from European national libraries and
other relevant third party metadata sources. The gathered metadata is enhanced
locally with geographical indexing, leveraging on geographic gazetteers and
authority files. When available, the images of the maps are also processed to
extract potentially relevant features. This made it possible to develop a rich
integrated environment for searching and browsing services based mainly in
enriched metadata. Keywords: Geographic information; Old maps; Systems architectures; Interoperability | |||
| Visual Analysis of Classification Systems and Library Collections | | BIBA | Full-Text | 436-439 | |
| Magnus Pfeffer; Kai Eckert; Heiner Stuckenschmidt | |||
| In this demonstration we present a visual analysis approach that addresses both developers and users of hierarchical classification systems. The approach supports an intuitive understanding of the structure and current use in relation to a specific collection. We will also demonstrate its application for the development and management of library collections. | |||
| A Framework for Music Content Description and Retrieval | | BIBA | Full-Text | 440-443 | |
| Alberto Pinto; Goffredo Haus | |||
| The recently approved format for music content description IEEE PAR1599 (MX) defines a standard for retrieval models representation within music and audio/video formats that makes use of XML documents as content descriptors. We show how music/audio semantics can be represented within the Structural layer of MX through the introduction of novel Music Information Retrieval (MIR) objects in order to embed metadata relative to specific retrieval models. | |||
| XCL: The Extensible Characterisation Language -- One Step towards an Automatic Evaluation of Format Conversions | | BIBA | Full-Text | 444-446 | |
| Jan Schnasse; Sebastian Beyl; Elona Chudobkaite; Volker Heydegger; Manfred Thaller | |||
| Today file format specifications are formulated in natural languages. A programmer who wants to decode, encode or render the information contained in a file has to read through the specification before translating it into the terms of a programming language. The maintainer of the format usually eases that process by the deployment of libraries for the format. While this is a well proven process the translation from one format into another format is often an error-prone undertaking, nevertheless. For content holders format conversion is one strategy to assure long term access to their digital resources. However, currently there is still no standardised automatic procedure for the evaluation of format conversions available. Mainly in the case where format conversion is used as a strategy for long time preservation of digital content, this is a serious gap. With the Extensible Characterisation Languages (XCL) we want to address the problem of automatic evaluation of format conversions. | |||
| A User Field Study: Communication in Academic Communities and Government Agencies | | BIBA | Full-Text | 447-449 | |
| Filip Kruse; Annette Balle Sørensen; Bart Ballaux; Birte Christensen-Dalsgaard; Hans Hofman; Michael Poltorak Nielsen; John W. Pattenden-Fail; Seamus Ross; Kellie Snow; Jørn Thøgersen | |||
| The preliminary findings of a study focusing on communication in academic communities and government agencies are outlined. The study was conducted within the academic community at British and Danish universities and government agencies in The Netherlands, using the 'Contextual Design' approach and 'Cultural Probes'. Qualitative data on researchers' and government agents' communicative and interactive behaviour were collected and an affinity analysis carried out. The analysis produced two types of results; 1) a conceptual model of flow from idea to dissemination, and 2) a catalogue of central elements of the communicative and collaborative behaviour of researchers and government agents. These results will be further explored and validated by means of a questionnaire based survey of academic communities and government agencies. | |||
| Digital Preservation Needs of Scientific Communities: The Example of Göttingen University | | BIBAK | Full-Text | 450-452 | |
| Heike Neuroth; Stefan Strathmann; Sven Vlaeminck | |||
| Digital information has become an integral part of our cultural and
scientific heritage. We are increasingly confronted with scientific findings,
historical events and cultural achievements presented in electronic form. The
rapid pace of technical change is causing data carriers and data formats to age
quickly. The result is an acute threat to the long-term usability of digital
objects which serve as sources for science and research. The necessity for
long-term preservation has to be anchored in the social context of the national
information, research and cultural policy, and the global integrations of
science and research. To examine the preservation needs in the context of large
scaled research facilities the awareness and practices at the University of
Göttingen and at the ETH Zürich was explored. As a first step, an
online questionnaire was developed and conducted in summer 2007. The poster
explains first findings of the online survey. Keywords: digital preservation; university; metadata; survey | |||
| Dynamic Catalogue Enrichment with SeeAlso Link Servers | | BIBA | Full-Text | 453-454 | |
| Jakob Voß | |||
| The poster presents architecture and usage of SeeAlso, a simple protocol for link servers that is used to dynamically enrich catalogues of libraries in the German Common library network GBV. | |||
| Access to Archival Finding Aids: Context Matters | | BIBA | Full-Text | 455-457 | |
| Junte Zhang; Khairun Nisa Fachry; Jaap Kamps | |||
| We detail the design of a search engine for archival finding aids based on an XML database system. The resulting system shows results -- which can vary in granularity from individual archival items to the whole fonds -- within the context of the archive. The presentation preserves the archival structure by providing important contextual information, and all individual results can be "clicked", warping the user to the full finding aid with the selected part in focus. | |||