| Keynote: The Web Changes Everything: Understanding and Supporting People in Dynamic Information Environments | | BIBA | Full-Text | 1 | |
| Susan T. Dumais | |||
| Most digital library resources and the Web more generally are dynamic and ever-changing collections of information. However, most of the tools that have been developed for interacting with Web and DL content, such as browsers and search engines, focus on a single static snapshot of the information. In this talk, I will present analyses of how web content changes over time, how people re-visit web pages over time, and how re-visitation patterns are influenced by user intent and changes in content. These results have implications for many aspects of search including crawling, ranking algorithms, result presentation and evaluation. I will describe a prototype that supports people in understanding how information they interact with changes over time, by highlighting what content has changed since their last visit. Finally, I will describe a new retrieval model that represents features about the temporal evolution of content to inform crawl policy and improve ranking. | |||
| Modelling Digital Libraries Based on Logic | | BIBA | Full-Text | 2-13 | |
| Carlo Meghini; Nicolas Spyratos; Tsuyoshi Sugibuchi | |||
| We present a data model for digital libraries supporting identification, description and discovery of digital objects. The model is formalized as a first-order theory, certain models of which correspond to the intuitive notion of digital library. Our main objective is to lay the foundations for the design of an API offering the above functionality. Additionally, we use our formal framework to discuss the adequacy of the Resource Description Framework with respect to the requirements of digital libraries. | |||
| General-Purpose Digital Library Content Laboratory Systems | | BIBA | Full-Text | 14-21 | |
| Paolo Manghi; Marko Mikulicic; Leonardo Candela; Michele Artini; Alessia Bardi | |||
| In this work, we name Digital Library Content Laboratories (DLCLs) software systems specially devised for aggregating and elaborating over information objects -- e.g., publications, experimental data, multimedia and compound objects -- collected from possibly heterogeneous and autonomous data sources. We present a general-purpose and cost-efficient system for the construction of customized DLCLs, based on the D-NET Software Toolkit. D-NET offers a service-oriented framework, where developers can choose the set of services they need, customize them to match domain requirements, and combine them in a "LEGO fashion" to obtain a personalized DLCL. D-NET is currently the enabling software of several DLCLs, operated by European Commission projects and national initiatives. | |||
| Component-Based Authoring of Complex, Petri net-based Digital Library Infrastructure | | BIBAK | Full-Text | 22-29 | |
| Yungah Park; Unmil Karadkar; Richard Furuta | |||
| caT, a Petri net-based hypertext system, serves as a platform for unified
modeling of digital library infrastructure and its governing policies, user
characteristics, and their contextual information. Traditionally, users have
created caT networks from scratch, thus limiting their use to small
collections. In this paper we introduce TcAT, a component-based authoring tool,
which enables the creation of large caT nets that can represent
interaction-rich, real-life spaces such as libraries and museums. TcAT
implements composition operations from Petri net theory, allowing authors to
select and modify existing net fragments as templated building blocks for
larger networks. Authors may switch between visual and textual modes at will,
thus combining the strengths of expressing large nets textually and selecting
net fragments via point-and-click interaction. A user evaluation of the new
authoring mechanisms suggests that this is a promising tool for improving the
efficiency of experienced users as well as that of novice users, who are
unfamiliar with the Petri net formalism. Keywords: caT; Petri net-based hypertext; digital library infrastructure | |||
| Uncovering Hidden Qualities -- Benefits of Quality Measures for Automatically Generated Metadata | | BIBAK | Full-Text | 30-37 | |
| Sascha Tönnies; Wolf-Tilo Balke | |||
| Today, digital libraries more and more have to rely on semantic techniques
during the workflows of metadata generation, search and navigational access.
But, due to the statistical and/or collaborative nature of such techniques, the
underlying quality of automatically generated metadata is questionable. Since
data quality is essential in digital libraries, we present a user study on one
hand evaluating metrics for quality assessment, on the other hand evaluating
their benefit for the individual user during interaction. To observe the
interaction of domain experts in the sample field of chemistry, we transferred
the abstract metrics' outcome for a sample semantic technique into three
different kinds of visualizations and asked the experts to evaluate these
visualizations first without, later augmented with the quality information. We
show that the generated quality information is indeed not only essential for
data quality assurance in the curation step of digital libraries, but will also
be helpful for designing intuitive interaction interfaces for end-users. Keywords: Digital Libraries; Information Quality; Semantic Technologies | |||
| Query Transformation in a CIDOC CRM Based Cultural Metadata Integration Environment | | BIBAK | Full-Text | 38-45 | |
| Manolis Gergatsoulis; Lina Bountouri; Panorea Gaitanou; Christos Papatheodorou | |||
| The wide use of a number of cultural heritage metadata schemas imposes the
development of new interoperability techniques that facilitate unified access
to cultural resources. In this paper, we focus on the ontology based semantic
integration by proposing an expressive mapping language for the specification
of the mappings between the XML-based metadata schemas and the CIDOC CRM
ontology. We also present an algorithm for the transformation of XPath queries
posed on XML-based metadata into equivalent queries on the CIDOC CRM ontology. Keywords: Metadata interoperability; semantic integration; query transformation;
mapping languages; metadata schemas | |||
| User-Contributed Descriptive Metadata for Libraries and Cultural Institutions | | BIBAK | Full-Text | 46-54 | |
| Michael A. Zarro; Robert B. Allen | |||
| The Library of Congress and other cultural institutions are collecting
highly informative user-contributed metadata as comments and notes expressing
historical and factual information not previously identified with a resource.
In this observational study we find a number of valuable annotations added to
sets of images posted by the Library of Congress on the Flickr Commons. We
propose a classification scheme to manage contributions and mitigate
information overload issues. Implications for information retrieval and search
are discussed. Additionally, the limits of a "collection" are becoming blurred
as connections are being built via hyperlinks to related resources outside of
the library collection, such as Wikipedia and locally relevant websites. Ideas
are suggested for future projects, including interface design and institutional
use of user-contributed information. Keywords: Annotation; Descriptors; Metadata; Social Media | |||
| An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library | | BIBAK | Full-Text | 55-66 | |
| Claudio Gennaro; Giuseppe Amato; Paolo Bolettieri; Pasquale Savino | |||
| Content-based image retrieval is becoming a popular way for searching
digital libraries as the amount of available multimedia data increases.
However, the cost of developing from scratch a robust and reliable system with
content-based image retrieval facilities for large databases is quite
prohibitive.
In this paper, we propose to exploit an approach to perform approximate similarity search in metric spaces developed by [3,6]. The idea at the basis of these techniques is that when two objects are very close one to each other they 'see' the world around them in the same way. Accordingly, we can use a measure of dissimilarity between the views of the world at different objects, in place of the distance function of the underlying metric space. To employ this idea the low level image features (such as colors and textures) are converted into a textual form and are indexed into the inverted index by means of the Lucene search engine library. The conversion of the features in textual form allows us to employ the Lucene's off-the-shelf indexing and searching abilities with a little implementation effort. In this way, we are able to set up a robust information retrieval system that combines full-text search with content-based image retrieval capabilities. Keywords: Approximate Similarity Search; Access Methods; Lucene | |||
| Evaluation Constructs for Visual Video Summaries | | BIBAK | Full-Text | 67-79 | |
| Stina Westman | |||
| This paper reports on a user-centered evaluation of visual video summaries.
We evaluated four types of summaries (fast-forward, user-controlled
fast-forward, scene clips and storyboard) with a set of existing performance
and satisfaction measures. We further conducted a repertory grid elicitation
with our participants gathering evaluation constructs related to both video
summary content and controls. Results showed a lack of correlation between
performance and satisfaction measures. User-supplied evaluation constructs were
shown to span both the performance and satisfaction dimensions of the video
summary evaluation space. Most constructs achieved moderate to good inter-rater
agreement in a consequent survey. Keywords: video summarization; evaluation measures; repertory grid | |||
| Visual Expression for Organizing and Accessing Music Collections in MusicWiz | | BIBAK | Full-Text | 80-91 | |
| Konstantinos A. Meintanis; Frank M., III Shipman | |||
| Music services, media players and managers provide support for content
classification and access based on filtering metadata values, statistics of
access, and user ratings. This approach fails to capture characteristics of
mood and personal history that are often the deciding factor when creating
personal playlists and collections in music. This paper presents MusicWiz, a
music management environment that combines traditional metadata with spatial
hypertext-based expression and automatically extracted characteristics of music
to generate personalized associations between songs. MusicWiz's similarity
inference engine combines the personal expression in the workspace with
assessments of similarity based on the artists, other metadata, lyrics, and the
audio signal to make suggestions and to generate playlists. An evaluation of
MusicWiz with and without the workspace and suggestion capabilities showed
significant differences for organizing and playlist creation tasks. The
workspace features were more valuable for organizing tasks while the suggestion
features had more value for playlist creation activities. Keywords: Spatial hypertext; media managers; music recommendation | |||
| An Architecture for Supporting RFID-Enhanced Interactions in Digital Libraries | | BIBA | Full-Text | 92-103 | |
| George Buchanan; Jennifer Pearson | |||
| In this paper, we report the design of an RFID sensing infrastructure for digital libraries. In addition to the architecture of the system, we report its deployment in three different applications to illustrate its use and integration with not only the core DL software, but also web browsers and software for reading documents (e.g. in PDF format). Through this, we demonstrate the utility of RFID support across the entire information seeking cycle. | |||
| New Evidence on the Interoperability of Information Systems within UK Universities | | BIBAK | Full-Text | 104-115 | |
| Kathleen Menzies; Duncan Birrell; Gordon Dunsire | |||
| This paper will report on the key findings and implications of the
JISC-funded Online Catalogue and Repository Interoperability Study (OCRIS), a 3
month project which investigated the interoperability of Online Public Access
Catalogues (OPACs) and Institutional Repositories (IRs) within UK Higher
Education Institutions (HEIs). The aims and objectives of the project included:
surveying the extent to which repository content is in scope for OPACs and the
extent to which it is already recorded there; listing the various services to
managers, researchers, teachers and learners offered by these systems;
identifying the potential for improvements in the links from repositories
and/or OPACs to other institutional services such as finance or research
administration.
The project combined quantitative and qualitative methods; primarily, an online questionnaire distributed to staff within 85 UK HEIs, purposive sampling and two in-depth case studies conducted at the Universities of Cambridge and Glasgow. Keywords: Interoperability; digital libraries; repositories; catalogues; standards;
resource discovery platforms | |||
| Enhancing Digital Libraries with Social Navigation: The Case of Ensemble | | BIBAK | Full-Text | 116-123 | |
| Peter Brusilovsky; Lillian N. Cassel; Lois M. L. Delcambre; Edward A. Fox; Richard Furuta; Daniel D. Garcia; Frank M., III Shipman; Paul Logasa, II Bogen; Michael Yudelson | |||
| A traditional library is a social place, however the social nature of the
library is typically lost when the library goes digital. This paper argues
social navigation, an important group of social information access techniques,
could be used to replicate some social features of traditional libraries and to
enhance the user experience. Using the case of Ensemble, a major educational
digital library, the paper describes how social navigation could be used to
extend digital library portals, how social wisdom can be collected, and how it
can be used to guide portal users to valuable resources. Keywords: social navigation; digital library; portal; navigation support | |||
| Automating Logical Preservation for Small Institutions with Hoppla | | BIBA | Full-Text | 124-135 | |
| Stephan Strodl; Petar Petrov; Michael Greifeneder; Andreas Rauber | |||
| Preserving digital information over the long term becomes increasing important for large number of institutions. The required expertise and limited tool support discourage especially small institutions from operating archives with digital preservation capabilities. Hoppla is an archiving solution that combines back-up and fully automated migration services for data collections in environments with limited expertise and resources for digital preservation. The system allows user-friendly handling of services and outsources digital preservation expertise. This paper presents the automated logical preservation process of the Hoppla archiving system in detail. It describes the recommendation process for appropriate preservation strategies via a web update service. A set of two real world case studies were conducted based on a first rules set focused on common office documents. The promising results sustain the novel approach of automating logical preservation by outsourcing expertise. | |||
| Estimating Digitization Costs in Digital Libraries Using DiCoMo | | BIBAK | Full-Text | 136-147 | |
| Alejandro Bia; Rafael Muñoz; Jaime Gómez | |||
| The estimate of digitization costs is a very difficult task. It is difficult
to make exact predictions due to the great quantity of unknown factors.
However, digitization projects need to have a precise idea of the economic
costs and the times involved in the development of their contents. The common
practice when we start digitizing a new collection is to set a schedule, and a
firm commitment to fulfill it (both in terms of cost and deadlines), even
before the actual digitization work starts. As it happens with software
development projects, incorrect estimates produce delays and cause costs
overdrafts.
Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during five years at the Miguel de Cervantes Digital Library, during the digitization of more than 12.000 books, we have developed a method for time and cost estimates named DiCoMo (Digitization Costs Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks. Keywords: Cost and time estimates; Digitization; Contents Production; DL Project
management | |||
| In Pursuit of an Expressive Vocabulary for Preserved New Media Art | | BIBA | Full-Text | 148-155 | |
| Andrew McHugh; Leonidas Konstantelos | |||
| The status of the new media, interactive and performance art context appears to complicate our ability to follow conventional preservation approaches. Documentation of digital art materials has been determined to be an appropriate means of resolving associated difficulties, but this demands high levels of expressiveness to support the encapsulation of the myriad elements and qualities of content and context that may influence value and reproducibility. We discuss a proposed Vocabulary for Preserved New Media Works, a means of encapsulating the various information and material dimensions implicit within a work and required to ensure its ongoing availability. | |||
| Privacy-Aware Folksonomies | | BIBA | Full-Text | 156-167 | |
| Clemens Heidinger; Erik Buchmann; Matthias Huber; Klemens Böhm; Jörn Müller-Quade | |||
| Many popular web sites use folksonomies to let people label objects like images (Flickr), music (Last.fm), or URLs (Delicous) with schema-free tags. Folksonomies may reveal personal information. For example, tags can contain sensitive information, the set of tagged objects might disclose interests, etc. While many users call for sophisticated privacy mechanisms, current folksonomy systems provide coarse mechanisms at most, and the system provider has access to all information. This paper proposes a privacy-aware folksonomy system. Our approach consists of a partitioning scheme that distributes the folksonomy data among four providers and makes use of encryption. A key sharing mechanism allows a user to control which party is able to access which data item she has generated. We prove that our approach generates folksonomy databases that are indistinguishable from databases consisting of random tuples. | |||
| Seamless Web Editing for Curated Content | | BIBA | Full-Text | 168-175 | |
| David Bainbridge; Brook J. Novak | |||
| In this paper we present a new framework for editing that we have called Seaweed (short for seamless web editing) which enables authors to directly edit content on web pages within any common web browser -- much like a word-processor -- without the need of switching between modes. There are numerous ways to utilise the technique. This article reports on work integrating it with blogging software to support the direct creation and editing of curated content, and its subsequent evaluation through two field trials. | |||
| Automatic Classification of Social Tags | | BIBA | Full-Text | 176-183 | |
| Christian Wartena | |||
| Collaborative tagging has become popular in recent years. As was noted in several studies completely different types of tags are found. Tags either can refer to the personal usage context of a tagger or can describe the tagged object. We investigate different types of tags found in LibraryThing, an online service in which books are tagged, and define a number of features that are typical for some of these classes. Finally, we show how these features can be used to classify tags automatically. | |||
| Exploring the Impact of Search Interface Features on Search Tasks | | BIBA | Full-Text | 184-195 | |
| Abdigani Diriye; Ann Blandford; Anastasios Tombros | |||
| There is growing recognition that exploratory search is less well supported by existing search interfaces than known-item search. In this paper, we report on a study in which three interfaces providing different levels of search support were developed and tested, for both known item and exploratory search tasks. A rich qualitative analysis of participants' search behaviours and perceptions was conducted. As expected, the simplest interface provided better support for known item than for exploratory search tasks. Conversely, richer search interface features were found to provide better support for exploratory search, but would distract people from the objective of more clearly defined search tasks. This study provides preliminary evidence that searching is most effective when supported by an interface that is tailored towards the search activities of the task. | |||
| Relevance in Technicolor | | BIBA | Full-Text | 196-207 | |
| Ulises Cerviño Beresi; Yunhyong Kim; Dawei Song; Ian Ruthven; Mark Baillie | |||
| In this article we propose the concept of relevance criteria profiles, which provide a global view of user behaviour in judging the relevance of retrieved information. We further propose a plotting technique which provides a session based overview of the relevance judgement processes interlaced with interactions that allow the researcher to visualise and quickly detect emerging patterns in both interactions and relevance criteria usage. We discuss by example, using data from a user study conducted between the months of January and August of 2008, how these tools support the better understanding of task based user valuation of documents that is likely to lead to recommendations for improving end-user services in digital libraries. | |||
| Application of Session Analysis to Search Interface Design | | BIBA | Full-Text | 208-215 | |
| Cathal Hoare; Humphrey Sorensen | |||
| Evaluations of search features used in digital library environments are generally results centric, focussing on the outcome of an evaluation -- for example, the number of relevant documents retrieved -- rather than garnering an understanding of why that result was achieved. This paper explores how search feature development benefits from user-centered evaluation. By examining the application of an established web analytics technique, session analysis, to the development of search features and interfaces, it will be shown that designers can better understand how users conduct evaluation tasks. The feedback provided by this technique allows for clearer evaluation of an interface and admits iteratively evolving designs that are based on empirical data. | |||
| An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library | | BIBA | Full-Text | 216-227 | |
| Florian Reitz; Oliver Hoffmann | |||
| Many scientists and research groups make use of the DBLP bibliographic project collection in various ways. Most of them are unaware of its internal structure, although it can have significant influence on their results. Prior work has shown that the collection does not cover all sub-fields of computer science in the same quality but has not provided an explanation for these differences. We introduce an extension of the DBLP data set which gives us a detailed picture on how DBLP has evolved since 1995. We show that the project started with a narrow focus on two sub-fields and discuss how additional themes have been added in recent years. We analyze the relations between sub-fields at different times and provide a model which explains the differences in coverage. | |||
| Analysis of Computer Science Communities Based on DBLP | | BIBAK | Full-Text | 228-235 | |
| Maria Biryukov; Cailing Dong | |||
| It is popular nowadays to bring techniques from bibliometrics and
scientometrics into the world of digital libraries to explore mechanisms which
underlie community development. In this paper we use the DBLP data to
investigate the author's scientific career, and analyze some of the computer
science communities. We compare them in terms of productivity and population
stability, and use these features to compare the sets of top-ranked conferences
with their lower ranked counterparts. Keywords: bibliographic databases; author profiling; scientific communities;
bibliometrics | |||
| Citation Graph Based Ranking in Invenio | | BIBAK | Full-Text | 236-247 | |
| Ludmila Marian; Jean-Yves LeMeur; Martin Rajman; Martin Vesely | |||
| Invenio is the web-based integrated digital library system developed at
CERN. Within this framework, we present four types of ranking models based on
the citation graph that complement the simple approach based on citation
counts: time-dependent citation counts, a relevancy ranking which extends the
PageRank model, a time-dependent ranking which combines the freshness of
citations with PageRank and a ranking that takes into consideration the
external citations. We present our analysis and results obtained on two main
data sets: Inspire and CERN Document Server. Our main contributions are: (i) a
study of the currently available ranking methods based on the citation graph;
(ii) the development of new ranking methods that correct some of the identified
limitations of the current methods such as treating all citations of equal
importance, not taking time into account or considering the citation graph
complete; (iii) a detailed study of the key parameters for these ranking
methods. Keywords: CDS; Invenio; Inspire; citation graph; PageRank; external citations; time
decay | |||
| A Search Log-Based Approach to Evaluation | | BIBA | Full-Text | 248-260 | |
| Junte Zhang; Jaap Kamps | |||
| Anyone offering content in a digital library is naturally interested in assessing its performance: how well does my system meet the users' information needs? Standard evaluation benchmarks have been developed in information retrieval that can be used to test retrieval effectiveness. However, these generic benchmarks focus on a single document genre, language, media-type, and searcher stereotype that is radically different from the unique content and user community of a particular digital library. This paper proposes to derive a domain-specific test collection from readily available interaction data in search logs files that captures the domain-specificity of digital libraries. We use as case study an archival institution's complete search log that spans over multiple years, and derive a large-scale test collection. We manually derive a set of topics judged by human experts -- based on a set of e-mail reference questions and responses from archivists -- and use this for validation. Our main finding is that we can derive a reliable and domain-specific test collection from search log files. | |||
| Determining Time of Queries for Re-ranking Search Results | | BIBA | Full-Text | 261-272 | |
| Nattiya Kanhabua; Kjetil Nørvåg | |||
| Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness. | |||
| Ranking Entities Using Web Search Query Logs | | BIBA | Full-Text | 273-281 | |
| Bodo Billerbeck; Gianluca Demartini; Claudiu S. Firan; Tereza Iofciu; Ralf Krestel | |||
| Searching for entities is an emerging task in Information Retrieval for which the goal is finding well defined entities instead of documents matching the query terms. In this paper we propose a novel approach to Entity Retrieval by using Web search engine query logs. We use Markov random walks on (1) Click Graphs -- built from clickthrough data -- and on (2) Session Graphs -- built from user session information. We thus provide semantic bridges between different query terms, and therefore indicate meaningful connections between Entity Retrieval queries and related entities. | |||
| Examining Group Work: Implications for the Digital Library as Sharium | | BIBAK | Full-Text | 282-293 | |
| Sandra Toze; Elaine G. Toms | |||
| Digital libraries have the potential to be rich interactive environments or
"shariums" that support students who work in groups to complete course work. To
understand how DLs might realize this potential, the processes of a single
group working on a complex project over a semester were analyzed. Findings
suggest that groups perform a range of tasks including administrative,
communication and information seeking and retrieval, and use multiple tools and
artifacts to accomplish their work. Over the course of the work, activities
shift from the individual to group illustrating the need for a complex system
that intertwines public and private work space. Currently DLs provide only one
tool -- search -- that a group might use, but do not fully support groupwork. Keywords: collaboration; group work; design; digital library; methodology | |||
| Architecture for a Collaborative Research Environment Based on Reading List Sharing | | BIBAK | Full-Text | 294-306 | |
| Gabriella Kazai; Paolo Manghi; Katerina Iatropoulou; Tim Haughton; Marko Mikulicic; Antonis Lempesis; Natasa Milic-Frayling; Natalia Manola | |||
| Scholarly research involves a systematic study of information sources in
order to establish facts and reach new conclusions. It encompasses survey,
analysis, evaluation, and creation as distinct phases that are performed
iteratively and often in parallel by accessing a range of local and remote
resources. Throughout these activities scholars create collections of relevant
work, ranging from publication references to new information acquired through
experiments or correspondence with other scholars. We use the term reading list
to refer to such collections. Existing software packages or web services for
managing publication lists, like CiteULike, lack integration with researchers'
workflow which may require access to both desktop and online resources. In this
paper we describe the architecture and system design of ScholarLynk, a desktop
tagging tool that enables researchers to build and maintain reading lists
across distributed data stores, in collaboration with other researchers. Keywords: Desktop tagging tool; scholarly research; reading lists | |||
| CritSpace: A Workspace for Critical Engagement within Cultural Heritage Digital Libraries | | BIBA | Full-Text | 307-314 | |
| Neal Audenaert; George Lucchese; Richard Furuta | |||
| Cultural heritage digital libraries hold promise both as a new tool for
representing the complex information structures frequently found in the
humanities and social sciences and as interactive environments that enable
scholars to work with this information in new ways throughout the research
project. Much attention has been paid to digitization, textual encoding,
metadata and dissemination of digital cultural heritage data. Scholars now
routinely turn toward electronic sources as a first step in their information
finding process. Considerably less attention, however, has been devoted to
understanding how to support the formative stages of scholarly research.
In this paper, we highlight our finding from a formative user study of scholarly analysis of source documents in several different fields. We discuss the implications of these results for our current research into designing a web-based creativity support environment for cultural heritage digital libraries. | |||
| German Encyclopedia Alignment Based on Information Retrieval Techniques | | BIBA | Full-Text | 315-326 | |
| Roman Kern; Michael Granitzer | |||
| Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance. | |||
| Lightweight Parsing of Classifications into Lightweight Ontologies | | BIBA | Full-Text | 327-339 | |
| Aliaksandr Autayeu; Fausto Giunchiglia; Pierre Andrews | |||
| Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. We analyze the natural language labels within classification by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classifications into lightweight ontologies required by semantic matching, search and classification algorithms. | |||
| Measuring Effectiveness of Geographic IR Systems in Digital Libraries -- Evaluation Framework and Case Study | | BIBA | Full-Text | 340-351 | |
| Damien Palacio; Guillaume Cabanac; Christian Sallaberry; Gilles Hubert | |||
| Common search engines process users' queries (i.e., information needs) by retrieving documents from pre-built term-based indexes. For digital libraries, such approaches are limited regarding particular contexts, such as specialized collections (e.g., cultural heritage collections) or specific retrieval criteria (e.g., multidimensional criteria). In this paper, we consider Information Retrieval systems exploiting geographic dimensions: spatial, temporal, and topical dimensions. Our contribution is twofold as we propose a Geographic Information Retrieval system evaluation framework and test the following hypothesis: combining spatial and temporal dimensions along with the topical dimension improves the effectiveness of Information Retrieval systems. | |||
| A Visual Digital Library Approach for Time-Oriented Scientific Primary Data | | BIBAK | Full-Text | 352-363 | |
| Jürgen Bernard; Jan Brase; Dieter W. Fellner; Oliver Koepler; Jörn Kohlhammer; Tobias Ruppert; Tobias Schreck; Irina Sens | |||
| Digital Library support for textual and certain types of non-textual
documents has significantly advanced over the last years. While Digital Library
support implies many aspects along the whole library workflow model,
interactive and visual retrieval allowing effective query formulation and
result presentation are important functions. Recently, new kinds of non-textual
documents which merit Digital Library support, but yet cannot be accommodated
by existing Digital Library technology, have come into focus. Scientific
primary data, as produced for example, by scientific experimentation, earth
observation, or simulation, is such a data type. We report on a concept and
first implementation of Digital Library functionality, supporting visual
retrieval and exploration in a specific important class of scientific primary
data, namely, time-oriented data. The approach is developed in an
interdisciplinary effort by experts from the library, natural sciences, and
visual analytics communities. In addition to presenting the concept and
discussing relevant challenges, we present results from a first implementation
of our approach as applied on a real-world scientific primary data set. Keywords: Visual Analysis; Visual Search; Content-Based Search; Scientific Primary
Data; Visual Cluster Analysis | |||
| DINAH, A Philological Platform for the Construction of Multi-structured Documents | | BIBA | Full-Text | 364-375 | |
| Pierre-Edouard Portier; Sylvie Calabretto | |||
| We consider how the construction of multi-structured documents implies the definition of structuration vocabularies. In a multi-users context, the growth of these vocabularies has to be controlled. Therefore, we propose using the trace of users activity to limit this growth and document the vocabularies. A user will, for example, be able to follow and annotate the track of a vocabulary concept: from its creation to the last time it was used. From a broader point of view, this work is grounded on our Web based philological platform, DINAH, and is mainly motivated by our collaboration with a group of philosophers studying the handwritten manuscripts of Jean-Toussaint Desanti. | |||
| The PROBADO Project -- Approach and Lessons Learned in Building a Digital Library System for Heterogeneous Non-textual Documents | | BIBA | Full-Text | 376-383 | |
| René Berndt; Ina Blümel; Michael Clausen; David Damm; Jürgen Diet; Dieter W. Fellner; Christian Fremerey; Reinhard Klein; Frank Krahl; Maximilian Scherer | |||
| The PROBADO project is a research effort to develop and operate advanced Digital Library support for non-textual documents. The main goal is to contribute to all parts of the Digital Library work flow from content acquisition over indexing to search and presentation. While not limited in terms of supported document types, reference support is developed for classical digital music and 3D architectural models. In this paper, we review the overall goals, approaches taken, and lessons learned so far in a highly integrated effort of university researchers and library experts. We address the problem of technology transfer, aspects of repository compilation, and the problem of inter-domain retrieval. The experiences are relevant for other project efforts in the non-textual Digital Library domain. | |||
| Capacity-Constrained Query Formulation | | BIBA | Full-Text | 384-388 | |
| Matthias Hagen; Benno Stein | |||
| Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed. | |||
| AAT-Taiwan: Toward a Multilingual Access to Cultural Objects | | BIBAK | Full-Text | 389-392 | |
| Shu-Jiun Chen; Diane Wu; Pei-Wen Peng; Yung-Ting Chang | |||
| This paper reports on current collaborative work between Taiwan e-Learning
and Digital Archives Program (TELDAP) and Getty Research Institute (GRI) in
developing the Chinese-language Art & Architecture Thesaurus (AAT-Taiwan)
which supports the unification of terminology used by various archiving
institutions for describing identical concepts. This work aims to establish a
conceptual framework for the digital library by providing controlled
vocabularies to index and catalogue the collection. With its multilingual
nature, AAT Taiwan is able to bridge Western and Eastern culture in an
integrated framework, and make our resources accessible worldwide. With its
hierarchical structure, it also enhances the effectiveness and
comprehensiveness of information retrieval in digital libraries. Keywords: digital library; multilingual thesaurus; knowledge organization system | |||
| Using Pattern Language as a Framework for Future Metadata Structure | | BIBA | Full-Text | 393-396 | |
| Esben Agerbæk Black | |||
| In the 1970's Christopher Alexander envisioned the "pattern language". It
contains an underlying philosophy [1] of what to accomplish by using pattern
language; it is this philosophy we tap into and apply to metadata planning.
Different collections needs different metadata to be of future use; this information has a structure, we aim to reuse knowledge of, and standardize the creation of these structures. We further believe pattern language will ease the transition of existing digital collections. | |||
| i-TEL-u: A Query Suggestion Tool for Integrating Heterogeneous Contexts in a Digital Library | | BIBA | Full-Text | 397-400 | |
| Maristella Agosti; Davide Cisco; Giorgio Maria Di Nunzio; Ivano Masiero; Massimo Melucci | |||
| This paper presents the design, implementation and evaluation of a query suggestion tool (named i-TEL-u) that allows for the management and the exploitation of different contexts in an integrated way within the same search interface for accessing the contents of The European Library portal. i-TEL-u allows users to seamlessly move from one context to another according to their information needs and to the way these needs evolve during the search session. The aim of this tool is to improve the search functionalities of the portal, attract many users and give them easy and effective access. | |||
| The Planets Testbed -- A Collaborative Research Environment for Digital Preservation | | BIBA | Full-Text | 401-404 | |
| Brian Aitken; Seamus Ross; Andrew Lindley; Edith Michaeler; Andrew N. Jackson; Maurice van den Dobbelsteen | |||
| The digital objects that are so fundamental to 21st century life may have a precarious future due to the rapid pace of technological change. Digital preservation specialists have proposed an almost overwhelming variety of preservation actions and tools that may help to mitigate this risk, but there is a lack of empirical evidence to help librarians, archivists and non-specialists to make an informed decision about the most applicable and effective preservation tools. The Planets project has developed a digital preservation Testbed that aims to provide such an evidence-base. | |||
| A Functionality Perspective on Digital Library Interoperability | | BIBA | Full-Text | 405-408 | |
| George Athanasopoulos; Edward A. Fox; Yannis E. Ioannidis; George Kakaletris; Natalia Manola; Carlo Meghini; Andreas Rauber; Dagobert Soergel | |||
| Digital Library (DL) interoperability requires addressing a variety of issues associated with functionality. We report on the analysis and solutions identified by the Functionality Working Group of the DL.org project during its deliberations on DL interoperability. Ultimately, we hope that work based on our perspective will lead to improved architectures and software, as well as to greater interoperability, for next-generation DL systems. | |||
| Overview and Results of the INEX 2009 Interactive Track | | BIBA | Full-Text | 409-412 | |
| Thomas Beckers; Norbert Fuhr; Nils Pharo; Ragnar Nordlie; Khairun Nisa Fachry | |||
| We present results of the INEX 2009 Interactive Track which focussed on how users behave in interactive search systems. Three types of working tasks based on a collection of book metadata were regarded. The results show differences with respect to the task types and point out improvements and new research questions for the next track in 2010. | |||
| SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size) | | BIBAK | Full-Text | 413-416 | |
| Jöran Beel; Bela Gipp; Ammar Shaker; Nick Friedrich | |||
| Extracting titles from a PDF's full text is an important task in information
retrieval to identify PDFs. Existing approaches apply complicated and expensive
(in terms of calculating power) machine learning algorithms such as Support
Vector Machines and Conditional Random Fields. In this paper we present a
simple rule based heuristic, which considers style information (font size) to
identify a PDF's title. In a first experiment we show that this heuristic
delivers better results (77.9% accuracy) than a support vector machine by
CiteSeer (69.4% accuracy) in an 'academic search engine' scenario and better
run times (8:19 minutes vs. 57:26 minutes). Keywords: header extraction; title extraction; style information; document analysis | |||
| Academic Publication Management with PUMA -- Collect, Organize and Share Publications | | BIBAK | Full-Text | 417-420 | |
| Dominik Benz; Andreas Hotho; Robert Jäschke; Gerd Stumme; Axel Halle; Angela Gerlach Sanches Lima; Helge Steenweg; Sven Stefani | |||
| The PUMA project fosters the Open Access movement und aims at a better
support of the researcher's publication work. PUMA stands for an integrated
solution, where the upload of a publication results automatically in an update
of both the personal and institutional homepage, the creation of an entry in a
social bookmarking systems like BibSonomy, an entry in the academic reporting
system of the university, and its publication in the institutional repository.
In this poster, we present the main features of our solution. Keywords: Publication Management; Puma; BibSonomy; Open Access; Institutional
Repository; Tagging; Bookmarking; Metadata Sharing | |||
| Using Mind Maps to Model Semistructured Documents | | BIBAK | Full-Text | 421-424 | |
| Alejandro Bia; Rafael Muñoz; Jaime Gómez | |||
| We often use UML diagrams for our software development projects, and also
for modeling XML DTDs and Schemas [1], finding that although UML diagrams can
effectively be made to represent DTDs and Schemas (either using Class or
Component diagrams), in real practice, complex DTDs and Schemas produce
unreadable, unmanageable, complex UML diagrams. Recently we started exploring
other types of diagrams and unconventional methods which can be both useful for
designing and modeling semistructured data, and as teaching aids or thinking
tools. This experience also served to open our minds to tools and methods other
than the recognized mainstream practices.
In this paper, we describe how we managed to use Mind Maps and a modified Freemind tool to successfully model, design, modify, import and export XML DTDs, XML Schemas (XSD and RNG) and also XML document instances, getting very manageable, easily comprehensible, folding diagrams. In this way, we converted a general purpose mind-mapping tool, into a very powerful tool for XML vocabulary design and simplification (and also for teaching XML markup, or for presentation purposes). Keywords: Visual Modeling; Mind Maps; XML; DTD; Schema | |||
| Towards a Public Library Digital Service Taxonomy | | BIBAK | Full-Text | 425-428 | |
| Steven Buchanan; David McMenemy | |||
| Recent research has identified inconsistency of public library digital
services, and associated problems of disparity and duplication, as a key
usability issue. The hypothesis of this research is that root cause is
inconsistent definition and specification of digital services, and that a
service taxonomy would facilitate resolution of this issue, providing a
classification scheme and controlled vocabulary. Reporting on initial research
to validate this hypothesis, which examined options available from 8 of 32
Scottish public library homepages; evidence of inconsistency of terminology and
organisation schemes was found, with navigation not always straightforward due
to a high number of loosely structured options being available from the
majority of sites sampled. Initial findings are discussed including planned
second stage research. Keywords: digital services; usability; service taxonomy; public libraries | |||
| Multimodal Image Collection Visualization Using Non-negative Matrix Factorization | | BIBA | Full-Text | 429-432 | |
| Jorge E. Camargo; Juan C. Caicedo; Fabio A. González | |||
| In this paper we address the problem of generating an image collection visualization in which images and text can be projected together. Given a collection of images with attached text annotations, we aim to find a common representation for both information sources to model latent correlations among the collection. Using the proposed latent representation, an image collection visualization is built, in which images and text can be projected simultaneously. The resulting image visualization allows to identify the relationships between images and text terms, allowing to understand the semantic structure of the collection. | |||
| A New Perspective on Collection Selection | | BIBAK | Full-Text | 433-436 | |
| Helen Dodd; George Buchanan; Matt Jones | |||
| Collection selection is traditionally a sub-problem of meta-search, and
identifies collections most likely to contain relevant documents. However, we
propose to treat collection selection as an independent search task with the
goal of identifying collections that are relevant as a whole; so the user may
return to them to serve future (related) information needs. Using a new
methodology and framework we evaluate the suitability of existing collection
selection algorithms for this search task, compared with a new algorithm
designed specifically for the task. Keywords: Collection selection; database selection; collection ranking | |||
| Creating a Flexible Preservation Infrastructure for Electronic Records | | BIBAK | Full-Text | 437-440 | |
| Karen Estlund; Heather Briston | |||
| As universities begin to address their first significant collections of
electronic records, the needs of the collections often outstrip the resources
and support available. This poster will illustrate the steps taken to
transition and preserve a presidential electronic records collection into an
university archives with limited systems support and preparation for future
preservation needs. The infrastructure created was designed to quickly ingest
at-risk records and allow for file migration and system evolution as future
technologies are implemented. Keywords: Digital Preservation; Digital Libraries; Preservation Planning;
Institutional Archives; Migration | |||
| Matching Intellectual Works for Rights Management in the European Library | | BIBAK | Full-Text | 441-444 | |
| Nuno Freire | |||
| This poster presents the work matching system implemented in The European
Library for identifying different publications with the same underlying
intellectual work. This work is contextualized in the rights management
framework of project ARROW, where The European Library is the main source of
bibliographic metadata as an aggregator of Europe's national library
catalogues. Keywords: copyright; entity matching; intellectual work; bibliographic metadata | |||
| Mopseus -- A Digital Library Management System Focused on Preservation | | BIBAK | Full-Text | 445-448 | |
| Dimitris Gavrilis; Christos Papatheodorou; Panos Constantopoulos; Stavros Angelis | |||
| This paper presents Mopseus, a Fedora-commons based digital repository that
focuses on preservation. An overview of the general architecture of the system
is presented along with some more in-depth details of its semantic structures.
Mopseus features dynamic RDF-based relations, a service for defining metadata
schemas, a built-in RDBMS synchronization and indexing mechanism, a mechanism
for migration from existing repositories and a built-in workflow engine. Keywords: Digital libraries; repository; digital preservation | |||
| Link Proximity Analysis -- Clustering Websites by Examining Link Proximity | | BIBAK | Full-Text | 449-452 | |
| Bela Gipp; Adriana Taylor; Jöran Beel | |||
| This research-in-progress paper presents a new approach called Link
Proximity Analysis (LPA) for identifying related web pages based on link
analysis. In contrast to current techniques, which ignore intra-page link
analysis, the one put forth here examines the relative positioning of links to
each other within websites. The approach uses the fact that a clear correlation
between the proximity of links to each other and the subject-relatedness of the
linked websites can be observed on nearly every web page. By statistically
analyzing this relationship and measuring the amount of sentences, paragraphs,
etc. between two links, related websites can be automatically, identified as a
first study has proven. Keywords: Web page; Website; clustering; Network Analysis; Link Analysis; Citation
Proximity Analysis | |||
| SliDL: A Slide Digital Library Supporting Content Reuse in Presentations | | BIBAK | Full-Text | 453-456 | |
| José Hilario Canós; María Isabel Marante; Manuel Llavador | |||
| Presentation building applications lack good support to slide reuse. In this
paper, we introduce SliDL, a digital library that facilitates slide reuse by
flattening the presentation-based structure of current systems and providing
slide retrieval facilities. The service-oriented architecture of SliDL enables
slide sharing between different applications. We have developed clients for
Microsoft PowerPoint 2007 and OpenOffice.org Impress. Keywords: Slide reuse; presentation management; Service-Oriented Architecture | |||
| Metadata Impact on Research Paper Similarity | | BIBA | Full-Text | 457-460 | |
| Germán Hurtado Martín; Steven Schockaert; Chris Cornelis; Helga Naessens | |||
| While collaborative filtering and citation analysis have been well studied for research paper recommender systems, content-based approaches typically restrict themselves to straightforward application of the vector space model. However, various types of metadata containing potentially useful information are usually available as well. Our work explores several methods to exploit this information in combination with different similarity measures. | |||
| Exploring the Influence of Tagging Motivation on Tagging Behavior | | BIBA | Full-Text | 461-465 | |
| Roman Kern; Christian Körner; Markus Strohmaier | |||
| The reasons why users tag have remained mostly elusive to quantitative investigations. In this paper, we distinguish between two types of motivation for tagging: While categorizers use tags mainly for categorizing resources for later browsing, describers use tags mainly for describing resources for later retrieval. To characterize users with regard to these different motivations, we introduce statistical measures and apply them to 7 different real-world tagging datasets. We show that while most taggers use tags for both categorizing and describing resources, different tagging systems lend themselves to different motivations for tagging. Additionally we show that the distinction between describers and categorizers can improve the performance of tag recommendation. | |||
| A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval | | BIBA | Full-Text | 466-469 | |
| Nádia P. Kozievitch; Ricardo da Silva Torres; Felipe S. P. Andrade; Uma Murthy; Edward A. Fox; Eric Hallerman | |||
| Parasitology is a basic course in life sciences curricula, but up to now it has few computer-assisted teaching tools. We present SuperIDR, a tool which supports annotation and search (based on a textual and a visual description) in the biodiversity domain. In addition, it provides a feature to aid comparison of morphological characteristics among different species. Preliminary results with two experiments show that students found the tool to be very useful, contributing to an alternative learning approach. | |||
| Framework for Logging and Exploiting the Information Retrieval Dialog | | BIBA | Full-Text | 470-473 | |
| Paul Landwich; Claus-Peter Klas; Matthias Hemmje | |||
| In this paper we present first results for a new approach of an innovative user interface for digital library and information retrieval systems. The leading thought bases on the fact that only the dialog between user and system can establish a necessary information context in order to satisfy an information need. We introduce a framework for information retrieval systems to handle activities and sets elaborated during a search process and a prototype tool integrated in an existing interface framework. Finally a description of a user study and expert interviews and their evaluation results conducted on the basis of the tool is given. | |||
| Defining the Dynamicity and Diversity of Text Collections | | BIBA | Full-Text | 474-477 | |
| Ilya Markov; Fabio Crestani | |||
| In Information Retrieval collections are often considered to be relatively dynamic or diverse, but no general definition has been given for these notions and no actual measure has been proposed to quantify them. We give intuitive definitions of the dynamicity and diversity properties of text collections and present measures for calculating them based on the notion of novelty. Experimental results show that the proposed measures are consistent with the definitions and can distinguish collections effectively according to their dynamicity and diversity properties. | |||
| Manuzio: A Model for Digital Annotated Text and Its Query/Programming Language | | BIBA | Full-Text | 478-481 | |
| Marek Maurizio; Renzo Orsini | |||
| More and more large repositories of texts which must be automatically processed represent their content through the use of descriptive markup languages. This method has been diffused by the availability of widely adopted standards like SGML and, later, XML, which made possible the definition of specific formats for many kinds of text, from literary texts (TEI) to web pages (XHTML). The markup approach has, however, several noteworthy shortcomings. First, we can encode easily only texts with a strict hierarchical structure while text has often concurrent hierarchies. Then, extra-textual information, like metadata or annotations, can be tied only to the same structure of the text and must be expressed as strings of the markup language. Third, queries and programs for the retrieval and processing of text must be expressed in terms of languages like XQuery [4], in which every document is represented as a tree of nodes; for this reason, in documents where parallel, overlapping structures exists, the complexity of XQuery programs becomes significantly higher. | |||
| Effective Term Weighting for Sentence Retrieval | | BIBA | Full-Text | 482-485 | |
| Saeedeh Momtazi; Matthew Lease; Dietrich Klakow | |||
| A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context? We investigate three simple term-weighting schemes for such estimation within the language modeling retrieval paradigm [6]. While the three schemes described are ad hoc, they address a principled estimation problem underlying the standard word unigram model. We also show these schemes enable better estimation of a state-of-the-art class model based on term clustering [5]. Using a TREC QA dataset, we evaluate the three weighting schemes for both word and class models on the QA subtask of sentence retrieval. Our inverse sentence frequency weighting scheme achieves over 5% absolute improvement in mean-average precision for the standard word model and nearly 2% absolute improvement for the class model. | |||
| User-Oriented Evaluation of Color Descriptors for Web Image Retrieval | | BIBAK | Full-Text | 486-489 | |
| Otávio Augusto Bizetto Penatti; Ricardo da Silva Torres | |||
| This paper proposes a methodology for effectiveness evaluation in
content-based image retrieval systems. The methodology is based on the opinion
of real users. This paper also presents the results of using this methodology
to evaluate color descriptors for Web image retrieval. The experiments were
performed using a database containing more than 230 thousand heterogeneous
images that represents the existing content on the Web. Keywords: user evaluation; color descriptors; content-based image retrieval; web | |||
| A Topic-Specific Web Search System Focusing on Quality Pages | | BIBAK | Full-Text | 490-493 | |
| Ari Pirkola; Tuomas Talvensaari | |||
| We describe a topic-specific Web search system focused on quality pages and
argue that there is a need for such quality-based topic-specific search tools.
The first implementation of the search system is available on the Web and it
deals with climate change. The key idea is to crawl (using a focused crawling
technique) in known trusted sites and in sites that are connected to them. We
also discuss the further development of the system and our future research. Our
project plan involves building a larger quality-based Web search system dealing
with many globally significant topics (in addition to climate change). Keywords: Digital libraries; Focused crawling; Vertical search engines; Web
information retrieval | |||
| Reliable Preservation of Interactive Environments and Workflows | | BIBA | Full-Text | 494-497 | |
| Klaus Rechert; Dirk von Suchodoletz; Randolph Welte; Felix Ruzzoli; Isgandar Valizada | |||
| The creation of most digital objects occurs solely in interactive graphical user interfaces which were available at a particular time period. Archiving and preservation organizations are posed with large amounts of such objects of various types. At some point they will need to automatically process these to make them available to their users or convert them to a commonly used format. We present methods and a system architecture for emulation services which enable the preservation of interactive environments and their workflows in a reliable manner. This system includes a framework for describing interactions with an interactive environment in an abstract manner, for supporting reliable playback in an automated way and finally for ensuring the preservation of specific operation knowledge by documenting and storing all components in a dedicated software archive. | |||
| Automated Country Name Disambiguation for Code Set Alignment | | BIBA | Full-Text | 498-501 | |
| Gramm Richardson | |||
| Multiple standards and encodings for names of countries, as well as multiple renderings of the country names themselves cause problems for interoperability. This impacts both human and automated processing. This paper describes an automated method for aligning pairs of country code sets by examining the string similarity between the names of the countries in each set. | |||
| LIFE-SHARE Project: Developing a Digitisation Strategy Toolkit | | BIBAK | Full-Text | 502-505 | |
| Beccy Shipman; Matthew Herring; Ned Potter; Bo Middleton | |||
| This poster will outline the Digitisation Strategy Toolkit created as part
of the LIFE-SHARE project. The toolkit is based on the lifecycle model created
by the LIFE project and explores the creation, acquisition, ingest,
preservation (bit-stream and content) and access requirements for a
digitisation strategy. This covers the policies and infrastructure required in
libraries to establish successful practices. The toolkit also provides both
internal and external resources to support the service. This poster will
illustrate how the toolkit works effectively to support digitisation with
examples from three case studies at the Universities of Leeds, Sheffield and
York. Keywords: digitisation; digital lifecycle; toolkit; strategies; libraries | |||
| Ensemble: A Distributed Portal for the Distributed Community of Computing Education | | BIBAK | Full-Text | 506-509 | |
| Frank M., III Shipman; Lillian N. Cassel; Edward A. Fox; Richard Furuta; Lois M. L. Delcambre; Peter Brusilovsky; B. Stephen, II Carpenter; Gregory W. Hislop; Stephen H. Edwards; Daniel D. Garcia | |||
| NSF's NSDL is composed of domain-oriented pathways. Ensemble is the pathway
for computing and supports the full range of computing education communities,
providing a base for the development of programs that blend computing with
other STEM areas (e.g., X-informatics and Computing + X), and producing digital
library innovations that can be propagated to other NSDL pathways. Computing is
a distributed community, including computer science, computer engineering,
software engineering, information science, information systems, and information
technology. Ensemble aims to provide much needed support for the many distinct
yet overlapping educational programs in computing and their associated
communities. To do this, Ensemble takes the form of a distributed portal
providing access to the broad range of existing educational resources while
preserving the collections and their associated curatorial processes. Ensemble
encourages contribution, use, reuse, review, and evaluation of educational
materials at multiple levels of granularity. Keywords: Ensemble; distributed portal; distributed community | |||
| A New Focus on End Users: Eye-Tracking Analysis for Digital Libraries | | BIBAK | Full-Text | 510-513 | |
| Jonathan Sykes; Milena Dobreva; Duncan Birrell; Emma McCulloch; Ian Ruthven; Yurdagül Ünal; Pierluigi Feliciati | |||
| Eye-tracking data was gathered as part of a user and functional evaluation
of the Europeana v1.0 prototype, to determine which areas of the interface
screen are most heavily used and which areas attract users' attention but are
not effectively used in search. Outputs from eye-tracking data can offer
insight into how advanced search functions can be made more intuitive for end
users with differing interests and abilities, and can be used to inform
continued interface development as digital libraries look to the future.
Results led to recommendations for the future development of the Europeana
digital library. Keywords: digital libraries; eye-tracking; gaze plots; heat maps; user studies | |||
| Digital Library Educational Module Development Strategies and Sustainable Enhancement by the Community | | BIBAK | Full-Text | 514-517 | |
| Seungwon Yang; Tarek Kanan; Edward A. Fox | |||
| The Digital Library Curriculum Development Project
(http://curric.dlib.vt.edu) team has been developing educational modules and
conducting field-tests internationally since January 2006. There had been three
approaches for module development in the past. The first approach was that the
project team members created draft modules (total of 9) and then those modules
were reviewed by the experts in the field as well as by other members of the
team. The second approach was that graduate student teams developed modules
under the supervision of an instructor and the project team. Four members in
each team collaborated for a single module. In total four modules were produced
in this way. The last approach was that five graduate students developed a
total of five modules, each module reviewed by two students. The completed
modules were posted in Wikiversity.org for wider distribution and collaborative
improvements by the community. The entire list of modules in the Digital
Library Educational Framework also can be found in that location. Keywords: digital libraries; curriculum; education; module development; development
strategy; wiki | |||
| Approach to Cross-Language Retrieval for Japanese Traditional Fine Art: Ukiyo-e Database | | BIBAK | Full-Text | 518-521 | |
| Biligsaikhan Batjargal; Fuminori Kimura; Akira Maeda | |||
| In this paper we introduce our system that retrieves Ukiyo-e databases using
an English query by customizing and utilizing freely available open source
software. In our system, the Ukiyo-e metadata elements were mapped to Dublin
Core. We adopted a dictionary-based query translation approach and utilized the
Greenstone Digital Library Software to make available our Ukiyo-e digital
collections online. The preliminary result is an easy-to-use and useful system
for users who do not understand Japanese, that allows to search and view
Japanese Ukiyo-e databases in English. Keywords: Ukiyo-e; Image database; Digital library; Cross-Language information
retrieval | |||
| Open Source Historical OCR: The OCRopodium Project | | BIBA | Full-Text | 522-525 | |
| Michael Bryant; Tobias Blanke; Mark Hedges; Richard Palmer | |||
| In this paper we present some initial results of OCRopodium project to build a scalable workflow for OCR of historical collections. Large-scale digitisation projects dealing with text-based historical material face challenges that are not well-catered-to by commercial software. Open source tools allow for better customisation to match these requirements, particularly with regard to character model training and per-project language modelling. | |||
| A Voice-Oriented Image Cataloguing Environment | | BIBAK | Full-Text | 526-529 | |
| José Hilario Canós; Carlos J. Castillo; Pablo Muñoz; Héctor Valero; Manuel Llavador | |||
| VOICE is a tool for cataloguing digital images using a voice-based user
interface. The goal of VOICE is to ease the introduction of descriptive
metadata associated to single images or collections of pictures, so that the
data entered can be used later for keyword-based image retrieval. We have
developed two versions of the tool, standalone VOICE and VOICE4Picasa. The
latter is and add-in to Picasa which calls the former without need to switch
from one application to the other one. In our demonstration, we will show the
features of both systems, adding metadata to pictures and using Picasa's
retrieval features to find images in our collections. Keywords: Image Cataloguing; Voice-based Interfaces; Speech Recognition | |||
| DMP Online: A Demonstration of the Digital Curation Centre's Web-Based Tool for Creating, Maintaining and Exporting Data Management Plans | | BIBA | Full-Text | 530-533 | |
| Martin Donnelly; Sarah Jones; John W. Pattenden-Fail | |||
| Funding bodies increasingly require researchers to produce Data Management Plans (DMPs). The Digital Curation Centre (DCC) has created DMP Online, a web-based tool which draws upon an analysis of funders' requirements to enable researchers to create and export customisable DMPs, both at the grant application stage and during the project's lifetime. | |||
| DiLiA -- The Digital Library Assistant | | BIBA | Full-Text | 534-537 | |
| Kathrin Eichler; Holmer Hemsen; Günter Neumann; Norbert Reithinger; Sven Schmeier; Kinga Schumacher; Inessa Seifert | |||
| In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -- single and multi word terms -- as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -- a combination of metadata and document processing. | |||
| Xeproc©: A Model-Based Approach towards Document Process Preservation | | BIBA | Full-Text | 538-541 | |
| Thierry Jacquin; Hervé Déjean; Jean-Pierre Chanod | |||
| Developed in the context of the EU Integrated Project SHAMAN, Xeproc© technology lets one define and design document processes while producing an abstract representation that is independent of the implementation. These representations capture the intent behind the workflow and can be preserved for reuse in future unknown infrastructures. Xeproc© is available under Eclipse Public Licence. | |||
| A Prototype Personalization System for the European Library Portal | | BIBAK | Full-Text | 542-545 | |
| Marialena Kyriakidi; Lefteris Stamatogiannakis; Mei Li Triantafyllidi; Maria Vayanou; Yannis E. Ioannidis | |||
| In this demonstration, we present a flexible system that enables the
provision of personalized functionalities to digital libraries. The system has
been developed based on the needs of The European Library portal and will be
demonstrated in that particular context, but could be applied more generally.
It implements a broad set of data processing, analysis, and mining techniques
over the portal's log files, using an environment called madIS. It enables
on-line result contextualization and adaptation through the development of REST
web services, which are responsible for retrieving and appropriately
integrating the extracted information. The demonstration also features a
web-based visualization tool for showing the output of the log analysis
performed. Keywords: Log mining; pattern extraction; profiling; personalization | |||
| Meta-Composer: Synthesizing Online FRBR Works from Library Resources | | BIBA | Full-Text | 546-549 | |
| Michalis Sfakakis; Panagiotis Staikos; Sarantos Kapidakis | |||
| Next generation display and indexing of cataloging records are mainly influenced from the development of the FRBR conceptual model. While the process for collecting all relevant bibliographic records in a catalogue to an FRBR work entity has been extensively developed and tested in non interactive (offline) environment, the corresponding process has not been explored when meta-searching. This work presents the implementation and use of alternative clustering algorithms and similarity metrics for the composition of the FRBR work entities in the configurable meta-search engine meta-Composer. Moreover, it introduces a tool for the evaluation of the composition methods, which can be used either as complementary to the configuration process for the use of the best clustering methods to the searched catalogues or as a general testbed for the evaluation of the FRBR work entities composition process. | |||
| Digital Library in a 3D Virtual World: The Digital Bleek and Lloyd Collection in Second Life | | BIBAK | Full-Text | 550-553 | |
| Rizmari Versfeld; Spencer J. Lee; Edward A. Fox; Hussein Suleman; Kyle Williams | |||
| This research explores and demonstrates the process of setting up a 3D
representation of a typical web-based digital library called 'The Digital Bleek
and Lloyd collection (lloydbleekcollection.cs.uct.ac.za)' in the popular 3D
virtual world, 'Second Life'. The processes of building, scripting, and
evaluation of the 3D exhibit are discussed. The report concludes that SL is a
good platform for this kind of cultural representation. At a university level
it could be used to showcase and share researchers' work. Keywords: Second Life; virtual worlds; 3D; Digital Libraries; Bleek and Lloyd; Bushman
heritage | |||