| Making web annotations persistent over time | | BIBAK | Full-Text | 1-10 | |
| Robert Sanderson; Herbert Van de Sompel | |||
| As Digital Libraries (DL) become more aligned with the web architecture,
their functional components need to be fundamentally rethought in terms of URIs
and HTTP. Annotation, a core scholarly activity enabled by many DL solutions,
exhibits a clearly unacceptable characteristic when existing models are applied
to the web: due to the representations of web resources changing over time, an
annotation made about a web resource today may no longer be relevant to the
representation that is served from that same resource tomorrow.
We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource. Keywords: annotation, digital preservation, persistence, web architecture | |||
| Transferring structural markup across translations using multilingual alignment and projection | | BIBAK | Full-Text | 11-20 | |
| David Bamman; Alison Babeu; Gregory Crane | |||
| We present here a method for automatically projecting structural information
across translations, including canonical citation structure (such as chapters
and sections), speaker information, quotations, markup for people and places,
and any other element in TEI-compliant XML that delimits spans of text that are
linguistically symmetrical in two languages. We evaluate this technique on two
datasets, one containing perfectly transcribed texts and one containing
errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags
from source documents to their transcribed translations, with an 83.6% accuracy
rate when projecting to texts containing uncorrected OCR. This approach has the
potential to allow a highly granular multilingual digital library to be
bootstrapped by applying the knowledge contained in a small, heavily curated
collection to a much larger but unstructured one. Keywords: annotation projection, knowledge transfer, multilingual alignment | |||
| ProcessTron: efficient semi-automated markup generation for scientific documents | | BIBAK | Full-Text | 21-28 | |
| Guido Sautter; Klemens Böhm; Conny Kühne; Tobias Mathäß | |||
| Digitizing legacy documents and marking them up with XML is important for
many scientific domains. However, creating comprehensive semantic markup of
high quality is challenging. Respective processes consist of many steps, with
automated markup generation and intermediate manual correction. These
corrections are extremely laborious. To reduce this effort, this paper makes
two contributions: First, it proposes ProcessTron, a lightweight
markup-process-control mechanism. ProcessTron assists users in two ways: It
ensures that the steps are executed in the appropriate order, and it points the
user to possible errors during manual correction. Second, ProcessTron has been
deployed in real-world projects, and this paper reports on our experiences. A
core observation is that ProcessTron more than halves the time users need to
mark up a document. Results from laboratory experiments, which we have
conducted as well, confirm this finding. Keywords: data-driven markup process control, semantic xml markup | |||
| Scholarly paper recommendation via user's recent research interests | | BIBAK | Full-Text | 29-38 | |
| Kazunari Sugiyama; Min-Yen Kan | |||
| We examine the effect of modeling a researcher's past works in recommending
scholarly papers to the researcher. Our hypothesis is that an author's
published works constitute a clean signal of the latent interests of a
researcher. A key part of our model is to enhance the profile derived directly
from past works with information coming from the past works' referenced papers
as well as papers that cite the work. In our experiments, we differentiate
between junior researchers that have only published one paper and senior
researchers that have multiple publications. We show that filtering these
sources of information is advantageous -- when we additionally prune noisy
citations, referenced papers and publication history, we achieve statistically
significant higher levels of recommendation accuracy. Keywords: digital library, information retrieval, recommendation, user modeling | |||
| Effective self-training author name disambiguation in scholarly digital libraries | | BIBAK | Full-Text | 39-48 | |
| Anderson A. Ferreira; Adriano Veloso; Marcos André Gonçalves; Alberto H. F. Laender | |||
| Name ambiguity in the context of bibliographic citation records is a hard
problem that affects the quality of services and content in digital libraries
and similar systems. Supervised methods that exploit training examples in order
to distinguish ambiguous author names are among the most effective solutions
for the problem, but they require skilled human annotators in a laborious and
continuous process of manually labeling citations in order to provide enough
training examples. Thus, addressing the issues of (i) automatic acquisition of
examples and (ii) highly effective disambiguation even when only few examples
are available, are the need of the hour for such systems. In this paper, we
propose a novel two-step disambiguation method, SAND (Self-training Associative
Name Disambiguator), that deals with these two issues. The first step
eliminates the need of any manual labeling effort by automatically acquiring
examples using a clustering method that groups citation records based on the
similarity among coauthor names. The second step uses a supervised
disambiguation method that is able to detect unseen authors not included in any
of the given training examples. Experiments conducted with standard public
collections, using the minimum set of attributes present in a citation (i.e.,
author names, work title and publication venue), demonstrated that our proposed
method outperforms representative unsupervised disambiguation methods that
exploit similarities between citation records and is as effective as, and in
some cases superior to, supervised ones, without manually labeling any training
example. Keywords: bibliographic citations, name disambiguation | |||
| Citing for high impact | | BIBAK | Full-Text | 49-58 | |
| Xiaolin Shi; Jure Leskovec; Daniel A. McFarland | |||
| The question of citation behavior has always intrigued scientists from
various disciplines. While general citation patterns have been widely studied
in the literature we develop the notion of citation projection graphs by
investigating the citations among the publications that a given paper cites. We
investigate how patterns of citations vary between various scientific
disciplines and how such patterns reflect the scientific impact of the paper.
We find that idiosyncratic citation patterns are characteristic for low impact
papers; while narrow, discipline-focused citation patterns are common for
medium impact papers. Our results show that crossing-community, or bridging
citation patters are high risk and high reward since such patterns are
characteristic for both low and high impact papers. Last, we observe that
recently citation networks are trending toward more bridging and
interdisciplinary forms. Keywords: citation networks, citation projection, publication impact | |||
| Evaluating methods to rediscover missing web pages from the web infrastructure | | BIBAK | Full-Text | 59-68 | |
| Martin Klein; Michael L. Nelson | |||
| Missing web pages (pages that return the 404 "Page Not Found error) are part
of the browsing experience. The manual use of search engines to rediscover
missing pages can be frustrating and unsuccessful. We compare four automated
methods for rediscovering web pages. We extract the page's title, generate the
page's lexical signature (LS), obtain the page's tags from the bookmarking
website delicious.com and generate a LS from the page's link neighborhood. We
use the output of all methods to query Internet search engines and analyze
their retrieval performance. Our results show that both LSs and titles perform
fairly well with over 60% URIs returned top ranked from Yahoo!. However, the
combination of methods improves the retrieval performance. Considering the
complexity of the LS generation, querying the title first and in case of
insufficient results querying the LSs second is the preferable setup. This
combination accounts for more than 75% top ranked URIs. Keywords: digital preservation, search engines, web page discovery | |||
| Search behaviors in different task types | | BIBAK | Full-Text | 69-78 | |
| Jingjing Liu; Michael J. Cole; Chang Liu; Ralf Bierig; Jacek Gwizdka; Nicholas J. Belkin; Jun Zhang; Xiangmin Zhang | |||
| Personalization of information retrieval tailors search towards individual
users to meet their particular information needs by taking into account
information about users and their contexts, often through implicit sources of
evidence such as user behaviors. Task types have been shown to influence search
behaviors including usefulness judgments. This paper reports on an
investigation of user behaviors associated with different task types.
Twenty-two undergraduate journalism students participated in a controlled lab
experiment, each searching on four tasks which varied on four dimensions:
complexity, task product, task goal and task level. Results indicate regular
differences associated with different task characteristics in several search
behaviors, including task completion time, decision time (the time taken to
decide whether a document is useful or not), and eye fixations, etc. We suggest
these behaviors can be used as implicit indicators of the user's task type. Keywords: eye tracking, information retrieval, personalization, task type, user
behavior | |||
| Exploiting time-based synonyms in searching document archives | | BIBAK | Full-Text | 79-88 | |
| Nattiya Kanhabua; Kjetil Nørvåg | |||
| Query expansion of named entities can be employed in order to increase the
retrieval effectiveness. A peculiarity of named entities compared to other
vocabulary terms is that they are very dynamic in appearance, and synonym
relationships between terms change with time. In this paper, we present an
approach to extracting synonyms of named entities over time from the whole
history of Wikipedia. In addition, we will use their temporal patterns as a
feature in ranking and classifying them into two types, i.e., time-independent
or time-dependent. Time-independent synonyms are invariant to time, while
time-dependent synonyms are relevant to a particular time period, i.e., the
synonym relationships change over time. Further, we describe how to make use of
both types of synonyms to increase the retrieval effectiveness, i.e., query
expansion with time-independent synonyms for an ordinary search, and query
expansion with time-dependent synonyms for a search wrt. temporal criteria.
Finally, through an evaluation based on TREC collections, we demonstrate how
retrieval performance of queries consisting of named entities can be improved
using our approach. Keywords: query expansion, synonym detection, temporal search | |||
| Using word sense discrimination on historic document collections | | BIBAK | Full-Text | 89-98 | |
| Nina Tahmasebi; Kai Niklas; Thomas Theuerkauf; Thomas Risse | |||
| Word sense discrimination is the first, important step towards automatic
detection of language evolution within large, historic document collections. By
comparing the found word senses over time, we can reveal and use important
information that will improve understanding and accessibility of a digital
archive. Algorithms for word sense discrimination have been developed while
keeping today's language in mind and have thus been evaluated on well selected,
modern datasets. The quality of the word senses found in the discrimination
step has a large impact on the detection of language evolution. Therefore, as a
first step, we verify that word sense discrimination can successfully be
applied to digitized historic documents and that the results correctly
correspond to word senses. Because accessibility of digitized historic
collections is influenced also by the quality of the optical character
recognition (OCR), as a second step we investigate the effects of OCR errors on
word sense discrimination results. All evaluations in this paper are performed
on The Times Archive, a collection of newspaper articles from 1785-1985. Keywords: OCR error impact, historic document collections, information extraction,
word sense discrimination | |||
| Chinese calligraphy specific style rendering system | | BIBAK | Full-Text | 99-108 | |
| Zhenting Zhang; Jiangqin Wu; Kai Yu | |||
| Manifesting the handwriting characters with the specific style of a famous
artwork is fascinating. In this paper, a system is built to render the user's
handwriting characters with a specific style. A stroke database is established
firstly. When rendering a character, the strokes are extracted and recognized,
then proper radicals and strokes are filtered, finally these strokes are
deformed and the result is generated. The Special Nine Grid (SNG) is presented
to help recognize radicals and strokes. The Rule-base Stroke Deformation
Algorithm (RSDA) is proposed to deform the original strokes according to the
handwriting strokes. The rendering result manifests the specific style with
high quality. It is feasible for people to generate the tablet or other
artworks with the proposed system. Keywords: rule-base stroke deformation, special nine grid, specific style rendering | |||
| Translating handwritten bushman texts | | BIBAK | Full-Text | 109-118 | |
| Kyle Williams; Hussein Suleman | |||
| The Bleek and Lloyd Collection is a collection of artefacts documenting the
life and language of the Bushman people of southern Africa in the 19th century.
Included in this collection is a handwritten dictionary that contains English
words and their corresponding |xam Bushman language translations. This
dictionary allows for the manual translation of |xam words that appear in the
notebooks of the Bleek and Lloyd collection. This, however, is not practical
due to the size of the dictionary, which contains over 14000 entries. To solve
this problem a content-based image retrieval system was built that allows for
the selection of a |xam word from a notebook and returns matching words from
the dictionary. The system shows promise with some search keys returning
relevant results. Keywords: CBIR, cultural heritage preservation, digital libraries, handwritten
manuscripts, image processing, information retrieval | |||
| Do Wikipedians follow domain experts?: a domain-specific study on Wikipedia knowledge building | | BIBAK | Full-Text | 119-128 | |
| Yi Zhang; Aixin Sun; Anwitaman Datta; Kuiyu Chang; Ee-Peng Lim | |||
| Wikipedia is one of the most successful online knowledge bases, attracting
millions of visits daily. Not surprisingly, its huge success has in turn led to
immense research interest for a better understanding of the collaborative
knowledge building process. In this paper, we performed a (terrorism)
domain-specific case study, comparing and contrasting the knowledge evolution
in Wikipedia with a knowledge base created by domain experts. Specifically, we
used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We
identified 409 Wikipedia articles matching TKB records, and went ahead to study
them from three aspects: creation, revision, and link evolution. We found that
the knowledge building in Wikipedia had largely been independent, and did not
follow TKB -- despite the open and online availability of the latter, as well
as awareness of at least some of the Wikipedia contributors about the TKB
source. In an attempt to identify possible reasons, we conducted a detailed
analysis of contribution behavior demonstrated by Wikipedians. It was found
that most Wikipedians contribute to a relatively small set of articles each.
Their contribution was biased towards one or very few article(s). At the same
time, each article's contributions are often championed by very few active
contributors including the article's creator. We finally arrive at a conjecture
that the contributions in Wikipedia are more to cover knowledge at the article
level rather than at the domain level. Keywords: Wikipedia, contributing behavior, knowledge building | |||
| Spatiotemporal mapping of Wikipedia concepts | | BIBAK | Full-Text | 129-138 | |
| Adrian Popescu; Gregory Grefenstette | |||
| Space and time are important dimensions in the representation of a large
number of concepts. However there exists no available resource that provides
spatiotemporal mappings of generic concepts. Here we present a link-analysis
based method for extracting the main locations and periods associated to all
Wikipedia concepts. Relevant locations are selected from a set of geotagged
articles, while relevant periods are discovered using a list of people with
associated life periods. We analyze article versions over multiple languages
and consider the strength of a spatial/temporal reference to be proportional to
the number of languages in which it appears. To illustrate the utility of the
spatiotemporal mapping of Wikipedia concepts, we present an analysis of
cultural interactions and a temporal analysis of two domains. The Wikipedia
mapping can also be used to perform rich spatiotemporal document indexing by
extracting implicit spatial and temporal references from texts. Keywords: concept, cultural, interaction, multilinguism, spatial-temporal, wikipedia | |||
| Crowdsourcing the assembly of concept hierarchies | | BIBAK | Full-Text | 139-148 | |
| Kai Eckert; Mathias Niepert; Christof Niemann; Cameron Buckner; Colin Allen; Heiner Stuckenschmidt | |||
| The "wisdom of crowds" is accomplishing tasks that are cumbersome for
individuals yet cannot be fully automated by means of specialized computer
algorithms. One such task is the construction of thesauri and other types of
concept hierarchies. Human expert feedback on the relatedness and relative
generality of terms, however, can be aggregated to dynamically construct
evolving concept hierarchies. The InPhO (Indiana Philosophy Ontology) project
bootstraps feedback from volunteer users unskilled in ontology design into a
precise representation of a specific domain. The approach combines statistical
text processing methods with expert feedback and logic programming to create a
dynamic semantic representation of the discipline of philosophy.
In this paper, we show that results of comparable quality can be achieved by leveraging the workforce of crowdsourcing services such as the Amazon Mechanical Turk (AMT). In an extensive empirical study, we compare the feedback obtained from AMT's workers with that from the InPhO volunteer users providing an insight into qualitative differences of the two groups. Furthermore, we present a set of strategies for assessing the quality of different users when gold standards are missing. We finally use these methods to construct a concept hierarchy based on the feedback acquired from AMT workers. Keywords: crowdsourcing, similarity, thesaurus learning | |||
| A user-centered design of a personal digital library for music exploration | | BIBAK | Full-Text | 149-158 | |
| David Bainbridge; Brook J. Novak; Sally Jo Cunningham | |||
| We describe the evaluation of a personal digital library environment
designed to help musicians capture, enrich and store their ideas using a
spatial hypermedia paradigm. The target user group is musicians who primarily
use audio and text for composition and arrangement, rather than with formal
music notation. Using the principle of user-centered design, the software
implementation was guided by a diary study involving nine musicians which
suggested five requirements for the software to support: capturing,
overdubbing, developing, storing, and organizing. Moreover, the underlying
spatial data-model was exploited to give raw audio compositions a hierarchical
structure, and -- to aid musicians in retrieving previous ideas -- a search
facility is available to support both query by humming and text-based queries.
A user evaluation of the completed design with eleven subjects indicated that
musicians, in general, would find the hypermedia environment useful for
capturing and managing their moments of musical creativity and exploration.
More specifically they would make use of the query by humming facility and the
hierarchical track organization, but not the overdubbing facility as
implemented. Keywords: music composition, personal digital music library, spatial hypermedia | |||
| Improving mood classification in music digital libraries by combining lyrics and audio | | BIBAK | Full-Text | 159-168 | |
| Xiao Hu; J. Stephen Downie | |||
| Mood is an emerging metadata type and access point in music digital
libraries (MDL) and online music repositories. In this study, we present a
comprehensive investigation of the usefulness of lyrics in music mood
classification by evaluating and comparing a wide range of lyric text features
including linguistic and text stylistic features. We then combine the best
lyric features with features extracted from music audio using two fusion
methods. The results show that combining lyrics and audio significantly
outperformed systems using audio-only features. In addition, the examination of
learning curves shows that the hybrid lyric + audio system needed fewer
training samples to achieve the same or better classification accuracies than
systems using lyrics or audio singularly. These experiments were conducted on a
unique large-scale dataset of 5,296 songs (with both audio and lyrics for each)
representing 18 mood categories derived from social tags. The findings push
forward the state-of-the-art on lyric sentiment analysis and automatic music
mood classification and will help make mood a practical access point in music
digital libraries. Keywords: audio features, feature fusion, lyric sentiment analysis, music digital
libraries, music mood classification, supervised learning | |||
| Visualizing personal digital collections | | BIBAK | Full-Text | 169-172 | |
| Weijia Xu; Maria Esteva; Suyog Dott Jain | |||
| This paper describes the use of relational database management system
(RDBMS) and treemap visualization to represent and analyze a group of personal
digital collections created in the context of work and with no external
metadata. We evaluated the visualization vis a vis the results of previous
personal information management (PIM) studies. We suggest that this
visualization supports analysis that allow understanding PIM practices
overtime. Keywords: database applications, digital collections, information visualization,
personal information management (PIM), treemap | |||
| Interpretation of web page layouts by blind users | | BIBAK | Full-Text | 173-176 | |
| Luis Francisco-Revilla; Jeff Crow | |||
| Digital libraries must support assistive technologies that allow people with
disabilities such as blindness to use, navigate and understand their documents.
Increasingly, many documents are Web-based and present their contents using
complex layouts. However, approaches that translate two-dimensional layouts to
one-dimensional speech produce a very different user experience and loss of
information. To address this issue, we conducted a study of how blind people
navigate and interpret layouts of news and shopping Web pages using current
assistive technology. The study revealed that blind people do not parse Web
pages fully during their first visit, and that they can miss important parts.
The study also provided insights for improving assistive technologies. Keywords: assistive technology, blind users, web page layouts | |||
| Supporting document triage via annotation-based multi-application visualizations | | BIBAK | Full-Text | 177-186 | |
| Soonil Bae; DoHyoung Kim; Konstantinos Meintanis; J. Michael Moore; Anna Zacchi; Frank Shipman; Haowei Hsieh; Catherine C. Marshall | |||
| For open-ended information tasks, users must sift through many potentially
relevant documents, a practice we refer to as document triage. Normally, people
perform triage using multiple applications in concert: a search engine
interface presents lists of potentially relevant documents; a document reader
displays their contents; and a third tool -- a text editor or personal
information management application -- is used to record notes and assessments.
To support document triage, we have developed an extensible multi-application
architecture that initially includes an information workspace and a document
reader. An Interest Profile Manager infers users' interests from their
interactions with the triage applications, coupled with the characteristics of
the documents they are interacting with. The resulting interest profile is used
to generate visualizations that direct users' attention to documents or parts
of documents that match their inferred interests. The novelty of our approach
lies in the aggregation of activity records across applications to generate
fine-grained models of user interest. Keywords: document triage, multi-application user modeling, visualization | |||
| Flexible access to photo libraries via time, place, tags, and visual features | | BIBAK | Full-Text | 187-196 | |
| Andreas Girgensohn; Frank Shipman; Thea Turner; Lynn Wilcox | |||
| Photo libraries are growing in quantity and size, requiring better support
for locating desired photographs. MediaGLOW is an interactive visual workspace
designed to address this concern. It uses attributes such as visual appearance,
GPS locations, user-assigned tags, and dates to filter and group photos. An
automatic layout algorithm positions photos with similar attributes near each
other to support users in serendipitously finding multiple relevant photos. In
addition, the system can explicitly select photos similar to specified photos.
We conducted a user evaluation to determine the benefit provided by similarity
layout and the relative advantages offered by the different layout similarity
criteria and attribute filters. Study participants had to locate photos
matching probe statements. In some tasks, participants were restricted to a
single layout similarity criterion and filter option. Participants used
multiple attributes to filter photos. Layout by similarity without additional
filters turned out to be one of the most used strategies and was especially
beneficial for geographical similarity. Lastly, the relative appropriateness of
the single similarity criterion to the probe significantly affected retrieval
performance. Keywords: geographic data, photo libraries, photo retrieval, similarity criteria,
tagged photos, visual similarity | |||
| Interactively browsing movies in terms of action, foreshadowing and resolution | | BIBAK | Full-Text | 197-200 | |
| Stewart Greenhill; Brett Adams; Svetha Venkatesh | |||
| We describe a novel video player that uses Temporal Semantic Compression
(TSC) to present a compressed summary of a movie. Compression is based on tempo
which is derived from film rhythms. The technique identifies periods of action,
drama, foreshadowing and resolution, which can be mixed in different amounts to
vary the kind of summary presented. The compression algorithm is embedded in a
video player, so that the summary can be interactively recomputed during
playback. Keywords: compression, media aesthetics, video browsing | |||
| Timeline interactive multimedia experience (time): on location access to aggregate event information | | BIBAK | Full-Text | 201-204 | |
| Jeff Crow; Eryn Whitworth; Ame Wongsa; Luis Francisco-Revilla; Swati Pendyala | |||
| Attending a complex scheduled social event, such as a multi-day music
festival, requires a significant amount of planning before and during its
progression. Advancements in mobile technology and social networks enable
attendees to contribute content in real-time that can provide useful
information to many. Currently access to and presentation of such information
is challenging to use during an event. The Timeline Interactive Multimedia
Experience (TIME) system aggregates information posted to multiple social
networks and presents the flow of information in a multi-touch timeline
interface. TIME was designed to be placed on location to allow real-time access
to relevant information that helps attendees to make plans and navigate their
crowded surroundings. Keywords: complex scheduled events, events, multi-touch, planning, social media,
timeline | |||
| Domain-specific iterative readability computation | | BIBAK | Full-Text | 205-214 | |
| Jin Zhao; Min-Yen Kan | |||
| We present a new algorithm to measure domain-specific readability. It
iteratively computes the readability of domain-specific resources based on the
difficulty of domain-specific concepts and vice versa, in a style reminiscent
of other bipartite graph algorithms such as Hyperlink-Induced Topic Search
(HITS) and the Stochastic Approach for Link-Structure Analysis (SALSA). While
simple, our algorithm outperforms standard heuristic measures and remains
competitive among supervised-learning approaches. Moreover, it is less
domain-dependent and portable across domains as it does not rely on an
annotated corpus or expensive expert knowledge that supervised or
domain-specific methods require. Keywords: domain-specific information retrieval, graph-based algorithm, iterative
computation, readability measure | |||
| Evaluating topic models for digital libraries | | BIBAK | Full-Text | 215-224 | |
| David Newman; Youn Noh; Edmund Talley; Sarvnaz Karimi; Timothy Baldwin | |||
| Topic models could have a huge impact on improving the ways users find and
discover content in digital libraries and search interfaces through their
ability to automatically learn and apply subject tags to each and every item in
a collection, and their ability to dynamically create virtual collections on
the fly. However, much remains to be done to tap this potential, and
empirically evaluate the true value of a given topic model to humans. In this
work, we sketch out some sub-tasks that we suggest pave the way towards this
goal, and present methods for assessing the coherence and interpretability of
topics learned by topic models. Our large-scale user study includes over 70
human subjects evaluating and scoring almost 500 topics learned from
collections from a wide range of genres and domains. We show how scoring model
-- based on pointwise mutual information of word-pair using Wikipedia, Google
and MEDLINE as external data sources -- performs well at predicting human
scores. This automated scoring of topics is an important first step to
integrating topic modeling into digital libraries. Keywords: evaluation, topic models, topic quality, user studies | |||
| FRBRization of MARC records in multiple catalogs | | BIBAK | Full-Text | 225-234 | |
| Hugo Miguel Álvaro Manguinhas; Nuno Miguel Antunes Freire; José Luis Brinquete Borbinha | |||
| This paper addresses the problem of using the FRBR model to support the
presentation of results. It describes a service implementing new algorithms and
techniques for transforming existing MARC records into the FRBR model for this
specific purpose. This work was developed in the context of the TELPlus project
and processed 100,000 bibliographic and authority records from multilingual
catalogs of 12 European countries. Keywords: FRBR, FRBRization, bibliographic records, multilingual catalogs | |||
| Exposing the hidden web for chemical digital libraries | | BIBAK | Full-Text | 235-244 | |
| Sascha Tönnies; Benjamin Köhncke; Oliver Koepler; Wolf-Tilo Balke | |||
| In recent years, the vast amount of digitally available content has lead to
the creation of many topic-centered digital libraries. Also in the domain of
chemistry more and more digital collections are available, but the complex
query formulation still hampers their intuitive adoption. This is because
information seeking in chemical documents is focused on chemical entities, for
which current standard search relies on complex structures which are hard to
extract from documents. Moreover, although simple keyword searches would often
be sufficient, current collections simply cannot be indexed by Web search
providers due to the ambiguity of chemical substance names. In this paper we
present a framework for automatically generating metadata-enriched index pages
for all documents in a given chemical collection. All information is then
linked to the respective documents and thus provides an easy to crawl metadata
repository promising to open up digital chemical libraries. Our experiments,
indexing an open access journal, show that not only the documents can be found
using a simple Google search via the automatically created index pages, but
also that the quality of the search is much more efficient than fulltext
indexing in terms of both precision/recall and performance. Finally, we compare
our indexing against a classical structure search and figured out that
keyword-based search can indeed solve at least some of the daily tasks in
chemical workflows. To use our framework thus promises to expose a large part
of the currently still hidden chemical Web, making the techniques employed
interesting for chemical information providers like digital libraries and open
access journals. Keywords: chemical digital collections, digital libraries, hidden web, information
extraction, information retrieval, web search | |||
| oreChem ChemXSeer: a semantic digital library for chemistry | | BIBAK | Full-Text | 245-254 | |
| Na Li; Leilei Zhu; Prasenjit Mitra; Karl Mueller; Eric Poweleit; C. Lee Giles | |||
| Representing the semantics of unstructured scientific publications will
certainly facilitate access and search and hopefully lead to new discoveries.
However, current digital libraries are usually limited to classic flat
structured metadata even for scientific publications that potentially contain
rich semantic metadata. In addition, how to search the scientific literature of
linked semantic metadata is an open problem. We have developed a semantic
digital library oreChem ChemxSeer that models chemistry papers with
semantic metadata. It stores and indexes extracted metadata from a chemistry
paper repository Chemx Seer using "compound objects".
We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) (http://www.openarchives.org/ore/ standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents. Keywords: ChemXSeer, digital library, metadata extraction, oai-ore, seersuite,
semantic web, support vector machines | |||
| BinarizationShop: a user-assisted software suite for converting old documents to black-and-white | | BIBAK | Full-Text | 255-258 | |
| Fanbo Deng; Zheng Wu; Zheng Lu; Michael S. Brown | |||
| Converting a scanned document to a binary format (black and white) is a key
step in the digitization process. While many existing binarization algorithms
operate robustly for well-kept documents, these algorithms often produce less
than satisfactory results when applied to old documents, especially those
degraded with stains and other discolorations. For these challenging documents,
user assistance can be advantageous in directing the binarization procedure.
Many existing algorithms, however, are poorly designed to incorporate user
assistance. In this paper, we discuss a software framework, BinarizationShop,
that combines a series of binarization approaches that have been tailored to
exploit user assistance. This framework provides a practical approach for
converting difficult documents to black and white. Keywords: binarization, document processing, user-assisted software | |||
| Using an ontology and a multilingual glossary for enhancing the nautical archaeology digital library | | BIBAK | Full-Text | 259-262 | |
| Carlos Monroy; Richard Furuta; Filipe Castro | |||
| Access to materials in digital collections has been extensively studied
within digital libraries. Exploring a collection requires customized indices
and novel interfaces to allow users new exploration mechanisms. Materials or
objects can then be found by way of full-text, faceted, or thematic indexes.
There has been a marked interest not only in finding objects in a collection,
but in discovering relationships and properties. For example, multiple
representations of the same object enable the use of visual aids to augment
collection exploration. Depending on the domain and characteristics of the
objects in a collection, relationships among components can be used to enrich
the process of understanding their contents. In this context, the Nautical
Archaeology Digital Library (NADL) includes multilingual textual- and
visual-rich objects (shipbuilding treatises, illustrations, photographs, and
drawings). In this paper we describe an approach for enhancing access to a
collection of ancient technical documents, illustrations, and photographs
documenting archaeological excavations. Because of the nature of our
collection, we exploit a multilingual glossary along with an ontology.
Preliminary tests of our prototype suggest the feasibility of our method for
enhancing access to the collection. Keywords: information retrieval, interfaces, multilingual technical manuscripts,
nautical archaeology, ship reconstruction, technical documents | |||
| In-depth utilization of Chinese ancient maps: a hybrid approach to digitizing map resources in CADAL | | BIBAK | Full-Text | 263-272 | |
| Zhenchao Ye; Ling Zhuang; Jiangqin Wu; Chenyang Du; Baogang Wei; Yin Zhang | |||
| Digital map is getting increasingly popular as an intuitive and interactive
platform for data presentation recently. Thus applications integrated with
digital map have attracted much attention. But no off-the-shelf systems or
services could we use if the time span of maps be extended to historical ones.
There are a large number of valuable ancient atlases in CADAL digital library.
However, they are seldom made use of because the ones which are in image format
are not convenient for users to read or search. In this paper, we propose a
novel hybrid approach to utilizing these atlases directly and constructing some
applications based on ancient maps. We call it CAMAME which means Chinese
Ancient Maps Automatic Marking and Extraction. We create a gazetteer to store
the geographic information of sites which will be project on the map, then use
kernel method to do the regression and correct the estimated results with image
processing and local regression methods. The empirical results show that CAMAME
is effective and efficient, by which most valuable data in the map images is
marked and identified. Some Chinese literary chronicle applications that
exhibit ancient literary and related historical information over those
digitized atlas resources in CADAL digital library were developed. Keywords: atlases, digital library, image processing, kernel method | |||
| The fused library: integrating digital and physical libraries with location-aware sensors | | BIBAK | Full-Text | 273-282 | |
| George R. Buchanan | |||
| This paper reports an investigation into the connection of the workspace of
physical libraries with digital library services. Using simple sensor
technology, we provide focused access to digital resources on the basis of the
user's physical context, including the topic of the stacks they are next to,
and the content of books on their reading desks. Our research developed the
technological infrastructure to support this fused interaction, investigated
current patron behavior in physical libraries, and evaluated our system in a
user-centred pilot study. The outcome of this research demonstrates the
potential utility of the fused library, and provides a starting point for
future exploitation. Keywords: digital libraries, human factors, physical interaction | |||
| What humanists want: how scholars use source materials | | BIBAK | Full-Text | 283-292 | |
| Neal Audenaert; Richard Furuta | |||
| Despite the growing prominence of digital libraries as tools to support
humanities scholars, little is known about the work practices and needs of
these scholars as they pertain to working with source documents. In this paper
we present our findings from a formative user study consisting of
semi-structured interviews with eight scholars.
We find that the use of source materials (by which we mean the original physical documents or digital facsimiles with minimal editorial intervention) in scholarship is not a simple, straight-forward examination of a document in isolation. Instead, scholars study source materials as an integral part of a complex ecosystem of inquiry that seeks to understand both the text being studied and the context in which that text was created, transmitted and used. Drawing examples from our interviews, we address critical questions of why scholars use source documents and what information they hope to gain by studying them. We also briefly summarize key note-taking practices as a means for assessing the potential to design user interfaces that support scholarly work-practices. Keywords: digital humanities, source documents, user studies | |||
| Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries | | BIBAK | Full-Text | 293-302 | |
| M. A. Angrosh; Stephen Cranefield; Nigel Stanger | |||
| Identification of contexts associated with sentences is becoming
increasingly necessary for developing intelligent information retrieval
systems. This article describes a supervised learning mechanism employing a
conditional random field (CRF) for context identification and sentence
classification. Specifically, we focus on sentences in related work sections in
research articles. Based on a generic rhetorical pattern, a framework for
modelling the sequential flow in these sections is proposed. Adopting a
generalization strategy, each of these sentences is transformed into a set of
features, which forms our dataset. We distinguish between two kinds of features
for each of these sentences viz., citation features and sentence features.
While an overall accuracy of 96.51% is achieved by using a combination of both
citation and sentence features, the use of sentence features alone yields an
accuracy of 93.22%. The results also show F-Scores ranging from 0.99 to 0.90
for various classes indicating the robustness of our application. Keywords: citation classification, conditional random fields, linear chain CRFs,
sentence classification | |||
| Can an intermediary collection help users search image databases without annotations? | | BIBAK | Full-Text | 303-312 | |
| Robert Villa; Martin Halvey; Hideo Joho; David Hannah; Joemon M. Jose | |||
| Developing methods for searching image databases is a challenging and
ongoing area of research. A common approach is to use manual annotations,
although generating annotations can be expensive in terms of time and money,
and therefore may not be justified in many situations. Content-based search
techniques which extract visual features from image data can be used, but users
are typically forced to express their information need using example images, or
through sketching interfaces. This can be difficult if no visual example of the
information need is available, or when the information need cannot be easily
drawn.
In this paper, we consider an alternative approach which allows a user to search for images through an intermediate database. In this approach, a user can search using text in the intermediate database as a way of finding visual examples of their information need. The visual examples can then be used to search a database that lacks annotations. Three experiments are presented which investigate this process. The first experiment automatically selects the image queries from the intermediary database; the second instead uses images which have been hand-picked by users. A third experiment, an interactive study, is then presented this study compares the intermediary interface to text search, where we consider text as an upper bound of performance. For this last study, an interface which supports the intermediary search process is described. Results show that while performance does not match manual annotations, users are able to find relevant material without requiring collection annotations. Keywords: content-based image retrieval, search strategies | |||
| Social network document ranking | | BIBAK | Full-Text | 313-322 | |
| Liang Gou; Xiaolong (Luke) Zhang; Hung-Hsuan Chen; Jung-Hyun Kim; C. Lee Giles | |||
| In search engines, ranking algorithms measure the importance and relevance
of documents mainly based on the contents and relationships between documents.
User attributes are usually not considered in ranking. This user-neutral
approach, however, may not meet the diverse interests of users, who may demand
different documents even with the same queries. To satisfy this need for more
personalized ranking, we propose a ranking framework. Social Network Document
Rank (SNDocRank), that considers both document contents and the relationship
between a searched and document owners in a social network. This method
combined the traditional tf-idf ranking for document contents with out
Multi-level Actor Similarity (MAS) algorithm to measure to what extent document
owners and the searcher are structurally similar in a social network. We
implemented our ranking method in simulated video social network based on data
extracted from YouTube and tested its effectiveness on video search. The
results show that compared with the traditional ranking method like tf-idfs the
SNDocRank algorithm returns more relevant documents. More specifically, a
searcher can get significantly better results be being in a larger social
network, having more friends, and being associated with larger local
communities in a social network. Keywords: information retrieval, multilevel actor similarity, ranking, social networks | |||
| A mathematical framework for modeling and analyzing migration time | | BIBAK | Full-Text | 323-332 | |
| Feng Luan; Mads Nygård; Thomas Mestl | |||
| File format obsolescence has so far been considered the major risk in
long-term storage of digital objects. There are, however, growing indications
that file transfer may be a real threat as the migration time, i.e., the time
required to migrate Petabytes of data, may easily spend years. However,
hardware support is usually limited to 3-4 years and a situation can emerge
when a new migration has to be started although the previous one is still not
finished yet. This paper chooses a process modeling approach to obtain
estimates of upper and lower bounds for the required migration time. The
advantage is that information about potential bottlenecks can be acquired. Our
theoretical considerations are validated by migration tests at the National
Library of Norway (NB) as well as at our department. Keywords: long-term preservation, migration, performance, process modeling, storage | |||
| Digital libraries for scientific data discovery and reuse: from vision to practical reality | | BIBAK | Full-Text | 333-340 | |
| Jillian C. Wallis; Matthew S. Mayernik; Christine L. Borgman; Alberto Pepe | |||
| Science and technology research is becoming not only more distributed and
collaborative, but more highly instrumented. Digital libraries provide a means
to capture, manage, and access the data deluge that results from these research
enterprises. We have conducted research on data practices and participated in
developing data management services for the Center for Embedded Networked
Sensing since its founding in 2002 as a National Science Foundation Science and
Technology Center. Over the course of eight years, our digital library strategy
has shifted dramatically in response to changing technologies, practices, and
policies. We report on the development of several DL systems and on the lessons
learned, which include the difficulty of anticipating data requirements from
nascent technologies, building systems for highly diverse work practices and
data types, the need to bind together multiple single-purpose systems, the lack
of incentives to manage and share data, the complementary nature of research
and development in understanding practices, and sustainability. Keywords: collaborative research, cyberinfrastructure, data deluge, distributed
research, escience | |||
| Ensemble PDP-8: eight principles for distributed portals | | BIBAK | Full-Text | 341-344 | |
| Edward A. Fox; Yinlin Chen; Monika Akbar; Clifford A. Shaffer; Stephen H. Edwards; Peter Brusilovsky; Dan Garcia; Lois Delcambre; Felicia Decker; David Archer; Richard Furuta; Frank Shipman; Stephen Carpenter; Lillian Cassel | |||
| Ensemble, the National Science Digital Library (NSDL) Pathways project for
Computing, builds upon a diverse group of prior NSDL, DL-I, and other projects.
Ensemble has shaped its activities according to principles related to design,
development, implementation, and operation of distributed portals. Here we
articulate 8 key principles for distributed portals (PDPs). While our focus is
on education and pedagogy, we expect that our experiences will generalize to
other digital library application domains. These principles inform, facilitate,
and enhance the Ensemble R&D and production activities. They allow us to
provide a broad range of services, from personalization to coordination across
communities. The eight PDPs can be briefly summarized as: (1) Articulation
across communities using ontologies. (2) Browsing tailored to collections. (3)
Integration across interfaces and virtual environments. (4) Metadata
interoperability and integration. (5) Social graph construction using logging
and metrics. (6) Superimposed information and annotation integrated across
distributed systems. (7) Streamlined user access with IDs. (8) Web 2.0 multiple
social network system interconnection. Keywords: adaptive education system, distributed portal, ontology, superimposed
information | |||
| Discovering Australia's research data | | BIBAK | Full-Text | 345-348 | |
| Stefanie Kethers; Xiaobin Shen; Andrew E. Treloar; Ross G. Wilkinson | |||
| Access to data crucial to research is often slow and difficult. When
research problems cross disciplinary boundaries, problems are exacerbated. This
paper argues that it is important to make it easier to find and access data
that might be found in an institution, in a disciplinary data store, in a
government department, or held privately. We explore how to meet ad hoc needs
that cannot easily be supported by a disciplinary ontology, and argue that web
pages that describe data collections with rich links and rich text are
valuable. We describe the approach followed by the Australian National Data
Service (ANDS) in making such pages available. Finally, we discuss how we plan
to evaluate this approach. Keywords: Australian research data commons, e-research, metadata | |||
| This is what i'm doing and why: reflections on a think-aloud study of dl users' information behaviour | | BIBAK | Full-Text | 349-352 | |
| Stephann Makri; Ann Blandford; Anna L. Cox | |||
| Many user-centred studies of digital libraries (DLs) include a think-aloud
element and are usually conducted with the purpose of identifying usability
issues related to the DLs used or understanding aspects of users' information
behaviour. However, few of these studies present detailed accounts of how their
think-aloud data was collected and analysed or reflect on this process. In this
paper, we discuss and reflect on the decisions made when planning and
conducting a think-aloud study of lawyers' interactive information behaviour.
Our discussion is framed by Blandford et al.'s PRET A Rapporter ('ready to
report') framework -- a framework that can be used to plan, conduct and
describe user-centred studies of DL use from an information work perspective. Keywords: methodology, reflection, think-aloud, user study | |||
| Customizing science instruction with educational digital libraries | | BIBAK | Full-Text | 353-356 | |
| Tamara Sumner | |||
| The Curriculum Customization Service enables science educators to customize
their instruction with interactive digital library resources. Preliminary
results from a field trial with 124 middle and high school teachers suggest
that the Service offers a promising model for embedding educational digital
libraries into teaching practices and for supporting teachers to integrate
customizing into their curriculum planning. Keywords: customizing instruction, differentiated instruction, educational digital
libraries, personalization, science education, software infrastructure for
teachers | |||
| Impact and prospect of social bookmarks for bibliographic information retrieval | | BIBAK | Full-Text | 357-360 | |
| Kazuhiro Seki; Huawei Qin; Kuniaki Uehara | |||
| This paper presents our ongoing study of the current/future impact of social
bookmarks (or social tags) on information retrieval (IR). Our main research
question asked in the present work is "How are social tags compared with
conventional, yet reliable manual indexing from the viewpoint of IR
performance?". To answer the question, we look at the biomedical literature and
begin with examining basic statistics of social tags from CiteULike in
comparison with Medical Subject Headings (MeSH) annotated in the Medline
bibliographic database. Then, using the data, we conduct various experiments in
an IR setting, which reveals that social tags work complementarily with MeSH
and that retrieval performance would improve as the coverage of CiteULike
grows. Keywords: controlled vocabulary, folksonomy, free keywords, subject headings | |||
| Merging metadata: a sociotechnical study of crosswalking and interoperability | | BIBAK | Full-Text | 361-364 | |
| Michael Khoo; Catherine Hall | |||
| Digital library interoperability relies on the use of a common metadata
format. However, implementing a common metadata format among multiple digital
libraries is not always a straightforward exercise. This paper reviews some of
the metadata issues that arose during the merger of two digital libraries, the
Internet Public Library and the Librarian's Internet Index. As part of the
merger, each library's metadata was crosswalked to Dublin Core. This required
considerable work. A sociotechnical analysis suggests that the metadata for
each library had been shaped in complex ways over time by local factors, and
that this complexity negatively impacted the efficiency of the crosswalk. Some
implications of this finding for digital library interoperability are
discussed. Keywords: Dublin core, crosswalk, interoperability, metadata, operations,
organizational knowledge, organizations, sociotechnical | |||
| Emulation based services in digital preservation | | BIBAK | Full-Text | 365-368 | |
| Klaus Rechert; Dirk von Suchodoletz; Randolph Welte | |||
| The creation of most digital objects occurs solely in interactive graphical
user interfaces which were available at the particular time period. Archiving
and preservation organizations are posed with large amounts of such objects of
various types. At some point they will need to process these automatically to
make them available to their users or convert them to a commonly used format. A
substantial problem is to provide a wide range of different users with access
to ancient environments and to allow using the original environment for a given
object. We propose an abstract architecture for emulation services in digital
preservation to provide remote user interfaces to emulation over computer
networks without the need to install additional software components.
Furthermore, we describe how these ideas can be integrated in a framework of
web services for common preservation tasks like viewing or migrating digital
objects. Keywords: digital library, digital preservation, emulation, interactive software,
long-term access | |||
| Many-to-many information connection connections in a distributed digital library portal | | BIBAK | Full-Text | 369-370 | |
| Lillian N. Cassel; Edward A. Fox; Richard Furuta; Lois M. L. Delcambre | |||
| The Ensemble computing education portal is part of the US NSF's National
Science Digital Library (NSDL). The underlying assumption in Ensemble's design
is that people will not come just because we build something new. The
information must be available from wherever potential users are. This poster
describes early efforts to provide multiple community oriented entry points to
multiple sources relevant to computing educators. Keywords: NSDL, digital library, distributed portal | |||
| SPIRO-V: a collaborative approach to controlled vocabularies gathering and management | | BIBAK | Full-Text | 371-372 | |
| Lina Huang; Rahul A. Deshmukh; Javed Mostafa; Jane Greenberg | |||
| This paper describes SPIRO-V, a collaborative controlled vocabulary
development system integrating automatic and manual approaches for
domain-specific vocabulary acquisition, and leveraging the knowledge of field
experts. Keywords: clinical study, controlled vocabulary construction | |||
| Generating citation digests for scientific publications | | BIBAK | Full-Text | 373-374 | |
| Richard Easty; Nikolay Nikolov | |||
| Science is characterized nowadays by unprecedented growth in the number of
publications. Thus it would be helpful if there were a way to summarize the
contents of the publications or explain the argumentative relationship between
them (e.g. support, further improvement, critique). Such semantic analysis
might involve analyzing the citation contexts (the paragraphs where a certain
publication is referred to by another publication). Here we present our work on
a system that creates the pre-requisites for such analysis by harvesting
publications from the web, extracting the contexts from them, and aggregating
them into citation digests that are retrieved in the context of user
interactions with web sites that mention these publications. Keywords: browser extension, citation contexts, science literature | |||
| AIRFrame: integrating diverse digital collections in astrobiology | | BIBAK | Full-Text | 375-376 | |
| Rich Gazan | |||
| Astrobiology is an inherently interdisciplinary field concerned with
questions of life in the universe. This paper describes the design and ongoing
implementation of the Astrobiology Integrative Research Framework (AIRFrame),
an open source, ontology-driven information system designed to ingest and
analyze heterogeneous inputs of both published and unpublished data, and to
identify and illustrate latent connections between research in astrobiology's
diverse constituent fields. Keywords: astrobiology, collaboration, interdisciplinary science | |||
| A public education tool for tsunami disasters based on walking tours in TDL | | BIBAK | Full-Text | 377-378 | |
| Sayaka Imai; Yoshinari Kanamori; Nobuo Shuto | |||
| As described in this paper, we proposed a public education tools for Tsunami
Disasters based on TDL. Keywords: GPS mobile phones, public education, tsunami digital library | |||
| A search engine for Japanese academic papers | | BIBAK | Full-Text | 379-380 | |
| Emi Ishita; Teru Agata; Atsushi Ikeuchi; Nozue Michiko; Miyata Yosuke; Shuichi Ueda | |||
| A search engine for Japanese academic papers rendered in PDF is described.
Evaluation results indicate fewer zero-result queries and higher precision in
the top-10 documents than was obtained for the same Japanese queries using
Google Scholar or Scirus. Keywords: PDF, academic papers, search engine | |||
| Analyzing viewing patterns while reading picture books | | BIBAK | Full-Text | 381-382 | |
| Emi Ishita; Shinji Mine; Chihiro Kunimoto; Junko Shiozaki; Keiko Kurata; Shuichi Ueda | |||
| We examine the eye movements of children who can read books on their own as
they read printed picture books. Our analysis focuses on two points; 1) Is it
the pictures or the text that they most frequently gaze at?, and 2) In what
sequence do they read picture books? Our results indicate that children look at
both text and pictures, but that there are large variations in the ratio of
viewing time for each child. Both circular and linear patterns are found in the
sequence of eye movements. Keywords: eye tracking, viewing patterns | |||
| Personalizing information retrieval for people with different levels of topic knowledge | | BIBK | Full-Text | 383-384 | |
| Jingjing Liu; Nicholas J. Belkin | |||
Keywords: decision time, dwell time, personalization of IR, topic knowledge | |||
| Rethinking preservation validation with the preserved object and repository risks ontology (PORRO) | | BIBAK | Full-Text | 385-386 | |
| Andrew McHugh; Mounia Lalmas | |||
| For securing digital longevity, the processes of preservation planning and
evaluation are fundamentally implicit and share similar complexity. Means are
required for the identification, documentation and association of those
properties of data, representation and management mechanisms that in
combination lend value, facilitate interaction and influence the preservation
process. These properties may be almost limitless in terms of diversity, but
are integral to the establishment of classes of risk exposure, and the planning
and deployment of appropriate preservation strategies. We present PORRO, an
ontology based approach for documenting objects, repositories and risk
information, intended to support preservation decision making and evaluation. Keywords: digital preservation, ontologies, validation | |||
| ForeCite: towards a reader-centric scholarly digital library | | BIBAK | Full-Text | 387-388 | |
| Thuy Dung Nguyen; Min-Yen Kan; Dinh-Trung Dang; Markus Hänse; Ching Hoi Andy Hong; Minh-Thang Luong; Jesse Prabawa Gozali; Kazunari Sugiyama; Yee Fan Tang | |||
| We present ForeCite (FC), a prototype reader-centric digital library that
supports the scholar in using scholarly documents. FC integrates three user
interfaces: a bibliometric component, a document reader and annotation system,
and a bibliographic management application. Keywords: ForeCite, argumentative zoning, document logical structure, scholarly
digital library | |||
| An architecture for a distributed digital library from the desktop up: the fascinator | | BIBAK | Full-Text | 389-390 | |
| Peter Sefton; Duncan Dickinson | |||
| This poster describes the architecture of a new kind of digital repository
service that includes components that run on desktop computers, designed to
close the gap between Institutional Repositories (IRs) and the day-to-day
electronic work environment used by researchers, and to address the too-often
heard cry from repository managers of "we built it but they didn't come.
The team at the Australian Digital Futures Institute are working with researchers to provide software that can (a) index and expose the research data content on their hard disks (b) extract metadata from files (c) automatically process data according to highly configurable workflows including producing web-ready renditions of research objects including documents, domain specific data visualizations (such as chemical molecules) and converting video and images so that they may be easily previewed. The architecture is inspired by the success of consumer software in two ways; the way entertainment programs organize content via faceted browse and search interfaces using embedded metadata, and the way photographic software allows content to be grouped into collections and pushed to online services, which are essentially repositories. Keywords: information systems, repositories, research, search | |||
| A digital library architecture supporting massive small files and efficient replica maintenance | | BIBAK | Full-Text | 391-392 | |
| Chunhui Shen; Weiming Lu; Jiangqin Wu; Baogang Wei | |||
| In this paper, we presented a service infrastructure based on distributed
file system for massive storage in digital library. In addition, we addressed
the small-file problem by merging small files into big ones, and proposed a
novel dynamic replica number adjustment scheme to ensure the maximal
availability and reliability in a limited storage space. Keywords: digital libraries, distributed system, replication, small file | |||
| Text clustering with important words using normalization | | BIBAK | Full-Text | 393-394 | |
| Shunyao Wu; Jinlong Wang; Huy Quan Vu; Gang Li | |||
| Important words, which usually exist in part of Title, Subject and Keywords,
can briefly reflect the main topic of a document. In recent years, it is a
common practice to exploit the semantic topic of documents and utilize
important words to achieve document clustering, especially for short texts such
as news articles. This paper proposes a novel method to extract important words
from Subject and Keywords of articles, and then partition documents only with
those important words. Considering the fact that frequencies of important words
are usually low and the scale matrix dataset for important words is small, a
normalization method is then proposed to normalize the scale dataset so that
more accurate results can be achieved by sufficiently exploiting the limited
information. The experiments validate the effectiveness of our method. Keywords: document clustering, important words, normalization | |||
| Liquid journals: scientific journals in the Web 2.0 era | | BIBAK | Full-Text | 395-396 | |
| Marcos Baez; Alejandro Mussi; Fabio Casati; Aliaksandr Birukou; Maurizio Marchese | |||
| In this demo we introduce a platform and a model of journal in the age of
the Web called liquid journal. The goal of the model (and of the supporting
platform) is to disseminate knowledge in the best possible way while also
supporting scientists in the credit attribution. In a nutshell, liquid journals
are collections of "interesting" links to scientific contributions, such as
papers, blogs, datasets, that are related to certain topics. The content gets
to the journal either by querying both conventional and non conventional
sources on the Web or manually by the group of editors. Liquid journals
combines depth and breath in bringing a wider spectrum of scientific
contributions from different communities, while also focusing editors' and
readers' attention on the things they care about. The demo illustrates the
features and benefits of the proposed platform. Keywords: Web, academic journals, enhanced search | |||
| Multiple sources with multiple portals: a demonstration of the ensemble computing portal in second life | | BIBAK | Full-Text | 397-398 | |
| B. Stephen, II Carpenter; Richard Furuta; Frank Shipman; Allison Huie; Daniel Pogue; Edward A. Fox; Spencer Lee; Peter Brusilovsky; Lillian Cassel; Lois Delcambre | |||
| This demonstration is an overview of our Ensemble pathway project with group
members on-location at the conference and in the virtual world of Second Life
from remote locations providing a live walk-through tour of our project online.
This approach allows the demonstration to extend beyond the allocated
conference session as a means to attract people to JCDL/ICADL. Keywords: computing portal, ensemble, second life, virtual worlds | |||
| Capturing and curating published data | | BIBAK | Full-Text | 399-400 | |
| Tim DiLauro; Mark Cyzyk; Elliot Metsger; Mark Patton | |||
| Verifiability and reproducibility are core tenets of the scholarly
communication process. For many scientific publications, however, it is often
the case that supporting datasets are not preserved, even when the article text
is. And when they are, it is usually as a collection of files without
relationships amongst one another or to the articles with which they are
associated. There are some existing approaches that attempt to link datasets
with articles after the fact (e.g., NED), but they are relatively few and
involve substantial human intervention.
The Digital Research and Curation Center in the Johns Hopkins University Sheridan Libraries, in conjunction with its partners has developed a proof-of-concept system that demonstrates an approach to capturing datasets during the process of submitting the associated article. As part of this process, linkages are established between the datasets and the article. Keywords: OAI-ORE, data curation, data publication, data services, scholarly
communication | |||
| OntoFrame S3: academic research information portal service using semantic web technologies and linguistic knowledge | | BIBAK | Full-Text | 401-402 | |
| Seungwoo Lee; Mikyoung Lee; Pyung Kim; Hanmin Jung; Won-Kyung Sung | |||
| In this paper, we show how Semantic Web technologies can be used for
information connection and fusion in academic research information service and
empowered by linguistic knowledge. Keywords: academic research information service, ontology, reasoning, semantic word
network | |||
| Entertainment history museums in virtual worlds: video game and music preservation in second life | | BIBAK | Full-Text | 403-404 | |
| Spencer Lee; Bradley Willis; Joseph S., Jr. Bourne; Edward A. Fox | |||
| This research explores and demonstrates the use of Second Life (the popular
3D virtual world) for the purpose of digitally preserving various aspects of
video game and music history. Physical game interfaces like joysticks,
advertisements used for games, and famous game characters and cultural icons
over the history are displayed and preserved in multiple video game exhibits
for different eras. Selected game characters are digitally recreated in 3D
format as Second Life avatar appearances. Historical changes of musical
instruments, musicians, and genres are displayed and preserved likewise.
Selected musical instruments are digitally recreated as 3D models playing their
real sounds. Some of them will be available for the visitors to play in basic
ways. Keywords: 3D, digital preservation, entertainment, game, history, music, second life,
virtual worlds | |||
| Integrating Greenstone with an interactive map visualizer | | BIBAK | Full-Text | 405-406 | |
| Sam McIntosh; David Bainbridge | |||
| This extended abstract describes recent work in combining interactive map
functionality with the Greenstone 3 digital library software research
framework. Keywords: digital library integration, interactive map visualizer | |||
| Subject metadata support powered by Maui | | BIBK | Full-Text | 407-408 | |
| Olena Medelyan; Vye Perrone; Ian H. Witten | |||
Keywords: keyword extraction, metadata extraction, subject heading extraction, web
interface | |||
| Recommender system for MIR research community | | BIBAK | Full-Text | 409-410 | |
| Yi Yu; Vincent Oria; J. Stephen Downie | |||
| In this demonstration, we show a recommender system for the Music
Information Retrieval (MIR) research community. We extract the key topics and
tags by analyzing the ten-year cumulative ISMIR proceedings, and recommend
papers and research colleagues to users in an interactive way. Keywords: ISMIR, music-IR, recommender systems, social networks | |||