| Science teachers' use of online resources and the digital library for Earth system education | | BIBAK | Full-Text | 1-10 | |
| Lecia J. Barker | |||
| A three-part study of teachers' use of online resources and of the Digital
Library for Earth System Education (DLESE) was conducted from 2004 through
summer 2006. The first two phases were qualitative and informed a survey
administered to 622 science teachers across the U.S., one-fifth of whom had
used DLESE. The findings present a profile of teachers and their access to
Internet-connected computers and other hardware/electronic media devices in
their classrooms; and teachers' preferences for resource formats (e.g.,
customizability) and educational web site features (e.g., tagged reading
level). Analysis of variance showed that teachers with more than one working
computer and teachers with more other devices valued the Internet more highly
for teaching than did their less equipped peers. DLESE users valued the
Internet more highly for their teaching, had more years teaching experience,
and valued customizable resources more than their non-DLESE using peers. Most
believed that resources catalogued in DLESE were scientifically accurate.
Teachers used DLESE most often for finding hands-on activities, still images
and other visual aids, and hand-outs; they were least likely to seek people,
games, or assessment tools. The findings provide guidance for developers of K12
educational resources. Keywords: K12, educational digital libraries, empirical, evaluation, mixed-method,
teaching | |||
| Dimensional standard alignment in K-12 digital libraries: assessment of self-found vs. recommended curriculum | | BIBAK | Full-Text | 11-14 | |
| Byron Marshall; René Reitsma; Malinda Zarske | |||
| Enhancing the experience of digital library users depends, in part, on
recognizing and understanding user tasks. In the context of K-12 educational
libraries this means that we must understand how K-12 teachers interact with
such libraries and how they assess the relevance of documents found or
encountered. This paper presents the results of an experiment in which K-12
teachers scored the relevance of curriculum they found themselves and the
relevance of documents their colleagues found and recommended. We found that
teachers apply a significantly more detailed notion of relevance, both
qualitatively and quantitatively, when searching for as compared to evaluating
recommended curricula. Differences were observed in both relevance judgments
and system interaction logs. These variations may be useful in identifying user
intent and in dynamically adapting the behavior of digital libraries of
educational material. Keywords: context-specific measurement, curriculum-standard alignment, digital
library, inter-rater reliability, relevance | |||
| Helping students with information fragmentation, assimilation and notetaking | | BIBAK | Full-Text | 15-18 | |
| Yolanda Jacobs Reimer; Melissa Bubnash; Matthew Hagedal; Peter Wolf | |||
| The problem of information fragmentation is especially acute for today's
college students who manage and assimilate information in various forms while
completing many of their academic tasks, and who must do so within the confines
of standard software applications. The goal of this research is to provide
students with a novel information assimilation and notetaking tool that helps
them more efficiently manage their electronic information and overcome some of
the fragmentation challenges they routinely experience. Our Global Information
Gatherer prototype allows students to view, edit and store files of different
types from within a single interface, and provides an integrated web browser
and notetaking functionality. Keywords: PIM, information assimilation, information fragmentation, notetaking,
students in higher education, user interface design | |||
| Topic model methods for automatically identifying out-of-scope resources | | BIBAK | Full-Text | 19-28 | |
| Steven Bethard; Soumya Ghosh; James H. Martin; Tamara Sumner | |||
| Recent years have seen the rise of subject-themed digital libraries, such as
the NSDL pathways and the Digital Library for Earth System Education (DLESE).
These libraries often need to manually verify that contributed resources cover
topics that fit within the theme of the library. We show that such scope
judgments can be automated using a combination of text classification
techniques and topic modeling. Our models address two significant challenges in
making scope judgments: only a small number of out-of-scope resources are
typically available, and the topic distinctions required for digital libraries
are much more subtle than classic text classification problems. To meet these
challenges, our models combine support vector machine learners optimized to
different performance metrics and semantic topics induced by unsupervised
statistical topic models. Our best model is able to distinguish resources that
belong in DLESE from resources that don't with an accuracy of around 70%. We
see these models as the first steps towards increasing the scalability of
digital libraries and dramatically reducing the workload required to maintain
them. Keywords: digital libraries, machine learning, relevance, scope, topics | |||
| Automatically generating high quality metadata by analyzing the document code of common file types | | BIBAK | Full-Text | 29-38 | |
| Lars Fredrik Høimyr Edvardsen; Ingeborg Torvik Sølvberg; Trond Aalberg; Hallvard Trætteberg | |||
| A major challenge for content management in intranets and other large scale
document storage and retrieval services is the generation of high quality
metadata. Manual generation of metadata is resource demanding and is often
viewed by collection managers and document authors as inefficient use of their
time, and there is a desire for other ways to create the needed metadata.
Automatic Metadata Generation (AMG) is methods for generating metadata without
manual interaction using computer program(s) to interpret the document and
possibly the document context. Current AMG research has been limited to
collection of similarly formatted documents. The research presented in this
paper expands the field of AMG by presenting an approach that is independent of
a common visualization scheme; AMG based on document code analysis. This is
done by showing AMG possibilities from Latex, Word and PowerPoint documents and
how this approach can significantly increase the quality of the generated
metadata. This by avoiding common quality reducing factors as missing
completeness, low accuracy, logical consistency and coherence and timeliness by
giving AMG algorithms direct access to the user specified intellectual content
and the file formatting. This research shows how this AMG approach can be
combined with other AMG approaches, drawing on their strengths in order to
achieve the desired high quality metadata entities. Keywords: PDF, automatic metadata generation, document code, extraction, harvesting,
latex, metadata quality, openXML, powerpoint, word | |||
| Disambiguating authors in academic publications using random forests | | BIBAK | Full-Text | 39-48 | |
| Pucktada Treeratpituk; C. Lee Giles | |||
| Users of digital libraries usually want to know the exact author or authors
of an article. But different authors may share the same names, either as full
names or as initials and last names (complete name change examples are not
considered here). In such a case, the user would like the digital library to
differentiate among these authors. Name disambiguation can help in many cases;
one being a user in a search of all articles written by a particular author.
Disambiguation also enables better bibliometric analysis by allowing a more
accurate counting and grouping of publications and citations. In this paper, we
describe an algorithm for pair-wise disambiguation of author names based on a
machine learning classification algorithm, random forests. We define a set of
similarity profile features to assist in author disambiguation. Our experiments
on the Medline database show that the random forest model outperforms other
previously proposed techniques such as those using support-vector machines
(SVM). In addition, we demonstrate that the variable importance produced by the
random forest model can be used in feature selection with little degradation in
the disambiguation accuracy. In particular, the inverse document frequency of
author last name and the middle name's similarity alone achieves an accuracy of
almost 90%. Keywords: author disambiguation, medline, random forests | |||
| Using web information for author name disambiguation | | BIBAK | Full-Text | 49-58 | |
| Denilson Alves Pereira; Berthier Ribeiro-Neto; Nivio Ziviani; Alberto H. F. Laender; Marcos André Gonçalves; Anderson A. Ferreira | |||
| In digital libraries, ambiguous author names may occur due to the existence
of multiple authors with the same name (polysemes) or different name variations
for the same author (synonyms). We proposed here a new method that uses
information available on the Web to deal with both problems at the same time.
Our idea consists of gathering information from input citations and submitting
queries to a Web search engine, aiming at finding curricula vitae and Web pages
containing publications of the ambiguous authors. From the content of documents
in the answer sets returned by the Web search engine, useful information that
can help in the disambiguation process is extracted. Using this information,
author names are disambiguated by leveraging a hierarchical clustering method
that groups citations in the same document together in a bottom-up fashion.
Experimental results show that the our method yields results that outperform
those of two state-of-the-art unsupervised methods and are statistically
comparable with those of a supervised one, but requiring no training. We
observe gains of up to 65.2% in the pairwise F1 metric when compared with our
best unsupervised baseline method. Keywords: author name disambiguation, bibliographic citation, search engine | |||
| Whetting the appetite of scientists: producing summaries tailored to the citation context | | BIBAK | Full-Text | 59-68 | |
| Stephen Wan; Cécile Paris; Robert Dale | |||
| The amount of scientific material available electronically is forever
increasing. This makes reading the published literature, whether to stay
up-to-date on a topic or to get up to speed on a new topic, a difficult task.
Yet, this is an activity in which all researchers must be engaged on a regular
basis. Based on a user requirements analysis, we developed a new research tool,
called the Citation-Sensitive In-Browser Summariser (CSIBS), which supports
researchers in this browsing task. CSIBS enables readers to obtain information
about a citation at the point at which they encounter it. This information is
aimed at enabling the reader to determine whether or not to invest the time in
exploring the cited article further, thus alleviating information overload.
CSIBS builds a summary of the cited document, bringing together meta-data about
the document and a citation-sensitive preview that exploits the citation
context to retrieve the sentences from the cited document that are relevant at
this point. This paper briefly presents our user requirements analysis, then
describes the system and, finally, discusses the observations from an initial
pilot study. We found that CSIBS facilitates the relevancy judgment task, by
increasing the users' self-reported confidence in making such judgements. Keywords: biomedical researchers, information browsing, information needs, scientific
literature, summarization, user modeling and interactive ir | |||
| Finding topic trends in digital libraries | | BIBAK | Full-Text | 69-72 | |
| Levent Bolelli; Seyda Ertekin; Ding Zhou; C. Lee Giles | |||
| We propose a generative model based on latent Dirichlet allocation for
mining distinct topics in document collections by integrating the temporal
ordering of documents into the generative process. The document collection is
divided into time segments where the discovered topics in each segment is
propagated to influence the topic discovery in the subsequent time segments. We
conduct experiments on the collection of academic papers from CiteSeer
repository. We augment the text corpus with the addition of user queries and
tags and integrate the citation graph to boost the weight of the topical terms.
The experiment results show that segmented topic model can effectively detect
distinct topics and their evolution over time. Keywords: latent dirichlet allocation, topic detection, trend analysis | |||
| CEBBIP: a parser of bibliographic information in chinese electronic books | | BIBAK | Full-Text | 73-76 | |
| Liangcai Gao; Zhi Tang; Xiaofan Lin | |||
| Bibliographic information is essential for many digital library
applications, such as citation analysis, academic searching and topic
discovery. And bibliographic data extraction has attracted a great deal of
attention in recent years. In this paper, we address the problem of automatic
extraction of bibliographic data in Chinese electronic book and propose a tool
called CEBBIP* for the task, which includes three main systems: data
preprocessing, data parsing and data postprocessing. In the data preprocessing
system, the tool adopts a rules-based method to locate citation data in a book
and to segment citation data into citation strings of individual referencing
literature. And a learning-based approach, Conditional Random Fields (CRF), is
employed to parse citation strings in the data parsing system. Finally, the
tool takes advantage of document intrinsic local format consistency to enhance
citation data segmentation and parsing through clustering techniques. CEBBIP
has been used in a commercial E-book production system. Experimental results
show that CEBBIP's precision rate is very high. More specially, adopting the
document intrinsic local format consistency obviously improves the citation
data segmenting and parsing accuracy. Keywords: bibliography, chinese electronic book, digital library, machine learning,
metadata extraction | |||
| Query parameters for harvesting digital video and associated contextual information | | BIBAK | Full-Text | 77-86 | |
| Gary Marchionini; Chirag Shah; Christopher A. Lee; Robert Capra | |||
| Video is increasingly important to digital libraries and archives as both
primary content and as context for the primary objects in collections. Services
like YouTube not only offer large numbers of videos but also usage data such as
comments and ratings that may help curators today make selections and aid
future generations to interpret those selections. A query-based harvesting
strategy is presented and results from daily harvests for six topics defined by
145 queries over a 20-month period are discussed with respect to, query
specification parameters, topic, and contribution patterns. The limitations of
the strategy and these data are considered and suggestions are offered for
curators who wish to use query-based harvesting. Keywords: digital curation, harvesting, video mining | |||
| ViGOR: a grouping oriented interface for search and retrieval in video libraries | | BIBAK | Full-Text | 87-96 | |
| Martin Halvey; David Vallet; David Hannah; Joemon M. Jose | |||
| In this paper, we present ViGOR (Video Grouping, Organisation and Retrieval)
a video retrieval system that allows users to group videos in order to
facilitate video retrieval tasks. In this way users are able to visualise and
conceptualise many aspects of their search tasks and carry out a localised
search in order to solve a more global search problem. The main objective of
this work is to aid users while carrying out explorative video retrieval tasks;
these tasks can be often ambiguous and multi-faceted. Two user evaluations were
carried out in order to evaluate the usefulness of this grouping paradigm for
assisting users. The first evaluation involved users carrying out broad tasks
on YouTube, and gave insights into the application of our interface to a vast
online video collection. The second evaluation involved users carrying out
focused tasks on the TRECVID 2007 video collection, allowing a comparison over
a local collection, on which we could extract a number of content-based
features. The results of our evaluations show that the use of the ViGOR system
results in an increase in user performance and user satisfaction, showing the
potential of a grouping paradigm for video search for various tasks in a
variety of diverse video collections. Keywords: search, user studies, video, visualisation | |||
| Developing a flexible content model for media repositories: a case study | | BIBAK | Full-Text | 97-100 | |
| Christopher A. Beer; Peter D. Pinch; Karen Cariani | |||
| This article describes the process and challenges of developing a content
model that can support the content and metadata present in a complex media
archive. Media archives have some of the most diverse requirements in an effort
to catalog, preserve, and make accessible a wide range of content with
multifaceted relationships between works. We focus particularly on the design
and implementation of the WGBH Media Library and Archives' Fedora digital
access repository for scholars, educational users and the public. It is our
hope that the process and findings from this work can support the architecture
and development of other media archives. Keywords: content model, digital libraries, fedora, media | |||
| An alignment based system for chord sequence retrieval | | BIBAK | Full-Text | 101-104 | |
| Pierre Hanna; Matthias Robine; Thomas Rocher | |||
| Music retrieval systems for Western tonal music digital libraries have to
consider rhythmic, timbral, melodic and harmonic information. Most existing
retrieval systems only take into account melodies. Melody comparison may induce
errors since two musical pieces can be very similar whereas their melodies may
differ in a significant way. In this paper, we propose to investigate and
experiment a retrieval system based on the comparison of chord progressions.
The definition of chords may be ambiguous but their properties can be precisely
described and represented. We detail the adaptations of alignment algorithms,
successfully applied for the estimation of symbolic melodic similarity, for
chord progression retrieval. Several experiments, performed on symbolic
databases, show that the system described is robust to variations and
outperforms a recent chord retrieval system. Keywords: music information retrieval | |||
| Query-page intention matching using clicked titles and snippets to boost search rankings | | BIBAK | Full-Text | 105-114 | |
| Masaya Murata; Hiroyuki Toda; Yumiko Matsuura; Ryoji Kataoka | |||
| Users of text retrieval systems input only a few keywords or sometimes just
one keyword to the systems even if they had complex information needs. Due to
the lack of query keywords, it becomes hard to return relevant search results
that satisfy the demands of each user. Because digital documents, in contrast
to queries, are generally composed of many kinds of keywords, it is also
difficult to estimate the main topic or grasp the inherent intentions of the
documents. In this paper, we present techniques to represent users' search
intentions and the intentions that digital documents can satisfy by making use
of clicked titles and snippets acquired from a click log analysis. We then
present a method to match these intentions to boost search result rankings.
Through experiments that use click logs and indexes of a commercial search
engine, we verified our method's capability of significantly improving search
precision. Keywords: click logs analysis, implicit relevance feedback, representation of
intention, search result rankings | |||
| Supporting analysis of future-related information in news archives and the web | | BIBAK | Full-Text | 115-124 | |
| Adam Jatowt; Kensuke Kanazawa; Satoshi Oyama; Katsumi Tanaka | |||
| A lot of future-related information is available in news articles or Web
pages. This information can however differ to large extent and may fluctuate
over time. It is therefore difficult for users to manually compare and
aggregate it, and to re-construct the most probable course of future events. In
this paper we approach a problem of automatically generating summaries of
future events related to queries using data obtained from news archive
collections or from the Web. We propose two methods, explicit and implicit
future-related information detection. The former is based on analyzing the
context of future temporal expressions in documents, while the latter relies on
detecting periodical patterns in historical document collections. We present a
graph-based visualization of future-related information and demonstrate its
usefulness through several examples. Keywords: event prediction, future-related information retrieval, temporal information
analysis | |||
| Generalized formal models for faceted user interfaces | | BIBAK | Full-Text | 125-134 | |
| Edward C. Clarkson; Shamkant B. Navathe; James D. Foley | |||
| Faceted metadata and navigation have become major topics in library science,
information retrieval and Human-Computer Interaction (HCI). This work surveys a
range of extant approaches in this design space, classifying systems along
several relevant dimensions. We use that survey to analyze the organization of
data and its querying within faceted browsing systems. We contribute formal
entity-relationship (ER) and relational data models that explain that
organization and relational query models that explain systems' browsing
functionality. We use these types of models since they are widely used to
conceptualize data and to model back-end data stores. Their structured nature
also suggests ways in which both the models and faceted systems might be
extended. Keywords: ER model, design survey, entity-relationship model, faceted metadata,
faceted navigation, relational model, tuple relational calculus | |||
| Large-scale ETD repositories: a case study of a digital library application | | BIBAK | Full-Text | 135-144 | |
| Adam Mikeal; James Creel; Alexey Maslov; Scott Phillips; John Leggett; Mark McFarland | |||
| We describe the implementation of a statewide system for managing and
preserving electronic theses and dissertations (ETDs) from Texas universities.
We further explain the theoretical, technical and political issues that arose
during the implementation of this system. These issues range from technical
components developed by TDL 'such as a customized workflow management
application and adding OAI-ORE capabilities to DSpace' to human-centered issues
such as stakeholder engagement and participation. Our experiences reflect the
challenges, expected and unexpected, that others will face when attempting to
build digital library applications to scale. Keywords: digital library infrastructure, electronic document workflow, electronic
theses and dissertations, scalable systems | |||
| Style-consistency calligraphy synthesis system in digital library | | BIBAK | Full-Text | 145-152 | |
| Kai Yu; Jiangqin Wu; Yueting Zhuang | |||
| There are lots of digitized calligraphy works written by ancient famous
calligraphists in CADAL (China-America Digital Academic Library) digital
library. To make use of these resources, users want to generate a tablet or a
piece of calligraphic works written by some ancient famous calligraphist. But
some characters in the tablet or the calligraphic work hadn't been written by
the calligraphist or though were ever written but are hard to read because of
long time weathering. In this paper, a novel approach is proposed to synthesize
Chinese calligraphic characters which are in the same style of some
calligraphist, and a corresponding system is developed for calligraphy works
generation and tablets design.
Calligraphic character is represented by a three-level hierarchical model. A novel approach for determining the character structure is proposed, which takes advantage of both the structure of the same characters of different styles and the structure of similar characters of the same style. A style evaluation model (SEM) is presented to evaluate whether the calligraphic character generated is in the same style of the specified calligraphist and to adjust the calligraphic character generated. Our experiments show that this system is effective. Keywords: structure determination, style evaluation model (SEM), style-consistency
calligraphy synthesis | |||
| Generative model-based metasearch for data fusion in information retrieval | | BIBAK | Full-Text | 153-162 | |
| Miles Efron | |||
| "Data fusion" refers to the problem in information retrieval (IR) where
several lists of documents ranked against a query are to be merged into a
single ranked list for presentation to a user. Data fusion is also known as
"metasearch." In a digital library setting data fusion may support operations
such as federated search based on multiple repository representations. This
paper presents a novel approach to the fusion problem: generative model-based
Metasearch (GeM). We suggest viewing the appearance of documents in a return
set as the outcome of a probabilistic process; some documents are likely to
occur in the model, while others are unlikely. Using Bayesian parameter
estimation to fit a multinomial distribution based on the return sets to be
merged, GeM achieves a final ranking by listing documents in decreasing
probability of generation under the induced model. We also introduce what we
call "the impatient reader" approach to normalizing document ranks in service
to the fusion operation. We report results from several experiments on TREC
data suggesting that GeM, informed with impatient reader document scores,
operates at state-of-the-art levels of effectiveness. Keywords: data fusion, digital libraries, generative models, information retrieval,
metasearch, probabilistic models | |||
| EnTag: enhancing social tagging for discovery | | BIBAK | Full-Text | 163-172 | |
| Koraljka Golub; Jim Moon; Douglas Tudhope; Catherine Jones; Brian Matthews; BartBomiej PuzoD; Marianne Lykke Nielsen | |||
| The EnTag (Enhanced Tagging for Discovery) project investigated the effect
on indexing and retrieval when using only social tagging versus when using
social tagging in combination with suggestions from a controlled vocabulary.
Two different contexts were explored: tagging by readers of a digital
collection and tagging by authors in an institutional repository; also two
different controlled vocabularies were examined, Dewey Decimal Classification
and ACM Computing Classification Scheme. For each context a separate
demonstrator was developed and a user study conducted. The results showed the
importance of controlled vocabulary suggestions for both indexing and
retrieval: to help produce ideas of tags to use, to make it easier to find
focus for the tagging, as well as to ensure consistency and increase the number
of access points in retrieval. The value and usefulness of the suggestions
proved to be dependent on the quality of the suggestions, both in terms of
conceptual relevance to the user and in appropriateness of the terminology. The
participants themselves could also see the advantages of controlled vocabulary
terms for retrieval if the terms used were from an authoritative source. Keywords: ACM computing classification scheme, controlled vocabularies, dewey decimal
classification, digital collection, folksonomies, institutional repository,
intute, social tagging, subject indexing | |||
| Review-oriented metadata enrichment: a case study | | BIBAK | Full-Text | 173-182 | |
| Liang Zhang; Jiangqin Wu; Yueting Zhuang; Yin Zhang; Chenxing Yang | |||
| Book reviews contributed by readers in social sites contain valuable
information on books' content, style and merit, many informative words in which
can be used to enrich metadata of books in China-Us Million Book Digital
Library. In this paper, we present a system for review-oriented metadata
enrichment and propose an Book-Centric Diverse Random Walk algorithm on a
four-partite graph containing three kinds of relations among authors, books,
reviews and words, in order to produce highly relevant as well as diverse
keywords for a book. Experimental results of a user study show that our
approach significantly outperforms other methods in terms of relevance and
diversity. The metadata generated by our approach also has a large overlap with
popular social tags and brief introductions from DouBan for books in the
coverage experiments. Keywords: book review, digital libraries, diversity, graph-based scoring, keyword
extraction, metadata, metadata enrichment | |||
| Using timed-release cryptography to mitigate the preservation risk of embargo periods | | BIBAK | Full-Text | 183-192 | |
| Rabia Haq; Michael L. Nelson | |||
| Due to temporary access restrictions, embargoed data cannot be refreshed to
unlimited parties during the embargo time interval. A solution to mitigate the
risk of data loss has been developed that uses a data dissemination framework,
the Timed-Locked Embargo Framework (TLEF), that allows data refreshing of
encrypted instances of embargoed content in an open, unrestricted scholarly
community. TLEF exploits implementations of existing technologies to
"time-lock" data using timed-release cryptology so that TLEF can be deployed as
digital resources encoded in a complex object format suitable for metadata
harvesting. The framework successfully demonstrates dynamic record
identification, time-lock puzzle encryption, encapsulation and dissemination as
XML documents. We implement TLEF and provide a quantitative analysis of its
successful data harvest of time-locked embargoed data with minimum time
overhead without compromising data security and integrity. Keywords: cryptography, repositories, time lock, timed release | |||
| Learning to assess the quality of scientific conferences: a case study in computer science | | BIBAK | Full-Text | 193-202 | |
| Waister Silva Martins; Marcos André Gonçalves; Alberto H. F. Laender; Gisele L. Pappa | |||
| Assessing the quality of scientific conferences is an important and useful
service that can be provided by digital libraries and similar systems. This is
specially true for fields such as Computer Science and Electric Engineering,
where conference publications are crucial. However, the majority of the
existing approaches for assessing the quality of publication venues has been
proposed for journals. In this paper, we characterize a large number of
features that can be used as criteria to assess the quality of scientific
conferences and study how these several features can be automatically combined
by means of machine learning techniques to effectively perform this task.
Within the features studied are citations, submission and acceptance rates,
tradition of the conference, and reputation of the program committee members.
Among our several findings, we can cite that: (1) separating high quality
conferences from medium and low quality ones can be performed quite
effectively, but separating the last two types is a much harder task; and (2)
citation features followed by those associated with the tradition of the
conference are the most important ones for the task. Keywords: classification, conference assessment, digital library, machine learning | |||
| CARES: a ranking-oriented CADAL recommender system | | BIBAK | Full-Text | 203-212 | |
| Chenxing Yang; Baogang Wei; Jiangqin Wu; Yin Zhang; Liang Zhang | |||
| A recommender system is useful for a digital library to suggest the books
that are likely preferred by a user. Most recommender systems using
collaborative filtering approaches leverage the explicit user ratings to make
personalized recommendations. However, many users are reluctant to provide
explicit ratings, so ratings-oriented recommender systems do not work well. In
this paper, we present a recommender system for CADAL digital library, namely
CARES, which makes recommendations using a ranking-oriented collaborative
filtering approach based on users' access logs, avoiding the problem of the
lack of user ratings. Our approach employs mean AP correlation coefficients for
computing similarities among users' implicit preference models and a random
walk based algorithm for generating a book ranking personalized for the
individual. Experimental results on real access logs from the CADAL web site
show the effectiveness of our system and the impact of different values of
parameters on the recommendation performance. Keywords: collaborative filtering, digital library, recommendation system | |||
| Recommendation as link prediction: a graph kernel-based machine learning approach | | BIBAK | Full-Text | 213-216 | |
| Xin Li; Hsinchun Chen | |||
| Recommender systems have demonstrated commercial success in multiple
industries. In digital libraries they have the potential to be used as a
support tool for traditional information retrieval functions. Among the major
recommendation algorithms, the successful collaborative filtering (CF) methods
explore the use of user-item interactions to infer user interests. Based on the
finding that transitive user-item associations can alleviate the data sparsity
problem in CF, multiple heuristic algorithms were designed to take advantage of
the user-item interaction networks with both direct and indirect interactions.
However, the use of such graph representation was still limited in
learning-based algorithms. In this paper, we propose a graph kernel-based
recommendation framework. For each user-item pair, we inspect its associative
interaction graph (AIG) that contains the users, items, and interactions n
steps away from the pair. We design a novel graph kernel to capture the AIG
structures and use them to predict possible user-item interactions. The
framework demonstrates improved performance on an online bookstore dataset,
especially when a large number of suggestions are needed. Keywords: collaborative filtering, kernel methods, recommender system | |||
| A polyrepresentational approach to interactive query expansion | | BIBAK | Full-Text | 217-220 | |
| Abdigani Diriye; Ann Blandford; Anastasios Tombros | |||
| Interactive Query Expansion (IQE) presents suggested terms to the user
during their search to enable better Information Retrieval (IR). However, IQE
terms are poorly used, and tend to lack information meaningful to the user. The
lack of cognitive and functional support during query refinement is a well
documented problem, and despite the work carried out, it is still an under
researched area. This stagnation in progress has been partly due to the long
held belief that users are able to make good IQE term selections, and that the
de facto way IQE terms are presented is effective. In this paper, we introduce
a novel method to improve the presentation of IQE terms by providing
supplementary information alongside them. We describe a user study that
compared our novel polyrepresentational approach to IQE against a conventional
IQE system and a baseline system. Our findings have shown that a
polyrepresentational approach to IQE can address the ambiguity and uncertainty
surrounding IQE, and improve the perceived usefulness of the terms. Keywords: query formulation, interactive query expansion | |||
| Automatically characterizing resource quality for educational digital libraries | | BIBAK | Full-Text | 221-230 | |
| Steven Bethard; Philipp Wetzer; Kirsten Butcher; James H. Martin; Tamara Sumner | |||
| With the rise of community-generated web content, the need for automatic
characterization of resource quality has grown, particularly in the realm of
educational digital libraries. We demonstrate how identifying concrete factors
of quality for web-based educational resources can make machine learning
approaches to automating quality characterization tractable. Using data from
several previous studies of quality, we gathered a set of key dimensions and
indicators of quality that were commonly identified by educators. We then
performed a mixed-method study of digital library curation experts, showing
that our characterization of quality captured the subjective processes used by
the experts when assessing resource quality for classroom use. Using key
indicators of quality selected from a statistical analysis of our expert study
data, we developed a set of annotation guidelines and annotated a corpus of
1000 digital resources for the presence or absence of these key quality
indicators. Agreement among annotators was high, and initial machine learning
models trained from this corpus were able to identify some indicators of
quality with as much as an 18% improvement over the baseline. Keywords: educational digital library, learning resource, machine learning, quality | |||
| Improving optical character recognition through efficient multiple system alignment | | BIBAK | Full-Text | 231-240 | |
| William B. Lund; Eric K. Ringger | |||
| Individual optical character recognition (OCR) engines vary in the types of
errors they commit in recognizing text, particularly poor quality text. By
aligning the output of multiple OCR engines and taking advantage of the
differences between them, the error rate based on the aligned lattice of
recognized words is significantly lower than the individual OCR word error
rates. This lattice error rate constitutes a lower bound among aligned
alternatives from the OCR output. Results from a collection of poor quality
mid-twentieth century typewritten documents demonstrate an average reduction of
55.0% in the error rate of the lattice of alternatives and a realized word
error rate (WER) reduction of 35.8% in a dictionary-based selection process. As
an important precursor, an innovative admissible heuristic for the A* algorithm
is developed, which results in a significant reduction in state space
exploration to identify all optimal alignments of the OCR text output, a
necessary step toward the construction of the word hypothesis lattice. On
average 0.0079% of the state space is explored to identify all optimal
alignments of the documents. Keywords: A* algorithm, OCR error rate reduction, text alignment | |||
| No bull, no spin: a comparison of tags with other forms of user metadata | | BIBAK | Full-Text | 241-250 | |
| Catherine C. Marshall | |||
| User-contributed tags have shown promise as a means of indexing multimedia
collections by harnessing the combined efforts and enthusiasm of online
communities. But tags are only one way of describing multimedia items. In this
study, I compare the characteristics of public tags with other forms of
descriptive metadata'titles and narrative captions'that users have assigned to
a collection of very similar images gathered from the photo-sharing service
Flickr. The study shows that tags converge on different descriptions than the
other forms of metadata do, and that narrative metadata may be more effective
than tags for capturing certain aspects of images that may influence their
subsequent retrieval and use. The study also examines how photographers use
peoples' names to personalize the different types of metadata and how they tell
stories across short sequences of images. The study results are then brought to
bear on design recommendations for user tagging tools and automated tagging
algorithms and on using photo sharing sites as de facto art and architecture
resources. Keywords: collaborative information management, image collection, metadata, study,
tags | |||
| What happens when facebook is gone? | | BIBAK | Full-Text | 251-254 | |
| Frank McCown; Michael L. Nelson | |||
| Web users are spending more of their time and creative energies within
online social networking systems. While many of these networks allow users to
export their personal data or expose themselves to third-party web archiving,
some do not. Facebook, one of the most popular social networking websites, is
one example of a "walled garden" where users' activities are trapped. We
examine a variety of techniques for extracting users' activities from Facebook
(and by extension, other social networking systems) for the personal archive
and for the third-party archiver. Our framework could be applied to any walled
garden where personal user data is being locked. Keywords: digital preservation, personal archiving, social networks | |||
| Improving historical research by linking digital library information to a global genealogical database | | BIBAK | Full-Text | 255-258 | |
| Douglas J. Kennard; William B. Lund; Bryan S. Morse | |||
| Journals, letters, and other writings are of great value to historians and
those who research their own family history; however, it can be difficult to
find writings by specific people, and even harder to find what others wrote
about them. We present a prototype web-based system that enables users to
discover information about historical people (including their own ancestors) by
linking digital library content to unique PersonIDs from a genealogical
database. Users can contribute content such as scanned journals or information
about where items can be found. They can also transcribe content and tag it
with PersonIDs to identify who it is about. Additional features provide tools
for users to explore historical contexts and relationships. These include the
ability to tag places and to create a historical social network by specifying
non-family relationships or by using a mechanism we call rosters to imply
participation in some group or event. Keywords: authority control, diaries, family history, genealogy, historical social
networks, journals, tagging | |||
| Collecting fragmentary authors in a digital library | | BIBAK | Full-Text | 259-262 | |
| Monica Berti; Matteo Romanello; Alison Babeu; Gregory Crane | |||
| This paper discusses new work to represent, in a digital library of
classical sources, authors whose works themselves are lost and who survive only
where surviving authors quote, paraphrase or allude to them. It describes
initial works from a digital collection of such fragmentary authors designed
not only to capture but to extend the ontologies that traditional scholarship
has developed over generations: the aim is representing every nuance of print
conventions while using the capabilities of digital libraries to extend our
ability to identify fragments, to represent what we have identified, and to
render the results of that work intellectually and physically more accessible
than was possible in print culture. Keywords: digital libraries, fragmentary authors, greek fragmentary historians, tei p5
guidelines, xml | |||
| Robust registration of manuscript images | | BIBAK | Full-Text | 263-266 | |
| Ryan Baumann; W. Brent Seales | |||
| In this paper we present an application of image registration techniques to
the specific domain of manuscript images. We show the application of this
technique to images of the Venetus A, a 10th century manuscript of Homer's
Iliad. The same algorithm is used to register images of the MS across time
(including photographs separated by over a century), as well as across imaging
modalities. Keywords: image registration, image warping, manuscript restoration, multispectral
imaging | |||
| Cost and benefit analysis of mediated enterprise search | | BIBAK | Full-Text | 267-276 | |
| Mingfang Wu; James A. Thom; Andrew Turpin; Ross Wilkinson | |||
| The utility of an enterprise search system is determined by three key
players: the information retrieval (IR) system (the search engine), the
enterprise users, and the service provider who delivers the tailored IR service
to its designated enterprise users. Currently, evaluations of enterprise search
have been focused largely on the IR system effectiveness and efficiency, only a
relatively small amount of effort on the user's involvement, and hardly any
effort on the service provider's role. This paper will investigate the role of
the service provider. We propose a method that evaluates the cost and benefit
for a service provider of using a mediated search engine -- in particular,
where domain experts intervene on the ranking of the search results from a
search engine. We test our cost and benefit evaluation method in a case study
and conduct user experiments to demonstrate it. Our study shows that: 1) by
making use of domain experts' relevance assessments in search result ranking,
the precision and the discount cumulated gain of ranked lists have been
improved significantly (144% and 40% respectively); 2) the service provider
gains substantial return on investment and higher search success rate by
investing in domain experts' relevance assessments; and 3) the cost and benefit
evaluation also indicates the type of queries to be selected from a query log
for evaluating an enterprise search engine. Keywords: cost and benefit analysis, enterprise search, evaluation, information
retrieval, mediated search, relevance feedback | |||
| Document relevance assessment via term distribution analysis using fourier series expansion | | BIBAK | Full-Text | 277-284 | |
| Patricio Galeas; Ralph Kretschmer; Bernd Freisleben | |||
| In addition to the frequency of terms in a document collection, the
distribution of terms plays an important role in determining the relevance of
documents for a given search query. In this paper, term distribution analysis
using Fourier series expansion as a novel approach for calculating an abstract
representation of term positions in a document corpus is introduced. Based on
this approach, two methods for improving the evaluation of document relevance
are proposed: (a) a function-based ranking optimization representing a user
defined document region, and (b) a query expansion technique based on
overlapping the term distributions in the top-ranked documents. Experimental
results demonstrate the effectiveness of the proposed approach in providing new
possibilities for optimizing the retrieval process. Keywords: fourier series, query expansion, ranked retrieval, term distribution | |||
| How do you feel about "dancing queen"?: deriving mood & theme annotations from user tags | | BIBAK | Full-Text | 285-294 | |
| Kerstin Bischoff; Claudiu S. Firan; Wolfgang Nejdl; Raluca Paiu | |||
| Web 2.0 enables information sharing, collaboration among users and most
notably supports active participation and creativity of the users. As a result,
a huge amount of manually created metadata describing all kinds of resources is
now available. Such semantically rich user generated annotations are especially
valuable for digital libraries covering multimedia resources such as music,
where these metadata enable retrieval relying not only on content-based (low
level) features, but also on the textual descriptions represented by tags.
However, if we analyze the annotations users generate for music tracks, we find
them heavily biased towards genre. Previous work investigating the types of
user provided annotations for music tracks showed that the types of tags which
would be really beneficial for supporting retrieval -- usage (theme) and
opinion (mood) tags -- are often neglected by users in the annotation process.
In this paper we address exactly this problem: in order to support users in
tagging and to fill these gaps in the tag space, we develop algorithms for
recommending mood and theme annotations. Our methods exploit the available user
annotations, the lyrics of music tracks, as well as combinations of both. We
also compare the results for our recommended mood / theme annotations against
genre and style recommendations -- a much easier and already studied task.
Besides evaluating against an expert (AllMusic.com) ground truth, we evaluate
the quality of our recommended tags through a Facebook-based user study. Our
results are very promising both in comparison to experts as well as users and
provide interesting insights into possible extensions for music tagging systems
to support music search. Keywords: collaborative tagging, high-level music descriptors, metadata enrichment,
mood and theme tag recommendation | |||
| Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia | | BIBAK | Full-Text | 295-304 | |
| Daniel Hasan Dalip; Marcos André Gonçalves; Marco Cristo; Pável Calado | |||
| The old dream of a universal repository containing all the human knowledge
and culture is becoming possible through the Internet and the Web. Moreover,
this is happening with the direct collaborative, participation of people.
Wikipedia is a great example. It is an enormous repository of information with
free access and edition, created by the community in a collaborative manner.
However, this large amount of information, made available democratically and
virtually without any control, raises questions about its relative quality. In
this work we explore a significant number of quality indicators, some of them
proposed by us and used here for the first time, and study their capability to
assess the quality of Wikipedia articles. Furthermore, we explore machine
learning techniques to combine these quality indicators into one single
assessment judgment. Through experiments, we show that the most important
quality indicators are the easiest ones to extract, namely, textual features
related to length, structure and style. We were also able to determine which
indicators did not contribute significantly to the quality assessment. These
were, coincidentally, the most complex features, such as those based on link
analysis. Finally, we compare our combination method with state-of-the-art
solution and show significant improvements in terms of effective quality
prediction. Keywords: SVM, machine learning, quality assessment, wikipedia | |||
| Designing the reading experience for scanned multi-lingual picture books on mobile phones | | BIBAK | Full-Text | 305-308 | |
| Benjamin B. Bederson; Alex Quinn; Allison Druin | |||
| This paper reports on an adaption of the existing PopoutText and ClearText
display techniques to mobile phones. It explains the design rationale for a
freely available iPhone application to read books from the International
Children's Digital Library. Through a combination of applied image processing,
a zoomable user interface, and a process of working with children to develop
the detailed design, we present an interface that supports clear reading of
scanned picture books in multiple languages on a mobile phone. Keywords: books, children, digital libraries, iPhone, interaction design, interface
design, mobile phones, readability | |||
| Mobility, digital libraries and a rural indian village | | BIBAK | Full-Text | 309-312 | |
| Matt Jones; Emma Thom; David Bainbridge; David Frohlich | |||
| Millions of people in developed countries routinely create and share digital
content; but what about the billions of others in on the wrong side of what has
been called the 'global digital divide'? This paper considers three mobile
platforms to illustrate their potential in enabling rural Indian villagers to
make and share digital stories. We describe our experiences in creating
prototypes using mobile phones; high-end media-players; and, paper. Interaction
designs are discussed along with findings from various trials within the
village and elsewhere. Our approach has been to develop prototypes that can
work together in an integrated fashion so that content can flow freely and in
interesting ways through the village. While our work has particular relevance
to those users in emerging world contexts, we see it also informing needs and
practices in the developed world for user-generated content. Keywords: digital libraries, digital-divide, information ecologies, mobility,
user-generated content | |||
| What do exploratory searchers look at in a faceted search interface? | | BIBAK | Full-Text | 313-322 | |
| Bill Kules; Robert Capra; Matthew Banta; Tito Sierra | |||
| This study examined how searchers interacted with a web-based, faceted
library catalog when conducting exploratory searches. It applied eye tracking,
stimulated recall interviews, and direct observation to investigate important
aspects of gaze behavior in a faceted search interface: what components of the
interface searchers looked at, for how long, and in what order. It yielded
empirical data that will be useful for both practitioners (e.g., for improving
search interface designs), and researchers (e.g., to inform models of search
behavior). Results of the study show that participants spent about 50 seconds
per task looking at (fixating on) the results, about 25 seconds looking at the
facets, and only about 6 seconds looking at the query itself. These findings
suggest that facets played an important role in the exploratory search process. Keywords: OPAC, exploratory search, eye tracking, faceted search, online public access
catalogs, task design, user studies | |||
| Aligning METS with the OAI-ORE data model | | BIBAK | Full-Text | 323-330 | |
| Jerome P. McDonough | |||
| The Open Archives Initiative -- Object Reuse and Exchange (OAI-ORE)
specifications provide a flexible set of mechanisms for transferring complex
data objects between different systems. In order to serve as an exchange
syntax, OAI-ORE must be able to support the import of information from
localized data structures serving various communities of practice. In this
paper, we examine the Metadata Encoding & Transmission Standard (METS) and the
issues that arise when trying to map from a localized structural metadata
schema into the OAI-ORE data model and serialization syntaxes. Keywords: METS, OAI-ORE, aggregation, modeling, structural metadata | |||
| EverLast: a distributed architecture for preserving the web | | BIBAK | Full-Text | 331-340 | |
| Avishek Anand; Srikanta Bedathur; Klaus Berberich; Ralf Schenkel; Christos Tryfonopoulos | |||
| The World Wide Web has become a key source of knowledge pertaining to almost
every walk of life. Unfortunately, much of data on the Web is highly ephemeral
in nature, with more than 50-80% of content estimated to be changing within a
short time. Continuing the pioneering efforts of many national (digital)
libraries, organizations such as the International Internet Preservation
Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have
been tirelessly working towards preserving the ever changing Web.
However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results. Keywords: crawling, indexing, time-travel search, web archives | |||
| A framework for describing web repositories | | BIBAK | Full-Text | 341-344 | |
| Frank McCown; Michael L. Nelson | |||
| In prior work we have demonstrated that search engine caches and archiving
projects like the Internet Archive's Wayback Machine can be used to "lazily
preserve" website and reconstruct them when they are lost. We use the term "web
repositories" for collections of automatically refreshed and migrated content,
and collectively we refer to these repositories as the "web infrastructure". In
this paper we present a framework for describing web repositories and the
status of web resources in them. This includes an abstract API for web
repository interaction, the concepts of deep vs. flat and light/dark/grey
repositories and terminology of describing the recoverability of a web
resource. Our API may serve as a foundation for future web repository
interfaces. Keywords: preservation, web repositories, web resources | |||
| Preserving digital data in heterogeneous environments | | BIBAK | Full-Text | 345-348 | |
| Gonçalo Antunes; José Barateiro; Manuel Cabral; José Borbinha; Rodrigo Rodrigues | |||
| Digital preservation aims at maintaining digital objects accessible over a
long period of time, regardless of the challenges of organizational or
technological changes or failures. In particular, data produced in e-Science
domains could be reliably stored in today's data grids, taking advantage of the
natural properties of this kind of infrastructure to support redundancy.
However, to achieve reliability we must take into account failure
interdependency. Taking into account the fact that correlated failures can
affect multiple components and potentially cause complete loss of data, we
propose a solution to evaluate redundancy strategies in the context of
heterogeneous environments such as data grids. This solution is based on a
simulation engine that can be used not only to support the process of designing
the preservation environment and related policies, but also later on to observe
and control the deployed system. Keywords: data grids, dependability, digital libraries, digital preservation | |||
| Unsupervised creation of small world networks for the preservation of digital objects | | BIBAK | Full-Text | 349-352 | |
| Charles L. Cartledge; Michael L. Nelson | |||
| The prevailing model for digital preservation is that archives should be
similar to a "fortress": a large, protective infrastructure built to defend a
relatively small collection of data from attack by external forces. Such
projects are a luxury, suitable only for limited collections of known
importance and requiring significant institutional commitment for
sustainability. In previous research, we have shown the web infrastructure
(i.e., search engine caches, web archives) refreshes and migrates web content
in bulk as side-effects of their user-services, and these results can be mined
as a useful, but passive preservation service. Our current research involves a
number of questions resulting from removing the implicit assumption that
web-based data objects must passively await curatorial services: What if data
objects were not tethered to repositories? What are the implications if the
content were actively seeking out and injecting itself into the web
infrastructure (i.e., search engine caches, web archives)? All of this leads to
our primary research question: Can we create objects that preserve themselves
more effectively than repositories or web infrastructure can? Keywords: digital preservation, small world | |||
| Towards a virtual organization for data cyberinfrastructure | | BIBAK | Full-Text | 353-356 | |
| Christine L. Borgman; Geoffrey C. Bowker; Thomas A. Finholt; Jillian C. Wallis | |||
| We report on the exploratory stages of multi-university,
multi-research-site, multi-year effort to investigate and compare data
practices in multiple cyberinfrastructure projects and their emerging virtual
organizations. Our long-term goal is to understand the data practices and data
management requirements of virtual organizations and their implications for the
design and development of data digital libraries. We have constructed our own
virtual organization as a participant-observer approach to the research.
Results to date suggest that collaborative technologies are emergent and that
defining and scoping the data products of collaborations continues to be
problematic. Keywords: collaborative work, cyberinfrastructure, scientific data, sensor networks | |||
| Expanding the search for digital preservation solutions: adopting PREMIS in cultural heritage institutions | | BIBAK | Full-Text | 357-358 | |
| Daniel Gelaw Alemneh | |||
| This paper will present some preliminary result on factors that affect the
adoption of PREMIS (Preservation Metadata Implementation Strategies) in
cultural heritage institutions. The study employed a web-based survey to
collect data from 123 participants in 20 countries as well as a
semi-structured, follow-up telephone interview with a smaller sample of the
survey respondents. Roger's diffusion of innovation theory was used as a
theoretical framework. The main constructs considered for the study were
relative advantage, compatibility, complexity, trialability, observability, and
institution readiness. The study yielded both qualitative and quantitative
data, and preliminary analysis showed that all six factors influence the
adoption of PREMIS in varying degrees. Keywords: diffusion of innovation, digital preservation, metadata, premis | |||
| Collaborative digital library: enhancing digital collections to improve learning in educational programs | | BIBAK | Full-Text | 359-360 | |
| Ali Sajedi Badashian; Asghar Dehghani Firouzabadi; Iman Khalkhali; Hamidreza Afzali; Morteza Ashurzad Delcheh; Mohammad Shoja Shafiei; Mahdi Alipour | |||
| In this article, a universal collaborative and competitive approach is
introduced for deployment of digital collections in an ideal Digital Library
(DL) for future's educational system. The collaborative and open-source aspects
of the system guarantee its growth and the competitive aspects guarantee the
accuracy. Keywords: collection development, curriculum development, digital libraries,
educational resources, exploring, information visualization, integration,
knowledge sharing | |||
| Digitizing the flea market: eBay as a data source for historic collections | | BIBAK | Full-Text | 361-362 | |
| Snowden Becker | |||
| The online auction site eBay has overtaken face-to-face transactions as the
primary means of doing business for collectors and sellers of unique and
ephemeral materials. Historical societies, museums, and archives also
increasingly collect ephemera as records of social and cultural history. This
presentation argues that the digitized flea market, as epitomized by eBay,
replaces in-person sales while also providing a stream of rich information
about a previously invisible, unquantifiable marketplace. Furthermore,
identifying factors that influence collectibles buyers' behavior in online
auction sales can also shed light on factors affecting user behaviors in
digital libraries. Data from a survey of over 1,000 recent home movie auction
listings on eBay suggest how eBay may be used as a data source by collectors,
as well as the users and designers of digital libraries. Keywords: collectibles, eBay, ephemera, home movies, online auctions | |||
| Semantic alerting for digital libraries | | BIBAK | Full-Text | 363-364 | |
| George Buchanan; Annika Hinze | |||
| We previously investigated the support of alerting services across networks
of heterogeneous digital libraries. We now report the first generation of
semantically enhanced digital library alerting systems. Where previous alerting
services have provided users with notifications of new library content using
traditional metadata, we demonstrate the advantages and challenges of using
semantic technologies. This uncovers key issues that are not yet fully
understood in general event-based systems (including alerting systems). Keywords: FRBR, aggregate documents, alerting, digital libraries, semantics | |||
| Addressing researchers' needs through the data curation profile | | BIBAK | Full-Text | 365-366 | |
| Jake Carlson; Deborah Leiter | |||
| This poster describes a study currently in progress that seeks to identify
and address the needs of researchers from multiple disciplines in managing,
curating and preserving their data. One output of this study, which is still in
its early stages, will be the "data curation profile," a methodological tool
designed to enable the comparison of needs across disciplines and help
librarians build digital libraries that accurately reflect and address the
needs of data producers. Keywords: data curation, data sharing, repositories | |||
| Implementation and evaluation of palm leaf manuscript metadata schema (PLMM) | | BIBAK | Full-Text | 367-368 | |
| Nisachol Chamnongsri; Lampang Manmart; Vilas Wuwongse | |||
| The evaluation of Palm Leaf Manuscripts Metadata Schema (PLMM) aims to
examine whether the PLMM satisfactorily meets the user requirements in
searching for the PLMs and managing the PLMs collection. (1) An examination of
the PLMM's capability in describing the particular characteristics of
Northeastern Thai Palm Leaf Manuscripts, and its usefulness in the palm leave
manuscripts preservation and rights control management (2) an investigation of
users' satisfaction when using PLMM to search for the PLMs and managing the
PLMs collection. The evaluation process began with the development of the
prototype of PLMs management system to implement the PLMM. Then, more than 200
metadata records describing all types of sample PLMs (with variations in sizes,
scripts, languages, titles, and number of content subjects contained in a
fascicle) were provided in Extensible Markup Language (XML) format, while
system interfaces and queries were developed with Hypertext Preprocessor (PHP).
This was followed by the trials with end users and staff in their workplace in
order to evaluate the usefulness of PLMM in user tasks according to the FRBR
tasks: find, identify, select, and obtain; and collection development tasks.
The research found that 'somewhat high' efficiency of the PLMM was perceived
among the participants in the two tasks. The finding also suggests that
perceived efficiency of the PLMM was significantly higher with more years of
users' experience with the PLMs. The status of users is another factor which
positively affected the perceived efficiency of the PLMM. Keywords: cultural heritage, metadata schema, palm leaf manuscript | |||
| A personalized learning environment | | BIBAK | Full-Text | 369-370 | |
| Sebastian de la Chica; Faisal Ahmad; Qianyi Gu; Ifi Okoye; Keith Maull; Tamara Sumner; Kirsten R. Butcher | |||
| We report on the current research activities and results obtained through
the Concept Learning service for Concept Knowledge (CLICK) and present a
demonstration of the system. This poster session will focus on a demonstration
of the CLICK system and the results of the learning study. Keywords: competency models, digital library resources, knowledge models,
personalization, student misconceptions | |||
| Analysis of transaction logs for insights into use of life oral histories | | BIBAK | Full-Text | 371-372 | |
| Michael G. Christel; Bryan S. Maher; Huan Li | |||
| A digital video library of over 900 hours of video and 18000 stories from
The HistoryMakers was used by 214 students, faculty, librarians, and life-long
learners interacting with a system providing multiple search and viewing
capabilities over a trial period of several months. User demographics and
actions were logged, providing metrics on how the system was used. This poster
overviews a few highlights from these transaction logs of the Informedia
digital video library system for life oral histories. Keywords: digital video library, oral histories, video retrieval | |||
| Summarizing user-generated reviews in digital libraries: a visual clustering approach | | BIBAK | Full-Text | 373-374 | |
| Wingyan Chung | |||
| In this paper, we describe a visual clustering approach to summarizing
user-generated reviews of digital library items and services. The approach
consists of the steps of sentence extraction, aspect identification, opinion
classification, and review summarization. Our work augments existing work by
considering non-standard input and by incorporating clustering and
visualization in summarization. Keywords: aspect analysis, clustering, sentiment analysis, text classification, text
summarization, user-generated review, visualization | |||
| An interoperability service framework for high-resolution image applications | | BIBAK | Full-Text | 375-376 | |
| Ryan Chute; Stephan Dresher; Luda Balakireva; Herbert Van de Sompel | |||
| This poster presents a prototype architecture and potential use-cases for a
standards-based service framework to simplify development of high-resolution
image viewing clients. Keywords: JPEG 2000, JSON, OAI-ORE, architecture, digital imaging, digital libraries,
interoperability, openurl, standards | |||
| Tailoring greenstone for seniors | | BIBAK | Full-Text | 377-378 | |
| Sally Jo Cunningham; Erin K. Bennett | |||
| We present a re-design of Greenstone to support seniors (aged over 65) in
managing documents reflecting their life history. Keywords: home archiving, personal history, senior users | |||
| A mixed digital / physical snapshot of early internet / web usage in New Zealand | | BIBAK | Full-Text | 379-380 | |
| Sally Jo Cunningham; Jillene Bydder | |||
| We are in the early stages of developing a unique physical and digital
record of New Zealand's early experience of the Internet. Keywords: digital museum, history of the web, internet archive | |||
| Mashing up life science literature resources | | BIBAK | Full-Text | 381-382 | |
| Richard Easty; Nikolay Nikolov | |||
| In the life sciences one of the pronounced problems is the deluge of new
results and data that are produced on a daily basis. This data can take many
different forms, e.g. microarray probes, gene sequences, protein structures and
is added by hundreds of research centers world-wide in a largely uncoordinated
fashion. Thus integration of life science data is growing in importance.
Unfortunately, most research centers do not have particular incentive to spend
efforts on integrating their data with data produced by others. This task is
largely left to large publicly-sponsored institutions like the US National
Library of Medicine and similar institutions in other countries. Unfortunately,
despite their work in this area, the integration of web-based life science
resources is still an open issue (and one ever growing in importance) as these
organizations cannot cope with the information deluge that is happening on a
daily basis in the life sciences. Thus it becomes essential that as many as
possible third parties are engaged in the process. Here we demonstrate a simple
prototype of a browser plugin that creates a platform for third parties to
contribute to cross-linking related online life science data resources and thus
improving the search experience and the productivity of the life science
community. The plugin creates a convenient programming interface that minimizes
the effort that arises for such third-party contributors. We have provided
reference implementations using the plugin that cross-link life science
literature resources and illustrate the potential for third parties to create
mashups that could be applied also in areas other than the life sciences. Keywords: browser plugin, data integration, life science literature | |||
| Representing publication and distribution practices for scholarly materials: a cross-disciplinary comparison | | BIBAK | Full-Text | 383-384 | |
| Phillip M. Edwards | |||
| This poster presents a pluralistic approach for representing
discipline-specific, cross-disciplinary, and discipline-independent work
practices related to scholarly communication. This approach has been applied to
qualitative analysis from an investigation of publication and distribution
practices of scholars within the biological sciences and the field of
communication. The resulting representations illustrate shared work practices
and areas where diverse practices exist, both of which can guide the
development of digital collections of scholarly materials. This poster also
considers challenges related to aligning data collection methods with the
application of these representational techniques. Keywords: scholarly communication, scholarly publication, work practices | |||
| Inferring intra-organizational collaboration from cosine similarity distributions in text documents | | BIBAK | Full-Text | 385-386 | |
| Maria Esteva; Hai Bi | |||
| We present a method that uses text mining methods and statistical
distributions to infer degrees of collaboration between staff members in an
organization, based on the similarity of the documents that that they wrote and
exchanged over time. Keywords: digital archives, statistical distributions, text mining | |||
| Personal name-matching through name transformation | | BIBAK | Full-Text | 387-388 | |
| Jun Gong; Lidan Wang; Douglas W. Oard | |||
| A graph theory based method is proposed to exploit name transformation for
personal name-matching. Experiment results on three personal name datasets show
that the method is effective. Keywords: name transformation, personal name-matching, string distance | |||
| EMU: the emory user behavior data management system for automatic library search evaluation | | BIBAK | Full-Text | 389-390 | |
| Qi Guo; Ryan P. Kelly; Selden Deemer; Arthur Murphy; Joan A. Smith; Eugene Agichtein | |||
| We describe EMU, a system for collecting, managing, and mining the behavior
data collected in the Emory libraries search system. We describe the data
capture system based on the LibX browser plugin, the database management system
for successfully storing, searching and exploring millions of resulting user
interactions, and preliminary results of interesting queries and statistics
that we are using to evaluate the effectiveness of library search tools. Keywords: data exploration, library search evaluation, user behavior modeling | |||
| Building a thailand researcher network based on a bibliographic database | | BIBAK | Full-Text | 391-392 | |
| Choochart Haruechaiyasak; Alisa Kongthon; Santipong Thaiprayoon | |||
| Among many practical and domain-specific tasks, expertise retrieval (ER) has
recently gained increasing attention in the information retrieval and knowledge
management communities. This paper describes our ongoing project to design and
implement an expert retrieval system with the scope on researchers who work in
Thailand. In our current system prototype, we assume that the areas of
expertise among researchers can be extracted from bibliographic databases. We
use the Science Citation Index (SCI) database to provide the information for
representing the expert profiles. From the SCI database, we queried and
retrieved publications covering from the year 2001 to 2008 by specifying the
affiliation equal to "Thailand". The results contain a set of approximately
23,000 publications. We downloaded and extracted four related fields including
authors (denoted by AU), controlled terms (denoted by ID), keywords (denoted by
DE) and subject category (denoted by SC). To build a researcher network, we
consider two types of relationships: direct and indirect. The direct (or
social) relationship is defined as the co-authoring degree between one
researcher to others. The co-authoring degree between two researchers,
co-authoring(A,B), can be calculated based on the co-occurrence frequency
between A and B found in the field AU of 23,000 retrieved records. The indirect
(or topical relationship is defined when two researchers have publications
under the same topics. The topical degree between two researchers,
topical(A,B), can be calculated based on the similarity measure between two
sets of extracted keywords, keyword(A) and keyword(B), representing researcher
A and B, respectively. The keyword set can be extracted from the fields ID, DE
and SC. An author with high frequencies on particular keywords is considered an
expert in the corresponding research topics. Keywords: R&D management, expertise retrieval, social network | |||
| Building a MARC-to-OLAC crosswalk: repurposing library catalog data for the language resources community | | BIBAK | Full-Text | 393-394 | |
| Christopher Hirt; Gary Simons; Joan Spanne | |||
| The Open Language Archives Community (OLAC) is an international partnership
of institutions which are building a network of interoperating repositories and
services to create a worldwide virtual library of language resources (that is,
resources that document, describe, or develop the more than 7,000 known
languages of the world). OLAC uses a community-specific refinement of qualified
Dublin Core [http://www.language-archives.org/OLAC/metadata.htm] along with a
community-specific refinement of the OAI Protocol for Metadata Harvesting
[http://www.language-archives.org/OLAC/repositories.htm] to maintain an
aggregated catalog of the holdings of the 35 participating archives. OLAC
recognizes that the language resources of interest to the community come not
only from sources within the community but also from many sources outside the
community. This poster describes one approach we have developed for addressing
this issue, namely, a crosswalk that transforms the MARC21 catalog for a
library or archive into an OAI static repository that holds an OLAC metadata
record for each MARC record identified as describing a language resource. Keywords: ISO 639, language identification | |||
| Locating text in scanned books | | BIBAK | Full-Text | 395-396 | |
| Chang Hu; Anne Rose; Benjamin B. Bederson | |||
| In this paper, we describe a work flow to extract and verify text locations
using commercial software, along with free software products and human
proofing. To help mid-sized digital libraries, we are making our solution
available as open source software. Keywords: adobe acrobat, book readers, digital libraries, readability, word location | |||
| Remote usability testing: a practice | | BIBAK | Full-Text | 397-398 | |
| Sheng-Cheng Huang; Randolph G. Bias; Tanya L. Payne; Jay B. Rogers | |||
| For increasingly frequent use of library resources by remote users, remote
usability testing has become a valuable tool for those who would pursue an
empirical, user-centered design of the interfaces to their electronic resources
and services. This paper describes our implementation of remote usability tests
to evaluate prototypes of a web content management application developed by
Vignette Corporation, and reports sample results to illustrate the utility of
such an approach that can help designing and improving interfaces of digital
library projects and their usability. Keywords: collaborative design, remote testing, usability testing | |||
| Scientific digital libraries, interoperability, and ontologies | | BIBAK | Full-Text | 399-400 | |
| J. Steven Hughes; Daniel J. Crichton; Chris A. Mattmann | |||
| Scientific digital libraries serve complex and evolving research
communities. Justifications for the development of scientific digital libraries
include the desire to preserve science data and the promises of information
interconnectedness, correlative science, and system interoperability. Research
[1] suggests single shared ontologies are fundamental to fulfilling these
promises. We present a tool framework, a set of principles, and a real world
case study where shared ontologies are used to develop and manage science
information models and subsequently guide the implementation of scientific
digital libraries. The tool framework, based on an ontology modeling tool as
illustrated in Figure 1, was configured to develop, manage, and keep shared
ontologies relevant within changing domains and to promote the
interoperability, interconnectedness, and correlation desired by scientists. Keywords: digital library, information model, interoperability, ontology, science
data, science metadata | |||
| The landscape of information science: 1996-2008 | | BIBAK | Full-Text | 401-402 | |
| Fidelia Ibekwe-SanJuan; Eric SanJuan | |||
| We propose a methodology combining symbolic and numeric information to map
the structure of research in Information Science between 1996-2008. The
visualization of the resulting maps showed that while the two-camp structure of
Information Science observed in previous studies is still valid, other research
poles like web and user-oriented studies are building bridges between the two
hitherto isolated poles. Keywords: clustering, information visualization, knowledge domain mapping, text mining | |||
| Forging the future: new tools for variable media art preservation | | BIBK | Full-Text | 403-404 | |
| Jon Ippolito; Richard Rinehart; Marilyn Lutz; Sharon Fitzgerald | |||
Keywords: metadata, new media, preservation strategies, variable media art | |||
| Analyzing OPAC use with screen views and eye tracking | | BIBAK | Full-Text | 405-406 | |
| Emi Ishita; Shinji Mine; Masanori Koizumi; Yosuke Miyata; Chihiro Kunimoto; Junko Shiozaki; Keiko Kurata; Shuichi Ueda | |||
| Eye tracking was used to analyze which elements of which screens were viewed
by users searching an Online Public Access Catalog (OPAC). Eye tracking data
was obtained for 32 participants performing a known-item search task. The
results show that more than 30% of participants did not make effective use of
screens offering additional details, and that participants who did, and found
the correct answer, gazed at specific screen elements more frequently than
participants who gave incorrect answers. Keywords: OPAC use, eye tracking, viewing patterns | |||
| A user-friendly metadata quality control tool for the internet public library | | BIBAK | Full-Text | 407-408 | |
| Michael Khoo; Xia Lin; Jung-ran Park | |||
| The Internet Public Library (IPL) is crosswalking its metadata to Dublin
Core. The quality of the crosswalked metadata will be unknown. The IPL is
therefore developing a tool for metadata quality control suitable for use by
LIS students who have little previous metadata quality control experience. Keywords: HCI, LIS instruction, dublin core, evaluation, internet public library,
metadata, metadata quality control, user-centered design | |||
| Using an institutional repository for personal digital collections of retired faculty members | | BIBAK | Full-Text | 409-410 | |
| Sarah Kim | |||
| In this poster, I address practical issues related to using IRs for personal
digital collections of retired faculty members. Keywords: archival collection, archiving, institutional repository | |||
| Exploitation of the wikipedia category system for enhancing the value of LCSH | | BIBAK | Full-Text | 411-412 | |
| Yoji Kiyota; Hiroshi Nakagawa; Satoshi Sakai; Tatsuya Mori; Hidetaka Masuda | |||
| This paper addresses an approach that integrates two different types of
information resources: the Web and libraries. Our method begins from any
keywords in Wikipedia, and induces related subject headings of LCSH through the
Wikipedia category system. Keywords: LCSH, subject headings, wikipedia categories | |||
| Inter-search engine lexical signature performance | | BIBAK | Full-Text | 413-414 | |
| Martin Klein; Michael L. Nelson | |||
| We generate lexical signatures (LSs) from web pages and acquire the
mandatory document frequency values from three dierent search engine (SE)
indexes. We cross-query the LSs against the two SEs they were not generated
from and compare the retrieval performance by parsing the result set and
analyzing the rank of the source URL. Keywords: lexical signature, performance, search engine | |||
| Correlation of music charts and search engine rankings | | BIBAK | Full-Text | 415-416 | |
| Martin Klein; Olena Hunsicker; Michael L. Nelson | |||
| We investigate the question whether expert rankings of real-world entities
correlate with search engine (SE) rankings of corresponding web resources. We
compare Billboards "Hot 100 Airplay" music charts with SE rankings of
associated web resources. Out of nine comparisons we found two strong, two
moderate, two weak and one negative correlation. The remaining two comparisons
were inconclusive. Keywords: correlation, real-world objects, search engine | |||
| Toward automatic generation of image-text document surrogates to optimize cognition | | BIBAK | Full-Text | 417-418 | |
| Eunyee Koh; Andruid Kerne; Jon Moeller | |||
| The representation of information collections needs to be optimized for
human cognition. Growing information collections play a crucial role in human
experiences. While documents often include rich visual components, collections,
including personal collections and those generated by search engines, are
typically represented lists of text-only surrogates. By concurrently invoking
complementary components of human cognition, combined image-text surrogates
help people to more effectively see, understand, think about, and remember
information collection. This research develops algorithmic methods that use the
structural context of images in HTML documents to associate meaningful text and
thus derive combined image-text surrogates. Keywords: information extraction, search representation, surrogates | |||
| Designing exploratory search tasks for user studies of information seeking support systems | | BIBAK | Full-Text | 419-420 | |
| Bill Kules; Robert Capra | |||
| This poster describes a procedure for designing exploratory tasks for use in
laboratory evaluations of information seeking interfaces. This procedure is
grounded in the literature on information seeking and information retrieval and
has been refined by an evaluation of four tasks designed for a study of a
faceted library catalog. The procedure is intended to be extensible to generate
exploratory tasks for other types of interfaces and domains. Keywords: n/a | |||
| Developing a review rubric for learning resources in digital libraries | | BIBK | Full-Text | 421-422 | |
| Heather Leary; Sarah Giersch; Andrew Walker; Mimi Recker | |||
Keywords: education digital library, instructional architect, national science digital
library, review rubric | |||
| From harvesting to cultivating: transformation of a web collecting system into a robust curation environment | | BIBAK | Full-Text | 423-424 | |
| Christopher A. Lee; Richard Marciano; Chien-yi Hou; Chirag Shah | |||
| Much has been written about the lifecycle of digital objects. This study is
instead concerned with the lifecycle of collections and associated services.
Online collection environments are built to fulfill specific collecting
objectives and constraints. If a collection proves useful within its original
hosting environment, it will often be necessary or desirable to move the
collection to new environments, in order to support new forms of use and
re-aggregation or extract resources from legacy data environments. Such a
transformation can be extremely expensive, challenging and prone to error,
especially if the collections include complex internal structures and services.
When "services make the repository" [1], moving raw data from one location to
another will often not be sufficient. Digital curators can preempt costly and
problematic system migration efforts by integrating collections into
environments specifically designed to support long-term preservation,
scalability and interoperability [2]. We report on an integration of content
and functionality of a feature-rich collecting environment (ContextMiner) into
a robust data curation environment (iRODS).
ContextMiner is a web-based service for building collections, through the execution and management of "campaigns" (i.e. sets of associated queries and parameters to harvest content over time). As a part of the VidArch project, we have been using the ContextMiner framework and services for harvesting YouTube videos and associated contextual information on a variety of topics. In July 2008, we released a public beta of ContextMiner, allowing anyone to run similar crawls. There are now more than 100 users. The current implementation -- based on a single MySQL database and associated code -- has served its intended purposes very well, but it is not a scalable or sustainable basis for offering wide-scale collecting services in support of the diverse array of potential users and use cases. iRODS (integrated Rule-Oriented Data System), is adaptive policy-driven data grid middleware, which addresses aspects of growth, evolution, openness, and closure -- fundamental requirements for digital preservation [3]. iRODS currently scales to hundreds of millions of files, tens of thousands of users, and petabytes of data. It operates in a highly distributed environment with heterogeneous storage resources and allows for growth through federation. It supports evolution through the virtualization of the underlying technology and supports changing business requirements through customization of repository behaviors. It supports openness through a data type agnostic treatment of content. iRODS can be instrumented with policies that support the management of the lifecycle of digital assets and will serve as a unique platform to study repository integration. One key feature is the automation of policy enforcement across distributed data that have been organized into a shared collection. The coupling of other open repositories and iRODS can create greater efficiencies and new types of repository services. We discuss various repository integration scenarios, their potential benefits, and implications for collection life cycles. The approaches co-locate metadata and content in varied ways and rely on efficiencies found in one repository only, or on the ability to combine policies in both spaces: (1) iRODS to ContexMiner data migration, (2) Policy-based data management for ContextMiner collections, and (3) Policy interchange between ContextMiner and iRODS collections. Keywords: interoperable repositories | |||
| A semi-automatic system for managing multiple digital preservation risks of digital libraries in china | | BIBAK | Full-Text | 425-426 | |
| Chao Li; Chunxiao Xing; Li Dong; Michael Bailou Huang | |||
| While many research projects in the world have been addressing challenges
posed by digital preservation, digital libraries in China have their own native
problems that have never been addressed before. Similar problems may occur in
other countries, and their memory institutions may be less prepared to handle
them. This poster analyses the requirements and challenges of digital libraries
in China and describes an integrated and flexible digital preservation system
-- AOMS. Keywords: XML, digital preservation, integration, web service | |||
| What patrons want: supporting interaction for novice information seeking scholars | | BIBAK | Full-Text | 427-428 | |
| Fernando Loizides; George R. Buchanan | |||
| In this paper, we undertake a study of inexperienced information seeking
scholars, identifying areas for improvement in their electronic information
seeking and document triage process[3]. We propose a software aid, currently
under development. Keywords: document triage, information seeking, novice users | |||
| Selective harvesting of regional digital libraries and national metadata aggregators | | BIBAK | Full-Text | 429-430 | |
| Cezary Mazurek; Marcin Mielnicki; Marcin Werla | |||
| The poster presents the concept, implementation and practical application of
the OAI-PMH protocol extension which allows OAI-PMH service providers to
dynamically create and harvest sets of items from OAI-PMH data providers. The
implementation of the presented concept is based on the encoding of dynamic set
specifications in OAI-PMH requests with the CQL language. The extension was
developed and widely applied in Poland and now it is used in several projects
funded by the European Commission. Keywords: CQL, OAI-PMH, interoperability, metadata access and distribution, metadata
aggregation, selective metadata harvesting | |||
| User search behaviors within a library gateway | | BIBAK | Full-Text | 431-432 | |
| William H. Mischo; Mary C. Schlembach; Michael A. Norman | |||
| This poster reports on user searching behavior within two information
gateways developed at the University of Illinois at Urbana-Champaign Library.
These gateways are built around a locally developed metasearch engine and are
designed to assist users with search query formulation and modification. Search
behavior data is being collected in custom transaction logs that gather user
search arguments along with any system actions and contextual search assistance
suggestions. Keywords: metasearch, transaction logs, user searching behaviors | |||
| Users' adjustments to unsuccessful queries in biomedical search | | BIBAK | Full-Text | 433-434 | |
| G. Craig Murray; Jimmy Lin; John Wilbur; Zhiyong Lu | |||
| Biomedical researchers depend on on-line databases and digital libraries for
up to date information. We introduce a pilot project aimed at characterizing
adjustments made to biomedical queries that improve search results.
Specifically we focus on queries submitted to PubMedî, a large
sophisticated search engine that facilitates Web access to abstracts of
articles in over 5,200 biomedical journals. On average 2 million users search
PubMed each day. During their search, nearly 20% will experience a result page
from one of their queries that has zero results. In some cases there really is
no document or abstract that will satisfy a particular query. However, in
analyzing one month of queries submitted to PubMed, we find that more often
than not, queries that retrieved no results are queries that would retrieve
something relevant if they were constructed differently. This paper describes a
new effort to identify some of the characteristics of a query that produces
zero results, and the changes that users most often apply in constructing new,
"corrected" queries. Zero-result queries afford us an opportunity to examine
changes made to queries that we know did not return relevant data, because they
did not return any data. An investigation of the changes users make under these
circumstances can yield insight into users' search processes. Keywords: PubMed, medical search, query reformulation, user modeling | |||
| Species identification: fish images with CBIR and annotations | | BIBK | Full-Text | 435-436 | |
| Uma Murthy; Edward A. Fox; Yinlin Chen; Eric Hallerman; Ricardo Torres; Evandro J. Ramos; Tiago R. C. Falcao | |||
Keywords: CBIR, fish species identification, image annotation, image retrieval, user
study | |||
| Kindle usage among LIS students: an exploratory study | | BIBK | Full-Text | 437-438 | |
| Debbie L. Rabina; Maria Cristina Pattuelli | |||
Keywords: e-books, electronic publishing, social issues, user needs | |||
| Metababble: a clash of metadata cultures | | BIBAK | Full-Text | 439-440 | |
| Monica Rivero; Geneva Henry | |||
| A tension exists between making digitized resources available to users
quickly and providing detailed, item-level metadata and semantic markup that
make those resources more discoverable. The Our Americas Archive Partnership
(OAAP) project, funded by IMLS in the fall of 2007, is facing these challenges
as the project progresses. This poster presents a summary of our approach and
future thoughts about descriptive approaches for digital resources. Keywords: TEI, digital library, metadata, minimal processing, social tagging | |||
| Evaluation of OAI-ORE via large-scale information topology visualization | | BIBAK | Full-Text | 441-442 | |
| Robert Sanderson; Clare Llewellyn; Richard Jones | |||
| This poster evaluates the OAI-ORE specifications through experiments
providing access to the JSTOR digital archive and the Flickr website. A
browser-based dynamic graph visualization tool was designed and tested to
determine if making the topology of the information available would provide
end-user benefits in terms of navigation and discovery. Keywords: OAI-ORE, linked data, visualization, web 2.0 | |||
| Empirical analysis on chinese academic plagiarism | | BIBAK | Full-Text | 443-444 | |
| Yang Shen; Huijuan Fu; Zitao Liu; Pengpeng Liu; Qingchuan Fu | |||
| This poster, from angels of subjects, authors' social network, authors'
combination, and students' plagiarism law, apply self-developed ROST
Anti-plagiarism Software to check 3781 papers, do a survey among 450 students,
quantitatively analyzed academic plagiarism conditions in China, and draw
several conclusions. Keywords: ROST anti-plagiarism software, plagiarism law, social network | |||
| Adaptive personalized eLearning on top of existing LCMS | | BIBAK | Full-Text | 445-446 | |
| Naimdjon Takhirov; Ingeborg T. Sølvberg | |||
| The next generation of eLearning systems should tailor the learning
experience to each individual's learning needs and preferences. PEDAL-NG is a
system that supports personalization in an existing, operational eLearning
environment, based on prior knowledge and the learning style of users. It is
built as a front-end of an existing LMS. The prototype is tested by a group of
students. The test results are favorable regarding the personalized course and
give valuable feedback for future research. Keywords: eLearning, learning objects, personalization | |||
| User search characteristics on a specialized digital collection for domain- and task-specific information | | BIBAK | Full-Text | 447-448 | |
| Xiaoya Tang | |||
| Domain-specialized digital collections have been growing rapidly in recent
years. A good understanding of how users interact with such collections to
accomplish domain-specific information tasks would help inform the design of
effective systems. This study investigates users' interaction with a Web-based
botanical collection by examining search logs recorded during an experiment.
The findings indicate that while users' interactions with such collections
demonstrate similar characteristics to those with general purpose search
systems, they also demonstrate a domain- and task-specific nature. Keywords: keyword search, query, terms, user study | |||
| MetRe: supporting the metadata revision process | | BIBAK | Full-Text | 449-450 | |
| Emma Tonkin | |||
| MetRe is a prototype interface and service designed to support the metadata
revision process. Improving consistency of metadata records within an
environment is a common repository management task, due to potential for user
error when submitting, as well as of other sources of error, such as systematic
error resulting from the chosen deposit process. Evidence to support the
metadata correction process may be gathered by automated metadata extraction
tools, evidence from within the repository, or by comparison with best practice
across the repository landscape. MetRe (Metadata Revision) is a prototype
demonstrator that is able to identify several characteristic classes of error,
twinned with an interface able to highlight several types of individual and
systematic error, including a notion of local (intra-repository) and general
(inter-repository) best practice. Keywords: metadata | |||
| Finding centuries-old hyperlinks with a novel semi-supervised learning technique | | BIBAK | Full-Text | 451-452 | |
| Xiaoyue Wang; Eamonn Keogh | |||
| Hyperlinks are so useful for searching and browsing modern digital
collections that researchers have longer wondered if it is possible to
retroactively add hyperlinks to digitized historical documents. There has
already been significant research into this endeavor for historical text;
however, in this work we consider the problem of adding hyperlinks among
graphic elements. While such a system would not have the ubiquitous utility of
text-based hyperlinks, there are several domains where it can potentially
significantly augment textual information. Keywords: historical digital libraries, historical manuscripts, hyperlinks,
semi-supervised learning | |||
| Journal ranking based on social information | | BIBAK | Full-Text | 453-454 | |
| Jinlong Wang; Ke Gao; Yongli Ren; Gang Li | |||
| Recently, literature analysis has become a hot issue in academic studies. In
order to quantify the importance of journals and provide researchers with
target vehicles for their work, this poster proposes a novel approach based on
the social information through considering the potential relationship between
journals quality and authors' affiliation. Based on the formula proposed in
this work, the importance of journals can be estimated and ranked. Keywords: journal ranking, mining, social information | |||
| The variety of ways in which instructors implement a modular digital library curriculum | | BIBAK | Full-Text | 455-456 | |
| Barbara M. Wildemuth; Jeffrey P. Pomerantz; Sanghee Oh; Seungwon Yang; Edward A. Fox | |||
| With support from the National Science Foundation, researchers at Virginia
Tech and the University of North Carolina developed a curriculum framework and
a number of modules for instruction in the area of digital libraries. In 2008,
15 different modules were field tested by 11 instructors at 10 different
institutions. As might be expected, instructors adapted these modules to fit
the context of their courses, some of which are described here. Keywords: computer science, curriculum development, digital libraries, education,
instruction, library and information science | |||
| GRE: hybrid recommendations for NSDL collections | | BIBAK | Full-Text | 457-458 | |
| Todd C. Will; Anand Srinivasan; Michael Bieber; Il Im; Vincent Oria; Yi-Fang (Brook) Wu | |||
| Recommendation systems have been proven to reduce the time and effort
required by users to find relevant items, but there are only sporadic reports
on their application in digital libraries. The General Recommendation Engine
(GRE) is composed of the text search system Lucene augmented by the
well-understood content based and collaborative filtering techniques and the
first application of knowledge based recommendation in digital libraries to
recommend items from 22 National Science Digital Library collections. In this
study comprised of 60 subjects, the GRE outperformed the baseline system Lucene
in all areas of evaluation. Keywords: collaborative filtering, content based, digital libraries, knowledge based
recommendation, recommendation systems, text search engine, user interface | |||
| Archiving the videogame industry: collecting primary materials of new media artifacts | | BIBAK | Full-Text | 459-460 | |
| Megan A. Winget | |||
| This paper describes the initial deposits in The Videogame Archive at the
Center for American History at the University of Texas at Austin. Keywords: collection development, new media, video games | |||
| Analyzing user's book-loan behaviors in Peking university library from social network perspective | | BIBAK | Full-Text | 461-462 | |
| Fei Yan; Ming Zhang; Tao Sun; Yang Lu; Naiyue Zhang; Long Xiao | |||
| In a university library, students from different background are connected by
co-borrowing behaviors which form a knowledge sharing network. This poster
presents a novel idea to study the users' book-loan behavior patterns
(knowledge sharing patterns) from the social network perspective which enable
us to understand the patterns in both the macro-level and micro-level analysis. Keywords: digital library, log mining, social network analysis | |||
| An ajax-based digital music stand for greenstone | | BIBAK | Full-Text | 463-464 | |
| David Bainbridge; Tim C. Bell | |||
| This extended abstract describes a digital music stand integrated with the
Greenstone digital library software. It features text annotation and an
animated fast-to-slow page wipe. Figure 1 illustrates both these features,
although it is best appreciated in a live demonstration. Digital annotation
provides a non-destructive alternative to a musician's habit of penciling in
notes. In Figure 1, slightly over half way down the page, there is a note to
watch the fingering. A user can have as many of these as they like, positioned
anywhere on the page.
The animated page wipe alleviates (somewhat) the issue of when to turn to the next page. Unlike its physical counterpart, where turning to the next page means you can no longer see the current page, with a digital music stand the next page can gradually be overlaid. The page transition occurring in Figure 1 can be seen as a marked horizontal bar not quite half-way down the page. The speed of the wipe is initially fast, but when it reaches the point where the scroll-bar marker is on the right-hand side of the page, it slows down significantly. This is to give the musician time to finishing playing the last line of the current page. In the event they have already finished playing that line, they will have naturally moved on to playing the top of the next page (which is already displayed). Rather than adopt a traditional client-side "helper" application for the digital music stand, we have integrated it within Greenstone using AJAX. For instance: next and previous pages are asynchronously loaded in the background; when generating a page, the dimensions of the user's screen is sent to the DL server so it can produce a version that maximizes the available space; and interactions such as adding an annotation, or altering the position of the animation-break are immediately stored as metadata associated with that document. Initially the animated page breaks are set to be between the last two staff systems. This is accomplished as part of the DL ingest process, leveraging off the staff detection step of Optical Music Recognition software. Keywords: digital library integration, digital music stand | |||
| Accessing the densho and historymakers oral history collections via informedia technologies | | BIBAK | Full-Text | 465-466 | |
| Michael G. Christel; Robert V. Baron; Geoff Froh; Dan Benson; Julieanna Richardson | |||
| Densho is a nonprofit organization started in 1996 with the goal of
documenting oral histories from Japanese Americans who were incarcerated during
World War II. The HistoryMakers is a nonprofit established in 1999 with the
goal of documenting video life oral history interviews highlighting the
accomplishments of individual African Americans and African-American-led groups
and movements. Both collections share the goal of broader, deeper use of the
oral history content through digitization and automated processing where
appropriate. This demonstration showcases the application of Carnegie Mellon
Informedia digital video library processing and interfaces to enhance access
into the interview segments. Keywords: digital video library, oral histories, video retrieval | |||
| Text mining for indexing | | BIBAK | Full-Text | 467-468 | |
| Judith Gelernter; Michael Lesk | |||
| This paper describes techniques for automatically extracting and classifying
maps found within articles. The process uses image analysis to find text in
maps, document structure to find captions and titles, and then text mining to
assign each map to a subject category, a geographical place, and a time period.
The text analysis is based on authority lists taken from gazetteers and from
library classifications. Keywords: automatic classification, content analysis and indexing, text mining | |||
| Our Americas archive partnership demonstration | | BIBAK | Full-Text | 469-470 | |
| Geneva Henry; Monica Rivero | |||
| The Our Americas Archive Partnership (OAAP) project is in year 2 of a 3-year
IMLS funded grant led by Rice University in Partnership with the University of
Maryland's Maryland Institute for Technology in the Humanities (MITH). Designed
to meet the needs of American studies scholars researching the Americas from a
hemispheric perspective, OAAP is developing an integrated framework for the
discovery of digital resources that are managed in heterogeneous distributed
repositories. This demonstration will show the current state of the project's
common interface to support resource discovery. Keywords: OAI, TEI, american studies, dspace, harvesting, repository, semantic markup,
social tagging, tag cloud | |||
| Mapping life events: temporal and geographic context for biographical information | | BIBAK | Full-Text | 471-472 | |
| Ray R. Larson; Ryan Shaw | |||
| Digital Libraries often fail to connect their contents to the wider context
of information resources available that are about the same persons, related
persons, places, or time periods and the events that happen to those persons,
at those places and in a given time period. This demonstration will show
prototype systems that can perform these tasks, linking the user to relevant
contextual information. Keywords: geographic information retrieval | |||
| Virtual DL poster sessions in second life | | BIBAK | Full-Text | 473-474 | |
| Spencer J. Lee; Edward A. Fox; Gary Marchionini; Javier Velacso; Gonçalo Antunes; José Borbinha | |||
| In Second Life (SL), a popular general-purpose 3D virtual world, we are
supporting the Digital Library community in a variety of ways, including
through virtual poster sessions. This brings together the interests of those
involved in JCDL 2009, IEEE-TCDL, and NSF-supported work in SL aimed to assist
education, training, and dissemination in the digital preservation area. Keywords: 3D, digital preservation, second life, tele-presence, virtual world | |||
| ContextMiner: building context-rich digital collections | | BIBK | Full-Text | 475-476 | |
| Chirag Shah | |||
Keywords: contextual information, digital curation, digital preservation | |||
| Using university collections in digital library education | | BIBK | Full-Text | 477-478 | |
| Quinn Stewart; David Todd | |||
Keywords: digital library curriculum, digitization, rich-media | |||
| A curriculum customization service | | BIBAK | Full-Text | 479-480 | |
| Tamara Sumner; Holly Devaul; Lynne Davis; John Weatherley | |||
| We demonstrate a prototype Curriculum Customization Service designed and
developed with significant teacher input. This prototype illustrates a model
for embedding digital library resources into mainstream classroom use. A 10
week pilot study suggests that this Service can increase teachers' use of
digital library resources in their class, and encourage them to use resources
to customize instruction. Keywords: customizing instruction, differentiated instruction, educational digital
libraries, personalization, science education | |||
| XEB: a markup language document container format suitable for handheld devices | | BIBAK | Full-Text | 481-482 | |
| Zhi Tang; Liangcai Gao; Aixia Jia; Xiaofan Lin | |||
| We propose a new document container format (XEB, eXtensible Electronic Book)
based on block mechanism to efficiently process markup language documents in
handheld devices. And random document access is also supported in the format
through a pagination mechanism. The format has already been applied to a number
of handheld devices' Chinese E-book readers and XEB documents can be downloaded
from a Chinese E-book store. Keywords: document parsing, handheld device, markup language document | |||
| AskDragon: a redundancy-based factoid question answering system with lightweight local context analysis | | BIBAK | Full-Text | 483-484 | |
| Xiaohua Zhou; Palakorn Achananuparp; E. K. Park; Xiaohua Hu; Xiaodan Zhang | |||
| We introduce our QA system AskDragon which employs a novel lightweight local
context analysis technique to handling two broad classes of factoid questions,
entity and numeric questions. The local context analysis module dramatically
improves the efficiency of QA systems without sacrificing high accuracy
performance. Keywords: answer generation, answer scoring, local context analysis, question
answering, redundancy-based approach | |||
| Knowledge extraction and integration for semi-structural information in digital libraries | | BIBK | Full-Text | 485-486 | |
| Wenhao Zhu; Baogang Wei; Jiangqin Wu; Shaomin Shi; Yan Yang | |||
Keywords: digital libraries, digitized textbook, information extraction | |||