HCI Bibliography Home | HCI Conferences | DL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DL Tables of Contents: 9697989900010203040506070809101112131415

JCDL'08: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries

Fullname:ACM/IEEE Joint Conference on Digital Libraries
Note:Briging - Culture Bridging - Technology
Editors:Ronald Larsen; Andreas Paepcke; José Borbinha; Mor Naaman
Location:Pittsburgh, Pennsylvania
Dates:2008-Jun-16 to 2008-Jun-20
Publisher:ACM
Standard No:ISBN 1-59593-998-9, 978-1-59593-998-2; ACM Order Number: 606082; ACM DL: Table of Contents hcibib: DL08
Papers:112
Pages:478
Links:Conference Home Page
  1. Automatic tools for digital libraries
  2. Users lost in information
  3. Education
  4. Geography and trust on the web
  5. Preservation and archiving
  6. Keynote
  7. It's the metadata, stupid
  8. Expanding search
  9. Video by the people, for the people
  10. Best paper session
  11. Keynote
  12. Content from documents
  13. Data from the web
  14. Platforms and users in digital libraries
  15. Beyond text
  16. Archiving and web tools for digital libraries
  17. Interfaces and navigation
  18. Demonstrations
  19. Workshops
  20. Posters

Automatic tools for digital libraries

Enhancing digital libraries using missing content analysis BIBAFull-Text 1-10
  David Carmel; Elad Yom-Tov; Haggai Roitman
This work shows how the content of a digital library can be enhanced to better satisfy its users' needs. Missing content is identified by finding missing content topics in the system's query log or in a pre-defined taxonomy of required knowledge. The collection is then enhanced with new relevant knowledge, which is extracted from external sources that satisfy those missing content topics. Experiments we conducted measure the precision of the system before and after content enhancement. The results demonstrate a significant improvement in the system effectiveness as a result of content enhancement and the superiority of the missing content enhancement policy over several other possible policies.
Building a dynamic lexicon from a digital library BIBAFull-Text 11-20
  David Bamman; Gregory Crane
We describe here in detail our work toward creating a dynamic lexicon from the texts in a large digital library. By leveraging a small structured knowledge source (a 30,457 word treebank), we are able to extract selectional preferences for words from a 3.5 million word Latin corpus. This is promising news for low-resource languages and digital collections seeking to leverage a small human investment into much larger gain. The library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author, genre or era and allows us to continually update the lexicon as new texts are added to the collection.
On content-driven search-keyword suggesters for literature digital libraries BIBAFull-Text 21-24
  Sulieman A. Bani-Ahmad; Gultekin Ozsoyoglu
We propose and evaluate a "content-driven search keyword suggester" for keyword-based search in literature digital libraries. Suggesting search keywords at an early stage, i.e., while the user is entering search terms, is helpful for constructing more accurate, less ambiguous, and focused search keywords for queries. Our search keyword suggestion approach is based on an a priori analysis of the publication collection in the digital library at hand, and consists of the following steps. We (i) parse the document collection using the Link Grammar parser, a syntactic parser of English, (ii) group publications based on their "most-specific" research topics, (iii) use the parser output to build a hierarchical structure of simple and compound tokens to be used to suggest search terms, (iv) use TextRank, a text summarization tool, to assign topic-sensitive scores to keywords, and (v) use the identified research-topics to help user aggregate search keywords prior to the actual search query execution.
   We experimentally show that the proposed framework, which is optimized to work on literature digital libraries, promises a more scalable, high quality, and user-friendly search-keyword suggester when compared to its competitors. We validate our proposal experimentally using a subset of the ACM SIGMOD Anthology digital library as a testbed, and by employing the research-pyramid model to identify the "most-specific" research topics.
Unsupervised semantic markup of literature for biodiversity digital libraries BIBAFull-Text 25-28
  Hong Cui
This paper reports the further development of machine learning techniques for semantic markup of biodiversity literature, especially morphological descriptions of living organisms such as those hosted at efloras.org and algaebase.org. Syntactic parsing and supervised machine learning techniques have been explored by earlier research. Limitations of these techniques promoted our investigation of an unsupervised learning approach that combines the strength of earlier techniques and avoids the limitations. Semantic markup at the organ and character levels is discussed. Research on semantic markup of natural heritage literature has direct impact on the development of semantic-based access in biodiversity digital libraries.

Users lost in information

Seeking information in realistic books: a user study BIBAFull-Text 29-38
  Veronica Liesaputra; Witten H. Ian
There are opposing views on whether readers gain any advantage from using a computer model of a 3D physical book. There is enough evidence, both anecdotal and from formal user studies, to suggest that the usual HTML or PDF presentation of documents is not always the most convenient, or the most comfortable, for the reader. On the other hand it is quite clear that while 3D book models have been prototyped and demonstrated, none are in routine use in today's digital libraries. And how do 3D book models compare with actual books?
   This paper reports on a user study designed to compare the performance of a practical Realistic Book implementation with conventional formats (HTML and PDF) and with physical books. It also evaluates the annotation features that the implementation provides.
Understanding cultural heritage experts' information seeking needs BIBAFull-Text 39-47
  Alia Amin; Jacco van Ossenbruggen; Lynda Hardman; Annelies van Nispen
We report on our user study on the information seeking behavior of cultural heritage experts and the sources they use to carry out search tasks. Seventeen experts from nine cultural heritage institutes in the Netherlands were interviewed and asked to answer questionnaires about their daily search activities. The interviews helped us to better understand their search motivations, types, sources and tools. A key finding of our study is that the majority of search tasks involve relatively complex information gathering. This is in contrast to the relatively simple fact-finding oriented support provided by current tools. We describe a number of strategies that experts have developed to overcome the inadequacies of their tools. Finally, based on the analysis, we derive general trends of cultural heritage experts' information seeking needs and discuss our preliminary experiences with potential solutions.
The myth of find: user behaviour and attitudes towards the basic search feature BIBAFull-Text 48-51
  Fernando Loizides; George R. Buchanan
The ubiquitous within-document text search feature (Ctrl-F) is considered by users to be a key advantage in electronic information seeking [1]. However what people say they do and what they actually do are not always consistent. It is necessary to understand, acknowledge and identify the cause of this inconsistency. We must identify the physical and cognitive factors to develop better methods and tools, assisting with the search process. This paper discusses the limitations and myths of Ctrl-f in information seeking. A prototype system for within-document search is introduced. Three user studies portray shared behaviour and attitudes, common between participants regarding within-document searching.
A longitudinal study of exploratory and keyword search BIBAFull-Text 52-56
  Max L. Wilson; m.c. schraefel
Digital libraries are concerned with improving the access to collections to make their service more effective and valuable to users. In this paper, we present the results of a four-week longitudinal study investigating the use of both exploratory and keyword forms of search within an online video archive, where both forms of search were available concurrently in a single user interface. While we expected early use to be more exploratory and subsequent use to be directed, over the whole period there was a balance of exploratory and keyword searches and they were often used together. Further, to support the notion that facets support exploration, there were more than five times as many facet clicks than more complex forms of keyword search (boolean and advanced). From these results, we can conclude that there is real value in investing in exploratory search support, which was shown to be both popular and useful for extended use of the system.

Education

Exploring educational standard alignment: in search of 'relevance' BIBAFull-Text 57-65
  René Reitsma; Byron Marshall; Michael Dalton; Martha Cyr
The growing availability of online K-12 curriculum is increasing the need for meaningful alignment of this curriculum with state-specific standards. Promising automated and semi-automated alignment tools have recently become available. Unfortunately, recent alignment evaluation studies report low inter-rater reliability, e.g., 32% with two raters and 35 documents. While these results are in line with studies in other domains, low reliability makes it difficult to accurately train automatic systems and complicates comparison of different services. We propose that inter-rater reliability of broadly defined, abstract concepts such as 'alignment' or 'relevance' must be expected to be low due to the real-world complexity of teaching and the multidimensional nature of the curricular documents. Hence, we suggest decomposing these concepts into less abstract, more precise measures anchored in the daily practice of teaching.
   This article reports on the integration of automatic alignment results into the interface of the Teach Engineering collection and on an evaluation methodology intended to produce more consistent document relevance ratings. Our results (based on 14 raters x 6 documents) show high inter-rater reliability (61-95%) on less abstract relevance dimensions while scores on the overall 'relevance' concept are (as expected) lower (64%). Despite a relatively small sample size, regression analysis of our data resulted in an explanatory (R2 = .75) and statistically stable (p-values < .05) model for overall relevance as indicated by matching concepts, related background material, adaptability to grade level, and anticipated usefulness of exercises. Our results suggest that more detailed relevance evaluation which includes several dimensions of relevance would produce better data for comparing and training alignment tools.
From NSDL 1.0 to NSDL 2.0: towards a comprehensive cyberinfrastructure for teaching and learning BIBAFull-Text 66-69
  David J. McArthur; Lee L. Zia
NSDL is a premier provider of digital educational collections and services, which has been supported by NSF for eight years. As a mature program, NSDL has reached a point where it could either change direction or wind down. In this paper we argue there are reasons to continue the program and we outline several possible new program directions. These build on NSDL's learning platform, and they also look towards NSF's emerging interest in supporting work at the intersection of cyberinfrastructure and education. We consider NSDL's potential roles in several grand challenges that confront education, including: tailoring educational resources to students' needs, providing educators with a cyber-teaching environment, developing a cyber-workbench for researchers, and integrating education research and practice.
Cross-disciplinary molecular science education in introductory science courses: an NSDL MATDL collection BIBAFull-Text 70-73
  David J. Yaron; Jodi L. Davenport; Michael Karabinos; Gaea L. Leinhardt; Laura M. Bartolo; John J. Portman; Cathy S. Lowe; Donald R. Sadoway; W. Craig Carter; Colin Ashe
This paper discusses a digital library designed to help undergraduate students draw connections across disciplines, beginning with introductory discipline-specific science courses (including chemistry, materials science, and biophysics). The collection serves as the basis for a design experiment for interdisciplinary educational libraries and is discussed in terms of the three models proposed by Sumner and Marlino. As a cognitive tool, the library is organized around recurring patterns in molecular science, with one such pattern being developed for this initial design experiment. As a component repository, the library resources support learning of these patterns and how they appear in different disciplines. As a knowledge network, the library integrates design with use and assessment.
Curriculum overlay model for embedding digital resources BIBAFull-Text 74-84
  Huda Khan; Keith Maull; Tamara Sumner
This paper describes the design and implementation of a curriculum overlay model for the representation of adaptable curriculum using educational digital library resources. We focus on representing curriculum to enable the incorporation of digital resources into curriculum and curriculum sharing and customization by educators. We defined this model as a result of longitudinal studies on educators' development and customization of curriculum and user interface design studies of prototypes representing curriculum. Like overlay journals or the information network overlay model, our curriculum overlay model defines curriculum as a compound object with internal semantic relationships and relationships to digital library metadata describing resources. We validated this model by instantiating the model using science curriculum which uses digital library resources and using this instantiation within an application that, built on FEDORA, supports curriculum customization. Findings from this work can support the design of digital library services for customizing curriculum which embeds digital resources.

Geography and trust on the web

Gazetiki: automatic creation of a geographical gazetteer BIBAFull-Text 85-93
  Adrian Popescu; Gregory Grefenstette; Pierre Alain Moëllic
Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames.
Discovering gis sources on the web using summaries BIBAFull-Text 94-103
  Ramaswamy Hariharan; Bijit Hore; Sharad Mehrotra
In this paper, we consider the problem of discovering GIS data sources on the web. Source discovery queries for GIS data are specified using keywords and a region of interest. A source is considered relevant if it contains data that matches the keywords in the specified region. Existing techniques simply rely on textual metadata accompanying such datasets to compute relevance to user-queries. Such approaches result in poor search results, often missing the most relevant sources on the web. We address this problem by developing more meaningful summaries of GIS datasets that preserve the spatial distribution of keywords. We conduct experiments showing the effectiveness of proposed summarization techniques by significantly improving the quality of query results over baseline approaches, while guaranteeing scalability and high performance.
Socialtrust: tamper-resilient trust establishment in online communities BIBAFull-Text 104-114
  James Caverlee; Ling Liu; Steve Webb
Web 2.0 promises rich opportunities for information sharing, electronic commerce, and new modes of social interaction, all centered around the "social Web" of user-contributed content, social annotations, and person-to-person social connections. But the increasing reliance on this "social Web" also places individuals and their computer systems at risk, creating opportunities for malicious participants to exploit the tight social fabric of these networks. With these problems in mind, we propose the SocialTrust framework for tamper-resilient trust establishment in online communities. SocialTrust provides community users with dynamic trust values by (i) distinguishing relationship quality from trust; (ii) incorporating a personalized feedback mechanism for adapting as the community evolves; and (iii) tracking user behavior. We experimentally evaluate the SocialTrust framework using real online social networking data consisting of millions of MySpace profiles and relationships. We find that SocialTrust supports robust trust establishment even in the presence of large-scale collusion by malicious participants.

Preservation and archiving

Personal & SOHO archiving BIBAFull-Text 115-123
  Stephan Strodl; Florian Motlik; Kevin Stadler; Andreas Rauber
Digital objects require appropriate measures for digital preservation to ensure that they can be accessed and used in the near and far future. While heritage institutions have been addressing the challenges posed by digital preservation needs for some time, private users and SOHOs (Small Office/Home Office) are less prepared to handle these challenges. Yet, both have increasing amounts of data that represent considerable value, be it office documents or family photographs. Backup, common practice of home users, avoids the physical loss of data, but it does not prevent the loss of the ability to render and use the data in the long term. Research and development in the area of digital preservation is driven by memory institutions and large businesses. The available tools, services and models are developed to meet the demands of these professional settings.
   This paper analyses the requirements and challenges of preservation solutions for private users and SOHOs. Based on the requirements and supported by available tools and services, we are designing and implementing a home archiving system to provide digital preservation solutions specifically for digital holdings in the small office and home environment. It hides the technical complexity of digital preservation challenges and provides simple and automated services based on established best practice examples. The system combines bit preservation and logical preservation strategies to avoid loss of data and the ability to access and use them. A first software prototype, called Hoppla, is presented in this paper.
Recovering a website's server components from the web infrastructure BIBAFull-Text 124-133
  Frank McCown; Michael L. Nelson
Our previous research has shown that the collective behavior of search engine caches (e.g., Google, Yahoo, Live Search) and web archives (e.g., Internet Archive) results in the uncoordinated but large-scale refreshing and migrating of web resources. Interacting with these caches and archives, which we call the Web Infrastructure (WI), allows entire websites to be reconstructed in an approach we call lazy preservation. Unfortunately, the WI only captures the client-side view of a web resource. While this may be useful for recovering much of the content of a website, it is not helpful for restoring the scripts, web server configuration, databases, and other server-side components responsible for the construction of the website's resources.
   This paper proposes a novel technique for storing and recovering the server-side components of a website from the WI. Using erasure codes to embed the server-side components as HTML comments throughout the website, we can effectively reconstruct all the server components of a website when only a portion of the client-side resources have been extracted from the WI. We present the results of a preliminary study that baselines the lazy preservation of ten EPrints repositories and then examines the preservation of an EPrints repository that uses the erasure code technique to store the server-side EPrints software throughout the website. We found nearly 100% of the EPrints components were recoverable from the WI just two weeks after the repository came online, and it remained recoverable four months after it was "lost".
A data model and architecture for long-term preservation BIBAFull-Text 134-144
  Greg Janée; Justin Mathena; James Frew
The National Geospatial Digital Archive, one of eight initial projects funded under the Library of Congress's NDIIPP program, has been researching how geospatial data can be preserved on a national scale and be made available to future generations. In this paper we describe an archive architecture that provides a minimal approach to the long-term preservation of digital objects based on co-archiving of object semantics, uniform representation of objects and semantics, explicit storage of all objects and semantics as files, and abstraction of the underlying storage system. This architecture ensures that digital objects can be easily migrated from archive to archive over time and that the objects can, in principle, be made usable again at any point in the future; its primary benefit is that it serves as a fallback strategy against, and as a foundation for, more sophisticated (and costly) preservation strategies. We describe an implementation of this architecture in a prototype archive running at UCSB that also incorporates a suite of ingest and access components.

Keynote

Shakespeare, god, and lonely hearts: transforming data access with many eyes BIBAFull-Text 145-146
  Fernanda Viégas; Martin Wattenberg
Data visualization has historically been accessible only to the technological elite. It is, after all, "serious" technology done by experts for experts. But recent web-based visualizations -- ranging from political art projects to news stories -- have reached millions. Unfortunately, while lay users can view sophisticated visualizations, they have few ways to create them. In order to "democratize" visualization, we have built Many Eyes, a web site where people may upload their own data, create interactive visualizations, and carry on conversations. By making these tools available to anyone on the web, the site fosters a social style of data analysis that empowers users to engage with public data through discussion and collaboration. Political discussions, citizen activism, religious conversations, game playing, and educational exchanges are all happening on Many Eyes. The public nature of these visualizations provides users with a transformative path to information literacy.

It's the metadata, stupid

HarvANA: harvesting community tags to enrich collection metadata BIBAFull-Text 147-156
  Jane Hunter; Imran Khan; Anna Gerber
Collaborative, social tagging and annotation systems have exploded on the Internet as part of the Web 2.0 phenomenon. Systems such as Flickr, Del.icio.us, Technorati, Connotea and LibraryThing, provide a community-driven approach to classifying information and resources on the Web, so that they can be browsed, discovered and re-used. Although social tagging sites provide simple, user-relevant tags, there are issues associated with the quality of the metadata and the scalability compared with conventional indexing systems. In this paper we propose a hybrid approach that enables authoritative metadata generated by traditional cataloguing methods to be merged with community annotations and tags. The HarvANA (Harvesting and Aggregating Networked Annotations) system uses a standardized but extensible RDF model for representing the annotations/tags and OAI-PMH to harvest the annotations/tags from distributed community servers. The harvested annotations are aggregated with the authoritative metadata in a centralized metadata store. This streamlined, interoperable, scalable approach enables libraries, archives and repositories to leverage community enthusiasm for tagging and annotation, augment their metadata and enhance their discovery services. This paper describes the HarvANA system and its evaluation through a collaborative testbed with the National Library of Australia using architectural images from PictureAustralia.
Semi automated metadata extraction for preprints archives BIBAFull-Text 157-166
  Emma Tonkin; Henk L. Muller
In this paper we present a system called paperBase that aids users in entering metadata for preprints. PaperBase extracts metadata from the preprint. Using a Dublin-Core based REST API, third-party repository software populates a web form that the user can then proofread and complete. PaperBase also predicts likely key words for the preprints, based on a controlled vocabulary of keywords that the archive uses and a Bayesian classifier.
   We have tested the system on 12 individuals, and measured the time that it took them to enter data, and the accuracy of the entered metadata. We find that our system appears to be faster than manual entry, but a larger sample needs to be tested before it can be deemed statistically significant. All but two participants perceived it to be faster. Some metadata, in particular the title of preprints, contains significantly fewer mistakes when entered automatically; even though the automatic system is not perfect, people tend to correct mistakes that paperBase makes, but would leave their own mistakes in place.
A metadata generation system for scanned scientific volumes BIBAFull-Text 167-176
  Xiaonan Lu; Brewster Kahle; James Z. Wang; C. Lee Giles
Large scale digitization projects have been conducted at digital libraries to preserve cultural artifacts and to provide permanent access. The increasing amount of digitized resources, including scanned books and scientific publications, requires development of tools and methods that will efficiently analyze and manage large collections of digitized resources. In this work, we tackle the problem of extracting metadata from scanned volumes of journals. Our goal is to extract information describing internal structures and content of scanned volumes, which is necessary for providing effective content access functionalities to digital library users. We propose methods for automatically generating volume level, issue level, and article level metadata based on format and text features extracted from OCRed text. We show the performance of our system on scanned bound historical documents nearly two centuries old. We have developed the system and integrated it into an operational digital library, the Internet Archive, for real-world usage.

Expanding search

Exploring a digital library through key ideas BIBAFull-Text 177-186
  Bill N. Schilit; Okan Kolak
Key Ideas is a technique for exploring digital libraries by navigating passages that repeat across multiple books. From these popular passages emerge quotations that authors have copied from book to book because they capture an idea particularly well: Jefferson on liberty; Stanton on women's rights; and Gibson on cyberpunk. We augment Popular Passages by extracting key terms from the surrounding context and computing sets of related key terms. We then create an interaction model where readers fluidly explore the library by viewing popular quotations on a particular key term, and follow links to quotations on related key terms. In this paper we describe our vision and motivation for Key Ideas, present an implementation running over a massive, real-world digital library consisting of over a million scanned books, and describe some of the technical and design challenges. The principal contribution of this paper is the interaction model and prototype system for browsing digital libraries of books using key terms extracted from the aggregate context of popularly quoted passages.
Math information retrieval: user requirements and prototype implementation BIBAFull-Text 187-196
  Jin Zhao; Min-Yen Kan; Yin Leng Theng
We report on the user requirements study and preliminary implementation phases in creating a digital library that indexes and retrieves educational materials on math. We first review the current approaches and resources for math retrieval, then report on the interviews of a small group of potential users to properly ascertain their needs. While preliminary, the results suggest that meta-search and resource categorization are two basic requirements for a math search engine. In addition, we implement a prototype categorization system and show that the generic features work well in identifying the math contents from the webpage but perform less well at categorizing them. We discuss our long term goals, where we plan to investigate how math expressions and text search may be best integrated.
A competitive environment for exploratory query expansion BIBAFull-Text 197-200
  David Milne; David M. Nichols; Ian H. Witten
Most information workers query digital libraries many times a day. Yet people have little opportunity to hone their skills in a controlled environment, or compare their performance with others in an objective way. Conversely, although search engine logs record how users evolve queries, they lack crucial information about the user's intent. This paper describes an environment for exploratory query expansion that pits users against each other and lets them compete, and practice, in their own time and on their own workstation. The system captures query evolution behavior on predetermined information-seeking tasks. It is publicly available, and the code is open source so that others can set up their own competitive environments.

Video by the people, for the people

How people find videos BIBAFull-Text 201-210
  Sally Jo Cunningham; David M. Nichols
At present very little is known about how people locate and view videos 'in the wild'. This study draws a rich picture of everyday video seeking strategies and video information needs, based on an ethnographic study of New Zealand university students. These insights into the participants' activities and motivations suggest potentially useful facilities for a video digital library.
Selection and context scoping for digital video collections: an investigation of YouTube and blogs BIBAFull-Text 211-220
  Robert G. Capra; Christopher A. Lee; Gary Marchionini; Terrell Russell; Chirag Shah; Fred Stutzman
Digital curators are faced with decisions about what part of the ever-growing, ever-evolving space of digital information to collect and preserve. The recent explosion of web video on sites such as YouTube presents curators with an even greater challenge -- how to sort through and filter a large amount of information to find, assess and ultimately preserve important, relevant, and interesting video. In this paper, we describe research conducted to help inform digital curation of on-line video. Since May 2007, we have been monitoring the results of 57 queries on YouTube related to the 2008 U.S. presidential election. We report results comparing these data to blogs that point to candidate videos on YouTube and discuss the effects of query-based harvesting as a collection development strategy.
A study of awareness in multimedia search BIBAFull-Text 221-230
  Robert Villa; Nicholas Gildea; Joemon M. Jose
Awareness of another's activity is an important aspect of facilitating collaboration between users, enabling an "understanding of the activities of others"[1]. Techniques such as collaborative filtering enable a form of asynchronous awareness, providing recommendations generated from the past activity of a community of users. In this paper we investigate the role of awareness and its effect on search behavior in collaborative multimedia retrieval. We focus on the scenario where two users are searching at the same time on the same task, and via the interface, can see the activity of the other user. The main research question asks: does awareness of another searcher aid a user when carrying out a multimedia search session?
   To encourage awareness, an experimental study was designed where two users were asked to find as many relevant video shots as possible under different awareness conditions. These were individual search (no awareness of each other), mutual awareness (where both user's could see each other's search screen), and unbalanced awareness (where one user is able to see the other's screen, but not vice-versa). Twelve pairs of users were recruited, and the four worst performing TRECVID 2006 search topics were used as search tasks, under four different awareness conditions. We present the results of this study, followed by a discussion of the implications for multimedia digital library systems.

Best paper session

Towards usage-based impact metrics: first results from the MESUR project. BIBAFull-Text 231-240
  Johan Bollen; Herbert Van de Sompel; Marko A. Rodriguez
Scholarly usage data holds the potential to be used as a tool to study the dynamics of scholarship in real time, and to form the basis for the definition of novel metrics of scholarly impact. However, the formal groundwork to reliably and validly exploit usage data is lacking, and the exact nature, meaning and applicability of usage-based metrics is poorly understood. The MESUR project funded by the Andrew W. Mellon Foundation constitutes a systematic effort to define, validate and cross-validate a range of usage-based metrics of scholarly impact. MESUR has collected nearly 1 billion usage events as well as all associated bibliographic and citation data from significant publishers, aggregators and institutional consortia to construct a large-scale usage data reference set. This paper describes some major challenges related to aggregating and processing usage data, and discusses preliminary results obtained from analyzing the MESUR reference data set. The results confirm the intrinsic value of scholarly usage data, and support the feasibility of reliable and valid usage-based metrics of scholarly impact.
Evaluating the contributions of video representation for a life oral history collection BIBAFull-Text 241-250
  Michael G. Christel; Michael H. Frisch
A digital video library of over 900 hours of video and 18000 stories from The HistoryMakers is used to investigate the role of motion video for users of recorded life oral histories. Stories in the library are presented in one of two ways in two within-subjects experiments: either as audio accompanied by a single still photographic image per story, or as the same audio within a motion video of the interviewee speaking. Twenty-four participants given a treasure-hunt fact-finding task, i.e., very directed search, showed no significant preference for either the still or video treatment, and no difference in task performance. Fourteen participants in a second study worked on an exploratory task in the same within-subjects experimental framework, and showed a significant preference for video. For exploratory work, video has a positive effect on user satisfaction. Implications for use of video in collecting and accessing recorded life oral histories, in student assignments and more generally, are discussed, along with reflections on long term user studies to complement the ones presented here.
From writing and analysis to the repository: taking the scholars' perspective on scholarly archiving BIBAFull-Text 251-260
  Catherine C. Marshall
This paper reports the results of a qualitative field study of the scholarly writing, collaboration, information management, and long-term archiving practices of researchers in five related subdisciplines. The study focuses on the kinds of artifacts the researchers create in the process of writing a paper, how they exchange and store materials over the short term, how they handle references and bibliographic resources, and the strategies they use to guarantee the long term safety of their scholarly materials. The findings reveal: (1) the adoption of a new CIM infrastructure relies crucially on whether it compares favorably to email along six critical dimensions; (2) personal scholarly archives should be maintained as a side-effect of collaboration and the role of ancillary material such as datasets remains to be worked out; and (3) it is vital to consider agency when we talk about depositing new types of scholarly materials into disciplinary repositories.

Keynote

Scientific publishing in the era of Petabyte data BIBAFull-Text 261-262
  Alexander S. Szalay
Today's scientific datasets are growing into Petabytes. A similar transition is happening in industry and society. Web search companies have to deal routinely with tens of Petabytes, a substantial fraction of the world's computers go into data warehouses of the Big 5. Scientists, librarians and publishers are just beginning to grasp the magnitude and multi-faceted nature of the problems facing us. Every step of the usual scientific process will need to change and change soon. Science in the 21st century will require a different set of skills than previously, more computational and algorithmic thinking and more interdisciplinary interaction will be hallmarks of a successful scientist. The talk will present the challenges and trends in this 'brave new world'.

Content from documents

User-assisted ink-bleed correction for handwritten documents BIBAFull-Text 263-271
  Yi Huang; Michael S. Brown
We describe a user-assisted framework for correcting ink-bleed in old handwritten documents housed at the National Archives of Singapore (NAS). Our approach departs from traditional correction techniques that strive for full automation. Fully-automated approaches make assumptions about ink-bleed characteristics that are not valid for all inputs. Furthermore, fully-automated approaches often have to set algorithmic parameters that have no meaning for the end-user. In our system, the user needs only to provide simple examples of ink-bleed, foreground ink, and background. These training examples are used to classify the remaining pixels in the document to produce a computer-generated result that is equal to or better than existing fully-automated approaches.
   To offer a complete system, we also provide tools that allow any errors in the computer-generated results to be quickly "cleaned up" by the user. The initial training markup, together with the computer-generated results, and manual edits are all recorded with the final output, allowing subsequent viewers to see how a corrected document was created and to make changes or updates. While an ongoing project, our feedback from the NAS staff has been overwhelmingly positive that this user-assisted framework is a practical way to address the ink-bleed problem.
CRF-based authors' name tagging for scanned documents BIBAFull-Text 272-275
  Manabu Ohta; Atsuhiro Takasu
Authors' names are a critical bibliographic element when searching or browsing academic articles stored in digital libraries. Therefore, those creating metadata for digital libraries would appreciate an automatic method to extract such bibliographic data from printed documents. In this paper, we describe an automatic author name tagger for academic articles scanned with optical character recognition (OCR) mark-up. The method uses conditional random fields (CRF) for labeling the unsegmented character strings in authors' blocks as those of either an author or a delimiter. We applied the tagger to Japanese academic articles. The results of the experiments showed that it correctly labeled more than 99% of the author name strings, which compares favorably with the under 96% correct rate of our previous tagger based on a hidden Markov model (HMM).
Segregating and extracting overlapping data points in two-dimensional plots BIBAFull-Text 276-279
  William Browuer; Saurabh Kataria; Sujatha Das; Prasenjit Mitra; C. Lee Giles
Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents are often invaluable to the researcher in various fields, for the purposes of direct comparison with their own work. Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents. If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently. We propose a framework based on image analysis and machine learning to extract information from 2-D plot images and store them in a database. The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points. We demonstrate performance of individual algorithms, using a combination of generated and real-life images.
A simple method for citation metadata extraction using hidden Markov models BIBAFull-Text 280-284
  Erik Hetzner
This paper describes a simple method for extracting metadata fields from citations using hidden Markov models. The method is easy to implement and can achieve levels of precision and recall for heterogeneous citations comparable to or greater than other HMM-based methods. The method consists largely of string manipulation and otherwise depends only on an implementation of the Viterbi algorithm, which is widely available, and so can be implemented by diverse digital library systems.

Data from the web

Identification of time-varying objects on the web BIBAFull-Text 285-294
  Satoshi Oyama; Kenichi Shirasuna; Katsumi Tanaka
We have developed a method for determining whether data found on the Web are for the same or different objects that takes into account the possibility of changes in their attribute values over time. Specifically, we estimate the probability that observed data were generated for the same object that has undergone changes in its attribute values over time and the probability that the data are for different objects, and we define similarities between observed data using these probabilities. By giving a specific form to the distributions of time-varying attributes, we can calculate the similarity between given data and identify objects by using agglomerative clustering on the basis of the similarity. Experiments in which we compared identification accuracies between our proposed method and a method that regards all attribute values as constant showed that the proposed method improves the precision and recall of object identification.
Using web information for creating publication venue authority files BIBAFull-Text 295-304
  Denilson Alves Pereira; Berthier Ribeiro-Neto; Nivio Ziviani; Alberto H. F. Laender
Citations to publication venues in the form of journal, conference and workshop contain spelling variants, acronyms, abbreviated forms and misspellings, all of which make more difficult to retrieve the item of interest. The task of discovering and reconciling these variant forms of bibliographic references is known as authority work. The key goal is to create the so called authority files, which maintain, for any given bibliographic item, a list of variant labels (i.e., variant strings) used as a reference to it. In this paper we propose to use information available on the Web to create high quality publication venue authority files. Our idea is to recognize (and extract) references to publication venues in the text snippets of the answers returned by a search engine. References to a same publication venue are then reconciled in an authority file. Each entry in this file is composed of a canonical name for the venue, an acronym, the venue type (i.e., journal, conference, or workshop), and a mapping to various forms of writing its name in bibliographic citations. Experimental results show that our Web-based method for creating authority files is superior to previous work based on straight string matching techniques. Considering the average precision in finding correct venue canonical names, we observe gains up to 41.7%.
Application of Kalman filters to identify unexpected change in blogs BIBAFull-Text 305-312
  Paul Logasa, II Bogen; Joshua Johnston; Unmil P. Karadkar; Richard Furuta; Frank Shipman
Information on the Internet, especially blog content, changes rapidly. Users of information collections, such as the blogs hosted by technorati.com, have little, if any, control over the content or frequency of these changes. However, it is important for users to be able to monitor content for deviations in the expected pattern of change. If a user is interested in political blogs and a blog switches subjects to a literary review blog, the user would want to know of this change in behavior. Since pages may change too frequently for manual inspection for "unwanted" changes, an automated approach is wanted. In this paper, we explore methods for identifying unexpected change by using Kalman filters to model blog behavior over time. Using this model, we examine the history of several blogs and determine methods for flagging the significance of a blog's change from one time step to the next. We are able to predict large deviations in blog content, and allow user-defined sensitivity parameters to tune a statistical threshold of significance for deviation from expectation.

Platforms and users in digital libraries

NCore: architecture and implementation of a flexible, collaborative digital library BIBAFull-Text 313-322
  Dean B. Krafft; Aaron Birkland; Ellen J. Cramer
NCore is an open source architecture and software platform for creating flexible, collaborative digital libraries. NCore was developed by the National Science Digital Library (NSDL) project, and it serves as the central technical infrastructure for NSDL. NCore consists of a central Fedora-based digital repository, a specific data model, an API, and a set of backend services and frontend tools that create a new model for collaborative, contributory digital libraries. This paper describes NCore, presents and analyzes its architecture, tools and services; and reports on the experience of NSDL in building and operating a major digital library on it over the past year and the experience of the Digital Library for Earth Systems Education in porting their existing digital library and tools to the NCore platform.
Acceptance and use of electronic library services in Ugandan universities BIBAFull-Text 323-332
  Prisca K. G. Tibenderana; Patrick J. Ogao
University libraries in Developing Countries (DCs), hampered by developmental problems, find it hard to provide electronic services. Donor communities have come in to bridge this technology gap by providing funds to university libraries for information technology infrastructure, enabling these university libraries to provide electronic library services to patrons. However, for these services to be utilized effectively, library end-users must accept and use them. To investigate this process in Uganda, this study modifies "The Unified Theory of Acceptance and Use of Technology" (UTAUT) by replacing "effort expectancy" and "voluntariness" with "relevancy", "awareness" and "benefits" factors. In so doing, we developed the Service Oriented UTAUT (SOUTAUT) model whose dependent constructs predict 133% of the variances in user acceptance and use of e-library services. The study revealed that relevancy moderated by awareness plays a major factor in acceptance and use of e-library services in DCs.
Portable digital libraries on an iPod BIBAFull-Text 333-336
  David Bainbridge; Steve Jones; Sam McIntosh; Matt Jones; Ian H. Witten
This paper describes the facilities we built to run a self-contained digital library on an iPod. The digital library software used was the open source package Greenstone, and the paper highlights the technical problems that were encountered and solved. It attempts to convey a feeling for the kind of issues that must be faced when adapting standard DL software for non-standard, leading-edge devices.
Annotated program examples as first class objects in an educational digital library BIBAFull-Text 337-340
  Peter Brusilovsky; I-Han Hsiao; Michael V. Yudelson
This paper analyzes problems encountered by our team while creating an educational digital library of program examples. We present approaches to resolving these problems, and evaluations of the suggested approaches.

Beyond text

Annotating historical archives of images BIBAFull-Text 341-350
  Xiaoyue Wang; Lexiang Ye; Eamonn Keogh; Christian Shelton
Recent initiatives like the Million Book Project and Google Print Library Project have already archived several million books in digital format, and within a few years a significant fraction of world's books will be online. While the majority of the data will naturally be text, there will also be tens of millions of pages of images. Many of these images will defy automation annotation for the foreseeable future, but a considerable fraction of the images may be amiable to automatic annotation by algorithms that can link the historical image with a modern contemporary, with its attendant metatags. In order to perform this linking we must have a suitable distance measure which appropriately combines the relevant features of shape, color, texture and text. However the best combination of these features will vary from application to application and even from one manuscript to another. In this work we propose a simple technique to learn the distance measure by perturbing the training set in a principled way. We show the utility of our ideas on archives of manuscripts containing images from natural history and cultural artifacts.
sLab: smart labeling of family photos through an interactive interface BIBAFull-Text 351-354
  Ehsan Fazl-Ersi; I. Scott MacKenzie; John K. Tsotsos
A novel technique for semi-automatic photo annotation is proposed and evaluated. The technique, sLab, uses face processing algorithms and a simplified user interface for labeling family photos. A user study compared our system with two others. One was Adobe Photoshop Element. The other was an in-house implementation of a face clustering interface recently proposed in the research community. Nine participants performed an annotation task with each system on faces extracted from a set of 150 images from their own family photo albums. As the faces were all well known to participants, accuracy was near perfect with all three systems. On annotation time, sLab was 25% faster than Photoshop Element and 16% faster than the face clustering interface.
Autotagging to improve text search for 3d models BIBAFull-Text 355-358
  Corey Goldfeder; Peter Allen
Text search on libraries of 3D models has traditionally worked poorly, as text annotations on 3D models are often unreliable or incomplete. We attempt to improve the recall of text search by automatically assigning appropriate tags to models. Our algorithm finds relevant tags by appealing to a large corpus of partially labeled example models, which does not have to be preclassified or otherwise prepared. For this purpose we use a copy of Google 3DWarehouse, a library of user contributed models which is publicly available on the Internet. Given a model to tag, we find geometrically similar models in the corpus, based on distances in a reduced dimensional space derived from Zernike descriptors. The labels of these neighbors are used as tag candidates for the model with probabilities proportional to the degree of geometric similarity. We show experimentally that text based search for 3D models using our computed tags can approach the quality of geometry based search. Finally, we describe our 3D model search engine that uses this algorithm.
Slide image retrieval: a preliminary study BIBAFull-Text 359-362
  Guo Min Liew; Min-Yen Kan
We consider the task of automatic slide image retrieval, in which slide images are ranked for relevance against a textual query. Our implemented system, SLIDIR caters specifically for this task using features specifically designed for synthetic images embedded within slide presentation. We show promising results in both the ranking and binary relevance task and analyze the contribution of different features in the task performance.
Perception-oriented online news extraction BIBAFull-Text 363-366
  Jinlin Chen; Keli Xiao
A novel online news extraction approach based on human perception is presented in this paper. The approach simulates how a human perceives and identifies online news content. It first detects news areas based on content function, space continuity, and formatting continuity of news information. It further identifies detailed news content based on the position, format, and semantic of detected news areas. Experiment results show that our approach achieves much better performance (in average more than 99% in terms of F1 Value) compared to previous approaches such as Tree Edit Distance and Visual Wrapper based approaches. Furthermore, our approach does not assume the existence of Web templates in the tested Web pages as required by Tree Edit Distance based approach, nor does it need training sets as required in Visual Wrapper based approach. The success of our approach demonstrates the strength of the perception-oriented Web information extraction methodology and represents a promising approach for automatic information extraction from sources with presentation design for humans.

Archiving and web tools for digital libraries

Plato: a service oriented decision support system for preservation planning BIBAFull-Text 367-370
  Christoph Becker; Hannes Kulovits; Andreas Rauber; Hans Hofman
The fast changes of technologies in today's information landscape have considerably shortened the lifespan of digital objects. Digital preservation has become a pressing challenge. Different strategies such as migration and emulation have been proposed; however, the decision for a specific tool e.g. for format migration or an emulator is very complex. The process of evaluating potential solutions against specific requirements and building a plan for preserving a given set of objects is called preservation planning. So far, it is a mainly manual, sometimes ad-hoc process with little or no tool support. This paper presents a service-oriented architecture and decision support tool that implements a solid preservation planning process and integrates services for content characterisation, preservation action and automatic object comparison to provide maximum support for preservation planning endeavours.
Usage analysis of a public website reconstruction tool BIBAFull-Text 371-374
  Frank McCown; Michael L. Nelson
The Web is increasingly the medium by which information is published today, but due to its ephemeral nature, web pages and sometimes entire websites are often "lost" due to server crashes, viruses, hackers, run-ins with the law, bankruptcy and loss of interest. When a website is lost and backups are unavailable, an individual or third party can use Warrick to recover the website from several search engine caches and web archives (the Web Infrastructure). In this short paper, we present Warrick usage data obtained from Brass, a queueing system for Warrick hosted at Old Dominion University and made available to the public for free. Over the last six months, 520 individuals have reconstructed more than 700 websites with 800K resources from the Web Infrastructure. Sixty-two percent of the static web pages were recovered, and 41% of all website resources were recovered. The Internet Archive was the largest contributor of recovered resources (78%).
Using web metrics to analyze digital libraries BIBAFull-Text 375-384
  Michael Khoo; Joe Pagano; Anne L. Washington; Mimi Recker; Bart Palmer; Robert A. Donahue
This paper discusses the use of web metrics tools at four digital libraries, the Instructional Architect, the Library of Congress, the National Science Digital Library, and WGBH Teachers' Domain. We describe some of the issues involved in using web metrics to report on ongoing web site performance. We also describe how web metrics can be used for focused data mining, using session length metrics as our example; and here, we recommend that session length metrics, which were developed to track e-commerce, need to be carefully considered when they are applied in non-e-commerce settings, such as digital libraries. We conclude by discussing some of the current limitations and possibilities of using web metrics to analyze and evaluate digital library use and impact.
A lightweight metadata quality tool BIBAFull-Text 385-388
  David M. Nichols; Chu-Hsiang Chan; David Bainbridge; Dana McKay; Michael B. Twidale
We describe a Web-based metadata quality tool that provides statistical descriptions and visualisations of Dublin Core metadata harvested via the OAI protocol. The lightweight nature of development allows it to be used to gather contextualized requirements and some initial user feedback is discussed.

Interfaces and navigation

Improving navigation interaction in digital documents BIBAFull-Text 389-392
  George Buchanan; Tom Owen
This paper investigates novel interactions for supporting within-document navigation. We focus on one specific interaction: the following of figure references. Through this interaction we illuminate factors also found in other forms of navigation. Three alternative interactions for supporting figure navigation are described and evaluated through a user study. Experimentation proves the advantages of our interaction design, and the degree to which the interaction of existing reader software can be improved.
Keeping narratives of a desktop to enhance continuity of on-going tasks BIBAFull-Text 393-396
  Youngjoo Park; Richard Furuta
We describe a novel interface by which a user can browse, bookmark and retrieve previously used working environments, i.e., desktop status, enabling the retention of the history of use of various sets of information. Significant tasks often require reuse of (sets of) information that was used earlier. Particularly, if a task involves extended interaction, then the task's environment has been through a lot of changes and can get complex. Under the current prevailing desktop-based computing environment, after an interruption to the task users can gain little assistance to get back to the context that they previously worked on. A user thus encounters increased discontinuity in continuing extended tasks.
Note-taking, selecting, and choice: designing interfaces that encourage smaller selections BIBAFull-Text 397-406
  Aaron Bauer; Kenneth R. Koedinger
Our research develops note-taking applications for educational environments. Previous studies found that while copy-pasting notes can be more efficient than typing, for some users it reduces attention and learning. This paper presents two studies aimed at designing and evaluating interfaces that encourage focusing. While we were able to produce interfaces that increased desirable behaviors and improved satisfaction, the new interfaces did not improve learning. We suggest design recommendations derived from these studies, and describe a "selecting-to-read" behavior we encountered, which has implications for the design of reading and note-taking applications.
A Fedora librarian interface BIBAFull-Text 407-416
  David Bainbridge; Ian H. Witten
The Fedora content management system embodies a powerful and flexible digital object model. This paper describes a new open-source software front-end that enables end-user librarians to transfer documents and metadata in a variety of formats into a Fedora repository. The main graphical facility that Fedora itself provides for this task operates on one document at a time and is not librarian-friendly. A batch driven alternative is possible, but requires documents to be converted beforehand into the XML format used by the repository, necessitating a need for programming skills. In contrast, our new scheme allows arbitrary collections of documents residing on the user's computer (or the web at large) to be ingested into a Fedora repository in one operation, without a need for programming expertise. Provision is also made for editing existing documents and metadata, and adding new ones. The documents can be in a wide variety of different formats, and the user interface is suitable for practicing librarians. The design capitalizes on our experience in building the Greenstone librarian interface and participating in dozens of workshops with librarians worldwide.

Demonstrations

Broadening participation in computing with the k-gray engineering pathway digital library BIBAFull-Text 417
  Alice Agogino; Michael Smith
This demonstration presents a digital library for educators at all levels to easily identify, select, and use educational resources that have been shown through research to be effective for increasing the participation of women and under-represented minorities in information technology. The library consists of practices from the Broadening Participation in Computing (BPC) program in NSF CISE and elsewhere that have been researched or evaluated for their promise or effectiveness to recruit, retain, or advance under-represented groups in IT fields of study or research careers. We do not develop the practices, but instead describe them and make them easy for users to find and evaluate in a central location.
Increasing the visibility of web-based information systems via client-side mash-ups BIBKFull-Text 418
  Godmar Back; Annette Bailey
Keywords: digital library, hypertext, ILS, mashups, OPAC
Running Greenstone on an iPod BIBAFull-Text 419
  David Bainbridge; Steve Jones; Sam McIntosh; Matt Jones; Ian H. Witten
The open source digital library software Greenstone is demonstrated running on an iPod. The standalone configuration supports browsing, searching and displaying documents in a range of media formats. Plugged in to a host computer (Mac, Linux, or Windows), the exact same facilities are made available to the world through a built-in web server.
The relation browser tool for faceted exploratory search BIBAFull-Text 420
  Robert G. Capra; Gary Marchionini
The Relation Browser (RB) is a tool developed by the Interaction Design Lab at the University of North Carolina at Chapel Hill for understanding relationships between items in a collection and for exploring an information space (e.g., a set of documents or webpages). The RB has been through a number of major design revisions. At JCDL 2007, we reported on two studies of information seeking that we conducted using the RB++ version of the Relation Browser software. Based on the results of those studies, we developed a set of design changes and implemented these in a new version called RB07. We will demonstrate the new RB07 interface and describe the rationale for our design changes.
An application for semantic markup of biodiversity documents BIBAFull-Text 421
  Hong Cui
We would like to demonstrate a machine-learning based semantic markup system that may be used to reformat free-text biodiversity documents in XML format for digital libraries. We named the system MARTT II. It is built on the MARTT engine described in [1], but with new components for example a parallel markup engine using the unsupervised learning algorithm described in [2]. The double Is in the name stand for Intuitive Interaction, which is our goal to make the system truly easy to use. They also mean the system supports two different automated markup engines, allowing the user to choose either one to use and make comparisons between the two.
Dynamic classification explorer for music digital libraries BIBAFull-Text 422
  J. Stephen Downie; Kris West; Xiao Hu
This paper outlines a dynamic classification explorer for music digital library users and researchers. System provides multiple simultaneous classification visualizations and synchronized audio.
Novel interface services for bioacoustic digital libraries BIBAFull-Text 423
  J. Stephen Downie; David K. Tcheng; Xin Xiang
This paper introduces the CARDINAL (Computer Assisted Recognition and Discovery in Natural Acoustic Landscapes) interface system for use in Bioacoustic Digital Libraries (BADL).
Direct: applying the DIKW hierarchy to large-scale evaluation campaigns BIBAFull-Text 424
  Marco Dussin; Nicola Ferro
We describe the effort of designing and developing a digital library system able to manage the different types of information resources produced during a large-scale evaluation campaign and to support the different stages of it. In this context, we present DIRECT, the system which has been adopted to manage the CLEF evaluation campaigns since 2005.
Semtinel: interactive supervision of automatic indexing BIBKFull-Text 425
  Kai Eckert; Heiner Stuckenschmidt; Magnus Pfeffer
Keywords: thesaurus-based retrieval, visualization
Dilight: providing flexible and knowledge rich access to support digital library learning BIBKFull-Text 426
  Daqing He; Ming Mao; Yefei Peng; Jongdo Park
Keywords: digital libraries, dilight, e-learning, information access
Computational linguistics for metadata building BIBAFull-Text 427
  Judith L. Klavans; Carolyn Sheffield; Jimmy Lin; Tandeep Sidhu
In this paper, we describe a downloadable text-mining tool for enhancing subject access to image collections in digital libraries.
Plato: a preservation planning tool BIBAFull-Text 428
  Hannes Kulovits; Christoph Becker; Michael Kraxner; Florian Motlik; Kevin Stadler; Andreas Rauber
Creating a concrete plan for preserving an institution's collection of digital objects requires the evaluation of available solutions against clearly defined and measurable criteria. Preservation planning aids in this decision making process to find the best preservation strategy considering the institution's requirements, the planning context and possible actions applicable to the objects contained in the repository. Performed manually, this evaluation of possible solutions against requirements takes a good deal of time and effort. In this demonstration, we present Plato, an interactive software tool aimed at creating preservation plans.
InPhO: a system for collaboratively populating and extending a dynamic ontology BIBAFull-Text 429
  Mathias Niepert; Cameron Buckner; Jaimie Murdock; Colin Allen
InPhO is a system that combines statistical text processing, information extraction, human expert feedback, and logic programming to populate and extend a dynamic ontology for the field of philosophy. Integrated in the editorial workflow of the Stanford Encyclopedia of Philosophy (SEP), it will provide important metadata features such as automated generation of cross-references, semantic search, and ontology driven conceptual navigation.
The VocalSearch music search engine BIBAFull-Text 430
  Bryan Pardo; David Little; Rui Jiang; Hagai Livni; Jinyu Han
The VocalSearch system is a music search engine developed at Northwestern University and available on the internet (vocalsearch.org). This system lets the user query for the desired song in a number of ways: sung queries, queries entered as music notation, and text-based lyrics search. Users are also able to contribute songs to the system, making them searchable for future users. The result is a flexible system that lets the user find the song using their preferred modality (music notation, text, music notation). This demonstration lets users try out the VocalSearch system.
DIGMAP: a service for searching and browsing old maps BIBAFull-Text 431
  Gilberto Pedrosa; João Luzio; Hugo Manguinhas; Bruno Martins
DIGMAP aims to become the main international resource discovery service for digitized old maps existing in libraries. The service reuses metadata from European national libraries and other relevant third party metadata sources. The gathered metadata is enhanced locally with geographical indexing and with record linking/clustering, leveraging on geographic gazetteers and authority files. When available, the images of the maps are also processed to extract potentially relevant features. This made it possible to develop a rich integrated environment for searching and browsing services with four perspectives: image's features, textual, geographic and temporal information.
See the world with ThemExplorer BIBAFull-Text 432
  Adrian Popescu; Sofiane Souidi; Pierre-Alain Moëllic
We demonstrate ThemExplorer:, a mash-up for geographic image retrieval. The application combines: Geonames, a geographic thesaurus; TagMaps, a tool for visualizing tags on a map; PIRIA -- a visual search engine, and pictures collected from Flickr and Google images. The user can query ThemExplorer using both keywords and example images.
TubeKit: a query-based YouTube crawling toolkit BIBKFull-Text 433
  Chirag Shah
Keywords: contextual information, web crawler, YouTube
Bringing lives to light: lives and event representation in temporal and geographic context BIBAFull-Text 434
  Ryan Shaw; Ray R. Larson
Our demonstration system consists of a set of tools for identifying life events in biographical texts and linking them to relevant contextual resources.
Data visualization applications in virtual globe software BIBAFull-Text 435-436
  Situ Studio; William Cotton; Nate Hill; Alicia Gibb
Focusing on the intersection of visual data mapping and virtual globe software, this application is part digital library and part analytical tool. It combines data sets into a collaborative database and visualizes the information through Google Earth overlays. This user-centered interface makes previously hard-to-use public information (e.g. census data) accessible and easily interpretable.
   We are presenting an interactive application named GeoDatum that allows users to upload their databases and display this information through a number of visualization tools, either individually or comparatively. The software is an open source web application with multiple goals. Primarily, it is a central repository for both geographic boundaries and the data related to those boundaries. In addition, it gives users the ability to create dynamic visualizations viewable in Google Earth's extensible KML environment, complete with full 3D renderings and animations. The trade-off is that anyone who wants to use the application to generate visualizations will leave their data for public use.
   The software's core functionality is to allow users to import their own Shapefiles as well as CSVs containing data about the geographic areas. Shapefiles are an industry standard GIS format supported by numerous software applications including ArcGIS. This software will convert this information into KML files and Google Earth overlays. While it can display publicly available data sets, it also allows a user to include their own information, thus making it a useful internal analytic tool for private interests as well.
   We will present a case study done with the Brooklyn Public Library that utilizes this tool in the service of a project on urban planning and analysis.

Workshops

MetaArchive/LOCKSS distributed preservation networks BIBAFull-Text 438
  Martin Halbert; Katherine Skinner; Tyler Walters
The Workshop will provide information and training for institutions seeking to build or join LOCKSS-based distributed digital preservation networks. Such Private LOCKSS Networks (PLNs) enable groups of institutions to establish collaborative partnerships to securely preserve collections. Instructors will address the technical implementation as well as important organizational and legal elements of distributed digital preservation. Attendees will gain an understanding of how to produce and manage a private LOCKSS network.
Collaborative information retrieval BIBAFull-Text 440
  Jeremy Pickens; Gene Golovchinsky; Meredith Ringel Morris
The goal of the workshop is to bring together researchers interested in various aspects of small-team collaborative search to share ideas, to stimulate research in the area, and to increase the visibility of this emerging area. We expect to identify promising directions for further exploration and to establish collaborative links among research groups.
Education for digital stewardship: librarians, archivists or curators? BIBAFull-Text 441-442
  Joyce Ray
The large-scale digital repositories that are emerging today and expected to increase exponentially during this century will require information managers with the skills to archive, preserve, and organize massive amounts of data for use and re-use by a variety of interdisciplinary scholarly communities over time. Where will these managers come from, and what skills will they need? This workshop, organized by the US Institute of Museum and Library Services (IMLS) will address these questions through presentations and discussion among educators and other interested participants. IMLS has invested more than $100 million since 2003 in the education of librarians, archivists and data curators through both formal and continuing education programs. This timely funding has enabled graduate schools of library and information science to reshape their curricula to address the emerging need for digital data managers. Are we prepared to meet the challenge?

Posters

Interface effects on digital library credibility judgments BIBAFull-Text 443
  Paul R. Aumer-Ryan
In digital library search engines, "no results found" is a misleading phrase because it masquerades as a definitive answer; in reality, the collection being searched may in fact contain content that matches a user's query. This research examines the effect of null result sets on search behavior and on the perception of contents in digital libraries. In particular, this research supports the hypothesis that interface and design flaws have an effect on the perceived authority and credibility (here defined in terms of being authentic, factual, trustworthy, scholarly, and accurate) of the information being communicated by the interface in question. In short, interface design and the "form" of information (or, alternatively, the messenger) can negatively impact the perception of the quality of the "content" of information (the message).
A ranking and exploration service based on large-scale usage data. BIBAFull-Text 444
  Johan Bollen; Herbert Van de Sompel; Lyudmilla Balakireva; Ryan Chute
This poster presents the architecture and user interface of a prototype service that was designed to allow end-users to explore the structure of science and perform assessments of scholarly impact on the basis of large-scale usage data. The underlying usage data set was constructed by the MESUR project which collected 1 billion usage events from a wide range of publishers, aggregators, and institutional consortia.
Self-arranging preservation networks BIBAFull-Text 445
  Charles L. Cartledge; Michael L. Nelson
We pose the question: what if digital library objects could self-arrange without intervention from repositories and minimal intervention from administrators? We present background information about networks, techniques on how networks can be created based on locally discovered information, and how a small set of controlling policies can profoundly affect network configuration. This poster reflects a work in progress, providing information about the project's genesis, current status and future efforts.
Tagging semantics: investigations with WordNet BIBAFull-Text 446
  Michael J. Cole; Jacek Gwizdka
The content of a tag sequence references both a user's concepts and the user's conceptualization of an information object. The tagging history of 823 users of the Delicious social tagging service is analyzed using WordNet. Three semantic measures of the tagging content are developed: the level of category references, the changes in category level for each noun as the tagging sequence unfolds, and the scope of concept coverage as the compactness of the WordNet subgraph for the noun senses. Observed patterns of concept reference as a function of sequence position hint at dynamic properties of the tag production process by marking a trace of cognitive activity. If tagging is object categorization, these measures provide a view of the personal categorization behavior of non-professionals and illuminate biases in the production of 'folksonomies' due to tag production processes.
Isovera digital library BIBAFull-Text 447
  Cal Collins; Sergey Demidenko; Shakib Mostafa
IsoveraDL is a digital library and peer review system. IsoveraDL allows you to upload and serve your learning resources along with associated record metadata in a very organized and dynamic way.
   IsoveraDL uses a resource publication workflow that models the existing off-line record management workflow in use by different AAAS BEN partners. In addition IsoveraDL also provides a Peer Review module which allows users to create and use dynamic Peer Review workflows. Using IsoveraDL, permitted users can upload records and metadata for Peer Reviewed by others. Metadata records can be validated by a different set of users before they are published.
   The record submission forms used in IsoveraDL for adding resources are highly customizable and administrative users have the option of setting up as many of these forms as required. The Controlled Vocabularies and metadata fields used in IsoveraDL conform to AAAS BEN Learning Object Metadata specification. Administrative users have the ability to edit or add to these vocabularies and fields if needed.
   Optionally, the IsoveraDL Peer Review module may be used in conjunction with IsoveraDL's record submission forms. If a record submission form is set up for Peer Review then all records submitted through the form are Peer Reviewed before they can be validated. Administrative users can create workflows, associated forms and reviewer groups for the Peer Review module. The Peer Review module also has a reporting interface through which administrative users can easily monitor the progress and workload of all records and users associated with the module.
   Once a record is validated and published, users are able to discover resources through IsoveraDL's search and browse functionality. Records published through IsoveraDL can be easily harvested to the AAAS BEN portal through the built-in Harvester module using OAI-PHM.
Museum materials in a digital library context and beyond BIBAFull-Text 448
  Stephen Davison; Elizabeth McAulay; Murtha Baca
This poster will present an overview of metadata mappings needed to support access, collaboration, preservation and aggregation of museum content within and beyond a digital library context.
OER recommender: linking NSDL pathways and OpenCourseWare repositories BIBAFull-Text 449
  Joel Duffin; Brandon Muramatsu
The OER Recommender (www.oerrecommender.org) is a web service that helps people find relevant open educational resources. It links the digital learning resources in the National Science Digital Library (NSDL) disciplinary pathways with courses in OpenCourseWare repositories thereby providing critical contextual information. When a person browses a web page in a participating NSDL Pathway or OpenCourseWare repository, the recommender annotates the page with a "Recommended resources" link. The poster will describe the motivations for the project, provide detail on the recommendation engine, display recommendations for participating collections, and describe how other collections can participate in the project.
The role of the DIKW hierarchy in the design of a digital library system for the scientific data of large-scale evaluation campaigns BIBAFull-Text 450
  Marco Dussin; Nicola Ferro
This paper exploit the DIKW hierarchy as a framework for modelling the scientific data produced during large-scale evaluation campaigns for information retrieval systems in order to design a digital library system able to manage and support the course of such evaluation campaigns.
Translation of on-screen text into visual expressions BIBAFull-Text 451
  Kumiko Fujisawa; Kenro Aihara
To support the user's on-screen reading, we propose a methodology to translate text into visual expressions. Our prototype system analyses the text and translates them into image and adds movement to the image.
Creating a searchable map library via data mining BIBAFull-Text 452
  Judith Gelernter; Michael Lesk
Maps in journal articles are difficult to access since they are rarely indexed apart from the articles themselves. Our prototype of a searchable map library was built by extracting maps and harvesting metadata from scanned articles to classify each map.
The DCC curation lifecycle model BIBAFull-Text 453
  Sarah Higgins
The scientific record and the documentary heritage are increasingly created in digital form. The UK based Digital Curation Centre supports institutions who store, manage and preserve such data to help ensure its enhancement and continuing long-term use.
   The DCC (Digital Curation Centre) Curation Lifecycle Model provides a generic graphical high-level overview of the stages required for successful curation and preservation of digital material from initial conceptualisation. The model can be used to plan curation and preservation activities, to ensure sustainability of repository content or other digital material, within an organisation or consortium. It will help to ensure that all necessary stages are undertaken, each in the correct sequence. The model enables granular functionality to be mapped against it to define roles and responsibilities, and build a framework of standards and technologies to implement. It can help with the process of identifying additional steps which may be required, or actions which are not required by certain situations or disciplines, and of ensuring that processes and policies are adequately documented.
   Digital Curation Centre staff developed the model before undertaking a period of public consultation, which was recently completed. The newly ratified model will shortly be used by the DCC to ensure that information, services and advisory material cover all areas of the lifecycle. Domain-specific variations of the model will be developed, with greater levels of granularity, to help ensure that advice and information are easily accessible from the website. One planned utilisation is the development of domain specific standards frameworks within the DCC DIFFUSE Standards Frameworks, to help practitioners identify which standards they should be using and where they would be appropriately implemented.
   This poster will present the DCC Curation Lifecycle Model, incorporating the results of the public consultation period held during December 2007 to February 2008.
Exploiting log files in video retrieval BIBAFull-Text 454
  Frank Hopfgartner; Thierry Urruty; Robert Villa; Nicholas Gildea; Joemon M. Jose
While research into user-centered text retrieval is based on mature evaluation methodologies, user evaluation in multimedia retrieval is still in its infancy. User evaluations can be expensive and are also often non-repeatable. An alternative way of evaluating such systems is the use of simulations. In this poster, we present an evaluation methodology which is based on exploiting log files recorded from a user-study we conducted.
Building a story tracer out of a web archive BIBAFull-Text 455
  Lian'en Huang; Jonathan J. H. Zhu; Xiaoming Li
There are quite a few web archives around the world, such as Internet Archive and Web InfoMall (http://www.infomall.cn). Nevertheless, we have not seen substantial mechanism built on top of the archives to render the value of the data beyond what the Wayback machine offers. One of the reasons for this situation is the lack of a system vision and design which encompasses the oceanic data in a meaningful and cost-effective way. This paper describes an effort in this direction.
CASPAR: Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval BIBAFull-Text 456
  David Lamb; Daniel Lucchesi
CASPAR (Cultural, Artistic, and Scientific knowledge for Preservation, Access, and Retrieval) aims at providing secure, reliable and cost-effective access for digitally encoded information for an indefinite time frame.
Developing a review process for online resources BIBAFull-Text 457
  Sarah Giersch; Heather Leary; Bart Palmer; Mimi Recker
The democratization of content creation via ubiquitous Internet tools and infrastructure [1] has fueled an explosion of user-generated content in the commercial and educational markets. Indeed, funding agencies such as the National Science Foundation (NSF) are actively seeking ways to integrate teachers and learners into the education cyber-infrastructure, whereby they become co-creators of educational content [2].
   The ease with which this content, often in the form of online learning resources of varying levels of granularity, can be created and disseminated places it outside the usual peer review processes employed by publishers and professional societies. To date, digital library (DL) developers, teachers and school administrators, concerned whether teachers are using peer-reviewed online learning resources, have depended on one or a combination of the following proxies to establish an imprimatur of quality: the reputation and oversight of a funding organization (e.g., NSF's peer review process), the credentials of the content creator (e.g., National Science Teachers Association) or the collection development policies of specific DLs (e.g., DLESE).
   Now more than ever, though, sites such as YouTube, Flickr and ccMixter and the evolving education cyber-infrastructure, have created an environment where user-generated content is beyond the reach of even these proxy review processes. However, in the omnipresent climate of accountability within K12 education at U.S. federal, state and local levels, education DLs are being challenged to identify the value: of the resources they hold and services they provide to users; and, of what their users create with those resources. For all of these reasons, it is useful, and necessary, to develop a standardized rubric and process to review online education resources. In particular, this work should leverage social and technical networks to enrich, facilitate, and automate the review process.
   The Digital Libraries go to School project was funded by NSF in 2006 to develop a professional development workshop curriculum that enables teachers to use the Instructional Architect (IA; http://ia.usu.edu) to design their own learning activities for classrooms using online STEM resources from the National Science Digital Library (NSDL.org) and the wider Web. One component of the project is to examine the criteria and approaches for reviewing the quality of teacher-created online learning resources in order to develop a rubric and workflow process.
   Work to date includes conducting focus groups and surveys with teachers and a 5-person Expert Review Committee, complemented by a literature review to identify elements for a review rubric incorporating the work of other education DLs (e.g., DLESE, MERLOT, NEEDS, among others). Findings are being synthesized, and based on analysis, a draft list of elements has been identified for further testing in Spring 2008. At the same time, a workflow process for conducting reviews with teacher-created resources will be piloted. It will combine human-generated reviews with machine-generated information about online resources (e.g., image and word count; educational standards alignment; currency of updates, provenance) [3]. Further work will identify areas for improving the review rubric and scaling and standardizing the workflow process for Fall 2008. We will also evaluate the usefulness of the reviews to teachers, and to stakeholders such as the IA, NSDL, NSF and other DLs, in providing access to high-quality online content.
Building federation of digital libraries basing on concept of atomic services BIBAFull-Text 458
  Cezary Mazurek; Tomasz Parkola; Marcin Werla
Our poster presents recent results from an ongoing research project entitled "Mechanisms of atomic services for distributed digital libraries".
Early returns on an institutional repository: an exploration of the validity and functionality of PocketKnowledge BIBAFull-Text 459
  Marcelle Mentor; Eric Strome; Stephen Asunka; Giovannina Agnitti; Gary Natriello
This poster presentation will reflect qualitatively on the challenges and opportunities of PocketKnowledge (http://pocketknowledge.tc.columbia.edu/home.php) as an institutional repository.
   The main question explored is whether such repositories are best understood as new competitors in the academic publishing world or as on-going documentations of the larger intellectual life of the institution. The early experience of one such repository, PocketKnowledge -- a social archive developed and implemented by EdLab, a research unit of the Gottesman Libraries, Teachers College Columbia University -- reveals that users are motivated not only to participate in an institutional repository, but also to document their intellectual life understood more broadly than publication. This suggests that the latter understanding of an institutional repository is a more reasonable, incremental expectation in what is surely an uphill battle against the long-established prestige of publishing in printed academic journals, and that institutional repositories such as PocketKnowledge should consider the strategic addition of functionalities that can highlight the intellectual life of the institution, not simply the intellectual production of its members.
Harvesting needed to maintain scientific literature online BIBAFull-Text 460
  Nikolay Nikolov; Peter Stoehr
Millions of scientific articles are accessible freely on the web. While some of them are stored in institutional repositories many are made available on personal pages which are exposed to the net's transience. We found that nearly 11% of URLs of PDF documents containing references to life science publications were not accessible within 5 months after being harvested using a search engine's (SE) API. For most of them (8.4%) no SE cache backup could be found. Although we have yet to estimate the exact rate at which the scientific literature disappears and the duration of its disappearance the results so far are a clear indicator that web harvesting is needed to preserve the online scientific literature.
Modeling Korean clinical records as a simple temporal constraint satisfaction problem BIBAFull-Text 461
  Heekyong Park; Jinwook Choi
Temporal information is especially crucial in medical text processing. The simple temporal constraint satisfaction problem (STP) has been evaluated as sufficient to represent most English clinical temporal assertions. We aimed to test expressive power of STP in representing Korean clinical documents and to find out any encoding issues dependent on Korean language. This paper shows that STP is sufficient. Some distinctive characteristics were found but they did not affect the encoding work.
Evaluation of a curriculum for digital libraries BIBKFull-Text 462
  Jeffrey Pomerantz; Barbara M. Wildemuth; Sanghee Oh; Seungwon Yang; Edward A. Fox
Keywords: computer science, curriculum development, digital libraries, education, evaluation, library and information science
Considering users and their opinions in knowledge management systems BIBAFull-Text 463
  Katharina Probst; Kelly Dempski
We describe a Knowledge Management System that shifts the focus from the traditional document-centric to a user-centric view. It takes into account users' query and download behavior, opinions, reputations, and social connections.
The return of the trivial: problems formalizing collection/item metadata relationships BIBAFull-Text 464
  Allen H. Renear; Karen M. Wickett; Richard J. Urban; David Dubin
Formalizing collection/item metadata relationships encounters the problem of trivial satisfaction. We offer a solution related to current work in IR and ontology evaluation.
The DSpace repository: can multiple institutions live in one space? a different approach BIBAFull-Text 465
  Christina Richison
In late 2006, NITLE launched a pilot program for managed institutional repository services designed for smaller colleges and universities and the non-profit organizations that serve them. Twenty-six participating institutions helped pioneer this pilot effort, which laid the foundation for the development of production-level DSpace Services. These DSpace Services allowed campuses to start and grow their digital repositories with a minimal level of investment, with no hardware to purchase and very little application support expertise to develop. Campuses were able to focus on the work of building digital repositories within the context of a community of campuses sharing ideas and best practices. Lessons learned and next steps will be discussed.
A unique insight into Department of Energy research accomplishments: a special collection BIBAFull-Text 466
  Mary V. Schorn
This poster describes online access to a unique collection of Department of Energy (DOE) Research and Development (R&D) accomplishments. The collection features research of DOE and its predecessor agencies, the Energy Research and Development Administration (ERDA) and the Atomic Energy Commission (AEC).
   This special collection contains historically significant government documents, including items from the Manhattan Project era, that have been specially selected and digitized to make them accessible via the Web. Landmark documents in the collection include The Eightfold Way: A Theory of Strong Interaction Symmetry (authored by Nobel prize winner Murray Gell-Mann) and The First Weighing of Plutonium (authored by Nobel prize winner Glenn Seaborg).
   In addition to a database of approximately 250 specially-selected documents, all related aspects of the collection (documents, research areas, and/or Nobel Laureate information) are combined in Feature Topic pages for the added value of a single point of access to each compilation. Over sixty (60) Feature Topic pages include diverse topics such as "Video Games -- Did They Begin at Brookhaven?" and "Human Genome Research: Decoding DNA".
   This collection features a large number of DOE-associated Nobel Laureates, including Enrico Fermi, winner of the 1938 Nobel prize in physics, and George Smoot, winner of the 2006 Nobel prize in physics, and showcases a diversity in DOE research areas, including solar energy (with related educational materials) and Radioisotope Thermoelectric Generators (RTGs) that are used to power spacecraft.
   Easy access to this unique collection is provided via DOE R&D Accomplishments at http://www.osti.gov/accomplishments. This special collection is continually growing, with additional Nobel Laureate and/or research topic documents and features being added on a regular basis.
The NCSU catalog research testbed: a tool for evaluating faceted library catalog interfaces BIBAFull-Text 467
  Tito Sierra; Joseph Ryan; Jason Casden
Many libraries have recently been devoting significant efforts to modernizing their library catalog web interfaces. One popular approach is to create a faceted search interface to the library catalog. Faceted search interfaces can be complex, requiring designers to make many decisions about the placement and display of faceted search design elements on the page. Unfortunately little empirical research exists on how to optimize faceted search interfaces for library catalogs. To facilitate research in this area, the NCSU Libraries has developed the NCSU Catalog Research Testbed, a tool for developing and evaluating faceted library catalog interfaces.
Computer classification system usage in CiteSeer BIBAFull-Text 468
  Mirco Speretta; Susan Gauch; Praveen Lakkaraju
The ACM society for computing and professionals provides a digital library whose Computer Classification System (CCS) is based on a taxonomy that has been continuously updated over the years. The CiteSeer digital library contains a large collection of computer science research papers, many of which are tagged with categories from the CCS taxonomy. By analyzing CiteSeer's tagged documents and by considering different time frames, we extracted statistics that shows how the CCS taxonomy covers the publications in computer and information science research sub-fields. We also studied size and growth of categories over the last four available years. We believe that the identification of such trends within taxonomies would greatly help to improve the structure of classification systems and would help the construction of more efficient browsing and searching systems.
A workbench for information quality evaluation BIBAFull-Text 469
  Besiki Stvilia
This paper describes the architecture of an Information Quality Evaluation Workbench for rapid design and operationalization of information quality assessment models.
Automatic extraction of morphological information from botanical collections BIBAFull-Text 470
  Xiaoya Tang
Specific morphological information is often used by users to search botanical collections. However, traditional systems based on statistical models are often not effective for such search. This study automatically extracts morphological information from botanical collections using an adapted and enhanced information extraction system. Experimental results indicate this approach is promising. This study also indicates that this approach is generalizable to similar collections in the same domain and even to different domains with adaptation of the pattern recognition and knowledge base in the new domain.
Releasing the power of digital metadata: examining large networks of co-related publications BIBAFull-Text 471
  David Tarrant; Les Carr; Terry Payne
Bibliographic metadata plays a key role in scientific literature, not only to summarise and establish the facts of the publication record, but also to track citations between publications and hence to establish the impact of individual articles within the literature. Commercial secondary publishers have typically taken on the role of rekeying, mining and analysing this huge corpus of linked data, but as the primary literature has moved to the world of the digital repository, this task is now undertaken by new services such as CiteSeer, Citebase or Google Scholar. As institutional and subject-based repositories proliferate and Open Access mandates increase, more of the literature will become openly available in well managed data islands containing a much greater amount of detailed bibliometric metadata in formats such as RDF. Through the use of efficient extraction and inference techniques, complex relations between data items can be established. In this paper we explain the importance of the co-relation in enabling new techniques to rate the impact of a paper or author within a large corpus of publications.
A cluster-based simulation of facet-based search BIBAFull-Text 472
  Thierry Urruty; Frank Hopfgartner; Robert Villa; Nicholas Gildea; Joemon M. Jose
The recent increase of online video has challenged the research in the field of video information retrieval. Video search engines are becoming more and more interactive, helping the user to easily find what he or she is looking for. In this poster, we present a new approach of using an iterative clustering algorithm on text and visual features to simulate users creating new facets in a facet-based interface. Our experimental results prove the usefulness of such an approach.
Integrating DDI metadata into the NARA transcontinental persistent archive prototype via the OAI-PMH BIBAFull-Text 473
  Jewel H. Ward; Jonathan Crabtree
The H.W. Odum Institute for Research in Social Science (Odum), the Renaissance Computing Institute (RENCI), and the School of Information and Library Science (SILS), all part of the University of North Carolina at Chapel Hill (UNC-CH), are collaborating with the San Diego Supercomputer Center (SDSC) on an extension of the National Archives and Records Administration's (NARA) transcontinental persistent archive prototype (TPAP) data grid with the new integrated Rule Oriented Data System (iRODS). The goal of the project is to enable collection interoperability between UNC-CH and SDSC using an iRODS environment. This poster presents the results of one part of that project, which is the development of a crosswalk between the Odum Institute Data Archive (OIDA) Data Document Initiative (DDI) metadata and the NARA TPAP iRODS metadata catalogue (iCAT) via the OAI-PMH.
The working scientist and the realities of data curation: a qualitative study addressing attitudes and needs BIBAFull-Text 474
  Megan Winget
This poster describes a nascent ethnographic study to examine working scientists' attitudes and perform a needs assessment regarding data collection, representation, and dissemination in terms of cyberinfrastructure initiatives.
Knowledge representation from information extraction BIBKFull-Text 475
  Tan Xu; Douglas W. Oard; Tamer Elsayed; Asad Sayeed
Keywords: information extraction, knowledge representation
A formal ontology for temporal entities and its application in knowledge extraction BIBAFull-Text 476
  Chunxia Zhang; Guiping Wang; Zhendong Niu
This poster will present a formal ontology of temporal entities for knowledge sharing and interoperability. This ontology captures the semantic intensions, attributes and properties of temporal entities and their relationships. And it has been applied into temporal knowledge acquisition from un-annotated Chinese texts.