| Interoperability Issues in Digital Libraries | | BIB | PDF | 1 | |
| Barry M. Leiner | |||
| Building a Digital Library: The Perseus Project as a Case Study in the Humanities | | BIBA | PDF | 3-10 | |
| Gregory Crane | |||
| This paper outlines some of our preliminary findings in the Perseus Project, an on-going digital library on ancient Greek culture that has been under development since 1987. | |||
| Towards the Digital Music Library: Tune Retrieval from Acoustic Input | | BIBAK | PDF | 11-18 | |
| Rodger J. McNab; Lloyd A. Smith; Ian H. Witten; Clare L. Henderson; Sally Jo Cunningham | |||
| Music is traditionally retrieved by title, composer or subject
classification. It is possible, with current technology, to retrieve music
from a database on the basis of a few notes sung or hummed into a microphone.
This paper describes the implementation of such a system, and discusses several
issues pertaining to music retrieval. We first describe an interface that
transcribes acoustic input into standard music notation. We then analyze
string matching requirements for ranked retrieval of music and present the
results of an experiment which tests how accurately people sing well known
melodies. The performance of several string matching criteria are analyzed
using two folk song databases. Finally, we describe a prototype system which
has been developed for retrieval of tunes from acoustic input. Keywords: Music retrieval, Melody recall, Acoustic interfaces, Relevance ranking | |||
| VISION: A Digital Video Library | | BIBAK | PDF | 19-27 | |
| Wei Li; Susan Gauch; John Gauch; Kok Meng Pua | |||
| The goal of the VISION (Video Indexing for Searching Over Networks) project
is to establish a comprehensive, online digital video library. We are
developing automatic mechanisms to populate the library and provide
content-based search and retrieval over computer networks. The salient feature
of our approach is the integrated application of mature image or video
processing, information retrieval, speech feature extraction and word-spotting
technologies for efficient creation and exploration of the library materials.
First, full-motion video is captured in real-time with flexible qualities to
meet the requirements of library patrons connected via a wide range of network
bandwidths. Then, the videos are automatically segmented into a number of
logically meaningful video clips by our novel two-step algorithm based on video
and audio contents. A closed caption decoder and/or word-spotter is being
incorporated into the system to extract textual information to index the video
clips by their contents. Finally, all information is stored in a full-text
information retrieval system for content-based exploration of the library over
networks of varying bandwidths. Keywords: Digital libraries, Content-based indexing and retrieving, Video and audio
processing | |||
| The Role of Intermediary Services in Emerging Digital Libraries | | BIBA | PDF | 29-35 | |
| Allen Brewer; Wei Ding; Karla Hahn; Anita Komlodi | |||
| The conception of a library has evolved over the past 200 years from a place
that houses a collection of information resources to a process of facilitating
knowledge transfer from source to user. The facilitator role of the library
encompasses the concept of a change agent, where the library acts as a
proactive participant in the diffusion of appropriate knowledge to users.
Today's libraries aim to provide not only access to and delivery of
information, but have increasingly incorporated proactive services aimed at
assisting in the interpretation and application of information to fulfill user
information requirements.
According to one definition of the digital library (DL) it can be defined as applications based on the hypermedia paradigm [1]. The conception of a DL was initially confined to collections of digital information [16] but others [8, 11] have argued for broader conceptions of DLs. In defining the role of a DL it is essential to incorporate the concept of proactive intermediation and value added services so that the DL is not limited to passive warehousing of navigable information. Value added services may be sourced from any number of suppliers creating potential complexities in interoperability between and among suppliers and services. We propose that intermediation is an essential functional role in a DL whose purposes include: (1) interaction with potential beneficiaries, (2) interaction with information resources, and (3) mediation between information resources and users to add value during the information transfer process. Information beneficiaries include users, organizations, repositories, software products, software agents, or any entity acting as an information seeking agent which can benefit from the acquisition of information, including another DL. The DL includes information assets used in the delivery of services but is not limited to the construction and access of information resources in the form of collections, corpora, databases, web resources, and repositories of reusable program and information objects. Value can be added during the mediation process via searching, categorization, filtering, translation, publishing, or some combination of these activities. By eliminating unnecessary constraints upon the types of entities which can benefit from a DL, this definition includes, in addition to human users, automated beneficiaries such as CASE workbenches, instrumentation platforms, and robots. Incorporating autonomous and semi-autonomous knowledge agents as both suppliers and users in the DL definition provides an opportunity to integrate computer, communication, information, and knowledge assets into a more unified system for information resource management to support the evolution of an information based economy. Where the characteristics of information drive DL collection decisions, the DL could be seen as product oriented. Services provided by product-oriented DLs include traditional information retrieval (IR) services for retrospective queries. Where customer information requirements form the basis for collection development decisions the DL may be seen as customer oriented. Services provided by customer oriented DLs may include proactive services such as selective dissemination of information (SDI) [27] or may provide real-time routing of information to customers. In a customer oriented DL the collection profile may be constructed by combining the user profiles for all users which the DL is intended to serve. To further explicate the sense of the DL as intermediary, the five value added functions (search, classification, filtering, translation, and publishing) are further explored in terms of their roles in a DL. Real world implementations, including many discussed below, often include multiple intermediary functions within the scope of services offered. | |||
| Toward the Bibliographic Control of Works: Derivative Bibliographic Relationships in an Online Union Catalog | | BIBA | PDF | 36-43 | |
| Gregory H. Leazer; Richard P. Smiraglia | |||
| The digital library will require a bibliographic retrieval tool that controls recorded knowledge regardless of its material form. A conceptual model for such a catalog is described. Foremost, this catalog will include information on derivative bibliographic relationships -- those relationships that exist among the individual members of a bibliographic family. In order to understand the problem of derivative bibliographic relationships, we conducted a study intended to build on our understanding of the nature of bibliographic works and the breadth of bibliographic families. The specific objectives of this research are to test the model for the control of bibliographic families, and measure the frequency and extent of the derivative relationship in OCLC's online union catalog. It appears from this cursory examination of the data, that although there were fewer large bibliographic families than expected, the characteristics of bibliographic families were as Smiraglia had predicted. Furthermore, Leazer's conceptual design appears to be an accurate model for the control of bibliographic families. | |||
| Graphical Table of Contents | | BIBA | PDF | 45-53 | |
| Xia Lin | |||
| This paper proposes a graphical table of contents (GTOC) that is functionally analogous to the table of contents. The proposed GTOC can be generated automatically from the text of documents. It visualizes document contents and relationships to allow easy access of underlying documents. It also provides various interactive tools to let the user explore the documents. Issues of how to generate such GTOC include how documents are indexed and organized, how the organized documents are visualized, and what interactive means are needed to provide necessary functionality of GTOC. These issues are discussed in this paper with a GTOC prototype based on Kohonen's self-organizing feature map algorithm. | |||
| Visual Relevance Analysis | | BIBA | PDF | 54-62 | |
| Mountaz Hascoet-Zizi; Nikos Pediotakis | |||
| In order to access relevant information in digital libraries, most traditional systems feature topic search. In this paper we present visual relevance analysis to extend the notion of topic search by relying on visualization and interaction techniques to help users rapidly browse through potentially relevant documents. Visual relevance analysis offers a better repartition of control between the user and the system for topic search. The interaction paradigm uses a library metaphor, implemented through a classification system. In this paper we first present how a classification system is built to serve the visualization purposes. We further discuss presentation and interaction strategies for visual relevance analysis followed by implementation issues and system overview. Finally we briefly review related work and compare it with our approach. | |||
| A Browsing Tool of Multi-Lingual Documents for Users without Multi-Lingual Fonts | | BIBA | PDF | 63-71 | |
| Tetsuo Sakaguchi; Akira Maeda; Takehisa Fujita; Shigeo Sugimoto; Koichi Tabata | |||
| Since a library is inherently multi-lingual, a multi-lingual document environment is crucial for a digital library. In the near future, worldwide information sharing through digital libraries will be common. Currently, multi-lingual documents are poorly facilitated on computers and the Internet. It is impractical to consider installing fonts for all character sets in every user's terminal. This paper presents a multi-lingual document browsing tool for a user with no multi-lingual fonts on his or her terminal. It discusses several methods for browsing multi-lingual documents and proposes a browser which sends a text string with the font glyphs required to display the text. It also gives the evaluation result of the browser. | |||
| How Will We Know When It Is a Library? | | BIB | PDF | 72 | |
| Ann S. Okerson | |||
| User Controlled Overviews of an Image Library: A Case Study of the Visible Human | | BIBAK | PDF | 74-82 | |
| Chris North; Ben Shneiderman; Catherine Plaisant | |||
| This paper proposes a user interface for remote access of the National
Library of Medicine's Visible Human digital image library. Users can visualize
the library, browse contents, locate data of interest, and retrieve desired
images. The interface presents a pair of tightly coupled views into the
library data. The overview image provides a global view of the overall search
space, and the preview image provides details about high resolution images
available for retrieval. To explore, the user sweeps the views through the
search space and receives smooth, rapid, visual feedback of contents. Desired
images are automatically downloaded over the internet from the library.
Library contents are indexed by meta-data consisting of automatically generated
miniature visuals. The interface software is completely functional and freely
available for public use, at: http://www.nlm.nih.gov/ Keywords: Browsing, Digital library, Image database, Information exploration,
Information retrieval, Internet, Medical image, Remote access, User interface,
Visualization, World-Wide Web | |||
| A Spatial Approach to Organizing and Locating Digital Libraries and Their Content | | BIBAK | PDF | 83-89 | |
| Jason Orendorf; Charles Kacmar | |||
| Explosive growth of world-wide web (WWW) sites combined with the lack of an
overall and consistent organizational structure is making it increasingly
difficult for researchers and users to locate relevant materials. This paper
proposes a spatial method of structuring digital libraries and their content in
which users navigate geographically to locate and access information. A
prototype based on a spatial methodology was implemented to further study this
organizational structure. The system, SDLS, is a hypermedia-based digital
library browser, authoring system, and document viewer in which users navigate
using geographical (map) displays to locate and retrieve information. This
method of access provides a natural means of information retrieval for
geographically-based repositories and reference materials. Keywords: Spatial, Geographic, Image, Map, Graphical, Digital library | |||
| Index Structures for Structured Documents | | BIBA | PDF | 91-99 | |
| Yong Kyu Lee; Seong-Joon Yoo; Kyoungro Yoon; P. Bruce Berra | |||
| Much research has been carried out in order to manage structured documents such as SGML documents and to provide powerful query facilities which exploit document structures as well as document contents. In order to perform structure queries efficiently in a structured document management system, an index structure which supports fast document element access must be provided. However, there has been little research on the index structures for structured documents. In this paper, we propose various kinds of new inverted indexing schemes and signature file schemes for efficient structure query processing. We evaluate the storage requirements and disk access times of our schemes and present the analytical and experimental results. | |||
| Toward Active, Extensible, Networked Documents: Multivalent Architecture and Applications | | BIBA | PDF | 100-108 | |
| Thomas A. Phelps; Robert Wilensky | |||
| Rich varieties of online digital documents are possible, documents which do
not merely imitate the capabilities of other media. A true digital document
provides an interface to potentially complex content. Since this content is
infinitely varied and specialized, we must provide means to interact with it in
arbitrarily specialized ways. Furthermore, since relevant content may be found
in distinct documents, we must draw from multiple sources, yet provide a
coherent presentation to the user. Finally, it is essential to be able to
conveniently author new content, define new means of manipulation, and
seamlessly mesh both with existing materials.
We present a new general paradigm that regards documents with complex content as "multivalent documents", comprising multiple "layers" of distinct but intimately related content. Small, dynamically-loaded program objects, or "behaviors", activate the content and work in concert with each other and layers of content to support arbitrarily specialized document types. Behaviors bind together the disparate pieces of a multivalent document to present the user with a single unified conceptual document. As implemented in Java in the context of the World Wide Web, multivalent documents in effect create a customizable virtual Web, drawing together diverse content and functionality into coherent document-based interfaces to content. Examples of the diverse functionality in multivalent documents include: "OCR select and paste", where the user describes a geometric region on the scanned image of a printed page and the corresponding text characters are copied out; video subtitling, which aligns a video clip with the script and language translations so that, e.g., the playing video can be presented simultaneously in multiple languages, and the video can be searched with text-based techniques; geographic information system (GIS) visualizations that compose several types of data from multiple datasets; and distributed user annotations that augment and may transform the content of other documents. In general, a document management infrastructure built around a multivalent perspective can provide an extensible, networked system that supports incremental addition of content, incremental addition of interaction with the user and with other components, reuse of content across behaviors, reuse of behaviors across types of documents, and efficient use of network bandwidth. Multivalent documents exploit digital technology to enable new, more sophisticated document interaction. | |||
| Physical Objects in the Digital Library | | BIBA | PDF | 109-115 | |
| Richard Furuta; Catherine C. Marshall; Frank M., III Shipman; John J. Leggett | |||
| Physical objects are the foundation for many of today's areas of scholarship, research, and education. Because physical objects are tangible, any digital representation of one is an approximation of the object. Knowing how to approximate requires an understanding of the work practices and needs of the library's constituencies. We consider issues arising from the creation of digital libraries based on physical objects, focusing particularly on the characteristics of botanical herbaria and their users. | |||
| Natural Language Information Retrieval in Digital Libraries | | BIBA | PDF | 117-125 | |
| Tomek Strzalkowski; Jose Perez-Carballo; Mihnea Marinescu | |||
| In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications's Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched. | |||
| Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-Occurrence Lists for Information Retrieval | | BIBA | PDF | 126-133 | |
| Bruce R. Schatz; Eric H. Johnson; Pauline A. Cochrane; Hsinchun Chen | |||
| The basic problem in information retrieval is that large-scale searches can
only match terms specified by the user to terms appearing in documents in the
digital library collection. Intermediate sources that support term suggestion
can thus enhance retrieval by providing alternative search terms for the user.
Term suggestion increases the recall, while interaction enables the user to
attempt to not decrease the precision.
We are building a prototype user interface that will become the Web interface for the University of Illinois Digital Library Initiative (DLI) testbed. It supports the principle of multiple views, where different kinds of term suggestors can be used to complement search and each other. This paper discusses its operation with two complementary term suggestors, subject thesauri and co-occurrence lists, and compares their utility. Thesauri are generated by human indexers and place selected terms in a subject hierarchy. Co-occurrence lists are generated by computer and place all terms in frequency order of occurrence together. This paper concludes with a discussion of how multiple views can help provide good quality Search for the Net. This is a paper about the design of a retrieval system prototype that allows users to simultaneously combine terms offered by different suggestion techniques, not about comparing the merits of each in a systematic and controlled way. It offers no experimental results. | |||
| Information Product Evaluation as Asynchronous Communication in Context: A Model for Organizational Research | | BIBA | PDF | 134-142 | |
| Lisa D. Murphy | |||
| Knowledge workers are routinely engaged in information search and retrieval (ISR) tasks where they make evaluations of complex information products such as electronic documents or multi-media items. Information Systems (IS) organizations in business support the creation of these complex information products as well as providing tools and support for their acquisition and use. Some ISR assumptions, such as an information need exists independently of the ability of the repository to satisfy it, or an information need can be specified by objective terms, can be problematic for knowledge workers. An alternative approach considers information products as elements of an asynchronous communication; it explicitly considers evaluation after retrieval and the types of support provided by IS groups. General propositions about the task and context of information product evaluation are proposed and used to develop a new model (Information Product Evaluation Model) incorporating aspects of the user's context, meta-information availability, and accessibility. | |||
| Text to Hypertext: Can Clustering Solve the Problem in Digital Libraries? | | BIBA | PDF | 144-150 | |
| Robert B. Kellogg; Madhan Subhas | |||
| Automatic hypertext generation remains an extremely challenging endeavor in
the digital library world. In this paper we present a solution for
automatically connecting relevant information in dynamic textual digital
libraries. This textual information is generally unconnected and often
unexplored due to the large flow of information entering from remote and local
sources. Often, full-text indexes exist for this information but embedded
links to related information are conspicuously absent. Links that do exist are
usually generated in an arduous and time-consuming manual process. That is why
the ability to automatically generate links has a potentially high payoff.
Our solution for the automatic generation of hypertext links relies on the techniques of document segmentation and document clustering. Hypertext links are automatically generated during the document clustering process using the incremental cover-coefficient-based clustering algorithm. The issues of link completeness and link quality are also addressed in this paper. Link completeness is studied by comparing the cluster-based approach of link generation to the exhaustive link generation approach. Results indicate that links are more complete in the higher similarity range than in the lower similarity range. Initial link quality user studies indicate that the cluster-based hypertext link generation approach is promising. In the future, we plan to conduct further studies on link quality and investigate ways to increase the effectiveness of our approach. | |||
| Indexing Handwriting Using Word Matching | | BIBA | PDF | 151-159 | |
| R. Manmatha; Chengfeng Han; E. M. Riseman; W. B. Croft | |||
| There are many historical manuscripts written in a single hand which it
would be useful to index. Examples include the W. B. DuBois collection at the
University of Massachusetts and the early Presidential libraries at the Library
of Congress. The standard technique for indexing documents is to scan them in,
convert them to machine readable form (ASCII) using Optical Character
Recognition (OCR) and then index them using a text retrieval engine. However,
OCR does not work well on handwriting. Here an alternative scheme is proposed
for indexing such texts. Each page of the document is segmented into words.
The images of the words are then matched against each other to create
equivalence classes (each equivalence classes contains multiple instances of
the same word). The user then provides ASCII equivalents for say the top 2000
equivalence classes.
The current paper deals with the matching aspects of this process. Due to variations in even a single person's handwriting, it is expected that the matching will be the most difficult step in the whole process. A matching technique based on Euclidean distance mapping is discussed. Experiments are shown demonstrating the feasibility of the approach. | |||
| Building a Scalable and Accurate Copy Detection Mechanism | | BIBA | PDF | 160-168 | |
| Narayanan Shivakumar; Hector Garcia-Molina | |||
| Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles. | |||
| Metadata to Describe Information in Digital Libraries | | BIBA | PDF | 170 | |
| Terence R. Smith | |||
| The session will include presentations and discussion of the issues in using
metadata to describe information in digital libraries. One example, will be
the role of metadata within the Alexandria Digital Library project at UC Santa
Barbara.
If sense is to be made of the flood of information that will be available through digital libraries, it must be described effectively, so that it can be found, its value assessed, and its acquisition handled efficiently. Metadata is the term most often used to refer to the description of information objects to support these three functions of digital libraries. Digital library technology is capable of both supporting major augmentations to traditional metadata activities and providing a basis for catalog interoperability. Important issues relate to the choice of languages for representing concepts and conceptual structures. Metadata for spatially-referenced information, for example, may follow a "bottom-up" approach that is an extension of traditional library practices or a "top-down" approaches involving more general knowledge representation languages. Background materials for this session are at: http://www.dlib.org/metadata.html | |||
| User Needs Assessment and Evaluation | | BIBA | PDF | 170 | |
| Nancy A. Van House; David Levy | |||
| A critical issue in digital library (DL) design is incorporating user needs
early in the design process and continuing throughout. The user needs and
assessment groups of the DLI projects are working to improve DL design by
incorporating user needs and preferences. They are working to develop data
collection and analysis methods for DLs, understand DL user behavior, assess
user needs, evaluate the emerging DLs against user needs, compare findings
across projects, understand how this information can be efficiently and
effectively incorporated in design, and build a research agenda.
This working session will consist of a panel representing both the user needs assessment and evaluation group and designers from several of the DLI projects. The emphasis will be on the interaction between the design process and the needs assessment and evaluation effort. It will address such issues as the interconnected and sometimes-conflicting needs of designers, evaluators, and users; coordinating evaluation and design approaches; and impediments to and supports for this interaction. Background materials for this session are at: http://www.dlib.org/user-needs.html | |||
| Social Aspects of Digital Libraries | | BIBA | PDF | 170-171 | |
| Christine L. Borgman | |||
| In February 1996, UCLA and the National Science Foundation are holding a
workshop on social aspects of digital libraries. This working session will
present an outline of the issues raised at the workshop, and invite audience
reaction and discussion. The research workshop plans to focus on the following
topics:
* Information needs: (a) Social context and culture -- to what extent can
digital library components be generalized and to what extent must they be tailored to each environment? (b) Information needs and information seeking -- what is the relationship between information seeking and learning in digital libraries? (c) Linking user-learner needs and behavior to digital library design -- what design techniques are appropriate in applying user needs research to digital library design? * End user searching and filtering: (a) Organization, description and representation of information -- which methods of organization can be generalized for digital libraries? What new methods are needed? (b) Search capabilities for users -- how, if at all, should problem domain areas be divided? (c) Interface design for information retrieval -- what human-computer interaction principles can be applied to the information retrieval environment? Background materials for this session are at: http://www.dlib.org/social.html | |||
| Repository Interactions | | BIBA | PDF | 171 | |
| William L. Scherlis | |||
| This working session will be based around the report of a D-Lib workshop in
March 1996 on interfaces between digital library repositories. The focus is on
technical issues, but they are closely linked to legal, technical, social,
economic and political questions.
The working group focuses on technical issues associated with repository interoperation. As digital libraries proliferate, many approaches to managing digital assets and associated meta-data are emerging. There are important differences among these approaches, and these differences have technical, legal, social, economic, and political dimensions. How can multiple repositories coexist and interact effectively? The working group is motivated by several important trends: The complexity and semantic richness of objects and meta-data managed by repositories is increasing. Information objects of greater value are now being managed more routinely, raising issues of security, access control, and support for commerce. Performance demands are increasing, as is the quantity and size of information objects, particularly in multimedia applications. Digital libraries are interacting more often with personal, group, and wide area information services. Finally, the distinction is blurring between digital libraries and other institutional information resources such as databases and corporate webs. The starting points for the working group are technologies that support management of information objects, their names, and associated meta-data -- databases, distributed file systems, object bases, and the Web. Several digital library research groups have started to develop concepts that could provide a basis for repository interoperation, including the CR-TR architectural work of Kahn and Wilensky, the Stanford Infobus project of Garcia-Molina and Winograd, and the agent architecture of the Michigan DLI project. In addition to the need to reconcile these various approaches, there is a broader need to put these in the context of standards efforts in the wider community, including Corba, OLE, Web-associated standards, Z39.50, and SQL and its successors. All of these deal with resolving names to objects, and all of these deal in some measure with meta-data. The initial effort of the working group is (1) to identify the dimensions of the space of repository interaction and interoperability, and the issues associated with achieving some transparency for users of the digital libraries, and (2) to assess current research and development efforts to understand the differences among them. Background materials for this session are at: http://www.dlib.org/repositories.html | |||
| Digitization and Conversion | | BIBA | PDF | 171-172 | |
| M. Stuart Lynn | |||
| The objective of the working group on digitization and conversion is to
share experience, evolve a code of practice, and encourage sharing of resources
amongst those who manage large projects to convert library materials to digital
forms. This working session will include brief presentations and discussion of
the major issues. The session will cover technical, library and operational
topics.
This topic is relevant to almost every aspect of digital library research. For example, sharing converted library objects requires technical work on formats and repository interactions. It also provokes the need for comprehensive metadata to describe converted items, suitable naming methods for identification and access, and a method for describing and protecting intellectual property. Research on the digital archiving of converted digital objects also applies to ensuring the longevity of digital objects in general. Background materials for this session are at: http://www.dlib.org/conversion.html | |||
| Naming Objects in the Digital Library | | BIBA | PDF | 172 | |
| William Y. Arms | |||
| This working session will include presentations and discussions on naming
issues in digital libraries. It will include reports on recent progress in the
development of Uniform Resource Names (URNs), but the emphasis will be on three
topics that concern how names are used.
* User issues. Groups that assign names to library objects wish to provide
long term flexibility while integrating existing naming schemes. They need ways to relate names of library objects to semantic concepts such as uniqueness, mutability, etc. * Name management. If names are to be globally unique and persist for long periods of time, the allocation of top-level names and the registration of naming schemes must be managed with care. * Aggregation and granularity. A crucial design decision in a digital library is how to assign names to parts of a work, variants, and other complex and compound items. Background materials for this session are at: http://www.dlib.org/naming.html | |||
| Agricultural Network Information Center (AgNIC) A Model for Access to Distributed Resources | | BIB | PDF | 174 | |
| Richard E. Thompson | |||
| Knowledge-Based Biomedical Information Retrieval | | BIB | PDF | 175 | |
| Alexa T. McCray | |||
| SEPTEMBER -- Secure Electronic Publishing Trial | | BIBA | PDF | 177 | |
| Jack Brassil | |||
| The SEPTEMBER system uses World Wide Web technology to distribute IEEE
Communication Society technical publications on the Global Internet. The trial
recently began with the publication of a single issue of the IEEE Journal on
Selected Areas in Communications.
The SEPTEMBER system became available on October 1, 1995 at the URL: http://www.research.att.com/jsac/ Full text articles are provided in multiple formats, including HTML and PDF. Additional services include access protection, a prototype billing system, and a novel copyright protection technology. More than 1200 users have registered with and used the system; user feedback has been overwhelmingly positive. The trial has provided an opportunity to explore the complexity of electronically disseminating existing paper journals. We have also gained insight into how subscribers wish to read online technical journals. I will discuss system implementation, reader demographics and behavior, and the future of IEEE Communications Society online publications. A complete article discussing the SEPTEMBER project is available at http://www.research.att.com/~jtb/psdocs/september.ps.Z | |||
| MITRE Information Discovery System | | BIBA | PDF | 177 | |
| Raymond J. D'Amore; Daniel J. Helm; Puck-Fai Yan; Stephen A. Glanowski | |||
| The MITRE Information Discovery System (MIDS) is a baseline system for
integrating advanced processing tools for information discovery and retrieval
in large-scale distributed environments. The system is built on a modular,
extendible architecture that allows for system-level decoupling and allocation
of component processing tools across network nodes to provide for efficient
processing in distributed environments. At one level, the system provides for
multi-platform user access to HTTP, Gopher, FTP, and news servers using an HTML
based client interface. However, more significantly, the system provides
advanced tools for metadata generation from disparate network objects, and a
content routing mediation layer for classification of metadata into appropriate
information brokers. This bottom-up layered information organization supports
a wide range of information retrieval and browsing strategies.
MIDS is being used in an enterprise intranet application to provide access to corporate information bases. Preliminary assessment indicates the need to balance available information retrieval and classification capabilities with a new generation of highly efficient post retrieval analysis tools for extracting, organizing, and visualizing information within extensive results sets. These back-end processing tools will be user accessible "on demand" through an object oriented interface to provide users with methods for maintaining personal views of large, heterogeneous information spaces. | |||
| Creating a Networked Computer Science Technical Report Library | | BIBA | PDF | 177-178 | |
| James R. Davis | |||
| Computer scientists have long been using the Internet as a medium for
transporting reports and documentation of many kinds, including, but not
limited to, technical reports about computer science. But this material has
typically been difficult to locate, search, and use, and has lacked the
organization and structure we expect from a true library. This poster
describes the Networked Computer Science Technical Report Library (NCSTRL), an
attempt to create a useful online library of computer science technical
reports.
NCSTRL provides scholarly and financial advantages to all its users. Researchers can easily search a body of material that is now slow, diffused, and difficult to access. Authors gain a wider audience than they now enjoy. In particular, since NCSTRL searches all sites, authors at less well-known institutions have an equal chance of at least having their reports noticed. Both these advantages grow as more sites participate. Departments gain a clean, effective management system for their technical reports and eliminate much of their current copying and mailing charges. The savings at Cornell alone are estimated to be in the thousands of dollars. The technology underlying NCSTRL is a network of interoperating digital library servers. The digital library servers provide three services: repository services that store and provide access to documents, index services that allow searches over bibliographic records, and user interface services that provide the human front-end for the other services. The services interoperate using an open protocol, so that other software systems can use the servers also. NCSTRL is powerful, yet also easy to install and maintain. The server software comes in two levels, Lite and Standard. The Lite version is intended for sites with few resources, and has a lower startup investment, while the Standard version offers greater functionality. Sites participating in NCSTRL will be able to install either. No matter which they install, the complete technical report collection will be available to all parties. NCSTRL has a uniform user interface, hiding almost all the underlying diversity. Users do not need to know which level of software a site is running, and departments will have a smooth upgrade path from the basic to the advanced should they desire additional capability. Technology alone is not enough to create a useful library. The poster will present our experiences in setting reasonable policies for fair use of scholarly material. | |||
| The Common Ground Surrounding Access: Theoretical and Practical Perspectives | | BIBA | PDF | 178 | |
| Geri Gay; June P. Mead | |||
| Research indicates that technologically-rich environments demand equally
rich data collection and analysis tools -- ones capable of examining
human-computer interactions as well as the social and cognitive dynamics that
develop during computer mediated collaboration. Further, our research has
demonstrated the need to address the social, psychological, and pedagogical
aspects of online collaboration. We have found that by studying the multiple
ways in which users interact with these new systems, we can develop tools that
add value to digitized images, that allow scholars to annotate, manipulate, and
organize the data they collect in creative multimedia compositions. What we
have found convinces us that parallel development and evaluation combine
synergistically to enhance the overall design process.
Our poster explores the common ground surrounding access to digital libraries. It addresses such questions as: What promise does access to digital libraries hold? How does access to digital libraries change patterns of communication? What does access to primary sources mean to teachers, librarians, researchers and students? What tools do they need for multimedia composition? What strategies do people employ as they annotate, manipulate and organize the data they search for and collect? What new forms of message construction need to be understood within this collaborative environments? And finally, what happens after access -- how do people use digital information after they find it? | |||
| Digital Libraries and Impacts on Scientific Careers | | BIBA | PDF | 178-179 | |
| Richard Giordano | |||
| The recruitment of subject specialist PhDs into information work is not a
simple act of recruitment because it amounts to a large cultural change for
those recruited, and introduces alien work practices and expectations in the
existing organizational culture. Young scientists have gone through a period
of acculturation and socialization as scientists, and many have career
aspirations that they compromise because of the lack of suitable employment.
Thus, although the development and maintenance of collaborative systems and
digital information environments represents a bright prospect for young
scientists, it is not without potential problems that might affect the quality
of work. In preliminary research conducted by the author, many such PhDs felt
undervalued both by senior research scientists and by members of the library
and information science community; they believed that their scientific careers
were over, and they longed to publish an experimental paper.
Our current research questions methods include: (1) to determine if, and to what extent, young scientists are satisfied with their career choice; (2) to differentiate among different types of scientists (3) to analyze the social forces at work in their environments that enhance or inhibit their professional and social standing within the scientific community. (3) to ascertain alternatives to 'traditional' scientific education (4) to ascertain proper rewards, management, and recognition from the library and information science community for scientists in their employment. | |||
| An Object-Oriented Hypermedia System for Structured Documents | | BIBA | PDF | 179 | |
| Hyunki Kim; Hakgene Shin; Jaewoo Chang | |||
| In this paper, we design a new hypermedia markup language using SGML and implement an object-oriented hypermedia system on top of the Postgres, a next-generation database management system. Compared with the conventional systems, our hypermedia system has some advantages. First, since our hypermedia markup language is designed using SGML, the language can interchange documents in a system-independent manner and can support content-based and structure-based retrieval. Second, since we apply an object-oriented paradigm for modeling hypermedia data and links, we can inherit the properties and methods of the object-oriented model. Finally, our hypermedia system can provide database management (DBMS) transaction management, storage management, security, crash recovery, and version control. | |||
| The Cultural Heritage Information Online Project: Demonstrating Access to Distributed Cultural Heritage Museum Information | | BIBA | PDF | 179-180 | |
| William E. Moen; John Perkins | |||
| Project CHIO is a demonstration project that provides access to cultural
heritage information online (CHIO). The project is sponsored by the Consortium
for the Computer Interchange of Museum Information (CIMI). The poster session:
* Provides background on this collaborative project
* Describes how ANSI/NISO Z39.50, the information retrieval protocol, is used
in this project * Details the information modeling, query semantics, and search and retrieval behavior upon which the use of Z39.50 is based. Project CHIO demonstrates how Z39.50 offers solutions to the difficulties in achieving meaningful online search and retrieval of information of different types and structure (e.g., structured records, full-text documents, images) regardless of the hardware and software used to store information or search for it. The initial implementations will consist of Z39.50 clients and servers supporting access to a demonstration CHIO Information Resource. The CHIO Information Resource can be modeled as a digital library comprised of hierarchical, distributed collections of information. A user may search the CHIO Information Resource to retrieve museum objects including: images, object records, exhibition catalogs, and wall labels. CHIO demonstrates the utility of national and international standards both to build digital libraries and to provide meaningful online search and retrieval of information in digital libraries. | |||
| Inverse Mapping in the Handle Management System | | BIBA | PDF | 180 | |
| Varna Puvvada; Roy H. Campbell | |||
| A handle identifies objects stored within a distributed system like a
digital library or the WWW Examples of handles are provided by the "Handle
Management System" (HMS) built by CNRI. HMS retrieves location dependent data
associated with a "handle", a printable string which unambiguously and
unforgeably identifies data. Handles have use as URNs (Uniform Resource Names)
that identify one or more URLs (Uniform Resource Locators) and their
corresponding documents on the Web. In this case, the handle is a location
independent name for the location dependent URLs and their referenced data.
Handles also have use to identify objects stored within a digital library. In
this instance the handle is a location independent name for one or more
location dependent "object pointers" that identify storage facility and/or
rights management system locations. Other applications of handles are E-mail
addresses, Internet host names and IP addresses.
The inverse mapping problem is to find a economic solution to retrieving the handle for a specific data item or location dependent name. The inverse mapping problem for the HMS is to find the handle for a given data item. The problem is complicated because the mapping of handles to data items is many to many, the data items may not be all unique, and the system is distributed. Inverse mapping is needed to manage the data items and name space. For example, it is easier to maintain a Web document if it is constructed with location independent URNs. However, many documents are built with local copies of the data and URLs. Converting these documents is desirable. Inverse mapping is also needed to check whether data items are aliased to multiple handles. The inverse mapping problem for data items can be solved using a combination of search techniques according to the nature of the data. Inverse mapping for short data items can follow a scheme similar to that used in the Domain Name Server for a pointer query. An inverse handle is associated with every data item in this category and this handle names a list of handles that refer to the data item. This inverse handle would belong to an inverse-mapping authority, for example hdl_inverse. The locally_unique_part of this handle should be constructed from the data item and could be built using a suitable hashing function, for example by using md5 checksum. This inverse service could either be a dedicated service or could be delegated to all the servers. For long textual data items a hierarchy of indices can be constructed based on the keywords similar to the keyword systems. These indices are maintained by servers spread through out the network. But this mapping is not immediate. The degree of consistency of the system depends on the frequency of update of these indices. | |||
| The Electronic Reserve System at Penn State U. | | BIBA | PDF | 180-181 | |
| Joan A. Reyes | |||
| The goal of our project was to move away from a static, paper-based reserve
reading room collection which could only be used within the confines of the
library building to an electronic reserve system which could be accessed by the
students and faculty of the University from anywhere on or off campus. This
system employs computing technologies to convert documents into digital
formats, to store the digital copies and to provide access to them. The pilot
project has allowed me to research areas that would encompass the design,
development and use of this type of system. Some of the topics I have
researched are: security, copyright, the types of material that students will
use, access, locations, terminal types, equipment, the capabilities of the
system, and staff training issues.
The pilot project is in its first phase and has used material from twenty-four courses which cover the humanities and the sciences with the potential of allowing the use of 270 courses at the Pattee Library location alone. The project will allow for the expansion of the system to the seven campus libraries at University Park and to the twenty-two Commonwealth campus libraries throughout Pennsylvania in the future. | |||
| Providing Multiple Levels of Difficulty in EarthLab's Digital Library | | BIBA | PDF | 181 | |
| Ruth A. Ross; Lois F. Kelso; Gary R. Broughton; Edward J. Hopkins | |||
| The EarthLab Learning Environment, designed to encourage student research
projects in earth science education, included a digital library of topics and
case studies for students to explore as they expanded their abilities to solve
problems and to find new problems to pose. To make EarthLab's library more
accessible to all students, a varying level of difficulty was developed by
creating three versions of each library document and implementing access
mechanisms to facilitate selecting a level or changing it. Levels defined: (1)
Easy [4th grade]; (2) Intermediate [7th]; (3) Advanced [10th]. "Difficulty"
was measured by readability, concept density, and prerequisite knowledge. Data
display and other visualization and interactive simulation was redesigned for
each level.
The educational objective was to accommodate and then challenge learners. The software was designed to help students find a comfortable level, then encourage moving up a level, as in a video game, ready for new challenges with increasing skills and knowledge. Effective with students from various backgrounds and achievement levels, including adult learners, this approach to adding flexibility to educational software has shown exceptional impact on students with special learning and reading problems and with gifted students. It may be useful in many training and learning systems. | |||
| A Framework for Pricing Services in Digital Libraries | | BIBA | PDF | 181 | |
| J. Sairamesh; Y. Yemini; D. F. Ferguson; C. Nikolaou | |||
| Digital Libraries will have a major influence on the design of future
information systems. They will set the stage for future complex information
technologies to evolve and provide "transparent" services to a variety of
users. We consider Digital Libraries (commercial) as information economies
consisting of several players (or economic agents): authors and publishers who
create and sell their collections, suppliers (e.g. computer systems) who
provide information storage, indexing and access services, information-agents
who provide searching and presentation services, and users who request for
services.
In such an economic framework, one can envision suppliers and information-agents competing to provide services for information storage, searching, access and presentation. In providing such services, several issues arise, among them are pricing and Quality of Service (QoS) to access and view information objects. These issues play an important role in allocating resources such as processing time, network bandwidth and buffers, memory, cache and I/O in order to provide the various services. Using this framework, we present the interactions among the players, the dynamics of the economic agents, service models, pricing and billing mechanisms (QoS based), and corresponding implementation issues in large digital libraries. | |||
| Establishing Computer-Based Information Services in the School Library | | BIBA | PDF | 182 | |
| Mag. Werner Schoggl | |||
| With school budgets down we are challenged to both improve our educational system and lower the costs of it. What can be expected from the integration of digital resources (online-services, Email, CD-ROM) in the school-library for the quality of education and for the role of the school libraries on the one hand and for the school budget on the other hand? We have been offering access to the Internet and to a local computer net for both teachers and students for several months. In addition we have produced some prototype samples of teaching material consisting of Web-pages with links to both offline and online sites. Some time ago we also established access to the services and resources of the national press agency. Last but not least we have organized some courses for the teachers of our school to acquire the basic skills for retrieving and editing documents from online and offline resources. We have also organized a small group of librarians (not restricted to Austria) that work together to improve the methods of integrating online services in school libraries. So far we can say that the integration of computerized information retrieval improves the qualities of a school library dramatically. This may eventually result in a decisive change of the way both students and teachers acquire new skills and knowledge. For the future we plan to develop methods for storing and searching material that has already been downloaded and to initiate a national (eventually international) online service for school libraries. | |||
| MESL Project Description | | BIBA | PDF | 182 | |
| J. Trant | |||
| The Museum Educational Site Licensing Project (MESL) brings representative museums, colleges, and universities together to define the terms and conditions for educational use of museum images and information on campus-wide networks. During this two-year collaboration, launched in 1995, fourteen selected educational and collecting institutions are collaborating to agree on the terms of the capture, distribution, and use of digital images and their associated texts. MESL participants are exploring and evaluating the educational benefits of digital access to museum collections through campus networks. Administrative, technical, and legal mechanisms are being developed and tested to enable the future use of large quantities of high-quality museum images by all educational institutions. | |||
| The Text Encoding Initiative Guidelines and Their Applications to Building Digital Libraries | | BIBA | PDF | 184-185 | |
| Nancy Ide; Judith Klavans | |||
| The Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange of Machine-Readable Texts were published in May 1994, after six years of development within the academic and research communities. The SGML-based Guidelines provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications, including natural language processing, information retrieval, hypertext, electronic publishing, various forms of literary and historical analysis, lexicography, etc. The Guidelines are intended to apply to texts, written or spoken, in any natural language, of any date, in any genre or text type, without restriction on form or content. They treat both continuous materials (running text) and discontinuous materials such as dictionaries and linguistic corpora. As such, the TEI Guidelines offer the best encoding solution currently available for the development of digital libraries, where varied and complex texts must be stored and manipulated in ways that answer a wide variety of user needs, and where the linkage of multi-media is essential. | |||
| User Needs Assessment and Evaluation: Issues and Methods | | BIBA | PDF | 186 | |
| Ann Bishop; Barbara Buttenfield; David Levy; Nancy Van House | |||
| Evaluation of digital libraries (DLs) begins, ideally, before design:
effective design is based on the needs of users.
DL needs assessment and evaluation requires a panoply of complementary methods. DLs are similar to, but distinct from, traditional libraries and of other kinds of computer-based systems. Their evaluation requires the adaptation of existing methods and development of new ones. Choice of method is driven by research goals and the conceptual and practical context of DLs. The goals of this workshop are for participants to acquire a better understanding of existing DL methods and needed developments, and of the underlying conceptual bases for needs assessment and evaluation; and to engage in a discussion about methods and a research agenda. The workshop will begin and end with a discussion of DL research goals and their relationship to methods. Several major data collection methods will be addressed in depth, including: * Ethnographic methods * Interviews, focus groups, and surveys * System monitoring and user feedback Each session will consist of a presentation by a researcher with expertise in the method, followed by substantial discussion of the information derived using the method; practicalities of using it; its strengths and weaknesses; and the conditions under which it is appropriate. | |||