DL'96: Proceedings of the 1st ACM International Conference on Digital Libraries

Fullname:1st ACM International Conference on Digital Libraries
Editors:Edward A. Fox; Gary Marchionini
Location:Bethesda, Maryland
Dates:1996-Mar-20 to 1996-Mar-23
Standard No:ACM ISBN 0-89791-830-4; ACM Order Number 606961; ACM DL: Table of Contents hcibib: DL96
  1. Keynote Address
  2. Multimedia Digital Libraries
  3. Library and Information Science Perspectives
  4. Human-Computer Interaction: Browsing and Visualization
  5. Keynote Address
  6. Human-Computer Interaction: Images and Spatial Organization
  7. Documents
  8. Information Retrieval
  9. Document Indexing and Analysis
  10. D-Lib Working Session 1A
  11. D-Lib Working Session 1B
  12. D-Lib Working Session 2A
  13. D-Lib Working Session 2B
  14. D-Lib Working Session 3A
  15. D-Lib Working Session 3B
  16. Panels -- Abstracts
  17. Posters
  18. Workshops -- Abstracts

Keynote Address

Interoperability Issues in Digital Libraries BIBPDF 1
  Barry M. Leiner

Multimedia Digital Libraries

Building a Digital Library: The Perseus Project as a Case Study in the Humanities BIBAPDF 3-10
  Gregory Crane
This paper outlines some of our preliminary findings in the Perseus Project, an on-going digital library on ancient Greek culture that has been under development since 1987.
Towards the Digital Music Library: Tune Retrieval from Acoustic Input BIBAKPDF 11-18
  Rodger J. McNab; Lloyd A. Smith; Ian H. Witten; Clare L. Henderson; Sally Jo Cunningham
Music is traditionally retrieved by title, composer or subject classification. It is possible, with current technology, to retrieve music from a database on the basis of a few notes sung or hummed into a microphone. This paper describes the implementation of such a system, and discusses several issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input.
Keywords: Music retrieval, Melody recall, Acoustic interfaces, Relevance ranking
VISION: A Digital Video Library BIBAKPDF 19-27
  Wei Li; Susan Gauch; John Gauch; Kok Meng Pua
The goal of the VISION (Video Indexing for Searching Over Networks) project is to establish a comprehensive, online digital video library. We are developing automatic mechanisms to populate the library and provide content-based search and retrieval over computer networks. The salient feature of our approach is the integrated application of mature image or video processing, information retrieval, speech feature extraction and word-spotting technologies for efficient creation and exploration of the library materials. First, full-motion video is captured in real-time with flexible qualities to meet the requirements of library patrons connected via a wide range of network bandwidths. Then, the videos are automatically segmented into a number of logically meaningful video clips by our novel two-step algorithm based on video and audio contents. A closed caption decoder and/or word-spotter is being incorporated into the system to extract textual information to index the video clips by their contents. Finally, all information is stored in a full-text information retrieval system for content-based exploration of the library over networks of varying bandwidths.
Keywords: Digital libraries, Content-based indexing and retrieving, Video and audio processing

Library and Information Science Perspectives

The Role of Intermediary Services in Emerging Digital Libraries BIBAPDF 29-35
  Allen Brewer; Wei Ding; Karla Hahn; Anita Komlodi
The conception of a library has evolved over the past 200 years from a place that houses a collection of information resources to a process of facilitating knowledge transfer from source to user. The facilitator role of the library encompasses the concept of a change agent, where the library acts as a proactive participant in the diffusion of appropriate knowledge to users. Today's libraries aim to provide not only access to and delivery of information, but have increasingly incorporated proactive services aimed at assisting in the interpretation and application of information to fulfill user information requirements.
   According to one definition of the digital library (DL) it can be defined as applications based on the hypermedia paradigm [1]. The conception of a DL was initially confined to collections of digital information [16] but others [8, 11] have argued for broader conceptions of DLs. In defining the role of a DL it is essential to incorporate the concept of proactive intermediation and value added services so that the DL is not limited to passive warehousing of navigable information. Value added services may be sourced from any number of suppliers creating potential complexities in interoperability between and among suppliers and services.
   We propose that intermediation is an essential functional role in a DL whose purposes include:
  • (1) interaction with potential beneficiaries,
  • (2) interaction with information resources, and
  • (3) mediation between information resources and users to add value during the
        information transfer process. Information beneficiaries include users, organizations, repositories, software products, software agents, or any entity acting as an information seeking agent which can benefit from the acquisition of information, including another DL. The DL includes information assets used in the delivery of services but is not limited to the construction and access of information resources in the form of collections, corpora, databases, web resources, and repositories of reusable program and information objects. Value can be added during the mediation process via searching, categorization, filtering, translation, publishing, or some combination of these activities. By eliminating unnecessary constraints upon the types of entities which can benefit from a DL, this definition includes, in addition to human users, automated beneficiaries such as CASE workbenches, instrumentation platforms, and robots. Incorporating autonomous and semi-autonomous knowledge agents as both suppliers and users in the DL definition provides an opportunity to integrate computer, communication, information, and knowledge assets into a more unified system for information resource management to support the evolution of an information based economy.
       Where the characteristics of information drive DL collection decisions, the DL could be seen as product oriented. Services provided by product-oriented DLs include traditional information retrieval (IR) services for retrospective queries. Where customer information requirements form the basis for collection development decisions the DL may be seen as customer oriented. Services provided by customer oriented DLs may include proactive services such as selective dissemination of information (SDI) [27] or may provide real-time routing of information to customers. In a customer oriented DL the collection profile may be constructed by combining the user profiles for all users which the DL is intended to serve.
       To further explicate the sense of the DL as intermediary, the five value added functions (search, classification, filtering, translation, and publishing) are further explored in terms of their roles in a DL. Real world implementations, including many discussed below, often include multiple intermediary functions within the scope of services offered.
  • Toward the Bibliographic Control of Works: Derivative Bibliographic Relationships in an Online Union Catalog BIBAPDF 36-43
      Gregory H. Leazer; Richard P. Smiraglia
    The digital library will require a bibliographic retrieval tool that controls recorded knowledge regardless of its material form. A conceptual model for such a catalog is described. Foremost, this catalog will include information on derivative bibliographic relationships -- those relationships that exist among the individual members of a bibliographic family. In order to understand the problem of derivative bibliographic relationships, we conducted a study intended to build on our understanding of the nature of bibliographic works and the breadth of bibliographic families. The specific objectives of this research are to test the model for the control of bibliographic families, and measure the frequency and extent of the derivative relationship in OCLC's online union catalog. It appears from this cursory examination of the data, that although there were fewer large bibliographic families than expected, the characteristics of bibliographic families were as Smiraglia had predicted. Furthermore, Leazer's conceptual design appears to be an accurate model for the control of bibliographic families.

    Human-Computer Interaction: Browsing and Visualization

    Graphical Table of Contents BIBAPDF 45-53
      Xia Lin
    This paper proposes a graphical table of contents (GTOC) that is functionally analogous to the table of contents. The proposed GTOC can be generated automatically from the text of documents. It visualizes document contents and relationships to allow easy access of underlying documents. It also provides various interactive tools to let the user explore the documents. Issues of how to generate such GTOC include how documents are indexed and organized, how the organized documents are visualized, and what interactive means are needed to provide necessary functionality of GTOC. These issues are discussed in this paper with a GTOC prototype based on Kohonen's self-organizing feature map algorithm.
    Visual Relevance Analysis BIBAPDF 54-62
      Mountaz Hascoet-Zizi; Nikos Pediotakis
    In order to access relevant information in digital libraries, most traditional systems feature topic search. In this paper we present visual relevance analysis to extend the notion of topic search by relying on visualization and interaction techniques to help users rapidly browse through potentially relevant documents. Visual relevance analysis offers a better repartition of control between the user and the system for topic search. The interaction paradigm uses a library metaphor, implemented through a classification system. In this paper we first present how a classification system is built to serve the visualization purposes. We further discuss presentation and interaction strategies for visual relevance analysis followed by implementation issues and system overview. Finally we briefly review related work and compare it with our approach.
    A Browsing Tool of Multi-Lingual Documents for Users without Multi-Lingual Fonts BIBAPDF 63-71
      Tetsuo Sakaguchi; Akira Maeda; Takehisa Fujita; Shigeo Sugimoto; Koichi Tabata
    Since a library is inherently multi-lingual, a multi-lingual document environment is crucial for a digital library. In the near future, worldwide information sharing through digital libraries will be common. Currently, multi-lingual documents are poorly facilitated on computers and the Internet. It is impractical to consider installing fonts for all character sets in every user's terminal. This paper presents a multi-lingual document browsing tool for a user with no multi-lingual fonts on his or her terminal. It discusses several methods for browsing multi-lingual documents and proposes a browser which sends a text string with the font glyphs required to display the text. It also gives the evaluation result of the browser.

    Keynote Address

    How Will We Know When It Is a Library? BIBPDF 72
      Ann S. Okerson

    Human-Computer Interaction: Images and Spatial Organization

    User Controlled Overviews of an Image Library: A Case Study of the Visible Human BIBAKPDF 74-82
      Chris North; Ben Shneiderman; Catherine Plaisant
    This paper proposes a user interface for remote access of the National Library of Medicine's Visible Human digital image library. Users can visualize the library, browse contents, locate data of interest, and retrieve desired images. The interface presents a pair of tightly coupled views into the library data. The overview image provides a global view of the overall search space, and the preview image provides details about high resolution images available for retrieval. To explore, the user sweeps the views through the search space and receives smooth, rapid, visual feedback of contents. Desired images are automatically downloaded over the internet from the library. Library contents are indexed by meta-data consisting of automatically generated miniature visuals. The interface software is completely functional and freely available for public use, at: http://www.nlm.nih.gov/
    Keywords: Browsing, Digital library, Image database, Information exploration, Information retrieval, Internet, Medical image, Remote access, User interface, Visualization, World-Wide Web
    A Spatial Approach to Organizing and Locating Digital Libraries and Their Content BIBAKPDF 83-89
      Jason Orendorf; Charles Kacmar
    Explosive growth of world-wide web (WWW) sites combined with the lack of an overall and consistent organizational structure is making it increasingly difficult for researchers and users to locate relevant materials. This paper proposes a spatial method of structuring digital libraries and their content in which users navigate geographically to locate and access information. A prototype based on a spatial methodology was implemented to further study this organizational structure. The system, SDLS, is a hypermedia-based digital library browser, authoring system, and document viewer in which users navigate using geographical (map) displays to locate and retrieve information. This method of access provides a natural means of information retrieval for geographically-based repositories and reference materials.
    Keywords: Spatial, Geographic, Image, Map, Graphical, Digital library


    Index Structures for Structured Documents BIBAPDF 91-99
      Yong Kyu Lee; Seong-Joon Yoo; Kyoungro Yoon; P. Bruce Berra
    Much research has been carried out in order to manage structured documents such as SGML documents and to provide powerful query facilities which exploit document structures as well as document contents. In order to perform structure queries efficiently in a structured document management system, an index structure which supports fast document element access must be provided. However, there has been little research on the index structures for structured documents. In this paper, we propose various kinds of new inverted indexing schemes and signature file schemes for efficient structure query processing. We evaluate the storage requirements and disk access times of our schemes and present the analytical and experimental results.
    Toward Active, Extensible, Networked Documents: Multivalent Architecture and Applications BIBAPDF 100-108
      Thomas A. Phelps; Robert Wilensky
    Rich varieties of online digital documents are possible, documents which do not merely imitate the capabilities of other media. A true digital document provides an interface to potentially complex content. Since this content is infinitely varied and specialized, we must provide means to interact with it in arbitrarily specialized ways. Furthermore, since relevant content may be found in distinct documents, we must draw from multiple sources, yet provide a coherent presentation to the user. Finally, it is essential to be able to conveniently author new content, define new means of manipulation, and seamlessly mesh both with existing materials.
       We present a new general paradigm that regards documents with complex content as "multivalent documents", comprising multiple "layers" of distinct but intimately related content. Small, dynamically-loaded program objects, or "behaviors", activate the content and work in concert with each other and layers of content to support arbitrarily specialized document types. Behaviors bind together the disparate pieces of a multivalent document to present the user with a single unified conceptual document. As implemented in Java in the context of the World Wide Web, multivalent documents in effect create a customizable virtual Web, drawing together diverse content and functionality into coherent document-based interfaces to content.
       Examples of the diverse functionality in multivalent documents include: "OCR select and paste", where the user describes a geometric region on the scanned image of a printed page and the corresponding text characters are copied out; video subtitling, which aligns a video clip with the script and language translations so that, e.g., the playing video can be presented simultaneously in multiple languages, and the video can be searched with text-based techniques; geographic information system (GIS) visualizations that compose several types of data from multiple datasets; and distributed user annotations that augment and may transform the content of other documents.
       In general, a document management infrastructure built around a multivalent perspective can provide an extensible, networked system that supports incremental addition of content, incremental addition of interaction with the user and with other components, reuse of content across behaviors, reuse of behaviors across types of documents, and efficient use of network bandwidth. Multivalent documents exploit digital technology to enable new, more sophisticated document interaction.
    Physical Objects in the Digital Library BIBAPDF 109-115
      Richard Furuta; Catherine C. Marshall; Frank M., III Shipman; John J. Leggett
    Physical objects are the foundation for many of today's areas of scholarship, research, and education. Because physical objects are tangible, any digital representation of one is an approximation of the object. Knowing how to approximate requires an understanding of the work practices and needs of the library's constituencies. We consider issues arising from the creation of digital libraries based on physical objects, focusing particularly on the characteristics of botanical herbaria and their users.

    Information Retrieval

    Natural Language Information Retrieval in Digital Libraries BIBAPDF 117-125
      Tomek Strzalkowski; Jose Perez-Carballo; Mihnea Marinescu
    In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications's Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.
    Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-Occurrence Lists for Information Retrieval BIBAPDF 126-133
      Bruce R. Schatz; Eric H. Johnson; Pauline A. Cochrane; Hsinchun Chen
    The basic problem in information retrieval is that large-scale searches can only match terms specified by the user to terms appearing in documents in the digital library collection. Intermediate sources that support term suggestion can thus enhance retrieval by providing alternative search terms for the user. Term suggestion increases the recall, while interaction enables the user to attempt to not decrease the precision.
       We are building a prototype user interface that will become the Web interface for the University of Illinois Digital Library Initiative (DLI) testbed. It supports the principle of multiple views, where different kinds of term suggestors can be used to complement search and each other. This paper discusses its operation with two complementary term suggestors, subject thesauri and co-occurrence lists, and compares their utility. Thesauri are generated by human indexers and place selected terms in a subject hierarchy. Co-occurrence lists are generated by computer and place all terms in frequency order of occurrence together. This paper concludes with a discussion of how multiple views can help provide good quality Search for the Net.
       This is a paper about the design of a retrieval system prototype that allows users to simultaneously combine terms offered by different suggestion techniques, not about comparing the merits of each in a systematic and controlled way. It offers no experimental results.
    Information Product Evaluation as Asynchronous Communication in Context: A Model for Organizational Research BIBAPDF 134-142
      Lisa D. Murphy
    Knowledge workers are routinely engaged in information search and retrieval (ISR) tasks where they make evaluations of complex information products such as electronic documents or multi-media items. Information Systems (IS) organizations in business support the creation of these complex information products as well as providing tools and support for their acquisition and use. Some ISR assumptions, such as an information need exists independently of the ability of the repository to satisfy it, or an information need can be specified by objective terms, can be problematic for knowledge workers. An alternative approach considers information products as elements of an asynchronous communication; it explicitly considers evaluation after retrieval and the types of support provided by IS groups. General propositions about the task and context of information product evaluation are proposed and used to develop a new model (Information Product Evaluation Model) incorporating aspects of the user's context, meta-information availability, and accessibility.

    Document Indexing and Analysis

    Text to Hypertext: Can Clustering Solve the Problem in Digital Libraries? BIBAPDF 144-150
      Robert B. Kellogg; Madhan Subhas
    Automatic hypertext generation remains an extremely challenging endeavor in the digital library world. In this paper we present a solution for automatically connecting relevant information in dynamic textual digital libraries. This textual information is generally unconnected and often unexplored due to the large flow of information entering from remote and local sources. Often, full-text indexes exist for this information but embedded links to related information are conspicuously absent. Links that do exist are usually generated in an arduous and time-consuming manual process. That is why the ability to automatically generate links has a potentially high payoff.
       Our solution for the automatic generation of hypertext links relies on the techniques of document segmentation and document clustering. Hypertext links are automatically generated during the document clustering process using the incremental cover-coefficient-based clustering algorithm. The issues of link completeness and link quality are also addressed in this paper. Link completeness is studied by comparing the cluster-based approach of link generation to the exhaustive link generation approach. Results indicate that links are more complete in the higher similarity range than in the lower similarity range. Initial link quality user studies indicate that the cluster-based hypertext link generation approach is promising. In the future, we plan to conduct further studies on link quality and investigate ways to increase the effectiveness of our approach.
    Indexing Handwriting Using Word Matching BIBAPDF 151-159
      R. Manmatha; Chengfeng Han; E. M. Riseman; W. B. Croft
    There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the W. B. DuBois collection at the University of Massachusetts and the early Presidential libraries at the Library of Congress. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Character Recognition (OCR) and then index them using a text retrieval engine. However, OCR does not work well on handwriting. Here an alternative scheme is proposed for indexing such texts. Each page of the document is segmented into words. The images of the words are then matched against each other to create equivalence classes (each equivalence classes contains multiple instances of the same word). The user then provides ASCII equivalents for say the top 2000 equivalence classes.
       The current paper deals with the matching aspects of this process. Due to variations in even a single person's handwriting, it is expected that the matching will be the most difficult step in the whole process. A matching technique based on Euclidean distance mapping is discussed. Experiments are shown demonstrating the feasibility of the approach.
    Building a Scalable and Accurate Copy Detection Mechanism BIBAPDF 160-168
      Narayanan Shivakumar; Hector Garcia-Molina
    Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles.

    D-Lib Working Session 1A

    Metadata to Describe Information in Digital Libraries BIBAPDF 170
      Terence R. Smith
    The session will include presentations and discussion of the issues in using metadata to describe information in digital libraries. One example, will be the role of metadata within the Alexandria Digital Library project at UC Santa Barbara.
       If sense is to be made of the flood of information that will be available through digital libraries, it must be described effectively, so that it can be found, its value assessed, and its acquisition handled efficiently. Metadata is the term most often used to refer to the description of information objects to support these three functions of digital libraries. Digital library technology is capable of both supporting major augmentations to traditional metadata activities and providing a basis for catalog interoperability.
       Important issues relate to the choice of languages for representing concepts and conceptual structures. Metadata for spatially-referenced information, for example, may follow a "bottom-up" approach that is an extension of traditional library practices or a "top-down" approaches involving more general knowledge representation languages.
       Background materials for this session are at: http://www.dlib.org/metadata.html

    D-Lib Working Session 1B

    User Needs Assessment and Evaluation BIBAPDF 170
      Nancy A. Van House; David Levy
    A critical issue in digital library (DL) design is incorporating user needs early in the design process and continuing throughout. The user needs and assessment groups of the DLI projects are working to improve DL design by incorporating user needs and preferences. They are working to develop data collection and analysis methods for DLs, understand DL user behavior, assess user needs, evaluate the emerging DLs against user needs, compare findings across projects, understand how this information can be efficiently and effectively incorporated in design, and build a research agenda.
       This working session will consist of a panel representing both the user needs assessment and evaluation group and designers from several of the DLI projects. The emphasis will be on the interaction between the design process and the needs assessment and evaluation effort. It will address such issues as the interconnected and sometimes-conflicting needs of designers, evaluators, and users; coordinating evaluation and design approaches; and impediments to and supports for this interaction.
       Background materials for this session are at: http://www.dlib.org/user-needs.html

    D-Lib Working Session 2A

    Social Aspects of Digital Libraries BIBAPDF 170-171
      Christine L. Borgman
    In February 1996, UCLA and the National Science Foundation are holding a workshop on social aspects of digital libraries. This working session will present an outline of the issues raised at the workshop, and invite audience reaction and discussion. The research workshop plans to focus on the following topics:
  • Information needs: (a) Social context and culture -- to what extent can
       digital library components be generalized and to what extent must they be
       tailored to each environment? (b) Information needs and information seeking
       -- what is the relationship between information seeking and learning in
       digital libraries? (c) Linking user-learner needs and behavior to digital
       library design -- what design techniques are appropriate in applying user
       needs research to digital library design?
  • End user searching and filtering: (a) Organization, description and
       representation of information -- which methods of organization can be
       generalized for digital libraries? What new methods are needed? (b) Search
       capabilities for users -- how, if at all, should problem domain areas be
       divided? (c) Interface design for information retrieval -- what
       human-computer interaction principles can be applied to the information
       retrieval environment? Background materials for this session are at: http://www.dlib.org/social.html
  • D-Lib Working Session 2B

    Repository Interactions BIBAPDF 171
      William L. Scherlis
    This working session will be based around the report of a D-Lib workshop in March 1996 on interfaces between digital library repositories. The focus is on technical issues, but they are closely linked to legal, technical, social, economic and political questions.
       The working group focuses on technical issues associated with repository interoperation. As digital libraries proliferate, many approaches to managing digital assets and associated meta-data are emerging. There are important differences among these approaches, and these differences have technical, legal, social, economic, and political dimensions. How can multiple repositories coexist and interact effectively?
       The working group is motivated by several important trends: The complexity and semantic richness of objects and meta-data managed by repositories is increasing. Information objects of greater value are now being managed more routinely, raising issues of security, access control, and support for commerce. Performance demands are increasing, as is the quantity and size of information objects, particularly in multimedia applications. Digital libraries are interacting more often with personal, group, and wide area information services. Finally, the distinction is blurring between digital libraries and other institutional information resources such as databases and corporate webs.
       The starting points for the working group are technologies that support management of information objects, their names, and associated meta-data -- databases, distributed file systems, object bases, and the Web. Several digital library research groups have started to develop concepts that could provide a basis for repository interoperation, including the CR-TR architectural work of Kahn and Wilensky, the Stanford Infobus project of Garcia-Molina and Winograd, and the agent architecture of the Michigan DLI project. In addition to the need to reconcile these various approaches, there is a broader need to put these in the context of standards efforts in the wider community, including Corba, OLE, Web-associated standards, Z39.50, and SQL and its successors. All of these deal with resolving names to objects, and all of these deal in some measure with meta-data.
       The initial effort of the working group is (1) to identify the dimensions of the space of repository interaction and interoperability, and the issues associated with achieving some transparency for users of the digital libraries, and (2) to assess current research and development efforts to understand the differences among them.
       Background materials for this session are at: http://www.dlib.org/repositories.html

    D-Lib Working Session 3A

    Digitization and Conversion BIBAPDF 171-172
      M. Stuart Lynn
    The objective of the working group on digitization and conversion is to share experience, evolve a code of practice, and encourage sharing of resources amongst those who manage large projects to convert library materials to digital forms. This working session will include brief presentations and discussion of the major issues. The session will cover technical, library and operational topics.
       This topic is relevant to almost every aspect of digital library research. For example, sharing converted library objects requires technical work on formats and repository interactions. It also provokes the need for comprehensive metadata to describe converted items, suitable naming methods for identification and access, and a method for describing and protecting intellectual property. Research on the digital archiving of converted digital objects also applies to ensuring the longevity of digital objects in general.
       Background materials for this session are at: http://www.dlib.org/conversion.html

    D-Lib Working Session 3B

    Naming Objects in the Digital Library BIBAPDF 172
      William Y. Arms
    This working session will include presentations and discussions on naming issues in digital libraries. It will include reports on recent progress in the development of Uniform Resource Names (URNs), but the emphasis will be on three topics that concern how names are used.
  • User issues. Groups that assign names to library objects wish to provide
       long term flexibility while integrating existing naming schemes. They need
       ways to relate names of library objects to semantic concepts such as
       uniqueness, mutability, etc.
  • Name management. If names are to be globally unique and persist for long
       periods of time, the allocation of top-level names and the registration of
       naming schemes must be managed with care.
  • Aggregation and granularity. A crucial design decision in a digital library
       is how to assign names to parts of a work, variants, and other complex and
       compound items. Background materials for this session are at: http://www.dlib.org/naming.html
  • Panels -- Abstracts

    Agricultural Network Information Center (AgNIC) A Model for Access to Distributed Resources BIBPDF 174
      Richard E. Thompson
    Knowledge-Based Biomedical Information Retrieval BIBPDF 175
      Alexa T. McCray


    SEPTEMBER -- Secure Electronic Publishing Trial BIBAPDF 177
      Jack Brassil
    The SEPTEMBER system uses World Wide Web technology to distribute IEEE Communication Society technical publications on the Global Internet. The trial recently began with the publication of a single issue of the IEEE Journal on Selected Areas in Communications.
       The SEPTEMBER system became available on October 1, 1995 at the URL:
       Full text articles are provided in multiple formats, including HTML and PDF. Additional services include access protection, a prototype billing system, and a novel copyright protection technology. More than 1200 users have registered with and used the system; user feedback has been overwhelmingly positive.
       The trial has provided an opportunity to explore the complexity of electronically disseminating existing paper journals. We have also gained insight into how subscribers wish to read online technical journals. I will discuss system implementation, reader demographics and behavior, and the future of IEEE Communications Society online publications.
       A complete article discussing the SEPTEMBER project is available at http://www.research.att.com/~jtb/psdocs/september.ps.Z
    MITRE Information Discovery System BIBAPDF 177
      Raymond J. D'Amore; Daniel J. Helm; Puck-Fai Yan; Stephen A. Glanowski
    The MITRE Information Discovery System (MIDS) is a baseline system for integrating advanced processing tools for information discovery and retrieval in large-scale distributed environments. The system is built on a modular, extendible architecture that allows for system-level decoupling and allocation of component processing tools across network nodes to provide for efficient processing in distributed environments. At one level, the system provides for multi-platform user access to HTTP, Gopher, FTP, and news servers using an HTML based client interface. However, more significantly, the system provides advanced tools for metadata generation from disparate network objects, and a content routing mediation layer for classification of metadata into appropriate information brokers. This bottom-up layered information organization supports a wide range of information retrieval and browsing strategies.
       MIDS is being used in an enterprise intranet application to provide access to corporate information bases. Preliminary assessment indicates the need to balance available information retrieval and classification capabilities with a new generation of highly efficient post retrieval analysis tools for extracting, organizing, and visualizing information within extensive results sets. These back-end processing tools will be user accessible "on demand" through an object oriented interface to provide users with methods for maintaining personal views of large, heterogeneous information spaces.
    Creating a Networked Computer Science Technical Report Library BIBAPDF 177-178
      James R. Davis
    Computer scientists have long been using the Internet as a medium for transporting reports and documentation of many kinds, including, but not limited to, technical reports about computer science. But this material has typically been difficult to locate, search, and use, and has lacked the organization and structure we expect from a true library. This poster describes the Networked Computer Science Technical Report Library (NCSTRL), an attempt to create a useful online library of computer science technical reports.
       NCSTRL provides scholarly and financial advantages to all its users. Researchers can easily search a body of material that is now slow, diffused, and difficult to access. Authors gain a wider audience than they now enjoy. In particular, since NCSTRL searches all sites, authors at less well-known institutions have an equal chance of at least having their reports noticed. Both these advantages grow as more sites participate. Departments gain a clean, effective management system for their technical reports and eliminate much of their current copying and mailing charges. The savings at Cornell alone are estimated to be in the thousands of dollars.
       The technology underlying NCSTRL is a network of interoperating digital library servers. The digital library servers provide three services: repository services that store and provide access to documents, index services that allow searches over bibliographic records, and user interface services that provide the human front-end for the other services. The services interoperate using an open protocol, so that other software systems can use the servers also.
       NCSTRL is powerful, yet also easy to install and maintain. The server software comes in two levels, Lite and Standard. The Lite version is intended for sites with few resources, and has a lower startup investment, while the Standard version offers greater functionality. Sites participating in NCSTRL will be able to install either. No matter which they install, the complete technical report collection will be available to all parties. NCSTRL has a uniform user interface, hiding almost all the underlying diversity. Users do not need to know which level of software a site is running, and departments will have a smooth upgrade path from the basic to the advanced should they desire additional capability.
       Technology alone is not enough to create a useful library. The poster will present our experiences in setting reasonable policies for fair use of scholarly material.
    The Common Ground Surrounding Access: Theoretical and Practical Perspectives BIBAPDF 178
      Geri Gay; June P. Mead
    Research indicates that technologically-rich environments demand equally rich data collection and analysis tools -- ones capable of examining human-computer interactions as well as the social and cognitive dynamics that develop during computer mediated collaboration. Further, our research has demonstrated the need to address the social, psychological, and pedagogical aspects of online collaboration. We have found that by studying the multiple ways in which users interact with these new systems, we can develop tools that add value to digitized images, that allow scholars to annotate, manipulate, and organize the data they collect in creative multimedia compositions. What we have found convinces us that parallel development and evaluation combine synergistically to enhance the overall design process.
       Our poster explores the common ground surrounding access to digital libraries. It addresses such questions as: What promise does access to digital libraries hold? How does access to digital libraries change patterns of communication? What does access to primary sources mean to teachers, librarians, researchers and students? What tools do they need for multimedia composition? What strategies do people employ as they annotate, manipulate and organize the data they search for and collect? What new forms of message construction need to be understood within this collaborative environments? And finally, what happens after access -- how do people use digital information after they find it?
    Digital Libraries and Impacts on Scientific Careers BIBAPDF 178-179
      Richard Giordano
    The recruitment of subject specialist PhDs into information work is not a simple act of recruitment because it amounts to a large cultural change for those recruited, and introduces alien work practices and expectations in the existing organizational culture. Young scientists have gone through a period of acculturation and socialization as scientists, and many have career aspirations that they compromise because of the lack of suitable employment. Thus, although the development and maintenance of collaborative systems and digital information environments represents a bright prospect for young scientists, it is not without potential problems that might affect the quality of work. In preliminary research conducted by the author, many such PhDs felt undervalued both by senior research scientists and by members of the library and information science community; they believed that their scientific careers were over, and they longed to publish an experimental paper.
       Our current research questions methods include:
  • (1) to determine if, and to what extent, young scientists are satisfied with
        their career choice;
  • (2) to differentiate among different types of scientists
  • (3) to analyze the social forces at work in their environments that enhance or
        inhibit their professional and social standing within the scientific
  • (3) to ascertain alternatives to 'traditional' scientific education
  • (4) to ascertain proper rewards, management, and recognition from the library
        and information science community for scientists in their employment.
  • An Object-Oriented Hypermedia System for Structured Documents BIBAPDF 179
      Hyunki Kim; Hakgene Shin; Jaewoo Chang
    In this paper, we design a new hypermedia markup language using SGML and implement an object-oriented hypermedia system on top of the Postgres, a next-generation database management system. Compared with the conventional systems, our hypermedia system has some advantages. First, since our hypermedia markup language is designed using SGML, the language can interchange documents in a system-independent manner and can support content-based and structure-based retrieval. Second, since we apply an object-oriented paradigm for modeling hypermedia data and links, we can inherit the properties and methods of the object-oriented model. Finally, our hypermedia system can provide database management (DBMS) transaction management, storage management, security, crash recovery, and version control.
    The Cultural Heritage Information Online Project: Demonstrating Access to Distributed Cultural Heritage Museum Information BIBAPDF 179-180
      William E. Moen; John Perkins
    Project CHIO is a demonstration project that provides access to cultural heritage information online (CHIO). The project is sponsored by the Consortium for the Computer Interchange of Museum Information (CIMI). The poster session:
  • Provides background on this collaborative project
  • Describes how ANSI/NISO Z39.50, the information retrieval protocol, is used
       in this project
  • Details the information modeling, query semantics, and search and retrieval
       behavior upon which the use of Z39.50 is based. Project CHIO demonstrates how Z39.50 offers solutions to the difficulties in achieving meaningful online search and retrieval of information of different types and structure (e.g., structured records, full-text documents, images) regardless of the hardware and software used to store information or search for it.
       The initial implementations will consist of Z39.50 clients and servers supporting access to a demonstration CHIO Information Resource. The CHIO Information Resource can be modeled as a digital library comprised of hierarchical, distributed collections of information. A user may search the CHIO Information Resource to retrieve museum objects including: images, object records, exhibition catalogs, and wall labels.
       CHIO demonstrates the utility of national and international standards both to build digital libraries and to provide meaningful online search and retrieval of information in digital libraries.
  • Inverse Mapping in the Handle Management System BIBAPDF 180
      Varna Puvvada; Roy H. Campbell
    A handle identifies objects stored within a distributed system like a digital library or the WWW Examples of handles are provided by the "Handle Management System" (HMS) built by CNRI. HMS retrieves location dependent data associated with a "handle", a printable string which unambiguously and unforgeably identifies data. Handles have use as URNs (Uniform Resource Names) that identify one or more URLs (Uniform Resource Locators) and their corresponding documents on the Web. In this case, the handle is a location independent name for the location dependent URLs and their referenced data. Handles also have use to identify objects stored within a digital library. In this instance the handle is a location independent name for one or more location dependent "object pointers" that identify storage facility and/or rights management system locations. Other applications of handles are E-mail addresses, Internet host names and IP addresses.
       The inverse mapping problem is to find a economic solution to retrieving the handle for a specific data item or location dependent name. The inverse mapping problem for the HMS is to find the handle for a given data item. The problem is complicated because the mapping of handles to data items is many to many, the data items may not be all unique, and the system is distributed.
       Inverse mapping is needed to manage the data items and name space. For example, it is easier to maintain a Web document if it is constructed with location independent URNs. However, many documents are built with local copies of the data and URLs. Converting these documents is desirable. Inverse mapping is also needed to check whether data items are aliased to multiple handles.
       The inverse mapping problem for data items can be solved using a combination of search techniques according to the nature of the data. Inverse mapping for short data items can follow a scheme similar to that used in the Domain Name Server for a pointer query. An inverse handle is associated with every data item in this category and this handle names a list of handles that refer to the data item. This inverse handle would belong to an inverse-mapping authority, for example hdl_inverse. The locally_unique_part of this handle should be constructed from the data item and could be built using a suitable hashing function, for example by using md5 checksum. This inverse service could either be a dedicated service or could be delegated to all the servers. For long textual data items a hierarchy of indices can be constructed based on the keywords similar to the keyword systems. These indices are maintained by servers spread through out the network. But this mapping is not immediate. The degree of consistency of the system depends on the frequency of update of these indices.
    The Electronic Reserve System at Penn State U. BIBAPDF 180-181
      Joan A. Reyes
    The goal of our project was to move away from a static, paper-based reserve reading room collection which could only be used within the confines of the library building to an electronic reserve system which could be accessed by the students and faculty of the University from anywhere on or off campus. This system employs computing technologies to convert documents into digital formats, to store the digital copies and to provide access to them. The pilot project has allowed me to research areas that would encompass the design, development and use of this type of system. Some of the topics I have researched are: security, copyright, the types of material that students will use, access, locations, terminal types, equipment, the capabilities of the system, and staff training issues.
       The pilot project is in its first phase and has used material from twenty-four courses which cover the humanities and the sciences with the potential of allowing the use of 270 courses at the Pattee Library location alone. The project will allow for the expansion of the system to the seven campus libraries at University Park and to the twenty-two Commonwealth campus libraries throughout Pennsylvania in the future.
    Providing Multiple Levels of Difficulty in EarthLab's Digital Library BIBAPDF 181
      Ruth A. Ross; Lois F. Kelso; Gary R. Broughton; Edward J. Hopkins
    The EarthLab Learning Environment, designed to encourage student research projects in earth science education, included a digital library of topics and case studies for students to explore as they expanded their abilities to solve problems and to find new problems to pose. To make EarthLab's library more accessible to all students, a varying level of difficulty was developed by creating three versions of each library document and implementing access mechanisms to facilitate selecting a level or changing it. Levels defined: (1) Easy [4th grade]; (2) Intermediate [7th]; (3) Advanced [10th]. "Difficulty" was measured by readability, concept density, and prerequisite knowledge. Data display and other visualization and interactive simulation was redesigned for each level.
       The educational objective was to accommodate and then challenge learners. The software was designed to help students find a comfortable level, then encourage moving up a level, as in a video game, ready for new challenges with increasing skills and knowledge.
       Effective with students from various backgrounds and achievement levels, including adult learners, this approach to adding flexibility to educational software has shown exceptional impact on students with special learning and reading problems and with gifted students. It may be useful in many training and learning systems.
    A Framework for Pricing Services in Digital Libraries BIBAPDF 181
      J. Sairamesh; Y. Yemini; D. F. Ferguson; C. Nikolaou
    Digital Libraries will have a major influence on the design of future information systems. They will set the stage for future complex information technologies to evolve and provide "transparent" services to a variety of users. We consider Digital Libraries (commercial) as information economies consisting of several players (or economic agents): authors and publishers who create and sell their collections, suppliers (e.g. computer systems) who provide information storage, indexing and access services, information-agents who provide searching and presentation services, and users who request for services.
       In such an economic framework, one can envision suppliers and information-agents competing to provide services for information storage, searching, access and presentation. In providing such services, several issues arise, among them are pricing and Quality of Service (QoS) to access and view information objects. These issues play an important role in allocating resources such as processing time, network bandwidth and buffers, memory, cache and I/O in order to provide the various services. Using this framework, we present the interactions among the players, the dynamics of the economic agents, service models, pricing and billing mechanisms (QoS based), and corresponding implementation issues in large digital libraries.
    Establishing Computer-Based Information Services in the School Library BIBAPDF 182
      Mag. Werner Schoggl
    With school budgets down we are challenged to both improve our educational system and lower the costs of it. What can be expected from the integration of digital resources (online-services, Email, CD-ROM) in the school-library for the quality of education and for the role of the school libraries on the one hand and for the school budget on the other hand? We have been offering access to the Internet and to a local computer net for both teachers and students for several months. In addition we have produced some prototype samples of teaching material consisting of Web-pages with links to both offline and online sites. Some time ago we also established access to the services and resources of the national press agency. Last but not least we have organized some courses for the teachers of our school to acquire the basic skills for retrieving and editing documents from online and offline resources. We have also organized a small group of librarians (not restricted to Austria) that work together to improve the methods of integrating online services in school libraries. So far we can say that the integration of computerized information retrieval improves the qualities of a school library dramatically. This may eventually result in a decisive change of the way both students and teachers acquire new skills and knowledge. For the future we plan to develop methods for storing and searching material that has already been downloaded and to initiate a national (eventually international) online service for school libraries.
    MESL Project Description BIBAPDF 182
      J. Trant
    The Museum Educational Site Licensing Project (MESL) brings representative museums, colleges, and universities together to define the terms and conditions for educational use of museum images and information on campus-wide networks. During this two-year collaboration, launched in 1995, fourteen selected educational and collecting institutions are collaborating to agree on the terms of the capture, distribution, and use of digital images and their associated texts. MESL participants are exploring and evaluating the educational benefits of digital access to museum collections through campus networks. Administrative, technical, and legal mechanisms are being developed and tested to enable the future use of large quantities of high-quality museum images by all educational institutions.

    Workshops -- Abstracts

    The Text Encoding Initiative Guidelines and Their Applications to Building Digital Libraries BIBAPDF 184-185
      Nancy Ide; Judith Klavans
    The Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange of Machine-Readable Texts were published in May 1994, after six years of development within the academic and research communities. The SGML-based Guidelines provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications, including natural language processing, information retrieval, hypertext, electronic publishing, various forms of literary and historical analysis, lexicography, etc. The Guidelines are intended to apply to texts, written or spoken, in any natural language, of any date, in any genre or text type, without restriction on form or content. They treat both continuous materials (running text) and discontinuous materials such as dictionaries and linguistic corpora. As such, the TEI Guidelines offer the best encoding solution currently available for the development of digital libraries, where varied and complex texts must be stored and manipulated in ways that answer a wide variety of user needs, and where the linkage of multi-media is essential.
    User Needs Assessment and Evaluation: Issues and Methods BIBAPDF 186
      Ann Bishop; Barbara Buttenfield; David Levy; Nancy Van House
    Evaluation of digital libraries (DLs) begins, ideally, before design: effective design is based on the needs of users.
       DL needs assessment and evaluation requires a panoply of complementary methods. DLs are similar to, but distinct from, traditional libraries and of other kinds of computer-based systems. Their evaluation requires the adaptation of existing methods and development of new ones.
       Choice of method is driven by research goals and the conceptual and practical context of DLs.
       The goals of this workshop are for participants to acquire a better understanding of existing DL methods and needed developments, and of the underlying conceptual bases for needs assessment and evaluation; and to engage in a discussion about methods and a research agenda. The workshop will begin and end with a discussion of DL research goals and their relationship to methods. Several major data collection methods will be addressed in depth, including:
  • Ethnographic methods
  • Interviews, focus groups, and surveys
  • System monitoring and user feedback Each session will consist of a presentation by a researcher with expertise in the method, followed by substantial discussion of the information derived using the method; practicalities of using it; its strengths and weaknesses; and the conditions under which it is appropriate.