%M C.IR.95.358 %T Funding for IR Research %S Panels %A Efthimiadis, Efthimis %A Zemankova, Maria %A Corn, Milton %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 358 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p358-efthimiadis/p358-efthimiadis.pdf %M C.IR.95.358 %T Education for IR %S Panels %A Taghva, Kazem %A Fox, Edward %A Robertson, Stephen %A Belkin, Nicholas %A Lewis, David %A Harman, Donna %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 358 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p358-efthimiadis/p358-efthimiadis.pdf %X The SIGIR Education Committee has recently been formed based on the model of the SIGCHI committee which completed its final report in 1992. The committee is charged with developing curriculum recommendations for IR-related education to serve the computer science, library science, and information science communities. It will be soliciting input from information retrieval educators and the consumers of IR education (students and employers), with a view of determining the current status of IR education, the marketplace, and future direction. The committee is also interested in: * clearinghouses for IR courseware and training materials * electronic as well as traditional courses * demonstrations for online access to state of the art systems * other innovative efforts The purpose of this panel is to report briefly on the activities of the Education Committee and to stimulate discussion on the state of information retrieval education. The panel will consist of three IR educators from different communities (computer science, library science, information science) who will give brief (about 10 minute) presentations on their view of the purpose and content of IR curricula; a representative from the government sector will report on the role of IR in government agencies; and a participant from the industrial sector will consider the role of IR education in industry. %M C.IR.95.359 %T VUSE for INSPEC; and EPOQUE for Windows %S Systems Demonstrations: Abstracts %A Pollitt, Steve %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 359 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.hud.ac.uk/schools/cedar/cedar.html %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X CeDAR -- The Centre for Database Access Research, School of Computing and Mathematics, University of Huddersfield, UK, has pioneered the use of view-based techniques to improve the effectiveness of user-interfaces to both bibliographic and corporate databases. Two systems are presented: VUSE for INSPEC: This front-ending software searches the 5 million record INSPEC database and is a by-product of a research project launched on 1st Sept. 1991. The project has been funded by the University of Huddersfield in collaboration with the Institution of Electrical Engineers, Marconi Research Laboratories and STN-International (FIZ-Karlsruhe). The VUSE (View-based User Search Engine) system removes the need for the user to appreciate explicit Boolean statements by introducing a search strategy of successive refinement through the use of filtering views. These techniques are described in "Peek-a-Boo revived -- End-user searching of bibliographic databases using filtering views." by A Steven Pollitt, Martin P Smith and Geoffrey P Ellis, Online 94, 18th International Online Information Meeting, London, December 1994 pp 63-72. This PC-resident software has been used in the investigation of ranking and relevance feedback extensions to VUSE, the subject of research to PhD being undertaken by Martin P Smith. EPOQUE for Windows: Presented in collaboration with the Directorate of Informatics and Telecommunications at the European Parliament in Luxembourg. CeDAR is responsible for specifying the thesaurus interface that provides the new guided search mode to EPOQUE (European Parliament Online QUEry system). EPOQUE for Windows, made available in April 1995, is designed to facilitate querying of the European Parliament's main documentary database through the incorporation of VUSE techniques. EPOQUE documents are indexed by the multilingual EUROVOC thesaurus which provides a significant demonstration of the suitability of view-based techniques for multilingual retrieval. An example of how this approach has been demonstrated on the Apple Macintosh can be found in: "Using the thesaurus to view and filter environmental databases: An example using EUROVOC to search EPOQUE -- the European Parliament Online Query System." by A Steven Pollitt, Geoffrey P Ellis and Martin P Smith, The First European ISKO Conference on Environmental Knowledge Organisation and Information Management, 14-16 Sept. 1994, Bratislava, Slovakia. in Stancikova P and Dahlberg I (Eds) Knowledge Organization in Subject Areas, Vol. 1 (1994) Supplement pp 21-32 Pub: INDEKS VERLAG Frankfurt/Main. %M C.IR.95.359 %T DOTPLOT %S Systems Demonstrations: Abstracts %A Church, Kenneth W. %A Helfman, Jonathan I. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 359-360 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X An interactive program, "dotplot," has been developed for browsing millions of lines of text and source code, using an approach borrowed from biology for studying homology (self-similarity) in DNA sequences. With conventional browsing tools such as a screen editor, it is difficult to identify structures that are too big to fit on the screen. In contrast, with dotplots we find that many of these structures show up as diagonals, squares, textures and other visually salient features, as will be illustrated in examples selected from biology and two new application domains: text (AP news, Canadian Hansards) and source code (5ESS(RG)). %M C.IR.95.360 %T BRUIN: Browsing and Retrieval of text and mUltimedia resources for Information retrieval educatioN %S Systems Demonstrations: Abstracts %A Efthimiadis, Efthimis N. %A Parodi, Ricardo %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 360 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The BRUIN prototype presents ideas for the implementation of a digital library as a resource for supporting information retrieval (IR) education. BRUIN utilizes concept maps, based on the IR literature, such as the Belkin and Croft (ARIST, 1987) classification of retrieval techniques, to provide an overarching structure for the system as well as a visualization mechanism. BRUIN uses different technologies, such as Web browsers (Mosaic, Netscape, etc.), graphics viewers, and retrieval engines, and integrates them under a common user interface. The resources are accessed either by searching the database, or by using the concept map to enter the system and browse through the resources. Documents have hypertext links to other documents in the database and are linked through document clustering and citation linking. Documents contain also links to HTML documents (in-house and off-site), Powerpoint slides, graphics, other multimedia elements, screen captures of system displays, and URLs to systems that are mentioned in the database and are accessible on the Internet. %M C.IR.95.360 %T CLARIT %S Systems Demonstrations: Abstracts %A Evans, David A. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 360 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The CLARIT system consists of a set of flexible tools for application in a wide range of information management problems. These tools integrate natural-language processing (NLP), automatic knowledge discovery, and traditional information retrieval techniques. An advanced functionality application for free-text database management is demonstrated, incorporating full NLP, a broad range of querying mechanisms, automatic or user controlled query expansion, document collection profiling, document summarization, automatic document classification, and integrated handling of scanned images. The application provides rapid analysis of potentially large queries over large-scale databases in monolithic or client/server processing modes. %M C.IR.95.360 %T Head-Coupled Stereo Display for Visualization in a Document Retrieval System using Associative Networks %S Systems Demonstrations: Abstracts %A Fowler, Richard H. %A Fowler, Wendy A. L. %A Kumar, Aruna %A Williams, Jorge L. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 360 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The system to be demonstrated provides head-coupled stereoscopic viewing of 3-d visual representations for document collections, associative term thesauri, and individual documents. This style of interface has been called "fish tank VR" and shown to be relatively effective for 3-d viewing and interaction tasks. The system also provides mechanisms to integrate query formulation across the visual representations. Supplying interaction techniques to access the system's several visual representations is one of the system's goals. The demonstration allows users to experiment with 3-d interaction in the highly interactive, iterative process of information retrieval. %M C.IR.95.360 %T MARIAN: Ranked Retrieval in a Full-Scale Library Catalog System %S Systems Demonstrations: Abstracts %A France, Robert %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 360-361 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Ranked retrieval techniques offer library users the ability to find works based on incomplete or partially incorrect descriptions. In addition, they offer a robust and unusual approach to exploratory or subject-based searches. Library data, however, is of a type not usually encountered in ranked retrieval systems: it is highly structured, involves very short text fields, and includes references to other objects such as people and subject categories. In MARIAN we have adapted techniques from vector search systems, from information theory, and from semantic network processing systems to provide effective approximate matching in this domain. Both canned and hands-on demonstrations will be provided on a complete research library collection of c. 1,000,000 records. %M C.IR.95.361 %T WILLOW %S Systems Demonstrations: Abstracts %A Freedman, Matthew %A Heyano, Scott %A Jensen, Ellen %A Jordan, William %A Ketchell, Debra %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 361 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Willow is a general-purpose, extensible information retrieval tool. It uses database drivers to translate user queries and actions into the idiom expected by the remote search system. Through its Z39.50 driver, Willow can communicate with any search system that understands version 2 of the ISO Z39.50-1994 search and retrieval protocol. This demonstration illustrates how Willow isolates users from the idiosyncratic query and command syntaxes of diverse information retrieval systems. It also demonstrates the list browser mechanism that helps a user choose search terms as she moves through data ordered along an arbitrary axis. Finally, it demonstrates the multimedia extensions that permit Willow to deal with complex data such as sound, images, SGML tagged text, or non-Roman character sets. %M C.IR.95.361 %T WATERS: The Wide Area TEchnical Report Service, Dienst, and NCSTRL: A National Computer Science Technical Report Library %S Systems Demonstrations: Abstracts %A French, James C. %A Viles, Charles L. %A Davis, James R. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 361 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The Wide Area TEchnical Report Service (WATERS) and Dienst are distributed databases of computer science technical reports. Contributors are departments of computer science that make their reports available through the World-Wide Web. The reports are stored locally at the contributing sites so that users with a client such as Mosaic can browse, search, obtain abstract and bibliographic information, and retrieve technical reports online. NCSTRL, the National Computer Science Technical Report Library, is a joint effort of teams from the NSF sponsored WATERS project and the ARPA CSTR project. This demonstration will be the first public showing of the results of their collaboration. %M C.IR.95.361 %T ECLAIR %S Systems Demonstrations: Abstracts %A Harper, David %A Hendry, David %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 361 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The Eclair class library is a set of extensible C++ classes for implementing best-match IR applications. Developers use Eclair to either add IR functionality to an existing application or to develop new applications from scratch. Using a loosely-coupled user interface (written in Tcl/Tk), we demonstrate a variety of IR application features and discuss how the code abstractions offered by Eclair were employed. A traditional best-match model as well as a recent probabilistic inference approach (IJdens, Bruza & Harper, submitted) are used in the demonstration. Also discussed is a MultiMedia application where pictures are represented by complex indexing features, designed for effective retrieval. We explain how the implementation of IR applications is supported by Eclair. %M C.IR.95.361 %T LyberWorld: A 3D Graphical User Interface for Fulltext Retrieval %S Systems Demonstrations: Abstracts %A Hemmje, Matthias %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 361-362 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The LyberWorld system introduces a prototypical application of information visualization components for IR user interfaces. The prototype implements visualizations of an abstract information space -- fulltext. It demonstrates a visual user interface for the probabilistic fulltext retrieval system INQUERY. Visualizations are used to communicate information search and browsing activities in a natural way by applying metaphors of spatial navigation and attraction in abstract information spaces. Visualization tools for exploring textual information spaces and judging relevance of information items are introduced and example sessions are provided. The presence of a spatial model in the user's mind and interaction with a system's corresponding display methods is regarded as an essential contribution towards natural interaction and reduction of cognitive costs during e.g. query construction, orientation within the database content, relevance judgement and orientation within the retrieval context. %M C.IR.95.362 %T ITMS %S Systems Demonstrations: Abstracts %A Holsclaw, Russell P. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 362 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The technology on which the ITMS is developed is referred to as the Judgment Space (J-SPACE). A Judgment Space is an N-dimensional Euclidean space with a coordinate system in which the reference axes are interpreted as subject matter dimensions. Textual units are assigned point locations in the space and the projections of each point on the reference axes are interpreted as the degree of relevance of that textual unit to that subject matter dimension. The procedure involves: 1) selecting a number of technical expressions and 2) obtaining scaled judgments as to the degree of relevance of each of the technical expressions, terms, etc. to each of the subdomains of the subject matter. The result is a two dimensional matrix reflecting the relevance of each term to each sub-domain which can be interpreted as an N-dimensional Euclidean Space in which is embedded a configuration of K vectors extending from the origin of the space. %M C.IR.95.362 %T FUN: An NF2 Relational Interface with Aggregation Capability for Document Retrieval, Restructuring and Analysis %S Systems Demonstrations: Abstracts %A Jarvelin, Kalervo %A Niemi, Timo %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 362 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Complex documents are used in many environments, e.g., information retrieval (IR). Such documents contain subdocuments, which may contain further subdocuments, etc. In practice, document database users often want to view selected complex documents in different structures and to obtain aggregation information on their subdocuments. Therefore powerful tools are needed for complex document retrieval, restructuring, and analysis. The FUN system provides powerful filter conditions, full restructuring capability and multi-attribute multi-level data aggregation of structured complex documents represented in the non-first-normal-form (NF2) relational model. In particular, The FUN system provides these capabilities in a truly declarative and powerful interface. %M C.IR.95.362 %T Automatic Building of Hypertext Links in Digital Libraries %S Systems Demonstrations: Abstracts %A Kellogg, Robert B. %A Subhas, Madhan %A Fox, Edward A. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 362 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Our demonstration, Automatic Building of Hypertext Links in Digital Libraries, seeks to reduce the cost of authoring quality hypertext documents by taking advantage of promising information retrieval techniques. A set of tools will be presented that assist document authors in dynamically creating hypertext documents. The ability of the hypertext engine to semi-automatically and automatically create and remove bi-directional links will be demonstrated. The links will be generated based on similarity between documents and document components that reside in the collection. A World-Wide Web browser will be used to demonstrate the results of the hypertext linking tools. %M C.IR.95.363 %T BIRD: Browsing Interface for the Retrieval of Documents %S Systems Demonstrations: Abstracts %A Kim, Hanhwe %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 363 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X BIRD (Browsing Interface for the Retrieval of Documents) provides a visual interface for browsing and sifting through document collections. Documents behave like metal filings, and terms like magnets that attract the documents they index. Lists corresponding to any Boolean query can be built by iterative operations which involve separating a collection of documents into subsets according to one or two terms, merging selected subsets, and manually adding/deleting documents from the sets. Users can examine the documents in the lists at any time, and thus keep track of browsing sessions, while sifting through large collections of documents. %M C.IR.95.363 %T An Interface for Remotely Searching a Newswire Multi-Data-Base System, with Functions for the Automatic Identification of Duplicate Information/Documents by the Use of Text Clustering Techniques %S Systems Demonstrations: Abstracts %A Kirriemuir, John %A Willett, Peter %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 363 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The system being demonstrated illustrates various relationships between newswire articles/documents; these documents are retrieved, in real time, from a multi-database belonging to a telecommunications company. The software is able to identify many near-duplicate records in database search outputs, such as may arise from several sources submitting various rewrites of the same original article to the database, or the abstract and full-text versions of the same article being submitted. The software is activated by additional functions on the end-user database search interface, that allow the user, when searching and retrieving documents, some control over the clustering process. %M C.IR.95.363 %T VIBE: Visual Information Browsing Environment %S Systems Demonstrations: Abstracts %A Korfhage, Robert R. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 363 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X VIBE (Visual Information Browsing Environment) is a visual interface for information systems, focusing on the clustering and organization of documents in a collection, with respect to multiple reference points (e.g., query, user profile, known documents). The highly dynamic, interactive interface can be used in two vector modes: normal (iconic display of hundreds of documents), ASTRO (display of thousands of documents), and one Boolean mode, showing documents for all Boolean combinations of the reference points. Multiple tools are available to help the user organize and view the documents. User-defined thresholds can limit the documents shown to only the more relevant ones. %M C.IR.95.363 %T PIRCS: An Effective Text Retrieval System %S Systems Demonstrations: Abstracts %A Kwok, K. L. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 363 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X PIRCS (Probabilistic Indexing and Retrieval -- Components -- System) is a highly effective, probability and network-based IR system designed for large scale heterogeneous collections. Factors contributing to its effectiveness include representation enhancements, sophisticated term weighting, learning network capability, and combining multiple retrieval strategies. PIRCS does not need a full inverted file but maintains a direct file that allows a network for retrieval and learning to be created dynamically. Documents are organized into subcollections served by a common master lexicon with cumulative statistics. Documents are then mixed and ranked for retrieval as if from a single collection. PIRCS currently runs on a SparcStation with 128MB and 7 GB of disk space. %M C.IR.95.363 %T Creation and Navigation of Virtual Semantic Space %S Systems Demonstrations: Abstracts %A Lai, Kok F. %A Wang, Wei-Jun %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 363-364 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X In most text retrieval systems, similarity between words and documents are captured in large similarity matrices which derive their coefficients from various similarity measures. To a typical human observer, the similarity matrix represents abstract mathematical relations where extraction of meaningful relationships is virtually impossible. We will demonstrate an interactive system which maps these relations into virtual semantic spaces whereby Euclidean distances between objects are inversely proportional to their similarity. Furthermore, we provide navigational tools that enable one to travel inside this virtual semantic space as one might explore a physical space. The interactivity allows one to exploit human experience rather than technological prowess to comprehend the semantic relations. %M C.IR.95.364 %T Cheshire II: Demonstration of a Next-Generation Online Catalog System %S Systems Demonstrations: Abstracts %A Larson, Ray R. %A Moon, Ralph %A McDonough, Jerome %A Kuntz, Lucy %A O'Leary, Paul %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 364 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Cheshire II is a next-generation online catalog and full-text information retrieval system using advanced IR techniques. It is a client/server system that uses SGML as the underlying database format in the server search engine, supports probabilistic and Boolean searching, "nearest neighbor" searching and relevance feedback via the Z39.50 IR protocol. A graphical client interface provides access to the system, and to any other Z39.50 compliant servers. The system is being deployed in a working library environment and its use and performance are being evaluated. %M C.IR.95.364 %T A Multilingual IR Engine %S Systems Demonstrations: Abstracts %A Leong, Mun-Kew %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 364 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X This is a demonstration of a multilingual information retrieval engine which operates independent of the language of a source document. Individual documents may contain any mix of alphabetic and character based languages. The main focus, however, is on Asian (character-based) languages, and we will show the results of applying various linguistic methods to enhance document retrieval in such languages. These will include code-set based stop-word lists, compound nominal identification, extraction, and indexing, and phrase segmentation, indexing, and retrieval. %M C.IR.95.364 %T Visual Displays of SIGIR Documents %S Systems Demonstrations: Abstracts %A Lin, Xia %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 364 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X In an earlier SIGIR paper (Lin, et al. 1991), a method was proposed to construct a semantic map for information retrieval by a self-organizing algorithm, Kohonen's feature map. An important feature of the semantic map is to help the user visualize contents of underlying documents. This demonstration shows a prototype that implements such a semantic map as a graphical interface for retrieval systems. Using documents from SIGIR 86-93 as a test base, the prototype shows several IR literature maps based on different indexing methods such as title indexing, title-abstract indexing, and fulltext indexing. Comparing these maps has led to the exploration of relationships between document visualization and document indexing. Some preliminary results of the exploration will be illustrated during interactive demonstrations of the prototype. %M C.IR.95.364 %T Promenade: An Integrated OODB/IR System for WWW Image Retrieval %S Systems Demonstrations: Abstracts %A McLean, Stuart A. %A Rasmussen, Edie %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 364-365 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Promenade is an image-document retrieval system which integrates free-text and attribute-value queries in an object-oriented database query language (OSQL from Ontos). The query language provides a protocol upon which we were able to build an HTML interface to make the stored collections available to the World Wide Web through standard Web browsers (Mosaic, Netscape, MacWeb). Promenade currently hosts two image databases for the National Agricultural Library on the World Wide Web: a collection of botanical prints from Curtis Botanical Magazine (1797-1827), and a collection of plant pest and disease photographs from the Michigan State University Cooperative Extension Service. %M C.IR.95.365 %T The MG Retrieval System %S Systems Demonstrations: Abstracts %A Moffat, Alistair %A Zobel, Justin %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 365 %* (c) Copyright 1995 Association for Computing Machinery %W ftp://munnari.oz.au/pub/mghttp://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The MG system provides facilities for compressing, indexing, and searching large collections of documents and images, and has been the primary tool used by the Melbourne-based CITRI group during (to date) three years of TREC experiments. One of the key features of the MG system is the extensive use of compression, reducing storage space, index construction time, and retrieval time. For example, the 2 Gb TREC collection is stored -- including compressed text, compressed inverted index, and other auxiliary files -- in about 750 Mb, and is fully built in under 8 hours. Multi-term Boolean and ranked queries are evaluated in seconds, and decompression of answers is also very fast. The MG software is available free of charge by anonymous ftp from munnari.oz.au, directory pub/mg. A tutorial guide appears as an appendix in Managing Gigabytes: Compressing and Indexing Documents and Images, Ian H. Witten, Alistair Moffat, Timothy C. Bell, Van Nostrand Reinhold, New York, 1994. %M C.IR.95.365 %T The MIRACLE System: Using Abductive Inference and Dynamic Indexing to Retrieve Multimedia SGML Documents %S Systems Demonstrations: Abstracts %A Muller, Adrian %A Thiel, Ulrich %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 365 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The retrieval of complex data such as multimedia items and SGML-structured texts can be facilitated by means of a formal representation of syntactic and semantic knowledge about these data. These information sources must be aggregated dynamically at the time of query processing. MIRACLE (MultImedia concept Retrieval based on logiCaL query Expansion) is an interactive, probabilistic retrieval system, which comprises an extended Bayesian network, a multimedia indexing component and an abductive retrieval engine. The inference process exploits and controls the multiple index structures of the network. The prototype is demonstrated on a collection of SGML structured dictionary articles. %M C.IR.95.365 %T Envision: Information Visualization in a Digital Library %S Systems Demonstrations: Abstracts %A Nowell, Lucy T. %A Fox, Edward A. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 365 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Envision is a multimedia digital library of computer science literature, with full-text searching and full-content retrieval capabilities, serving computer science researchers, teachers, and students at all levels of expertise. The most unusual feature of Envision is its Graphic View window, which provides powerful information visualization facilities that enable users to explore patterns in the literature. Envision's Graphic View window displays search results as a matrix of icons that represent documents. Users have control over the semantics of six graphical devices: icon position along the x-axis and y-axis, the alphanumeric icon label, icon size, icon color, and icon shape. These graphical devices may represent a number of document attributes: probable relevance to query, publication year, document type (e.g., text, video, hypermedia), document size, number of sources, author names, and index terms. Working in tandem with the Graphic View, the Item Summary window presents bibliographic information for icons selected by the user. Document content is presented on demand using Mosaic and a suite of related viewers. Recent studies show strong user interest and satisfaction, and minor changes they have suggested are being incorporated into newer versions of the interface software. Implementation efforts have led to an X Motif version of the Envision interface, which will be shown with a sample of the overall digital library collection. %M C.IR.95.366 %T GUIDO: Graphical User Interface for Document Organization %S Systems Demonstrations: Abstracts %A Nuchprayoon, Assadaporn %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 366 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X GUIDO (Graphical Interface for Document Organization) provides a visual interface for browsing and retrieving documents from document collections. The visual display allows the user to view document collection according to the chosen reference points. Changing reference points provides an opportunity for users to view document collection from different angles. Reference points can also be dynamically created from a document, a cluster of documents. These reference points also play a role as queries in vector space model where users can draw a boundary around to include documents with high relevance in the retrieval set. Users can examine the document on the display at any time. %M C.IR.95.366 %T Interactive Filtering with a Gaussian User Model %S Systems Demonstrations: Abstracts %A Oard, Douglas %A DeClaris, Nicholas %A Dorr, Bonnie %A Faloutsos, Christos %A Marchionini, Gary %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 366 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Text filtering systems are designed to sift through large quantities of dynamically generated texts and display only those which may be relevant to a user's interests. We are particularly interested in interactive filtering environments in which relevance judgments become available in real time. We will demonstrate a prototype interactive system for filtering USENET news which is based on a multidimensional Gaussian user interest model. That model allows us to include differing levels of specificity for different concepts in the interest representation, potentially improving performance when compared to the cosine measure. %M C.IR.95.366 %T EFRS: Empirical Fact Retrieval System %S Systems Demonstrations: Abstracts %A Oh, Sam %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 366 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Empirical Fact Retrieval System (EFRS) provides access to statistical research findings. Using EFRS, variable name(s) can be searched to find all the associated variables investigated by other scholars. The system displays all the associated variables, their associated statistical information and document. Searches can be further restricted by indicating significance level and strength of relationship between variables. Different alpha levels, direction of relationships (positive or negative) and strength of relationships can also be specified. To develop this system, an ER diagram of statistical research findings is drawn. This ER schema is converted in the relational schema. The system is built using Microsoft Access Relational Database System. %M C.IR.95.366 %T EUROSPIDER %S Systems Demonstrations: Abstracts %A Knaus, Daniel %A Schauble, Peter %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 366 %* (c) Copyright 1995 Association for Computing Machinery %W http://www-ir.inf.ethz.chhttp://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The EUROSPIDER system is a full fledged Information Retrieval (IR) system to search large and complex data collections for relevant objects. Depending on the configuration, the EUROSPIDER system can be used as a standalone IR system, it can be added to a World-Wide Web server to make a data collection accessible through a network, or it can be added to a commercial database (DB) system to provide access to a possibly very dynamic and structured data collection. In the last case, the integration of the EUROSPIDER system and a DB system provides both IR functionality (relevance ranking, feedback searches, document analysis) and DB functionality (data model, query language, transaction processing, access control). An advanced integration of the EUROSPIDER system with a DB system is achieved by using a probabilistic retrieval model which takes into account the DB scheme. The EUROSPIDER system is the commercial version of the IR system SPIDER developed at the Swiss Federal Institute of Technology (ETH) Zurich, Switzerland. Demonstrations are available at [http://www-ir.inf.ethz.ch]. %M C.IR.95.367 %T FISsearch %S Systems Demonstrations: Abstracts %A Scholten, Willem %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 367 %* (c) Copyright 1995 Association for Computing Machinery %W http://futureinfo.comhttp://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X FISsearch is a experimental retrieval system, consisting of an IR module, a http to Z39.50 bi-directional gateway, a Document Object Abstraction facility, and a location independent content delivery system, implementing URN's and URL's. This system is specifically being built to provide full-text search capabilities to large archival collections, by indexing OCR output, and at retrieval time delivering the bitmapped image of the original document. We believe this particular IR system is unique in so far that it incorporates in the weighing algorithms a term-noise corruption factor. This particular additional weight is obtained during the OCR process, as output from the recognition neural-net, and normalized between 0-1000. Initial research has shown that the actual effectiveness of the total system is good, and a high degree of corruption can be dealt with in the indexes. %M C.IR.95.367 %T InfoCrystal: A Visual Information Retrieval Interface %S Systems Demonstrations: Abstracts %A Spoerri, Anselm %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 367 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X The InfoCrystal uses a simple visual metaphor to enable users to deal with some of the complexities inherent in information retrieval. The InfoCrystal can be used both as a visualization tool and a visual query language to help users search for information. In particular, it allows users to specify Boolean as well as vector-space queries graphically. In this demonstration we provide an overview of the key features of the InfoCrystal and its implementation. We also present the results of two experiments that demonstrate that the InfoCrystal can be successfully used by novice users to specify an information need, where the users only received a short training tutorial. %M C.IR.95.367 %T Querying, Navigating and Visualizing an Online Library Catalog %S Systems Demonstrations: Abstracts %A Veerasamy, Aravindan %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 367 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X We demonstrate a graphical interface to a library catalog information retrieval system. This ranked output IR system interface has combined a novel set of features to help the end-user in a wide range of information gathering situations. The system supports the following: * Navigational features such as browsing table of contents and browsing list of articles written by a specific author. * An integrated online thesaurus from which end-users can pick words and phrases to expand their original query. * A visualization scheme that helps the user in understanding how the query result ranking was computed. * Simple drag-and-drop operations of objects into positive and negative areas on the screen for providing relevance feedback information. The above features help the user's interactive and iterative nature of the information seeking process. %M C.IR.95.367 %T New Information Retrieval Capabilities for Russian Texts Based on the Language Processor Russicon %S Systems Demonstrations: Abstracts %A Yablonsky, Serge A. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 367-368 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X New retrieval capabilities for Russian texts based on the language processor Russicon are introduced: - "word-changing" search, which means that all word-changing forms of a given word will be found; - "paradigm" search, which means that all words -- members of a given word paradigm (or the part of paradigm) will be found; - word search with given grammatical characteristics (part of speech, changeability, animation, case, number, gender, person, aspect, tense, transition, mood, form, reflexive (verb), length of word-building and wordchanging stem etc.); - word search with given word-building (root) and word changing stem, prefixes, suffixes and endings -- "linguistic wild cards" etc.; - word search enriched by synonyms from thesaurus; - usage for forming natural query from Russian sentence. These capabilities are achieved by using Russian language processor RUSSICON inside of retrieval systems. Technical specifications of the system. Russian language processor is realised as a C-library of such functions: morphological analyzer, normalizer, syntactic analyzer and semantic analyzer. The processor is designed as C++ library of mentioned functions which allow quick generation of different C-based retrieval systems on multiple platforms (DOS/WINDOWS). %M C.IR.95.368 %T LIBRETTO: An Intelligent Information Provider %S Systems Demonstrations: Abstracts %A Yannakoudakis, E. J. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 368 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p359-efthimiadis/p359-efthimiadis.pdf %X Advances in software engineering are making it possible to design systems that are totally open and can also be tailored to the specific needs of an information provision centre. This demonstration will show how we have utilised the concepts: USER-DEFINED ENTITIES, FREE-TEXT RETRIEVAL, THESAURUS, SDI, OPEN SYSTEM and USBC, in order to build the integrated package called LIBRETTO for the total control of a modern library or information centre. %M C.IR.95.369 %T On Lexical Cohesion Patterns, Thesaural Information, and Text Abridgement %S Posters: Abstracts %A Benbrahim, Mohamed %A Ahmad, Khurshid %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 369 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X The advent in information superhighway brings with it a deluge of multi-modal information, particularly textual information, stored on computer systems world-wide. If information retrieval of abstracts was an intractable problem, then think how hard it would be look for items of information in a distributed corpora of texts. It is important, therefore, to think about text abridgement schemes that can (semi)-automatically summarise texts not merely on frequency of keywords in context but authors use linguistic devices to convey messages and to fashion literally hundreds of thousands of words in a coherent whole. The literature on text linguistics and pragmatics of written communications can be of some considerable help here. Consider, for example, the work of Michael Hoey who focuses on 'passages of authentic text' and demonstrates that patterns of lexis operate across sentence boundaries and over considerable distances within and between the text. He has argued that lexis and text are an important level of organisation and that they interact constructively to form a regular contiguous unit. We intend to critically analyse his work from a computational stand point, and to explore whether or not, we can simulate 'lexis and text levels of language organisation': We show how we can analyse lexical repetition and paraphrasing, by making use of an encyclopaedic thesaurus, to abridge texts of specialist domains. We report on a computer system that can extract key sentences in a non-narrative English text, including sentences used to introduce and to close topics, and sentences that elaborate on the principal themes of the text. %M C.IR.95.369 %T What You Get Is What You Want: Combining Evidence for Effective Information Filtering %S Posters: Abstracts %A Dumais, Susan T. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 369 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X Information filtering refers to the task of selecting objects of interest from an incoming stream of information. As part of NIST/ARPA's TREC Workshop, we used Latent Semantic Indexing (LSI) for filtering 336k documents from diverse sources (newswires, patents, newspapers, technical abstracts) for 50 topics of interest. We developed representations of user interests using two sources of information. A Word Filter used just the words in the topic statements. A RelDocs Filter used just relevant training documents and ignored the topic statement (a variant of relevance feedback). The RelDocs filter vector was 30% more effective than the detailed natural language description of interests. Combining these two vectors provided small additional improvements in filtering. On average, 7 of the top 10 documents are relevant using the combined vector method. Performance can further be improved by continually incorporating relevant test documents into the filter vector. Data combination of the Word and RelDocs retrieved sets was not generally successful in improving performance compared to the best individual method, although we believe it might be if additional sources are used. Both query and data combination methods are quite general and applicable to a variety of filtering applications. %M C.IR.95.369 %T Language Processing Techniques for the Implementation of a Document Retrieval System for Turkish Text Databases %S Posters: Abstracts %A Ekmekcioglu, F. Cuna %A Lynch, Michael F. %A Willett, Peter %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 369-370 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X Over the last decade, a certain degree of progress has been achieved in the morphological analysis of Turkish. However, this work has not been used, thus far, to improve the effectiveness of information retrieval systems. This poster considers the development and evaluation of conflation techniques necessary for the implementation of a document retrieval system for Turkish text databases. We have evaluated stemming and n-gram matching for searching six dictionaries of Turkish words. Our results indicate that stemming can bring about substantial reductions in the number of word variants that must be processed in a Turkish free-text retrieval system. Thus, the six Turkish corpora result in a mean compression figure of 78.6%, as against a compression figure of just 36.4% when Porter's algorithm is applied to an English text. The n-gram experiments suggest that trigrams perform slightly better than digrams, and the best results, in terms of minimising the retrieval of inappropriate word variants, are obtained by combining both stemming and n-gram analysis. %M C.IR.95.370 %T Evaluation of Probabilistic Retrieval Methods %S Posters: Abstracts %A Gey, Fredric C. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 370 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X A probabilistic information retrieval method returns documents to the user in descending order of estimated probability of relevance. The advantage of a probabilistic method is that, if the estimate of relevance truely reflects the probability of relevance, the user has an additional piece of information upon which to decide when to halt the search. If, for example, the probability of relevance for the 75th ranked document is 0.01, the user knows that, on average, she will have to examine 100 more documents before finding the next relevant document. Recall and Precision graphs, as well as average precision over all levels of recall, are the usual method for evaluating information retrieval performance. However, with probabilistic retrieval models, another measure of performance can be introduced -- accuracy the probability estimate itself. This presentation shows how the accuracy of the probability estimate can be calibrated and tested with a Chi Square test. We test the probability accuracy on four different probabilistic methods which performed well (in terms of average precision) on the TREC3 collection of documents and queries. All of the methods fail a significance test on accuracy of the probability estimates. We explore the importance of prior probability-of-relevance as a reason for inaccuracy. %M C.IR.95.370 %T A Learning Method for Text Categorization: The Category Discrimination Method %S Posters: Abstracts %A Goldberg, Jeffrey Lee %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 370-371 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X The Category Discrimination Method (CDM) is a new learning algorithm designed for text categorization. The motivation is there are statistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-a-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the 'best' predictors of a given category. Using a precategorized test collection of text documents, for each category: * Determine the 'best' predictors (i.e. features) for the category by computing the cue validity of all single word-based features from all training documents and selecting those exceeding a threshold. * Conduct a multi-stage search over a limited search space to learn how these features might best be organized into a logical structure suitable for use as a text categorizer. * Evaluate the performance of best categorizer by running on the test documents. The hypothesis that CDM's performance will exceed two non-domain specific algorithms, Bayesian classification and decision tree learners is empirically tested. %M C.IR.95.371 %T VIIP: An Iconic-Indexing Approach for Video %S Posters: Abstracts %A Haddad, Hassen %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 371 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X The aim of this work is to build up an automatic strategy and model for content-based video indexing. We propose a two-step strategy based on image analysis and on an analogy with textual documents vector-based models. Starting with raw video, the approach first identifies the shots by detecting shot cuts, and builds a set of indexing frames representing the document shots. In a second step, a clustering process is applied on the detected shots leading to another set of final indexing frames taken as representatives of each cluster. A prototype implementing this approach has been built. It achieves shot cuts detection and clustering by applying similarity measures between the document frames. It also includes a set of other parameters such as an indexing frame selection criterion. This approach has led to a two-level indexing model: a physical and a semantic level, where frames represent structural document elements linked together with one of the "composed-of" or "indexed-with" relationships. According to the first evaluating tests of the prototype, the approach and the model allow to express some of the semantics held in the document, like a specific camera motion, an object orientation or presence, etc. %M C.IR.95.371 %T The Impact of Information Use in the Context of Pharmaceutical Research and Development %S Posters: Abstracts %A Harrison, Lauren %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 371 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X This study was designed to identify the degrees of fit between or appropriateness of the knowledge provided via enduser searching of bibliographic information systems with the information seeking practices of scientists involved in pharmaceutical research and development. The critical incident techniques was utilized to extract information regarding the impact of information retrieved from online bibliographic information resources on its users. Scientists (n=10) actively involved in some aspect of pharmaceutical research and development were interviewed. The intent was to determine how online bibliographic database systems are used, how well they serve the needs of their users as well as the nature of their impact on these users. Content analysis of interview transcripts show that users in this context are engaged in publication/report writing or research design when motivated to use online bibliographic databases. The impact of having information retrieved includes increased: publication rates, credibility in report/ information generation, credibility in provision of information to the Medical Community and timesavings. Respondents stated having the retrieved information positively impacted their research activities. Evidence suggests that enduser searching has an impact on publication productivity as well as other aspects of productivity. %M C.IR.95.371 %T Children's Browsing and Keyword Searching on the Science Library Catalog: The Effect of Domain Knowledge on Search Behavior %S Posters: Abstracts %A Hirsh, Sandra G. %A Borgman, Christine L. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 371-372 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X Research has shown that adults' subject domain knowledge influences the way they use information retrieval systems. However, the effect of domain knowledge on children's search behavior has not been investigated. This study examines children's search behavior on the Science Library Catalog, a hypertext-based automated library catalog for elementary school children. The Science Library Catalog provides two ways to search for information: a browsing-oriented search method which allows children to navigate through science knowledge hierarchies and a keyword search method which allows children to type in their search queries. We focus on the effect of science domain knowledge on children's search performance, search behavior, and learning as they look for science books on this system. Data were collected through one-on-one interviews, direct observation, and online monitoring of search sessions. We are using a pattern matching program to evaluate sequences of search moves in the monitoring logs and to help us understand how and when children use browsing and keyword search methods. This dissertation will contribute to our understanding of children's search behavior and the factors which influence their behavior. This research also has implications for information retrieval system evaluation and interface design. %M C.IR.95.372 %T Image Attributes: An Investigation %S Posters: Abstracts %A Jorgensen, Corinne %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 372 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X With the rapid expansion in imaging technologies, access to collections of digital images is a subject of major interest. Indexing systems and computerized retrieval for images both need data concerning typically described image attributes. To date, there is little research upon which to base choices as to which attributes should be included in these systems. This research is investigating attributes typically described in several types of tasks using pictorial images. Participants performed descriptive, categorizing, and searching tasks, and word and phrase data were subjected to content analysis. Forty-two image attributes and nine higher level attribute classes were described. The data suggest that indexing of literal object is of prime significance, as is indexing of the human form and other human characteristics. "Content/Story" and other abstract attributes are also typically described, suggesting that image indexing may benefit by application of concepts associated with indexing of fiction. Term variability is less than might have been expected, suggesting some constraints may exist on the process of communicating about visually perceived data. %M C.IR.95.372 %T Relevance Feedback: Usage, Usability, Utility %S Posters: Abstracts %A Koenemann, Jurgen %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 372 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X I present two experiments that investigate the interactive searching behavior of two groups of people using a best-match, ranked-output retrieval engine (INQUERY) to search a large, full-text document collection. The group for the first experiment consisted of ten users experienced in the use of traditional, boolean online retrieval systems who were novices in the use of best-match, ranked output systems. I describe their behavior and retrieval performance for five searches each in the context of the TREC-3 routing task with a special focus on their use of relevance feedback. The second experiment has been designed to analyze the contribution of relevance feedback more closely: a baseline system without relevance feedback is contrasted with three versions of relevance feedback systems that systematically vary user knowledge and user control with regard to relevance feedback but otherwise maintain the same interface, the same retrieval engine, the same full-text document collection (75000 Wall Street Journal articles from the TIPSTER collection), and the same search topics. I present an initial analysis of behavioral data and retrieval performance data gathered from 60 end-users with no training in information retrieval who each performed searches on the baseline system and one of the relevance feedback systems. %M C.IR.95.372 %T An Automatic Method for Document Structuring %S Posters: Abstracts %A Masson, Nicolas %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 372-373 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X This article outlines a method for the structuring of expository texts which are not explicitly structured -- no sections and subsections are present. We first perform a quantitative segmentation consisting in finding the topic boundaries. This stage uses the td.idf coherence measure, a standard measure in Information Retrieval. Quantitative segmentation allows one to isolate sets of paragraphs which are "topically coherent", that is to say that this process divides the text into several distinct developments or parts. We then perform a qualitative segmentation by establishing the nature of the relations, such as cause, illustration, conclusion, explanation, which are present between thematic blocks and between sentences inside each block. To achieve this, we developed a linguistic analysis based upon clue words detection and making use of the thematic boundaries previously obtained. These lexical clues can be connectors (e.g. thus), variable expressions (e.g. anteposed prepositional phrases) or invariable ones (e.g. in conclusion), punctuation (e.g. interrogation), verbal tenses and mood (e.g. conditional), verb (e.g. to introduce) This text structuring method is the first component of a system for the automatic generation of abstracts. %M C.IR.95.373 %T Ambiguity of Negation in Natural Language Queries %S Posters: Abstracts %A McQuire, April %A Eastman, Caroline M. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 373 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X We address the problem posed by the handling of negation in natural language queries to information retrieval systems. Negated constructs tend to be ambiguous and difficult to handle in both vector space and Boolean systems. A major problem is identifying the intended scope of the negation. A survey was conducted using sample requests typical of those that might be posed to an information retrieval system. The responses indicate that subjects generally agreed on appropriate scope for some negated constructs but did not agree on others. In general, constructs more complicated than a conjunction of noun phrases were found to be ambiguous; most of these involved prepositional phrases. These results indicate that it is not possible for a natural language interface to automatically translate all instances of negation and that perhaps a clarification dialog should be used. Future work planned includes the design of a natural language system using such a clarification dialog to handle negation and the examination of potentially ambiguous constructs involving negation in a collection of real queries. %M C.IR.95.373 %T Navigation-Based Passage Retrieval %S Posters: Abstracts %A Melucci, Massimo %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 373 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X This work focuses on the navigation of hypertexts for Passage Retrieval (PR). In particular hypertexts that are automatically constructed from a large and heterogeneous collection of full-text documents have been considered to extract node-passages relevant to user's informative requirements. Full-text documents summarize different subjects and then they are containers of ambiguous words. In retrieving passages we have to select those excerpts that match narrow queries. For doing that a PR technique has to disambiguate the sense of words occurring in full-text documents. Most approaches to PR do not consider user's query because passages are often defined before, and independently of query. We are studying a navigation-based technique for PR from collections of large documents. The proposed technique is based on a methodology and a prototype for the automatic construction of hypertexts for IR. Users start to navigate the automatically constructed hypertext to retrieve the passages that are as close to his requirements as possible. In retrieving passages the navigation provides some useful information to disambiguate the sense of the passage terms because each passage term belongs to a semantically meaningful context, namely a passage, and it is related to the terms previously visited. %M C.IR.95.373 %T A Probabilistic Approach to Document Classification %S Posters: Abstracts %A Merialdo, Bernard %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 373-374 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X We propose a probabilistic approach to document classification, and experiment it on an application where a new article is automatically assigned to a Usenet newsgroup. Each newsgroup is represented by a probabilistic language model (based on unigrams). A Maximum A Posteriori rule is used to decide which newsgroup generated the article with the highest probability. We experiment this approach on newsgroups dealing with various facets of Artificial Intelligence, and we try to guess from the body of an article the precise group it was posted to. First, a set of efficient keywords is automatically extracted from training data using a maximum precision criterion. A keyword-based approach is compared with the probabilistic approach and evaluated on the same test data, both for recall and precision rates, using various sizes for the vocabulary. Experiments indicate that the probabilistic approach is more efficient that the keyword-based approach. In the keyword case, increasing the number of keywords always increases the number of documents selected, and thus the recall rate. This is not so in the probabilistic case, because considering more words provides more information to the decision rule, so that the size of the vocabulary has to be chosen carefully for a maximum efficiency. %M C.IR.95.374 %T Does Probabilistic Datalog Meet the Requirements of Imaging? %S Posters: Abstracts %A Rolleke, Thomas %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 374 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X Information retrieval may be described as the process of selecting those documents that logically imply the query. The desired ranking of the documents according to their relevance to the query is obtained by computing a related probability. This computation is based on a mechanism called imaging. Probabilistic Datalog enables the modelling of information retrieval as uncertain inference. The expressiveness of probabilistic Datalog is especially suitable for hypermedia retrieval, since it allows for the mapping of the complex structure of hypermedia objects. Classical information retrieval models can be implemented using probabilistic Datalog. This paper discusses if probabilistic Datalog meets the requirements of imaging. What are the impacts of the different views on possible worlds? Should we implement imaging on top of probabilistic Datalog or should we incorporate imaging into the inference mechanism? The illustration of an implementation of the retrieval example given in (Crestani and Rijsbergen, 1995) on top of probabilistic Datalog shows the modelling of imaging regarding possible worlds as objects. %M C.IR.95.374 %T A New Approach for Textual Information Retrieval %S Posters: Abstracts %A Rungsawang, Arnon %A Rajman, Martin %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 374 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X In textual information retrieval, the vector space retrieval model has proven its robustness in manipulating large collections of unrestricted natural language text. In our approach, we try to improve the retrieval effectiveness of this model by introducing the notion of distributional semantics. The content of retrievable units or text excerpts, as well as user queries, are represented in a unified way as projections in a vector space of pertinent terms. These projections are derived from co-occurrence matrices computed on large reference text corpora collecting the distributional semantic information. Different measures of similarity may be used to characterize the proximity between user queries and related texts. In our first experiments, we use the cosine similarity measure. %M C.IR.95.374 %T CONVECTIS Context Vector-Based Indexing System %S Posters: Abstracts %A Sasseen, Robert V. %A Carleton, Joel L. %A Caid, William R. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 374-375 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X HNC Software Inc. has developed a system called CONVECTIS for automatically indexing free text documents. CONVECTIS uses HNC's context vector representation of text, which encodes similarity of meaning at the word level, and is learned automatically from free text examples. The key new feature of CONVECTIS is its use of supervised learning based on relevance feedback to tune the system's indexing behavior. This approach to indexing is also directly applicable to the document routing task. CONVECTIS has been demonstrated on datasets of gigabyte size and is currently being used by a large newswire company. The learning procedure typically achieves close to 100% precision and recall on training documents. For test documents not trained on, preliminary results indicate that performance in the 80-90% range for both recall and precision is commonly obtained. CONVECTIS is implemented in a client-server configuration and indexes over 10,000 documents per hour on a dual-CPU Sun Sparc20 (based on 3KB/doc with markup, 5000 index term context vectors). %M C.IR.95.375 %T First Experiences with a Speech Retrieval System %S Posters: Abstracts %A Schauble, Peter %A Wechsler, Martin %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 375 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X We present a speech retrieval system aimed at retrieving information from audio recordings containing speech. The current system contains 4.5 hours of radio news and accepts textual queries. The fully automatic indexing was done using speech recognition techniques. Indexing speech documents is challenging because word boundaries are difficult to detect and recognition errors influence the retrieval effectiveness. The indexing process is done in the following steps. First, a speaker dependent phone recognizer produces phonetic transcriptions of the audio recordings. Using those transcriptions, phone sequences of various lengths are selected as indexing features. We have developed an efficient algorithm for selecting indexing features. Its output is a set of 5000 phone sequences of medium collection frequency, covering most parts of the audio recordings. The final indexing is done by simply locating phone sequences in the phonetic transcriptions. Queries are entered as text and are indexed similarly using a phonetic dictionary. We show that useful information can be found with the system. Some of the selected features are similar to reduced words and have a positive influence on the indexing. This method can be further improved by taking into account recognition errors either in the selection or in the indexing process. %M C.IR.95.375 %T Dynamic Allocation of Signature File for Multimedia Document Using Parallel Devices %S Posters: Abstracts %A Shan, Man-Kwan %A Lee, Suh-Yin %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 375 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X Signature file access method is efficient for processing of content-based retrieval of multimedia database. In large multimedia database server, parallel device is utilized to achieve concurrent access. Efficient allocation of signature file on parallel device minimizes the query response time and is important in the design of large multimedia databases. In this paper, we propose a new dynamic allocation technique to distribute the signature file on parallel devices. It is an improvement of previous approach, Fragmented Signature File. While Fragmented Signature File distributes the partitioned frame signature file by using Quick Filter, the proposed Parallel Signature File distributes by using a disk allocation technique. The proposed Parallel Signature File has some advantages. First, the qualified blocks are distributed more uniformly than Fragmented Signature File. Second, it can be used in dynamic environment. Third, the blocks allocated in each processing unit can be clustered to reduce the disk random access time. Performance analysis shows that the proposed approach outperforms that of Fragmented Signature File and is not far from theoretical optimal response time. Besides, the performance of Parallel Signature File has significant improvement than that of Fragmented Signature File, especially in the application domain of multimedia. %M C.IR.95.375 %T Document Expansion Applied to Classification: Weighting of Additional Terms %S Posters: Abstracts %A Sta, Jean-David %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 375-376 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p369-liddy/p369-liddy.pdf %X In information retrieval, one can describe documents by the terms they contain. To improve the effectiveness of retrieval, the expansion method extends terms with related terms. It includes two steps: the selection and the weighting of terms to be added. This experiment involves comparing different systems for weighting the new terms of the expansion for automatic document classification, a problem somewhat similar to information retrieval. Document expansion occurs automatically and uses a thesaurus. The majority of weighting methods assign to new terms a weight equal to the expanded term multipied by a constant. This model has the disadvantage of modifying the similarity between documents when the information provided by the expansion should not interfere. In this specific instance, the model proposed here meets the invariance of similarity constraint and substantially improves document classification. When the similarity is the cosine, the space of solutions of this constraint applied to the expansion of a term t in q terms, is the hypersphere of radius, the weight w of t. The expansion function is then linear. One solution experimented here, is to assign to the q terms, w divided by the square root of q. %M C.IR.95.377 %T VIRI: Visual Information Retrieval Interfaces %S Post-Conference Research Workshops %A Korfhage, Robert H. %A Lin, Xia %A Dubin, David S. %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 377 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p377-korfhage/p377-korfhage.pdf %X A visual information retrieval interface (VIRI) is defined as one that uses graphic elements in addition to text to aid the solution of a problem related to information storage and retrieval. More than twenty such interfaces already exist, with different retrieval models, graphical metaphors, and user interactions. Furthermore, the interfaces have different strengths, for example, retrieval, browsing, and document classification. The focus of the workshop is to exchange information, and to begin development of a method for comparing these interfaces. Researchers and practitioners who are actively working on VIRI projects are particularly invited to participate. Some effort will be put into developing a classification scheme for VIRIs and identifying major research issues related to visual interfaces. Following this the discussion will center on identifying test collections and developing experimental tasks and measures that will provide a sound basis for comparing and evaluating the interfaces. %M C.IR.95.377 %T Z39.50 and the IR Research Community %S Post-Conference Research Workshops %A Lynch, Clifford %A Larson, Ray %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 377-378 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p377-lynch/p377-lynch.pdf %X The Z39.50 Computer-to-computer retrieval protocol is an increasingly mature US national standard (version 3 is currently in the ballot process as of early 1995); it is widely implemented both in the US and, increasingly, also seeing use internationally, particularly in Europe. Z39.50 is potentially of great importance to the IR research community for several reasons: * Because Z39.50 provides a means of separating a user interface from a retrieval system, it allows research in clients and user interfaces to proceed independently from research in back-end retrieval engines, and, of particular importance, allows new user interfaces to be tested against very large production databases. It also allows new experimental retrieval systems to be offered to large user communities through familiar interfaces. * Z39.50 can form the linkage between a number of large-scale research projects that involve the IR community, such as the various Digital Library efforts. * Z39.50 raises and provides a concrete framework to explore a number of important research issues in its own right about the design of interoperable clients and servers for information retrieval, the representation and exchange of metadata about information servers, and related matters. The workshop has several goals: * To introduce the broad IR community to Z39.50, including its history, its current status, its function, and implementation progress; * To highlight several IR research projects that are exploiting Z39.50 today; * To sketch some of the research issues that are raised by Z39.50. After an introduction delineating the history of Z39.50 and the current status of implementations, a short tutorial will explain the operation of the protocol. The second part of the workshop will include two panels: one about the use of Z39.50 to support IR research, and another about research issue in information retrieval protocols. Attendees will be invited to contribute to the discussion. %M C.IR.95.378 %T Information Retrieval and Databases %S Post-Conference Research Workshops %A Harper, David %A Schauble, Peter %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 378 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p378-harper/p378-harper.pdf %X The integration of database management systems and information retrieval systems is of great practical interests. There are, however, hard research problems that remain to be solved. The workshop aim is to assist the information retrieval community in understanding the integration problems and to set up a research agenda. The workshop will include short presentations on the following topics: Architecture: loosely coupled, tightly coupled, total integration; does the DBMS control the IRS or vice versa; support for distributed computing. Retrieval Model and Query Language: reconciling classical DB retrieval and classical (weighted) IR retrieval; retrieval models taking advantage of DB schema; treating DB attributes in an IR way, e.g. in a probabilistic way; integration query languages for IR/DB systems; query processing/optimization. Concurrence Control and Transaction Management: concurrence control on the IR index; is ACID enough or is ACID too much for IR; new transaction models (nested transactions); long lived transactions (for indexing). Performance: new access structures; new buffering schemes (caches); retrieval performance on dynamic data; insertion, deletion, modification performance; scalability (parallel architectures); identify bottlenecks. After the presentations, attendees will participate in round table discussions about each topic. To allow this to proceed in a workshop atmosphere, the workshop is restricted to 30 participants. %M C.IR.95.378 %T Curriculum Development in Computer Information Science: A Framework for Developing a New Curriculum in IR %S Post-Conference Research Workshops %A Fox, Edward A. %A Lidtke, Doris K. %A Mulder, Michael C. %A Rasmussen, Edie M. %A Taghva, Kazem %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 378-379 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p378-lidtke/p378-lidtke.pdf %X In this one-day workshop, Doris Lidtke and Michael Mulder will report on their extensive experience in the development of new curricula in computer information science, emphasizing preparation of students to deal with large scale information systems AND new paradigms of learning/teaching. Topics to be covered by the workshop leaders include: (1) involvement of the stakeholders -- employers, faculty, and instructional/curriculum designers; (2) determining content -- both depth and breadth; (3) validation by the stakeholders; (4) packaging -- knowledge units vs. courses; (5) special delivery mechanisms, and (6) essential/desired infrastructure to support the new/revised curriculum. These topics will provide a framework for discussion of curriculum development in information retrieval. Individuals and groups representing various points of view (library and information science, computing science, MIS, information systems, business, government and academia) will be invited to prepare submissions and act as group leaders. An opportunity will be provided for attendees to participate in working groups developing an IR curriculum in their area of interest. %M C.IR.95.379 %T IR and Automatic Construction of Hypermedia %S Post-Conference Research Workshops %A Agosti, Maristella %A Allan, James %B Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval %D 1995-07-09 %P 379 %* (c) Copyright 1995 Association for Computing Machinery %W http://www.acm.org/pubs/articles/proceedings/ir/215206/p379-agosti/p379-agosti.pdf %X The workshop will address IR methods and tools that can be used in the automatic construction of a hypermedia base to produce an informative hypertext collection of documents that can be searched and browsed by content. Passage retrieval is one of the methods that can be used in the segmentation of documents in a collection of flat documents for hypermedia information retrieval design. This method, as well as other methods for automatic authoring of hypermedia bases will be presented and discussed in the workshop. Both techniques that construct a hypertext from an unlinked set of data and those that can be applied to an existing hypertext/media permitting augmention of its set of links are relevant to the workshop. Typing of links in the resulting hypertext needs to be addressed as well as having both static and dynamic links in the resulting hypertext. The workshop also will address evaluation of the quality of hypertext collections and their construction. After the presentations of a few position papers, the participants will discuss specific methods or other topics of interest. The workshop will conclude with the approval of a short working paper presenting all the methods that the participants deem useful for automatic construction of hypermedia.