| In Memoriam: Gerard Salton | | BIB | 1 | |
| A Visual Retrieval Environment for Hypermedia Information Systems | | BIBAK | PDF | 3-29 | |
| Dario Lucarella; Antonella Zanzi | |||
| We present a graph-based object model that may be used as a uniform
framework for direct manipulation of multimedia information. After an
introduction motivating the need for abstraction and structuring mechanisms in
hypermedia systems, we introduce the data model and the notion of perspective,
a form of data abstraction that acts as a user interface to the system,
providing control over the visibility of the objects and their properties. A
perspective is defined to include an intension and an extension. The intension
is defined in terms of a pattern, a subgraph of the schema graph, and the
extension is the set of pattern-matching instances. Perspectives, as well as
database schema and instances, are graph structures that can be manipulated in
various ways. The resulting uniform approach is well suited to a visual
interface. A visual interface for complex information systems provides high
semantic power, thus exploiting the semantic expressibility of the underlying
data model, while maintaining ease of interaction with the system. In this
way, we reach the goal of decreasing cognitive load on the user, with the
additional advantage of always maintaining the same interaction style. We
present a visual retrieval environment that effectively combines filtering,
browsing, and navigation to provide an integrated view of the retrieval
problem. Design and implementation issues are outlined for MORE (Multimedia
Object Retrieval Environment), a prototype system relying on the proposed
model. The focus is on the main user interface functionalities, and actual
interaction sessions are presented including schema creation, information
loading, and information retrieval. Keywords: Browsing, Complex objects, Direct object manipulation, Graph-Oriented
models, Hypermedia applications, Information filtering, Visual interface,
Design, Human factors, Management, H.5.1 Information Systems, Information
interfaces and presentation, Multimedia Information Systems, Hypertext
navigation and maps, H.2.1 Information Systems, Database management, Logical
Design, Data models, H.3.3 Information Systems, Information storage and
retrieval, Information Search and Retrieval, Query formulation, H.3.3
Information Systems, Information storage and retrieval, Information Search and
Retrieval, Selection process, H.5.2 Information Systems, Information interfaces
and presentation, User Interfaces, Interaction styles | |||
| Sequential Patterns in Information Systems Development: An Application of a Social Process Model | | BIBAK | PDF | 30-63 | |
| Daniel Robey; Michael Newman | |||
| We trace the process of developing and implementing a materials management
system in one company over a 15-year period. Using a process research model
developed by Newman and Robey, we identify 44 events in the process and define
them as either encounters or episodes. Encounters are concentrated events,
such as meetings and announcements, that separate episodes, which are events of
longer duration. By examining the sequence of events over the 15 years of the
case, we identify a pattern of repeated failure, followed by success. Our
discussion centers on the value of detecting and displaying such patterns and
the need for theoretical interpretation of recurring sequences of events. Five
alternative theoretical perspectives, originally proposed by Kling, are used to
interpret the sequential patterns identified by the model. We conclude that
the form of the process model allows researchers who operate from different
perspectives to enrich their understanding of the process of system
development. Keywords: Social processes, System implementation, Human factors, Management, K.6.1
Computing Milieux, Management of computing and information systems, Project and
People Management, H.4.2 Information Systems, Information systems applications,
Types of Systems | |||
| Evaluation of Model-Based Retrieval Effectiveness with OCR Text | | BIBAK | PDF | 64-93 | |
| Kazem Taghva; Julie Borsack; Allen Condit | |||
| We give a comprehensive report on our experiments with retrieval from
OCR-generated text using systems based on standard models of retrieval. More
specifically, we show that average precision and recall is not affected by OCR
errors across systems for several collections. The collections used in these
experiments include both actual OCR-generated text and standard information
retrieval collections corrupted through the simulation of OCR errors. Both the
actual and simulation experiments include full-text and abstract-length
documents. We also demonstrate that the ranking and feedback methods
associated with these models are generally not robust enough to deal with OCR
errors. It is further shown that the OCR errors and garbage strings generated
from the mistranslation of graphic objects increase the size of the index by a
wide margin. We not only point out problems that can arise from applying OCR
text within an information retrieval environment, we also suggest solutions to
overcome some of these problems. Keywords: Error correction, Feedback, Optical character recognition, Ranking
algorithms, Experimentation, Performance, H.3.3 Information Systems,
Information storage and retrieval, Information Search and Retrieval, Retrieval
models, H.3.1 Information Systems, Information storage and retrieval, Content
Analysis and Indexing, Indexing methods, H.3.3 Information Systems, Information
storage and retrieval, Information Search and Retrieval, Search process, I.4.1
Computing Methodologies, Image processing and computer vision, Digitization and
Image Capture, Scanning | |||
| An Extension of Ukkonen's Enhanced Dynamic Programming ASM Algorithm | | BIBAK | PDF | 94-106 | |
| Hal Berghel; David Roach | |||
| We describe an improvement on Ukkonen's Enhanced Dynamic Programming (EHD)
approximate string-matching algorithm for unit-penalty four-edit comparisons.
The new algorithm has an asymptotic complexity similar to that of Ukkonen's but
is significantly faster due to a decrease in the number of array cell
calculations. A 42% speedup was achieved in an application involving name
comparisons. Even greater improvements are possible when comparing longer and
more dissimilar strings. Although the speed of the algorithm under
consideration is comparable to other fast ASM algorithms, it has greater
effectiveness in text-processing applications because it supports all four
basic Damerau-type editing operations. Keywords: Approximate string matching, Dynamic programming, Enhanced dynamic
programming, Similarity relations, Algorithms, Performance, F.2.2 Theory of
Computation, Analysis of algorithms and problem complexity, Nonnumerical
Algorithms and Problems, Pattern matching, H.3.1 Information Systems,
Information storage and retrieval, Content Analysis and Indexing, H.4.1
Information Systems, Information systems applications, Office Automation, H.3.3
Information Systems, Information storage and retrieval, Information Search and
Retrieval, Search process | |||
| Document Ranking on Weight-Partitioned Signature Files | | BIBAK | PDF | 109-137 | |
| Dik Lun Lee; Liming Ren | |||
| A signature file organization, called the weight-partitioned signature file,
for supporting document ranking is proposed. It employs multiple signature
files, each of which corresponds to one term frequency, to represent terms with
different term frequencies. Words with the same term frequency in a document
are grouped together and hashed into the signature file corresponding to that
term frequency. This eliminates the need to record the term frequency
explicitly for each word. We investigate the effect of false drops on
retrieval effectiveness if they are not eliminated in the search process. We
have shown that false drops introduce insignificant degradation on precision
and recall when the false-drop probability is below a certain threshold. This
is an important result since false-drop elimination could become the bottleneck
in systems using fast signature file search techniques. We perform an
analytical study on the performance of the weight-partitioned signature file
under different search strategies and configurations. An optimal formula is
obtained to determine for a fixed total storage overhead the storage to be
allocated to each partition in order to minimize the effect of false drops on
document ranks. Experiments were performed using a document collection to
support the analytical results. Keywords: Access method, Document retrieval, Information retrieval, Signature file,
Superimposed coding, Text retrieval, Algorithms, Design, Experimentation,
Performance, H.3.3 Information Systems, Information storage and retrieval,
Information Search and Retrieval, Retrieval models, H.2.2 Information Systems,
Database management, Physical Design, Access methods, H.3.6 Information
Systems, Information storage and retrieval, Library Automation, H.3.1
Information Systems, Information storage and retrieval, Content Analysis and
Indexing | |||
| Using Local Optimality Criteria for Efficient Information Retrieval with Redundant Information Filters | | BIBAK | PDF | 138-174 | |
| Neil C. Rowe | |||
| We consider information retrieval when the data -- for instance, multimedia
-- is computationally expensive to fetch. Our approach uses "information
filters" to considerably narrow the universe of possibilities before retrieval.
We are especially interested in redundant information filters that save time
over more general but more costly filters. Efficient retrieval requires that
decisions must be made about the necessity, order, and concurrent processing of
proposed filters (an "execution plan"). We develop simple polynomial-time
local criteria for optimal execution plans and show that most forms of
concurrency are suboptimal with information filters. Although the general
problem of finding an optimal execution plan is likely to be exponential in the
number of filters, we show experimentally that our local optimality criteria,
used in a polynomial-time algorithm, nearly always find the global optimum with
15 filters or less, a sufficient number of filters for most applications. Our
methods require no special hardware and avoid the high processor idleness that
is characteristic of massive-parallelism solutions to this problem. We apply
our ideas to an important application, information retrieval of captioned data
using natural-language understanding, a problem for which the natural-language
processing can be the bottleneck if not implemented well. Keywords: Boolean algebra, Conjunction, Filters, Natural language, Optimization,
Queries, Performance, H.3.3 Information Systems, Information storage and
retrieval, Information Search and Retrieval, Search process | |||
| TROLL: A Language for Object-Oriented Specification of Information Systems | | BIBAK | PDF | 175-211 | |
| Ralf Jungclaus; Gunter Saake; Thorsten Hartmann; Cristina Sernadas | |||
| TROLL is a language particularly suited for the early stages of information
system development, when the universe of discourse must be described. In TROLL
the descriptions of the static and dynamic aspects of entities are integrated
into object descriptions. Sublanguages for data terms, for first-order and
temporal assertions, and for processes, are used to describe respectively the
static properties, the behavior, and the evolution over time of objects. TROLL
organizes system design through object-orientation and the support of
abstractions such as classification, specialization, roles, and aggregation.
Language features for state interactions and dependencies among components
support the composition of the system from smaller modules, as does the
facility of defining interfaces on top of object descriptions. Keywords: Formal specification, Information system design, Object-oriented conceptual
modeling, Design, Languages, D.2.1 Software, Software engineering,
Requirements/Specifications, Languages, D.3.3 Software, Programming languages,
Language Constructs and Features, H.1.0 Information Systems, Models and
principles, General, D.3.2 Software, Programming languages, Language
Classifications | |||
| Computerized Performance Monitors as Multidimensional Systems: Derivation and Application | | BIBAK | PDF | 212-235 | |
| Rebecca A. Grant; Chris A. Higgins | |||
| An increasing number of companies are introducing computer technology into
more aspects of work. Effective use of information systems to support office
and service work can improve staff productivity, broaden a company's market, or
dramatically change its business. It can also increase the extent to which
work is computer mediated and thus within the reach of software known as
Computerized Performance Monitoring and Control Systems (CPMCSs). Virtually
all research has studied CPMCSs as unidimensional systems. Employees are
described as "monitored" or "unmonitored" or as subject to "high," "moderate,"
or "low" levels of monitoring. Research that does not clearly distinguish
among possible monitor design cannot explain how designs may differ in effect.
Nor can it suggest how to design better monitors. A multidimensional view of
CPMCSs describes monitor designs in terms of object of measurements, tasks
measured, recipient of data, reporting period, and message content. This view
is derived from literature in control systems, organizational behavior, and
management information systems. The multidimensional view can then be
incorporated into causal models to explain contradictory results of earlier
CPMCS research. Keywords: Computerized performance evaluation, Computerized work monitoring, Work
monitoring system design, Measurement, Management, Performance, Theory, H.4.2
Information Systems, Information systems applications, Types of Systems,
Logistics, K.4.3 Computing Milieux, Computers and society, Organizational
Impacts, H.1.2 Information Systems, Models and principles, User/Machine
Systems, Human factors, K.7.m Computing Milieux, The computing profession,
Miscellaneous, C.4 Computer Systems Organization, Performance of systems,
Modeling techniques, H.4.1 Information Systems, Information systems
applications, Office Automation, Time management | |||
| Natural-Language Retrieval of Images Based on Descriptive Captions | | BIBAK | PDF | 237-267 | |
| Eugene J. Guglielmo; Neil C. Rowe | |||
| We describe a prototype intelligent information retrieval system that uses
natural-language understanding to efficiently locate captioned data.
Multimedia data generally require captions to explain their features and
significance. Such descriptive captions often rely on long nominal compounds
(strings of consecutive nouns) which create problems of disambiguating word
sense. In our system, captions and user queries are parsed and interpreted to
produce a logical form using a detailed theory of the meaning of nominal
compounds. A fine-grain match can then compare the logical form of the query
to the logical forms for each caption. To improve system efficiency, we first
perform a coarse-grain match with index files, using nouns and verbs extracted
from the query. Our experiments with randomly selected queries and captions
from an existing image library show an increase of 30% in precision and 50% in
recall over the keyphrase approach currently used. Our processing times have a
median of seven seconds as compared to eight minutes for the existing system,
and our system is much easier to use. Keywords: Captions, Multimedia database, Type hierarchy, Algorithms, Experimentation,
Human factors, Performance, H.3.3 Information Systems, Information storage and
retrieval, Information Search and Retrieval, Selection process, H.3.3
Information Systems, Information storage and retrieval, Information Search and
Retrieval, Search process, H.3.3 Information Systems, Information storage and
retrieval, Information Search and Retrieval, Query formulation, H.3.1
Information Systems, Information storage and retrieval, Content Analysis and
Indexing, Indexing methods, H.3.1 Information Systems, Information storage and
retrieval, Content Analysis and Indexing, Linguistic processing, I.2.4
Computing Methodologies, Artificial intelligence, Knowledge Representation
Formalisms and Methods, Predicate logic, I.2.7 Computing Methodologies,
Artificial intelligence, Natural Language Processing, Language parsing and
understanding | |||
| Extending Object-Oriented Systems with Roles | | BIBAK | PDF | 268-296 | |
| Georg Gottlob; Michael Schrefl; Brigitte Rock | |||
| In many class-based object-oriented systems the association between as
instance and a class is exclusive and permanent. Therefore these systems have
serious difficulties in representing objects taking on different roles over
time. Such objects must be reclassified any time they evolve (e.g., if a
person becomes a student and later an employee). Class hierarchies must be
planned carefully and may grow exponentially if entities may take on several
independent roles. The problem is even more severe for object-oriented
databases than for common object-oriented programming. Databases store objects
over longer periods, during which the represented entities evolve. This
article shows how class-based object-oriented systems can be extended to handle
evolving objects well. Class hierarchies are complemented by role hierarchies,
whose nodes represent role types an object classified in the root may take on.
At any point in time, an entity is represented by an instance of the root and
an instance of every role type whose role it currently plays. In a natural
way, the approach extends traditional object-oriented concepts, such as
classification, object identity, specialization, inheritance, and polymorphism
in a natural way. The practicability of the approach is demonstrated by an
implementation in Smalltalk. Smalltalk was chosen because it is widely known,
which is not true for any particular class-based object-oriented database
programming language. Roles can be provided in Smalltalk by adding a few
classes. There is no need to modify the semantics of Smalltalk itself. Role
hierarchies are mapped transparently onto ordinary classes. The presented
implementation can easily be ported to object-oriented database programming
languages based on Smalltalk, such as Gemstone's OPAL hierarchies are
complemented by role hierarchies, whose nodes represent role types an object
classified in the root may take on. At any point in time, an entity is
represented by an instance of the root and an instance of every role type whose
role in currently plays. Keywords: Delegation, Inheritance, Object-oriented databases, Object specialization,
Roles, Design, Languages, D.1.5 Software, Programming techniques,
Object-oriented Programming, D.2.10 Software, Software engineering, Design,
Methodologies, D.2.10 Software, Software engineering, Design, Representation,
D.3.3 Software, Programming languages, Language Constructs and Features, H.2.3
Information Systems, Database management, Languages, Database (persistent)
programming languages, D.2.1 Software, Software engineering,
Requirements/Specifications | |||
| A General Explanation Component for Conceptual Modeling in CASE Environments | | BIBAK | PDF | 297-329 | |
| Jon Atle Gulla | |||
| In information systems engineering, conceptual models are constructed to
assess existing information systems and work out requirements for new ones. As
these models serve as a means for communication between customers and
developers, it is paramount that both parties understand the models, as well as
that the models form a proper basis for the subsequent design and
implementation of the systems. New CASE environments are now experimenting
with formal modeling languages and various techniques for validating conceptual
models, though it seems difficult to come up with a technique that handles the
linguistic barriers between the parties involved in a satisfactory manner. In
this article, we discuss the theoretical basis of an explanation component
implemented for the PPP CASE environment. This component integrates other
validation techniques and provides a very flexible natural-language interface
to complex model information. It describes properties of the modeling language
and the conceptual models in terms familiar to users, and the explanations can
be combined with graphical model views. When models are executed, it can
justify requested inputs and explain computed outputs by relating trace
information to properties of the models. Keywords: Conceptual modeling, Explanation generation, Help systems, Linguistics,
Paraphrasing, Requirements engineering, Design, Documentation, Human factors,
D.2.2 Software, Software engineering, Design Tools and Techniques,
Computer-aided software engineering (CASE), I.2.7 Computing Methodologies,
Artificial intelligence, Natural Language Processing, C.4 Computer Systems
Organization, Performance of systems, Modeling techniques | |||
| Bias in Computer Systems | | BIBAK | PDF | 330-347 | |
| Batya Friedman; Helen Nissenbaum | |||
| From an analysis of actual cases, three categories of bias in computer
systems have been developed: preexisting, technical, and emergent. Preexisting
bias has its roots in social institutions, practices, and attitudes. Technical
bias arises from technical constraints of considerations. Emergent bias arises
in a context of use. Although others have pointed to bias in particular
computer systems and have noted the general problem, we know of no comparable
work that examines this phenomenon comprehensively and which offers a framework
for understanding and remedying it. We conclude by suggesting that freedom
from bias should by counted among the select set of criteria -- including
reliability, accuracy, and efficiency -- according to which the quality of
systems in use in society should be judged. Keywords: Bias, Computer ethics, Computers and society, Design methods, Ethics, Human
values, Standards, Social computing, Social impact, System design, Universal
design, Values, Design, Human factors, H.1.2 Information Systems, Models and
principles, User/Machine Systems, D.2.0 Software, Software engineering,
General, K.4.0 Computing Milieux, Computers and society, General | |||
| Self-Indexing Inverted Files for Fast Text Retrieval | | BIBAK | PDF | 349-379 | |
| Alistair Moffat; Justin Zobel | |||
| Query-processing costs on large text databases are dominated by the need to
retrieve and scan the inverted list of each query term. Retrieval time for
inverted lists can be greatly reduced by the use of compression, but this adds
to the CPU time required. Here we show that the CPU component of query
response time for conjunctive Boolean queries and for informal ranked queries
can be similarly reduced, at little cost in terms of storage, by the inclusion
of an internal index in each compressed inverted list. This method has been
applied in a retrieval system for a collection of nearly two million short
documents. Our experimental results show that the self-indexing strategy adds
less than 20% to the size of the compressed inverted file, which itself
occupies less than 10% of the indexed text, yet can reduce processing time for
Boolean queries of 5-10 terms to under one fifth of the previous cost.
Similarly, ranked queries of 40-50 terms can be evaluated in as little as 25%
of the previous time, with little or no loss of retrieval effectiveness. Keywords: Full-text retrieval, Index compression, Information retrieval, Inverted
file, Query processing, Design, Performance, H.3.1 Information Systems,
Information storage and retrieval, Content Analysis and Indexing, Indexing
methods, E.4 Data, Coding and information theory, Data compaction and
compression, H.3.2 Information Systems, Information storage and retrieval,
Information Storage, File organization, H.3.3 Information Systems, Information
storage and retrieval, Information Search and Retrieval, Search process | |||
| Information System Behavior Specification by High Level Petri Nets | | BIBAK | PDF | 380-420 | |
| Andreas Oberweis; Peter Sander | |||
| The specification of an information system should include a description of
structural system aspects as well as a description of the system behavior. In
this article, we show how this can be achieved by high-level Petri nets --
namely, the so-called NR/T-nets (Nested-Relation/Transition Nets). In
NR/T-nets, the structural part is modeled by nested relations, and the
behavioral part is modeled by a novel Petri net formalism. Each place of a net
represents a nested relation scheme, and the marking of each place is given as
a nested relation of the respective type. Insert and delete operations in a
nested relational database (NF2-database) are expressed by transitions in a
net. These operations may operate not only on whole tuples of a given
relation, but also on "subtuples" of existing tuples. The arcs of a net are
inscribed with so-called Filter Tables, which allow (together with an optional
logical expression as transition inscription) conditions to be formulated on
the specified (sub-) tuples. The occurrence rule for NR/T-net transitions is
defined by the operations union, intersection, and "negative" in lattices of
nested relations. The structure of an NR/T-net, together with the occurrence
rule, defines classes of possible information system procedures, i.e.,
sequences of (possibly concurrent) operations in an information system. Keywords: Behavior specification, Complex objects, Conceptual design, Nested
relations, Petri nets, Design, Languages, Management, H.1.1 Information
Systems, Models and principles, Systems and Information Theory, General systems
theory, H.2.1 Information Systems, Database management, Logical Design, Data
models, H.2.3 Information Systems, Database management, Languages, Data
manipulation languages (DML) | |||
| The Model-Assisted Global Query System for Multiple Databases in Distributed Enterprises | | BIBAK | PDF | 421-470 | |
| Waiman Cheung; Cheng Hsu | |||
| Today's enterprises typically employ multiple information systems, which are
independently developed, locally administered, and different in logical or
physical designs. Therefore, a fundamental challenge in enterprise information
management is the sharing of information for enterprise users across
organizational boundaries; this requires a global query system capable of
providing on-line intelligent assistance to users. Conventional technologies,
such as schema-based query languages and hard-coded schema integration, are not
sufficient to solve this problem. This article develops a new approach, a
"model-assisted global query system," that utilizes an on-line repository of
enterprise metadata -- the Metadatabase -- to facilitate global query
formulation and processing with certain desirable properties such as
adaptiveness and open-systems architecture. A definitional model
characterizing the various classes and roles of the required metadata as
knowledge for the system is presented. The significance of possessing this
knowledge (via a Metadatabase) toward improving the global query capabilities
available previously is analyzed. On this basis, a direct method using model
traversal and a query language using global model constructs are developed
along with other new methods required for this approach. It is then tested
through a prototype system in a computer-integrated manufacturing (CIM)
setting. Keywords: Enterprise integration, Global query system, Heterogeneous distributed
information systems, Metadatabase, Design, Performance, H.2.4 Information
Systems, Database management, Systems, Query processing, H.2.3 Information
Systems, Database management, Languages, Query languages, H.2.4 Information
Systems, Database management, Systems, Distributed databases, H.2.7 Information
Systems, Database management, Database Administration, Data
dictionary/directory, H.3.3 Information Systems, Information storage and
retrieval, Information Search and Retrieval, Query formulation, H.5.2
Information Systems, Information interfaces and presentation, User Interfaces | |||