| Hyperdocuments as Automata: Verification of Trace-Based Browsing Properties by Model Checking | | BIBAK | PDF | 1-30 | |
| P. David Stotts; Richard Furuta; Cyrano Ruiz Cabarrus | |||
| We present a view of hyperdocuments in which each document encodes its own
browsing semantics in its links. This requires a mental shift in how a
hyperdocument is thought of abstractly. Instead of treating the links of a
document as defining a static directed graph, they are thought of as defining
an abstract program, termed the links-automaton of the document. A branching
temporal logic notation, termed HTL*, is introduced for specifying properties a
document should exhibit during browsing. An automated program verification
technique called model checking is used to verify that browsing specifications
in a subset of HTL* are met by the behavior defined in the links-automation.
We illustrate the generality of these techniques by applying them first to
several Trellis documents and then to a Hyperties document. Keywords: Browsing semantics, Hypermedia, Hypertext, Model checking, Petri nets,
Temporal logic, VERIFICATION, D.2.2 Software, SOFTWARE ENGINEERING, Design
Tools and Techniques, Petri nets, D.2.4 Software, SOFTWARE ENGINEERING,
Software/Program Verification, Assertion checkers, F.3.1 Theory of Computation,
LOGICS AND MEANINGS OF PROGRAMS, Specifying and Verifying and Reasoning about
Programs, Mechanical verification, H.5.1 Information Systems, INFORMATION
INTERFACES AND PRESENTATION, Multimedia Information Systems, Hypertext
navigation and maps, I.7.2 Computing Methodologies, DOCUMENT AND TEXT
PROCESSING, Document Preparation, Hypertext/hypermedia | |||
| Evaluation of an Algorithm for Finding a Match of a Distorted Texture Pattern in a Large Image Database | | BIBAK | PDF | 31-60 | |
| N. Vujovic; D. Brazakovic | |||
| Evaluation of an algorithm for finding a match for a random texture pattern
in a large image database is presented. The algorithm was designed assuming
that the random pattern may be subject to misregistration relative to its
representation in the database and assuming that it may have missing parts.
The potential applications involve authentication of legal documents, bank
notes, or credit cards, where thin fibers are embedded randomly into the
document medium during medium fabrication. The algorithm achieves image
matching by a three-step hierarchical procedure, which starts by matching parts
of fiber patterns while solving the misregistration problem and ends up by
matching complete fiber patterns. Performance of the algorithm is studied both
theoretically and experimentally. Theoretical analysis includes the study of
the probability that two documents have the same pattern, and the probability
of the algorithm establishing a wrong match, as well as the algorithm's
performance in terms of processing time. Experiments involving over 250,000
trials using databases of synthetic documents, containing up to 100,000
documents, were used to confirm theoretical predictions. In addition,
experiments involving a database containing real images were conducted in order
to confirm that the algorithm has potential in real applications. Keywords: Image database, Image matching, Misintegration, Random pattern, Presentation
of information, ALGORITHMS, EXPERIMENTATION, H.3.3 Information Systems,
INFORMATION STORAGE AND RETRIEVAL, Information Search and Retrieval, I.2.4
Computing Methodologies, ARTIFICIAL INTELLIGENCE, Knowledge Representation
Formalisms and Methods, I.4.3 Computing Methodologies, IMAGE PROCESSING AND
COMPUTER VISION, Enhancement, Filtering, I.4.3 Computing Methodologies, IMAGE
PROCESSING AND COMPUTER VISION, Enhancement, Grayscale manipulation, I.4.7
Computing Methodologies, IMAGE PROCESSING AND COMPUTER VISION, Feature
Measurement, Texture | |||
| Corpus-Based Stemming using Cooccurrence of Word Variants | | BIBAK | PDF | 61-81 | |
| Jinxi Xu; W. Bruce Croft | |||
| Stemming is used in many information retrieval (IR) systems to reduce
variant word forms to common roots. It is one of the simplest applications of
natural-language processing to IR and is one of the most effective in terms of
user acceptance and consistency, though small retrieval improvements. Current
stemming techniques do not, however, reflect the language use in specific
corpora, and this can lead to occasional serious retrieval failures. We
propose a technique for using corpus-based word variant cooccurrence statistics
to modify or create a stemmer. The experimental results generated using English
newspaper and legal text and Spanish text demonstrate the viability of this
technique and its advantages relative to conventional approaches that only
employ morphological rules. Keywords: Class refinement, Cooccurrence, Corpus analysis, Information retrieval,
n-gram, Stemming, ALGORITHMS, EXPERIMENTATION, PERFORMANCE, H.3.1 Information
Systems, INFORMATION STORAGE AND RETRIEVAL, Content Analysis and Indexing,
Indexing methods, H.3.1 Information Systems, INFORMATION STORAGE AND RETRIEVAL,
Content Analysis and Indexing, Linguistic processing, H.3.3 Information
Systems, INFORMATION STORAGE AND RETRIEVAL, Information Search and Retrieval,
Query formulation, H.3.3 Information Systems, INFORMATION STORAGE AND
RETRIEVAL, Information Search and Retrieval, Search process | |||
| Electronic Mail as a Coalition-Building Information Technology | | BIBAK | PDF | 82-100 | |
| Celia T. Romm; Nava Pliskin | |||
| One of the most intriguing lines of research within the literature on
diffusion of information technologies (IT) is the study of the power and
politics of this process. The major objective of this article is to build on
the work of Kling and Markus on power and IT, by extending their perspective to
email. To demonstrate how email can be used for political purposes within an
organizational context, a case study is presented. The case study describes a
series of events which took place in a university. In the case, email was used
by a group of employees to stage a rebellion against the university president.
The discussion demonstrates that email features make it amenable to a range of
political uses. The article is concluded with a discussion of the implications
from this case to email research and practice. Keywords: Abuse, Coalition building, Email, MIS, Politics, HUMAN FACTORS, K.4.1
Computing Milieux, COMPUTERS AND SOCIETY, Public Policy Issues | |||
| The Knowledge in Multiple Human Relevance Judgments | | BIBAK | PDF | 101-126 | |
| W. John Wilber | |||
| We show first that the pooling of multiple human judgments of relevance
provides predictor of relevance that is superior to that obtained from a single
human's relevance judgmets. A learning algorithm applied to a set of relevance
judgments obtained from a single human would be expected to perform on new
material at a level somewhat below that human. However, we examine two
learning methods which when trained on the superior source of pooled human
relevance judgments are able to perform at the level of a single human on new
material. All performance comparisons are based on an independent human judge.
Both algorithms function by producing term weights -- one by a log odds
calculation and the other by producing a least-squares fit to human relevance
ratings. Some characteristics of the algorithms are examined. Keywords: Document retrieval test set construction, Human relevance judgments, Inverse
document frequency weights, Log odds of relevance, Minimal-length least-squares
fit, Vector model, Word weights, EXPERIMENTATION, MEASUREMENT, PERFORMANCE,
H.1.2 Information Systems, MODELS AND PRINCIPLES, User/Machine Systems, Human
information processing, H.3.3 Information Systems, INFORMATION STORAGE AND
RETRIEVAL, Information Search and Retrieval, Selection process, I.2.6 Computing
Methodologies, ARTIFICIAL INTELLIGENCE, Learning, Knowledge acquisition, I.7.m
Computing Methodologies, DOCUMENT AND TEXT PROCESSING, Miscellaneous | |||
| A Hypermedia Version Control Framework | | BIBAK | PDF | 127-160 | |
| David L. Hicks; John J. Leggett; Peter J. Nurnberg; John L. Schnase | |||
| The areas of application of hypermedia technology, combined with the
capabilities that hypermedia provides for manipulating structure, create an
environment in which version control is very important. A hypermedia version
control framework has been designed to specifically address the version control
problem in open hypermedia environments. One of the primary distinctions of
the framework is the partitioning of hypermedia version control functionality
into intrinsic and application-specific categories. The version control has
been used as a model for the design of version control services for a hyperbase
management system that provides complete version support for both data and
structural entities. In addition to serving as a version control model for
open hypermedia environments, the framework offers a clarifying and unifying
context in which to examine the issues of version control in hypermedia. Keywords: Hyperbase management system, Hypermedia, DESIGN, MANAGEMENT, D.2.7 Software,
SOFTWARE ENGINEERING, Distribution, Maintenance, and Enhancement, Version
control, H.1.1 Information Systems, MODELS AND PRINCIPLES, Systems and
Information Theory, General systems theory, H.2.1 Information Systems, DATABASE
MANAGEMENT, Logical Design, Data models, H.2.8 Information Systems, DATABASE
MANAGEMENT, Database Applications, H.3.4 Information Systems, INFORMATION
STORAGE AND RETRIEVAL, Systems and Software, H.3.m Information Systems,
INFORMATION STORAGE AND RETRIEVAL, Miscellaneous, H.3.4 Information Systems,
INFORMATION STORAGE AND RETRIEVAL, Systems and Software, Distributed systems,
H.5.1 Information Systems, INFORMATION INTERFACES AND PRESENTATION, Multimedia
Information Systems, Hypertext navigation and maps | |||
| Self-Spatial Join Selectivity Estimation using Fractal Concepts | | BIBAK | PDF | 161-201 | |
| Alberto Belussi; Christos Faloutsos | |||
| The problem of selectivity estimation for queries of nontraditional
databases is still an open issue. In this article, we examine the problem of
selectivity estimation for some types of spatial queries in databases
containing real data. We have shown earlier [Faloutsos and Kamel 1994] that
real point sets typically have a nonuniform distribution, violating
consistently the uniformity and independence assumptions. Moreover, we
demonstrated that the theory of fractals can help to describe real point sets.
In this article we show how the concept of fractal dimension, i.e.,
(noninteger) dimension, can lead to the solution for the selectivity estimation
problem in spatial databases. Among the infinite family of fractal dimensions,
we consider here the Hausdorff fractal dimension D0 and the "Correlation"
fractal dimension D2. Specifically, we show that (a) the average number
of neighbors for a given point set follows a power law, with D2 as
exponent, and (b) the average number of nonempty range queries follows a power
law with E - D0 as exponent (E is the dimension of the embedding space).
We present the formulas to estimate the selectivity for "biased" range queries,
for self-spatial joins, and for the average number of nonempty range queries.
The result of some experiments on real and synthetic point sets are shown. Our
formulas achieve very low relative errors, typically about 10%, versus 40%-100%
of the formulas that are based on the uniformity and independence assumptions. Keywords: Fractal dimension, Range query, Selectivity estimation, Spatial gain,
ALGORITHMS, THEORY, H.2.8 Information Systems, DATABASE MANAGEMENT, Database
Applications, Spatial databases and GIS, H.2.4 Information Systems, DATABASE
MANAGEMENT, Systems, Query processing | |||
| Augmenting Organizational Memory: A Field Study of Answer Garden | | BIBAK | 203-224 | |
| Mark S. Ackerman | |||
| A growing concern for organizations and groups has been to augment their
knowledge and expertise. One such augmentation is to provide an organizational
memory, some record of the organization's knowledge. However, relatively
little is known about how computer systems might enhance organizational, group,
or community memory. This article presents Answer Garden, a system for growing
organizational memory. The article describes the system and its underlying
implementation. It then presents findings from a field study of Answer Garden.
The article discusses the usage data and qualitative evaluations from the field
study, and then draws a set of lessons for next-generation organizational
memory systems. Keywords: Collective memory, Community memory, Computer-supported cooperative work,
CSCW, Field studies, Group memory, organizational memory, PERFORMANCE,
RELIABILITY, H.5.3 Information Systems, INFORMATION INTERFACES AND
PRESENTATION, Group and Organization Interfaces, C.2.4 Computer Systems
Organization, COMPUTER-COMMUNICATION NETWORKS, Distributed Systems, Distributed
applications, H.1.2 Information Systems, MODELS AND PRINCIPLES, User/Machine
Systems, H.3.3 Information Systems, INFORMATION STORAGE AND RETRIEVAL,
Information Search and Retrieval, H.4.3 Information Systems, INFORMATION
SYSTEMS APPLICATIONS, Communications Applications, H.5.1 Information Systems,
INFORMATION INTERFACES AND PRESENTATION, Multimedia Information Systems,
Hypertext navigation and maps, H.5.2 Information Systems, INFORMATION
INTERFACES AND PRESENTATION, User Interfaces, I.7.2 Computing Methodologies,
DOCUMENT AND TEXT PROCESSING, Document Preparation, Hypertext/hypermedia, K.4.3
Computing Milieux, COMPUTERS AND SOCIETY, Organizational Impacts | |||
| A Study of Probability Kinematics in Information Retrieval | | BIBAK | PDF | 225-255 | |
| F. Crestani; C. J. Van Rijsbergen | |||
| We analyze the kinematics of probabilistic term weights at retrieval time
for different Information Retrieval models. We present four models based on
different notions of probabilistic retrieval. Two of these models are based on
classical probability theory and can be considered as prototypes of models long
in use in Information Retrieval, like the Vector Space Model and the
Probabilistic Model. The two other models are based on a logical technique of
evaluating the probability of a conditional called imaging; one is a
generalization of the other. We analyze the transfer of probabilities
occurring in the term space at retrieval time for these four models, compare
their retrieval performance using classical test collections, and discuss the
results. We believe that our results provide useful suggestions on how to
improve existing probabilistic models of Information Retrieval by taking into
consideration term-term similarity. Keywords: Logical imaging, Probabilistic modeling, Probabilistic retrieval,
PERFORMANCE, H.3.3 Information Systems, INFORMATION STORAGE AND RETRIEVAL,
Information Search and Retrieval, Retrieval models, F.1.2 Theory of
Computation, COMPUTATION BY ABSTRACT DEVICES, Modes of Computation,
Probabilistic computation | |||
| Arithmetic Coding Revisited | | BIBAK | PDF | 256-294 | |
| Alistair Moffat; Radford M. Neal; Ian H. Witten | |||
| Over the last decade, arithmetic coding has emerged as an important
compression tool. It is now the method of choice for adaptive coding on
multisymbol alphabets because of its speed, low storage requirements, and
effectiveness of compression. This article describes a new implementation of
arithmetic coding that incorporates several improvements over a widely used
earlier version by Witten, Neal, and Cleary, which has become a de facto
standard. These improvements include fewer multiplicative operations, greatly
extended range of alphabet sizes and symbol probabilities, and the use of
low-precision arithmetic, permitting implementation by fast shift/add
operations. We also describe a modular structure that separates the coding,
modeling, and probability estimation components of a compression system. To
motivate the improved coder, we consider the needs of a word-based text
compression program. We report a range of experimental results using this and
other models. Complete source code is available. Keywords: Approximate coding, Arithmetic coding, Text compression, Word-based model,
ALGORITHMS, PERFORMANCE, E.4 Data, CODING AND INFORMATION THEORY, Data
compaction and compression, E.1 Data, DATA STRUCTURES | |||
| Metric Details for Natural-Language Spatial Relations | | BIBAK | PDF | 295-321 | |
| Max J. Egenhofer; A. Rashid B. M. Shariff | |||
| Spatial relations often are desired answers that a geographic information
system (GIS) should generate in response to a user's query. Current GIS's
provide only rudimentary support for processing and interpreting
natural-language-like spatial relations, because their models and
representations are primarily quantitative, while natural-language spatial
relations are usually dominated by qualitative properties. Studies of the use
of spatial relations in natural language showed that topology accounts for a
significant portion of the geometric properties. This article develops a
formal model that captures metric details for the description of
natural-language spatial relations. The metric details are expressed as
refinements of the categories identified by the 9-intersection, a model for
topological spatial relations, and provide a more precise measure than does
topology alone as to whether a geometric configuration matches with a spatial
term or not. Similarly, these measures help in identifying the spatial term
that describes a particular configuration. Two groups of metric details are
derived: splitting ratios as the normalized values of lengths and areas of
intersections; and closeness measures as the normalized distances between
disjoint object parts. The resulting model of topological and metric
properties was calibrated for 64 spatial terms in English, providing values for
the best fit as well as value ranges for the significant parameters of each
term. Three examples demonstrate how the framework and its calibrated values
are used to determine the best spatial term for a relationship between two
geometric objects. Keywords: DESIGN, HUMAN FACTORS, H.2.3 Information Systems, DATABASE MANAGEMENT,
Languages, Query languages, H.3.3 Information Systems, INFORMATION STORAGE AND
RETRIEVAL, Information Search and Retrieval, Query formulation, H.3.3
Information Systems, INFORMATION STORAGE AND RETRIEVAL, Information Search and
Retrieval, Search process, H.3.3 Information Systems, INFORMATION STORAGE AND
RETRIEVAL, Information Search and Retrieval, Selection process, I.2.1 Computing
Methodologies, ARTIFICIAL INTELLIGENCE, Applications and Expert Systems,
Cartography, I.2.7 Computing Methodologies, ARTIFICIAL INTELLIGENCE, Natural
Language Processing, Language parsing and understanding, I.5.1 Computing
Methodologies, PATTERN RECOGNITION, Models, Geometric | |||
| A Semidiscrete Matrix Decomposition for Latent Semantic Indexing Information Retrieval | | BIBAK | PDF | 322-346 | |
| Tamara G. Kolda; Dianne P. O'Leary | |||
| The vast amount of textual information available today is useless unless it
can be effectively and efficiently searched. The goal in information retrieval
is to find documents that are relevant to a given user query. We can represent
and document collection by a matrix whose (i, j) entry is nonzero only if the
ith term appears in the jth document; thus each document corresponds to a
column vector. The query is also represented as a column vector whose ith term
is nonzero only if the ith term appears in the query. We score each document
for relevancy by taking its inner product with the query. The highest-scoring
documents are considered the most relevant. Unfortunately, this method does
not necessarily retrieve all relevant documents because it is based on literal
term matching. Latent semantic indexing (LSI) replaces the document matrix
with an approximation generated by the truncated singular-value decomposition
(SVD). This method has been shown to overcome many difficulties associated
with literal term matching. In this article we propose replacing the SVD with
the semidiscrete decomposition (SDD). We will describe the SDD approximation,
show how to compute it, and compare the SDD-based LSI method to the SVD-based
LSI methods. We will show that SDD-based LSI does as well as SVD-based LSI in
terms of document retrieval while requiring only one-twentieth the storage and
one-half the time to compute each query. We will also show how to update the
SDD approximation when documents are added or deleted from the document
collection. Keywords: ALGORITHMS, H.3.3 Information Systems, INFORMATION STORAGE AND RETRIEVAL,
Information Search and Retrieval, G.1.2 Mathematics of Computing, NUMERICAL
ANALYSIS, Approximation | |||
| Collaborative Conceptual Schema Design: A Process Model and Prototype System | | BIBAK | PDF | 347-371 | |
| Sudha Ram; V. Ramesh | |||
| Recent years have seen an increased interest in providing support for
collaborative activities among groups of users participating in various
information systems design tasks such as, requirements determination and
process modeling. However, little attention has been paid to the collaborative
conceptual database design process. In this article, we develop a model of the
collaborative conceptual schema development process and describe the design and
implementation of a graphical multiuser conceptual schema design tool that is
based on the model. The system we describe allows a group of users to work
collaboratively on the creation of database schemas in synchronous (same-time)
mode (either in a face-to-face or distributed setting). Extensive modeling
support is provided to assist users in creating semantically correct conceptual
schemas. The system also provides users with several graphical facilities such
as, a large drawing workspace with the ability to scroll or "jump" to any
portion of this workspace, zooming capabilities, and the ability to move
object(s) to any portion of the workspace. The unique component of the system,
however, is its built-in support for collaborative schema design. The system
supports a relaxed WYSIWIS environment, i.e., each user can control the
graphical layout of the same set of schema objects. The system ensures that
changes/additions made by any user are consistent. Any conflicts that may
compromise to the integrity of the shared schema are flagged and resolved by
the system. The results from a preliminary experiment suggest that the use of
our system in a collaborative mode improved information sharing among users,
minimized conflicts, and led to a more comprehensive schema definition. Keywords: DESIGN, MANAGEMENT, H.2.1 Information Systems, DATABASE MANAGEMENT, Logical
Design, H.4 Information Systems, INFORMATION SYSTEMS APPLICATIONS, K.6.3
Computing Milieux, MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS, Software
Management | |||
| Structured Hypertext with Domain Semantics | | BIBAK | PDF | 372-412 | |
| Weigang Wang; Roy Rada | |||
| One important facet of current hypertext research involves using
knowledge-based techniques to develop and maintain document structures. A
semantic net is one such technique. However, most semantic-net-based hypertext
systems leave the linking consistency of the net to individual users. Users
without guidance may accidentally introduce structural and relational
inconsistencies in the semantic nets. The relational inconsistency hinders the
creation of domain information models. The structural inconsistency leads to
unstable documents, especially when a document is composed by computation with
traversal algorithms. This work tackles to above problems by integrating
logical structure and domain semantics into a semantic net. A
semantic-net-based structured-hypertext model has been formalized. The model
preserves structural and relational consistency after changes to the semantic
net. The hypertext system (RICH) based on this model has been implemented and
tested. The RICH system can define and enforce a set of rules to maintain to
integrity of the semantic net and provide particular support for creating
multihierarchies with the reuse of existing contents and structures. Users
have found such flexible but enforceable semantics to be helpful. Keywords: DESIGN, DOCUMENTATION, MANAGEMENT, E.1 Data, DATA STRUCTURES, Graphs and
networks, H.2.1 Information Systems, DATABASE MANAGEMENT, Logical Design, Data
models, H.3.4 Information Systems, INFORMATION STORAGE AND RETRIEVAL, Systems
and Software, H.5 Information Systems, INFORMATION INTERFACES AND PRESENTATION,
I.7.m Computing Methodologies, DOCUMENT AND TEXT PROCESSING, Miscellaneous | |||