| VXT: a visual approach to XML transformations | | BIBAK | Full-Text | 1-10 | |
| Emmanuel Pietriga; Jean-Yves Vion-Dury; Vincent Quint | |||
| The domain of XML transformations is becoming more and more important as a
result of the increasing number of applications adopting XML as their format
for data exchange or representation. Most of the existing solutions for
expressing XML transformations are textual languages, such as XSLT or DOM
combined with a general-purpose programming language. Several tools build on
top of these languages, providing a graphical environment. Transformations are
however still specified in a textual way using the underlying language (often
XSLT), thus requiring the user to learn the associated textual language.
We believe that visual programming techniques are well-suited to representing XML structures and make the specification of transformations simpler. We present a visual programming language for the specification of XML transformations in an interactive environment, based on a zoomable user interface toolkit. Transformations can be run from the application or exported to two target languages: XSLT and Circus, a general-purpose structure transformation language designed by the second author and briefly introduced in this paper. Keywords: XML transformations, XSLT, circus, visual programming languages, zoomable
user interfaces | |||
| Computer assisted processing of large unstructured document sets: a case study in the construction industry | | BIBAK | Full-Text | 11-17 | |
| John McKechnie; Sameh Shaaban; Stephen Lockley | |||
| Construction is one of the most information intensive industries; typically
professionals access the industry information resources on a daily basis. The
major constraints to the future development of a formally encoded knowledge
base are fragmented information sources and lack of comprehensive
classification schemes. In agreement with earlier research and over twenty
years of practical experience we have found that manually categorising a large
collection of documents is error-prone, time-consuming, expensive and produces
inconsistent results. Attempts over recent years to automate this using
state-of-the-art categorisation techniques, have also proven to be wanting due
to the shallow internal representation in the document set. In this paper we
describe an approach to overcome this problem by combining the benefits of
automated categorisation with efficient and effective use of human judgement.
We present a tool based on this philosophy that utilises machine learning,
information retrieval and information visualisation techniques to help
bibliographers analyse the document collection. By analysing the content of the
unstructured document, this tool suggests to the bibliographer keywords,
subject headings and candidate documents to include under subject headings.
This greatly increases the speed at which bibliographers can process the
documents, increases the accuracy of their work and results in a categorisation
system that reflects the terminology and relationships held in the actual
knowledge base. This work is now being applied to enhance one of the market
leading retrieval products in the construction industry. Keywords: abstracting, automated text categorisation, classification, information
visualization, keyword extraction, machine learning | |||
| Towards static type checking for XSLT | | BIBA | Full-Text | 18-27 | |
| Akihiko Tozawa | |||
| We are concerned about the static type checking problem for XSLT. In the context of XSLT and other XML programming, types are DTDs or schemas, and static type checking is to verify that a program always converts valid source documents into also valid output documents. To achieve static type checking for XSLT, we introduce a subset of XSLT, and an efficient algorithm of backward type inference for that subset. Although our XSLT subset lacks XPath, it includes recursive calls of templates and is powerful enough to capture basic XSLT transformations. Our method is based on Finite Tree Automata (FTA) which provide a rigorous representation of types in XML. Given types of output documents, backward type inference computes types of source documents. The idea of backward type inference is borrowed from Milo et al's work, while we reduce the computational complexity in their framework. | |||
| Authoring graphics-rich and interactive documents in CGLIB: a constraint-based graphics library | | BIBAK | Full-Text | 28-37 | |
| Neng-Fa Zhou | |||
| CGLIB is a high-level graphics library for B-Prolog, a constraint logic
programming system. The library provides primitives for creating and
manipulating graphical objects and a set of constraints including non-overlap,
grid, table, and tree constraints that facilitates the specification of the
layouts of objects. The library adopts a construct called action rules
available in B-Prolog for creating agents and programming interactions among
agents or between agents and the user. The library is a fully working system
implemented in B-Prolog, Java and C. It can be used in many areas such as
drawing editors, interactive user interfaces, document authoring, animation,
information visualization, intelligent agents, and games. The high-level
abstraction of the library and the use of constraints and action rules in the
specification of layouts and behaviors can significantly enhance the
productivity of the development of graphics. We demonstrate through several
examples the effectiveness of the library as a tool for developing
graphics-rich and interactive user interfaces. Keywords: Prolog, action rules, agents, constraints, graphical user interface design,
graphics programming, programming languages | |||
| Dynamic documents: authoring, browsing, and analysis using a high-level petri net-based hypermedia system | | BIBAK | Full-Text | 38-47 | |
| Jin-Cheon Na; Richard Furuta | |||
| caT (for Context-Aware Trellis) was initially developed to support
context-aware documents by incorporating high-level Petri-net specification,
context-awareness, user modeling, and fuzzy knowledge handling features into
Trellis, a Petri-net-based hypermedia system. The browsing behavior of
documents specified in the caT model can reflect the reader's contextual (such
as location and time) and preference information. Recently, to provide a
framework for the authoring, browsing, and analysis of reasonably complex,
dynamic documents, we added (or extended) several features in the caT system,
providing hierarchical Petri net support, a structured authoring tool, browsing
tools for multiple presentations of a particular document's specification, and
a Petri net analysis tool. In this paper, we present the extended features of
caT and give examples of using caT to define and present various documents,
such as formal specification of software requirements and customized Web
documents. Since caT is based on a formal model, the behavioral characteristics
of developed caT models can be analyzed. Current debugging and analysis tools,
integrated into the authoring tool, are also introduced. Keywords: caT, dynamic documents, petri-net-based hypertext, trellis | |||
| Towards the convergence between hypermedia authoring languages and architecture description languages | | BIBAK | Full-Text | 48-57 | |
| Débora Christina Muchaluat-Saade; Luiz Fernando Gomes Soares | |||
| This paper presents a detailed comparison between the structural elements
and definitions provided by Hypermedia Authoring Languages and Architecture
Description Languages (ADL). ADLs are formal languages that can be used for
representing a software architecture. Although it may look trivial to make a
direct correspondence between ADL and hypermedia structural entities, such as
components to nodes and connectors to links, interesting differences can be
identified when observing them more closely. Based on the comparison results, a
structural meta-model that can be specialized for use in both domains is
proposed. Furthermore, the paper also presents an example of how the meta-model
can be used for describing hypermedia document structures, showing how some
features found in ADLs can be applied to hypermedia authoring languages. Our
final goal is to integrate the contributions of document engineering and
software architecture engineering and take advantage of the advances of one
area in the other one. The current paper is the first step towards this
direction. Keywords: ADL, architecture description languages, components, connectors, hypermedia
authoring languages, structural meta-model | |||
| The multivalent browser: a platform for new ideas | | BIBAK | Full-Text | 58-67 | |
| Thomas A. Phelps; Robert Wilensky | |||
| The Multivalent Browser is built on a architecture that separates
functionality from concrete document format. Almost all functionality is made
available via relatively small modules of code called behaviors that
programmers can write to extend the core system. Behaviors can be as
significant and powerful as parser-renderers for scanned paper, HTML, or TeX
DVI; as fine-grained as hyperlinks, cookies, and the disabling of menu items;
and as innovative or uncommon as in situ annotations, "lenses", collapsible
outline displays, new GUI widgets, and Robust Hyperlink support. Behaviors can
be combined in arbitrary groups for each individual document, in effect
spontaneously creating a custom browser for every one. Common aspects of
document functionality can be shared, so that, for example, the same behavior
that handles multipage support for scanned paper documents also provides such
support for DVI and PDF; similarly, the behaviors that support fine-grain
annotation of HTML also support identical annotation on scanned paper, UNIX
manual pages, DVI, and PDF.
We have designed and implemented this architecture, and implemented behaviors that support all of the above functionality and more. Here we describe the architecture that allows such power and fine-grained access, yet composes disparate behaviors and resolves their mutual conflicts. Keywords: annotation, architecture, digital, document, multivalent behavior, paper,
scanned | |||
| TabulaMagica: an integrated approach to manage complex tables | | BIBAK | Full-Text | 68-75 | |
| Horst Silberhorn | |||
| Tables are a special part of documents and specific means have been
developed to manage them. Step by step, the underlying models to edit and
format tables have been improved or supplemented by new ones. These models led
to a wide variety of table formats and produced "tabular legacies", making it
difficult to edit, use, or modify tables in varying formats. It is even more
time-consuming to convert them for various media or to unify or compare tabular
information. Our approach to tackle these problems is to integrate different
formats. To do so, we recognize the table structure, model the structure and
the presentational form and combine both. This way, one can modify the
structure, the topology, and the layout of tables simultaneously. Table
manipulations may be very complex and hard to understand for the user. In
addition, users are accustomed to WYSIWYG environments and want to be able to
track their operations by optical control. Therefore, we have developed our
WYSIWYG-GUI to work on tables, which we present here, discussing the
advantages, limitations and further work to do. Keywords: WYSIWYG editor, separation of structure and presentation, table processing,
tabular legacies | |||
| Mobile agent-based compound documents | | BIBA | Full-Text | 76-84 | |
| Ichiro Satoh | |||
| This paper presents a mobile agent-based framework for building mobile compound document, which can each be dynamically composed of mobile agents and can migrate itself over a network as a whole, with all its embedded agents. The key of this framework is that it builds a hierarchical mobile agent system that enables multiple mobile agents to be combined into a single mobile agent. The framework also provides several value-added mechanisms for visually manipulating components embedded in a compound document and for sharing a window on the screen among the components. This paper describes this framework and some experiences in the implementation of a prototype system, currently using Java the both implementation language and component development language, and then illustrates several interesting applications to demonstrate the framework's utility and flexibility. | |||
| Requirements for XML document database systems | | BIBAK | Full-Text | 85-94 | |
| Airi Salminen; Frank Wm. Tompa | |||
| The shift from SGML to XML has created new demands for managing structured
documents. Many XML documents will be transient representations for the purpose
of data exchange between different types of applications, but there will also
be a need for effective means to manage persistent XML data as a database. In
this paper we explore requirements for an XML database management system. The
purpose of the paper is not to suggest a single type of system covering all
necessary features. Instead the purpose is to initiate discussion of the
requirements arising from document collections, to offer a context in which to
evaluate current and future solutions, and to encourage the development of
proper models and systems for XML database management. Our discussion addresses
issues arising from data modelling, data definition, and data manipulation. Keywords: XML, XML database systems, data definition, data manipulation, data
modelling, structured documents | |||
| The extended XQL for querying and updating large XML databases | | BIBA | Full-Text | 95-104 | |
| Raymond K. Wong | |||
| XQL has been argued as just a model for asking for specific sets of elements with very limited query capability. This paper proposes several extensions of XQL to address the issues. The extensions include full-text indexed search, path variables, joins, session-based navigations, and updates. Effort has been spent to preserve the conciseness of the language syntax. Its corresponding query processor with optimization mechanism has been prototyped and available online. Finally, implementation issues are discussed. | |||
| Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents | | BIBAK | Full-Text | 105-114 | |
| Iraklis Varlamis; Michalis Vazirgiannis | |||
| Many organizations and enterprises establish distributed working
environments, where different users need to exchange information based on a
common model. XML is widely used to facilitate this information exchange. The
extensibility of XML allows the creation of generic models that integrate data
from different sources. For these tasks, several applications are used to
import and export information in XML format from the data repositories. In
order to support this process for relational repositories we developed the
X-Database system. The base of this system is an XML-Schema file that describes
the logical model of interchanged information. Initially, the system analyses
the syntax of the XML-Schema file and generates the relational database. Then
it handles the decomposition of valid XML files according to that Schema and
the composition of XML documents from the information in the database. Finally
the system offers a flexible mechanism for modifying and querying database
contents using only valid XML documents, which are validated over the
XML-Schema file's rules. Keywords: XML, document storage and retrieval, mapping, metadata, querying, relational
databases | |||
| An integrated environment for the presentation of consistent SMIL 2.0 documents | | BIBAK | Full-Text | 115-124 | |
| P. N. M. Sampaio; C. Lohr; J. P. Courtiat | |||
| The utilization of Interactive Multimedia Documents (IMDs) has been largely
addressed in several fields such as education, medicine, etc. since these
documents can be distributed and accessed over the World-Wide-Web. In this
context, the W3C standard Synchronized Multimedia Integration Language (SMIL)
has been proposed for the presentation of IMD's over the Web. However, the
flexibility of the temporal model of SMIL 2.0 allows the author to describe
temporal synchronization relationships that potentially can not be resolved
during the presentation of the document, known as temporal inconsistencies. For
this reason, an approach that enables to detect and correct these
inconsistencies is needed.
This paper presents a formal approach for the verification, scheduling and presentation of consistent SMIL 2.0 documents based on the RT-LOTOS formal description technique. Thus, the consistency analysis of SMIL 2.0 documents is presented and some solutions are proposed in order to deal with potential state space explosion problems. Further on, some contributions are also presented concerning the scheduling and presentation of SMIL 2.0 documents based on a simple scheduling graph, called a Time Labeled Automata (TLA), derived automatically from the document formal specification. Finally, a global Java-based architecture for the implementation of a player of consistent SMIL 2.0 documents is presented. Keywords: LOTOS, RT-LOTOS, SMIL 2.0, formal methods, interactive multimedia documents,
temporal consistency | |||
| Authoring transformations by direct manipulation for adaptable multimedia presentations | | BIBAK | Full-Text | 125-134 | |
| Lionel Villard | |||
| In this paper, we present a method for authoring generic and adaptable
multimedia presentations. This method relies on document transformations. For
the currently available tools, designing the XML content and the transformation
sheets is a tedious and error prone experience. We propose a framework based on
an incremental transformation process. Incremental transformation processors
represent a better alternative to help in the design of both the content and
the transformation sheets. We believe that such authoring tools are a first
step toward fully interactive transformation-based authoring environments. In
this paper, we focus on the authoring of transformation sheets by direct
manipulation. In particular, we study the authoring of transformations for the
XSLT language defined at the World Wide Web Consortium. Keywords: XML, XSLT, authoring tools, document model, incremental transformations,
multimedia | |||
| Vector graphics: from PostScript and Flash to SVG | | BIBAK | Full-Text | 135-143 | |
| Steve Probets; Julius Mong; David Evans; David Brailsford | |||
| The XML-based specification for Scalable Vector Graphics (SVG), sponsored by
the World Wide Web consortium, allows for compact and descriptive vector
graphics for the Web.
This paper describes a set of three tools for creating SVG, either from first principles or via the conversion of existing formats. The ab initio generation of SVG is effected from a server-side CGI script, using a PERL library of drawing functions; later sections highlight the problems of converting Adobe PostScript and Macromedia's Shockwave format (SWF) into SVG. Keywords: Flash, PDF, PostScript, SVG, SWF | |||
| Latent semantic linking over homogeneous repositories | | BIBAK | Full-Text | 144-151 | |
| Alessandra Alaniz Macedo; Maria da Graça Campos Pimentel; José Antonio Camacho Guerrero | |||
| We present a framework for the automatic generation of links based on
salient semantic structures extracted from homogeneous web repositories, and
discuss an implementation of the framework. For this study, we consider
homogeneous the repositories of the eClass, an instrumented environment that
automatically captures details of a lecture and provides effective
multimedia-enhanced web-based interfaces for users to review the lecture, and
the CoWeb, a web-based service for collaborative authoring of web-based
material. We exploited Latent Semantic Analysis over data indexed by a general
public license search engine. We experimented our service with data from a
graduate course supported by both eClass and CoWeb repositories. We present the
results of the Latent Semantic Analysis linking service in the light of results
previously obtained with our previous works. Keywords: automatic linking, information integration, information retrieval, semantic
structures | |||
| A technique for fuzzy document binarization | | BIBAK | Full-Text | 152-156 | |
| Nikos Papamarkos | |||
| This paper proposes a new method for fuzzy binarization of digital document.
The proposed approach achieves binarization using both the image gray-levels
and additional local spatial features. Both, gray-level and local features
values feed a Kohonen Self-Organized Feature Map (SOFM) neural network
classifier. After training, the neurons of the output competition layer of the
SOFM define two bilevel classes. Using content of these classes, fuzzy
membership functions are obtained that are next used with the Fuzzy C-means
(FCM) algorithm in order to reduce the character-blurring problem. The method
is suitable for binarization of blurring documents and can be easily modified
to accommodate any type of spatial characteristics. Keywords: binarization, fuzzy logic, self-organized neural networks, thresholding | |||
| Extraction of text areas in printed document images | | BIBAK | Full-Text | 157-165 | |
| Jean Duong; Myriam Côte; Hubert Emptoz; Ching Y. Suen | |||
| In this paper, we present a document analysis system which is expected to
extract regions of interest in greyscale document images. Collected areas are
then clustered in text zones and non-text areas using geometric and texture
features. The system works in two steps. Regions of interest are retrieved via
cumulative gradient considerations. In classification module, we introduced
some entropic heuristic. Experiments are done on the MediaTeam Document
Database to show the relevance of this criteria. Keywords: entropy, features, text extraction | |||