HCI Bibliography Home | HCI Conferences | DocEng Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
DocEng Tables of Contents: 0102030405060708091011121314

Proceedings of the 2001 ACM Symposium on Document Engineering

Fullname:DocEng'01 Proceeding of the 1st ACM Symposium on Document Engineering
Editors:Ethan V. Munson
Location:Atlanta, Georgia, USA
Dates:2001-Nov-09 to 2001-Nov-10
Standard No:ISBN: 1-58113-432-0; ACM DL: Table of Contents hcibib: DocEng01
  1. Transformations and Experiences
  2. Hypermedia and Graphics 1
  3. Innovative Document Systems
  4. Document Databases
  5. Hypermedia and Graphics 2
  6. Document Analysis and Retrieval

Transformations and Experiences

VXT: a visual approach to XML transformations BIBAKFull-Text 1-10
  Emmanuel Pietriga; Jean-Yves Vion-Dury; Vincent Quint
The domain of XML transformations is becoming more and more important as a result of the increasing number of applications adopting XML as their format for data exchange or representation. Most of the existing solutions for expressing XML transformations are textual languages, such as XSLT or DOM combined with a general-purpose programming language. Several tools build on top of these languages, providing a graphical environment. Transformations are however still specified in a textual way using the underlying language (often XSLT), thus requiring the user to learn the associated textual language.
   We believe that visual programming techniques are well-suited to representing XML structures and make the specification of transformations simpler. We present a visual programming language for the specification of XML transformations in an interactive environment, based on a zoomable user interface toolkit. Transformations can be run from the application or exported to two target languages: XSLT and Circus, a general-purpose structure transformation language designed by the second author and briefly introduced in this paper.
Keywords: XML transformations, XSLT, circus, visual programming languages, zoomable user interfaces
Computer assisted processing of large unstructured document sets: a case study in the construction industry BIBAKFull-Text 11-17
  John McKechnie; Sameh Shaaban; Stephen Lockley
Construction is one of the most information intensive industries; typically professionals access the industry information resources on a daily basis. The major constraints to the future development of a formally encoded knowledge base are fragmented information sources and lack of comprehensive classification schemes. In agreement with earlier research and over twenty years of practical experience we have found that manually categorising a large collection of documents is error-prone, time-consuming, expensive and produces inconsistent results. Attempts over recent years to automate this using state-of-the-art categorisation techniques, have also proven to be wanting due to the shallow internal representation in the document set. In this paper we describe an approach to overcome this problem by combining the benefits of automated categorisation with efficient and effective use of human judgement. We present a tool based on this philosophy that utilises machine learning, information retrieval and information visualisation techniques to help bibliographers analyse the document collection. By analysing the content of the unstructured document, this tool suggests to the bibliographer keywords, subject headings and candidate documents to include under subject headings. This greatly increases the speed at which bibliographers can process the documents, increases the accuracy of their work and results in a categorisation system that reflects the terminology and relationships held in the actual knowledge base. This work is now being applied to enhance one of the market leading retrieval products in the construction industry.
Keywords: abstracting, automated text categorisation, classification, information visualization, keyword extraction, machine learning
Towards static type checking for XSLT BIBAFull-Text 18-27
  Akihiko Tozawa
We are concerned about the static type checking problem for XSLT. In the context of XSLT and other XML programming, types are DTDs or schemas, and static type checking is to verify that a program always converts valid source documents into also valid output documents. To achieve static type checking for XSLT, we introduce a subset of XSLT, and an efficient algorithm of backward type inference for that subset. Although our XSLT subset lacks XPath, it includes recursive calls of templates and is powerful enough to capture basic XSLT transformations. Our method is based on Finite Tree Automata (FTA) which provide a rigorous representation of types in XML. Given types of output documents, backward type inference computes types of source documents. The idea of backward type inference is borrowed from Milo et al's work, while we reduce the computational complexity in their framework.

Hypermedia and Graphics 1

Authoring graphics-rich and interactive documents in CGLIB: a constraint-based graphics library BIBAKFull-Text 28-37
  Neng-Fa Zhou
CGLIB is a high-level graphics library for B-Prolog, a constraint logic programming system. The library provides primitives for creating and manipulating graphical objects and a set of constraints including non-overlap, grid, table, and tree constraints that facilitates the specification of the layouts of objects. The library adopts a construct called action rules available in B-Prolog for creating agents and programming interactions among agents or between agents and the user. The library is a fully working system implemented in B-Prolog, Java and C. It can be used in many areas such as drawing editors, interactive user interfaces, document authoring, animation, information visualization, intelligent agents, and games. The high-level abstraction of the library and the use of constraints and action rules in the specification of layouts and behaviors can significantly enhance the productivity of the development of graphics. We demonstrate through several examples the effectiveness of the library as a tool for developing graphics-rich and interactive user interfaces.
Keywords: Prolog, action rules, agents, constraints, graphical user interface design, graphics programming, programming languages
Dynamic documents: authoring, browsing, and analysis using a high-level petri net-based hypermedia system BIBAKFull-Text 38-47
  Jin-Cheon Na; Richard Furuta
caT (for Context-Aware Trellis) was initially developed to support context-aware documents by incorporating high-level Petri-net specification, context-awareness, user modeling, and fuzzy knowledge handling features into Trellis, a Petri-net-based hypermedia system. The browsing behavior of documents specified in the caT model can reflect the reader's contextual (such as location and time) and preference information. Recently, to provide a framework for the authoring, browsing, and analysis of reasonably complex, dynamic documents, we added (or extended) several features in the caT system, providing hierarchical Petri net support, a structured authoring tool, browsing tools for multiple presentations of a particular document's specification, and a Petri net analysis tool. In this paper, we present the extended features of caT and give examples of using caT to define and present various documents, such as formal specification of software requirements and customized Web documents. Since caT is based on a formal model, the behavioral characteristics of developed caT models can be analyzed. Current debugging and analysis tools, integrated into the authoring tool, are also introduced.
Keywords: caT, dynamic documents, petri-net-based hypertext, trellis
Towards the convergence between hypermedia authoring languages and architecture description languages BIBAKFull-Text 48-57
  Débora Christina Muchaluat-Saade; Luiz Fernando Gomes Soares
This paper presents a detailed comparison between the structural elements and definitions provided by Hypermedia Authoring Languages and Architecture Description Languages (ADL). ADLs are formal languages that can be used for representing a software architecture. Although it may look trivial to make a direct correspondence between ADL and hypermedia structural entities, such as components to nodes and connectors to links, interesting differences can be identified when observing them more closely. Based on the comparison results, a structural meta-model that can be specialized for use in both domains is proposed. Furthermore, the paper also presents an example of how the meta-model can be used for describing hypermedia document structures, showing how some features found in ADLs can be applied to hypermedia authoring languages. Our final goal is to integrate the contributions of document engineering and software architecture engineering and take advantage of the advances of one area in the other one. The current paper is the first step towards this direction.
Keywords: ADL, architecture description languages, components, connectors, hypermedia authoring languages, structural meta-model

Innovative Document Systems

The multivalent browser: a platform for new ideas BIBAKFull-Text 58-67
  Thomas A. Phelps; Robert Wilensky
The Multivalent Browser is built on a architecture that separates functionality from concrete document format. Almost all functionality is made available via relatively small modules of code called behaviors that programmers can write to extend the core system. Behaviors can be as significant and powerful as parser-renderers for scanned paper, HTML, or TeX DVI; as fine-grained as hyperlinks, cookies, and the disabling of menu items; and as innovative or uncommon as in situ annotations, "lenses", collapsible outline displays, new GUI widgets, and Robust Hyperlink support. Behaviors can be combined in arbitrary groups for each individual document, in effect spontaneously creating a custom browser for every one. Common aspects of document functionality can be shared, so that, for example, the same behavior that handles multipage support for scanned paper documents also provides such support for DVI and PDF; similarly, the behaviors that support fine-grain annotation of HTML also support identical annotation on scanned paper, UNIX manual pages, DVI, and PDF.
   We have designed and implemented this architecture, and implemented behaviors that support all of the above functionality and more. Here we describe the architecture that allows such power and fine-grained access, yet composes disparate behaviors and resolves their mutual conflicts.
Keywords: annotation, architecture, digital, document, multivalent behavior, paper, scanned
TabulaMagica: an integrated approach to manage complex tables BIBAKFull-Text 68-75
  Horst Silberhorn
Tables are a special part of documents and specific means have been developed to manage them. Step by step, the underlying models to edit and format tables have been improved or supplemented by new ones. These models led to a wide variety of table formats and produced "tabular legacies", making it difficult to edit, use, or modify tables in varying formats. It is even more time-consuming to convert them for various media or to unify or compare tabular information. Our approach to tackle these problems is to integrate different formats. To do so, we recognize the table structure, model the structure and the presentational form and combine both. This way, one can modify the structure, the topology, and the layout of tables simultaneously. Table manipulations may be very complex and hard to understand for the user. In addition, users are accustomed to WYSIWYG environments and want to be able to track their operations by optical control. Therefore, we have developed our WYSIWYG-GUI to work on tables, which we present here, discussing the advantages, limitations and further work to do.
Keywords: WYSIWYG editor, separation of structure and presentation, table processing, tabular legacies
Mobile agent-based compound documents BIBAFull-Text 76-84
  Ichiro Satoh
This paper presents a mobile agent-based framework for building mobile compound document, which can each be dynamically composed of mobile agents and can migrate itself over a network as a whole, with all its embedded agents. The key of this framework is that it builds a hierarchical mobile agent system that enables multiple mobile agents to be combined into a single mobile agent. The framework also provides several value-added mechanisms for visually manipulating components embedded in a compound document and for sharing a window on the screen among the components. This paper describes this framework and some experiences in the implementation of a prototype system, currently using Java the both implementation language and component development language, and then illustrates several interesting applications to demonstrate the framework's utility and flexibility.

Document Databases

Requirements for XML document database systems BIBAKFull-Text 85-94
  Airi Salminen; Frank Wm. Tompa
The shift from SGML to XML has created new demands for managing structured documents. Many XML documents will be transient representations for the purpose of data exchange between different types of applications, but there will also be a need for effective means to manage persistent XML data as a database. In this paper we explore requirements for an XML database management system. The purpose of the paper is not to suggest a single type of system covering all necessary features. Instead the purpose is to initiate discussion of the requirements arising from document collections, to offer a context in which to evaluate current and future solutions, and to encourage the development of proper models and systems for XML database management. Our discussion addresses issues arising from data modelling, data definition, and data manipulation.
Keywords: XML, XML database systems, data definition, data manipulation, data modelling, structured documents
The extended XQL for querying and updating large XML databases BIBAFull-Text 95-104
  Raymond K. Wong
XQL has been argued as just a model for asking for specific sets of elements with very limited query capability. This paper proposes several extensions of XQL to address the issues. The extensions include full-text indexed search, path variables, joins, session-based navigations, and updates. Effort has been spent to preserve the conciseness of the language syntax. Its corresponding query processor with optimization mechanism has been prototyped and available online. Finally, implementation issues are discussed.
Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents BIBAKFull-Text 105-114
  Iraklis Varlamis; Michalis Vazirgiannis
Many organizations and enterprises establish distributed working environments, where different users need to exchange information based on a common model. XML is widely used to facilitate this information exchange. The extensibility of XML allows the creation of generic models that integrate data from different sources. For these tasks, several applications are used to import and export information in XML format from the data repositories. In order to support this process for relational repositories we developed the X-Database system. The base of this system is an XML-Schema file that describes the logical model of interchanged information. Initially, the system analyses the syntax of the XML-Schema file and generates the relational database. Then it handles the decomposition of valid XML files according to that Schema and the composition of XML documents from the information in the database. Finally the system offers a flexible mechanism for modifying and querying database contents using only valid XML documents, which are validated over the XML-Schema file's rules.
Keywords: XML, document storage and retrieval, mapping, metadata, querying, relational databases

Hypermedia and Graphics 2

An integrated environment for the presentation of consistent SMIL 2.0 documents BIBAKFull-Text 115-124
  P. N. M. Sampaio; C. Lohr; J. P. Courtiat
The utilization of Interactive Multimedia Documents (IMDs) has been largely addressed in several fields such as education, medicine, etc. since these documents can be distributed and accessed over the World-Wide-Web. In this context, the W3C standard Synchronized Multimedia Integration Language (SMIL) has been proposed for the presentation of IMD's over the Web. However, the flexibility of the temporal model of SMIL 2.0 allows the author to describe temporal synchronization relationships that potentially can not be resolved during the presentation of the document, known as temporal inconsistencies. For this reason, an approach that enables to detect and correct these inconsistencies is needed.
   This paper presents a formal approach for the verification, scheduling and presentation of consistent SMIL 2.0 documents based on the RT-LOTOS formal description technique. Thus, the consistency analysis of SMIL 2.0 documents is presented and some solutions are proposed in order to deal with potential state space explosion problems. Further on, some contributions are also presented concerning the scheduling and presentation of SMIL 2.0 documents based on a simple scheduling graph, called a Time Labeled Automata (TLA), derived automatically from the document formal specification. Finally, a global Java-based architecture for the implementation of a player of consistent SMIL 2.0 documents is presented.
Keywords: LOTOS, RT-LOTOS, SMIL 2.0, formal methods, interactive multimedia documents, temporal consistency
Authoring transformations by direct manipulation for adaptable multimedia presentations BIBAKFull-Text 125-134
  Lionel Villard
In this paper, we present a method for authoring generic and adaptable multimedia presentations. This method relies on document transformations. For the currently available tools, designing the XML content and the transformation sheets is a tedious and error prone experience. We propose a framework based on an incremental transformation process. Incremental transformation processors represent a better alternative to help in the design of both the content and the transformation sheets. We believe that such authoring tools are a first step toward fully interactive transformation-based authoring environments. In this paper, we focus on the authoring of transformation sheets by direct manipulation. In particular, we study the authoring of transformations for the XSLT language defined at the World Wide Web Consortium.
Keywords: XML, XSLT, authoring tools, document model, incremental transformations, multimedia
Vector graphics: from PostScript and Flash to SVG BIBAKFull-Text 135-143
  Steve Probets; Julius Mong; David Evans; David Brailsford
The XML-based specification for Scalable Vector Graphics (SVG), sponsored by the World Wide Web consortium, allows for compact and descriptive vector graphics for the Web.
   This paper describes a set of three tools for creating SVG, either from first principles or via the conversion of existing formats. The ab initio generation of SVG is effected from a server-side CGI script, using a PERL library of drawing functions; later sections highlight the problems of converting Adobe PostScript and Macromedia's Shockwave format (SWF) into SVG.
Keywords: Flash, PDF, PostScript, SVG, SWF

Document Analysis and Retrieval

Latent semantic linking over homogeneous repositories BIBAKFull-Text 144-151
  Alessandra Alaniz Macedo; Maria da Graça Campos Pimentel; José Antonio Camacho Guerrero
We present a framework for the automatic generation of links based on salient semantic structures extracted from homogeneous web repositories, and discuss an implementation of the framework. For this study, we consider homogeneous the repositories of the eClass, an instrumented environment that automatically captures details of a lecture and provides effective multimedia-enhanced web-based interfaces for users to review the lecture, and the CoWeb, a web-based service for collaborative authoring of web-based material. We exploited Latent Semantic Analysis over data indexed by a general public license search engine. We experimented our service with data from a graduate course supported by both eClass and CoWeb repositories. We present the results of the Latent Semantic Analysis linking service in the light of results previously obtained with our previous works.
Keywords: automatic linking, information integration, information retrieval, semantic structures
A technique for fuzzy document binarization BIBAKFull-Text 152-156
  Nikos Papamarkos
This paper proposes a new method for fuzzy binarization of digital document. The proposed approach achieves binarization using both the image gray-levels and additional local spatial features. Both, gray-level and local features values feed a Kohonen Self-Organized Feature Map (SOFM) neural network classifier. After training, the neurons of the output competition layer of the SOFM define two bilevel classes. Using content of these classes, fuzzy membership functions are obtained that are next used with the Fuzzy C-means (FCM) algorithm in order to reduce the character-blurring problem. The method is suitable for binarization of blurring documents and can be easily modified to accommodate any type of spatial characteristics.
Keywords: binarization, fuzzy logic, self-organized neural networks, thresholding
Extraction of text areas in printed document images BIBAKFull-Text 157-165
  Jean Duong; Myriam Côte; Hubert Emptoz; Ching Y. Suen
In this paper, we present a document analysis system which is expected to extract regions of interest in greyscale document images. Collected areas are then clustered in text zones and non-text areas using geometric and texture features. The system works in two steps. Regions of interest are retrieved via cumulative gradient considerations. In classification module, we introduced some entropic heuristic. Experiments are done on the MediaTeam Document Database to show the relevance of this criteria.
Keywords: entropy, features, text extraction