| Aggregate documents: making sense of a patchwork of topical documents | | BIBAK | Full-Text | 3-7 | |
| Michael Shilman | |||
| This working session will be an interactive discussion about multimedia
content transformation. The basic assumption is that content transformation
activities should be provided as non-destructive operations. The final goal of
the panel is to gather researchers within the community interested in
manipulating multimedia content for providing rich user experiences. The
organizers of the panel will moderate and shape the discussion; nevertheless,
position papers from the participants are expected. Keywords: content adaptation, content transformation, multimedia content, structured
multimedia | |||
| Interactive office documents: a new face for web 2.0 applications | | BIBAK | Full-Text | 8-17 | |
| John M. Boyer | |||
| As the world wide web transforms from a vehicle of information dissemination
and e-commerce transactions into a writable nexus of human collaboration, the
Web 2.0 technologies at the forefront of the tranformation may be seen as
special cases of a more general shift in the conceptual application model of
the web. This paper recognizes the conceptual transition and explores the
connections to a new class of interactive office documents that become possible
by tighter integration of the Open Document Format with the W3C's next
generation web forms technology (XForms). The connections transcend simple
provisioning of office document editing and persistence capabilities on the
web. Rather, the advantages of office documents as self-contained entities that
flow through a collaborative network or business process are combined with web
application qualities such as intelligent behavioral interaction, in-process
web service access, and control of server submission content. An office
document mashup called 'Dual Forms' is presented to demonstrate the feasibility
of office document centric web applications. Keywords: ODF, SOA, XForms, XML signature, business process, office document, user
interaction, web service | |||
| Enabling adaptive time-based web applications with SMIL state | | BIBAK | Full-Text | 18-27 | |
| Jack Jansen; Dick C. A. Bulterman | |||
| In this paper we examine adaptive time-based web applications (or
presentations). These are interactive presentations where time dictates the
major structure, and that require interactivity and other dynamic adaptation.
We investigate the current technologies available to create such presentations
and their shortcomings, and suggest a mechanism for addressing these
shortcomings. This mechanism, SMIL State, can be used to add user-defined state
to declarative time-based languages such as SMIL or SVG animation, thereby
enabling the author to create control flows that are difficult to realize
within the temporal containment model of the host languages. In addition, SMIL
State can be used as a bridging mechanism between languages, enabling easy
integration of external components into the web application. Keywords: SMIL, declarative languages, delayed ad viewing, multimedia web applications | |||
| An export architecture for a multimedia authoring environment | | BIBAK | Full-Text | 28-31 | |
| Jan Mikác; Cécile Roisin; Bao Le Duc | |||
| In this paper, we propose an export architecture that provides a clear
separation of multimedia authoring services from publication services. We
illustrate this architecture with the LimSee3 authoring tool and several
standard publication formats: Timesheets, SMIL, and XHTML. Keywords: SMIL, export, multimedia document, publishing format, timesheets | |||
| Adaptation of scalable multimedia documents | | BIBAK | Full-Text | 32-41 | |
| Benoît Pellan; Cyril Concolato | |||
| Several scalable media codecs have been standardized in recent years to cope
with heterogeneous usage conditions and to aim at always providing audio, video
and image content in the best possible quality. Today, interactive multimedia
presentations are becoming accessible on handheld terminals and face the same
adaptation challenges as the media elements they present: quite diversified
screen, memory and processing power capabilities. In this paper, we address the
adaptation of multimedia documents by applying the concept of scalability to
their presentation.
The Scalable MSTI document model introduced in this paper has been designed with two main requirements in mind. First, the adaptation process must be simple to execute because it may be performed on limited terminals in broadcast scenarios. Second, the adaptation process must be simple to describe so that authored adaptation directives can be transported along with the document with a limited bandwidth overhead. The Scalable MSTI model achieves both objectives by specifying Spatial, Temporal and Interactive scalability axes on which incremental authoring can be performed to create progressive presentation layers. Our experiments are conducted on scalable multimedia documents designed for Digital Radio services on DMB channels using MPEG-4 BIFS and also for web services using XHTML, SVG, SMIL and Flash. A scalable image gallery is described throughout this article and illustrates the features offered by our document model in a rich multimedia example. Keywords: document adaptation, document model, multimedia scalability | |||
| Automated repurposing of implicitly structured documents | | BIBAK | Full-Text | 42-51 | |
| Helen Balinsky; Anthony Wiley; Michael Rhodes; Alfie Abdul-Rahman | |||
| The different visual cues present in a document -- such as spatial intervals
and positions, contrast in font families, sizes and weights -- combine to form
the document's visual hierarchy. This hierarchy is essential to the reader,
allowing scanning and comprehension; in contrast, this information is often
ignored by machine processing. At the same time, the document structure is
often not available in a machine readable form due to the ways documents were
originally created or later transformed. This paper addresses the challenge of
automatic document repurposing -- applying styling and formatting from one
'implicitly' structured document to another, whilst preserving the underlying
visual hierarchy. Using visual perception analysis, the proportionality mapping
is established, according to which the original document content is transformed
into the new style without breaking the original hierarchical structure.
Spatial relationships, location and frequency analysis are then used to
fine-tune the transformation. Keywords: cap-height, document repurposing, hierarchical metrics and structure,
injective mapping, x-height | |||
| Merging changes in XML documents using reliable context fingerprints | | BIBAK | Full-Text | 52-61 | |
| Sebastian Rönnau; Christian Pauli; Uwe M. Borghoff | |||
| Different dialects of XML have emerged as ubiquitous document exchange
formats. For effective collaboration based on such documents, the capability to
propagate edit operations performed on a document is indispensable. In order to
avoid the transmission of whole documents, deltas are used to describe these
edit operations, allowing the construction of a new version of a document.
However, patching a document with a delta it was not generated for is
error-prone, and any insert or delete operations performed on the document are
likely to affect all subsequent paths within that document.
In this paper, we present a delta format for XML documents that uses context-aware fingerprints to identify edit operations. This allows our XML patch procedure to find the correct position of an edit operation, even if the document was updated in the meantime. Possible conflicts are detected. Experimental results show the reliability of the presented fingerprinting technique and prove the high quality of the resulting patched documents. Keywords: CSCW, XML diff, XML patch, fingerprint, office applications, version control | |||
| A concise XML binding framework facilitates practical object-oriented document engineering | | BIBAK | Full-Text | 62-65 | |
| Andruid Kerne; Zachary O. Toups; Blake Dworaczyk; Madhur Khandelwal | |||
| Semantic web researchers tend to assume that XML Schema and OWL-S are the
correct means for representing the types, structure, and semantics of XML data
used for documents and interchange between programs and services. These
technologies separate information representation from implementation. The
separation may seem like a benefit, because it is platform-agnostic. The
problem is that the separation interferes with writing correct programs for
practical document engineering, because it violates a primary principle of
object-oriented programming: integration of data structures and algorithms. We
develop an XML binding framework that connects Java object declarations with
serialized XML representation. A basis of the framework is a metalanguage,
embedded in Java object and field declarations, designed to be particularly
concise, to facilitate the authoring and maintenance of programs that generate
and manipulate XML documents. The framework serves as the foundation for a
layered software architecture that includes meta-metadata descriptions for
multimedia information extraction, modeling, and visualization; Lightweight
Semantic Distributed Computing Services; interaction logging services; and a
user studies framework. Keywords: Java, XML, binding framework, metalanguage, object-oriented programming,
translation | |||
| Malan: a mapping language for the data manipulation | | BIBAK | Full-Text | 66-75 | |
| Arnaud Blouin; Olivier Beaudoux; Stéphane Loiseau | |||
| Malan is a MApping LANguage that allows the generation of transformation
programs by specifying a schema mapping between a source and target data
schema. By working at the schema level, Malan remains independent of any
transformation process; it also naturally guarantees the correctness of the
transformation target relative to its schema. Moreover, by expressing schemas
as UML class diagrams, Malan schema mappings can be written on top of UML
modellers. This paper describes the overall approach by focusing on the Malan
language itself, and its use within a transformation process. Keywords: UML, data manipulation, malan, mapping, schema transformation, schema
translation | |||
| Configurable editing of XML-based variable-data documents | | BIBAK | Full-Text | 76-85 | |
| John Lumley; Roger Gimson; Owen Rees | |||
| Variable data documents can be considered as functions of their bindings to
values, and this function could be arbitrarily complex to build
strongly-customised but high-value documents. We outline an approach for
editing such documents from example instances, which is highly configurable in
terms of controlling exactly what is editable and how, capable of being used
with a wide variety of XML-based document formats and processing pipelines, if
certain reasonable properties are supported and can generate appropriate
editors automatically, including web-service deployment. Keywords: SVG, XSLT, document construction, document editing, functional programming | |||
| Tracking sub-page components in document workflows | | BIBAK | Full-Text | 86-89 | |
| James A. Ollis; Steven R. Bagley; David F. Brailsford | |||
| Documents go through numerous transformations and intermediate formats as
they are processed, in a workflow, from abstract markup into final printable
form. Unfortunately, it is common to find that ideas about document components,
which might exist in the source code for the document, become completely lost
within an amorphous, unstructured, page of PDF prior to being rendered. Given
the importance of a component-based approach in Variable Data Printing (VDP) we
have developed a collection of tools that allow information about the various
transformations to be embedded at each stage in the workflow, together with a
visualization tool that uses this embedded information to display the
relationships between the various intermediate documents.
We demonstrate these tools in the context of an example workflow using DocBook markup but the techniques described are widely applicable and would be easily adaptable to other workflows and for use in teaching tools to illustrate document component and VDP concepts. Keywords: COGs, DocBook, PDF, VDP, XSL-FO, XSLT, document components, document
workflows, education | |||
| Higher-level layout through topological abstraction | | BIBAK | Full-Text | 90-99 | |
| Angelo Di Iorio; Luca Furini; Fabio Vitali; John Lumley; Tony Wiley | |||
| Existing layout languages provide support for geometric properties allowing
-- and in a sense forcing -- users to give a complete geometric description of
the desired output: if the characteristics of the output medium change, the
layout of the whole document has to be reworked completely, as the properties
set by the user are no longer appropriate for the modified context.
In this paper we propose a different paradigm which allows users to produce layouts by describing their topological and abstract properties, rather than geometric ones. We first define and detail topological properties as abstract relationships between the document components, independent from the output characteristics, and then describe an XML-based layout language based on these concepts, called TALL. A running engine able to transform topological layouts into actual PDF files, based on XSLT and the DDF framework, is presented as well. Keywords: DDF, TALL, XSLT, automatic layouts, topological layouts | |||
| An office document mashup for document-centric business processes | | BIBAK | Full-Text | 100-101 | |
| John M. Boyer; Eric Dunn; Maureen Kraft; Jun S. H. Liu; Mihir R. Shah; He Feng Su; Saurabh Tiwari | |||
| An office document mashup called 'Dual Forms' is presented to demonstrate
the feasibility and advantages of imbuing an office document with intelligent
interaction capabilities, access to web services of a service-oriented
architecture (SOA), digital signatures for legally binding contractual
agreements, and a self-submission capability that allows the document to flow
through a collaborative network or business process. Keywords: ODF, SOA, XForms, XML signature, business process, office document, user
interaction, web service | |||
| Image collection taxonomies for photo-book auto-population with intuitive interaction | | BIBAK | Full-Text | 102-103 | |
| Pere Obrador; Nathan Moroney; Ian MacDowell; Eamonn O'Brien-Strain | |||
| We demonstrate a system for automatic image selection for photobook
creation, along with an intuitive user interface for fine tuning of the
selection results. A versatile image collection representation is introduced,
which allows for automatic scalable selection in order to target a specific
image count for a predetermined size photobook. The images are selected based
on their relevance, while preserving a good coverage of the event (time plus
people) in order to maintain the storytelling potential of the selection. The
selected images are laid out and presented to the user through an Adobe Flex
user interface, which allows them to select images and swap them by
semantically related ones, in an intuitive manner. The final result is output
to a PDF file. Keywords: automatic photo selection, hierarchy, image appeal, image collection,
near-duplicate detection, scalability, time clustering | |||
| A prototype documenter system for medical grand rounds | | BIBAK | Full-Text | 104-105 | |
| Renato de Freitas Bulcão-Neto; José Antonio Camacho-Guerrero; Alessandra Alaniz Macedo | |||
| This paper demonstrates our ongoing experience on a documenter system for
medical grand rounds. The system captures and synchronizes the set of material
presented and corresponding physicians' interactions, automatically relates
clinical cases of patients, and then generates web-accessible documents with
all information captured. The resulting documentation can be used for several
purposes such as teaching, research and presurgical decision taking. Keywords: documentation, extension, pervasive healthcare | |||
| A content-based approach for document representation and retrieval | | BIBAK | Full-Text | 106-109 | |
| Antonio M. Rinaldi | |||
| In the last few years, the problem of defining efficient techniques for
knowledge representation is becoming a challenging topic in both academic and
industrial community. The large amount of available data creates several
problems in terms of information overload. In this framework, we assume that
new approaches for knowledge definition and representation may be useful, in
particular the ones based on the concept of ontology. In this paper we propose
a suitable model for knowledge representation purposes using linguistic
concepts and properties. We implement our model in a system which, using novel
techniques and metrics, analyzes documents from a semantic point of view using
as context of interest the Web. Experiments are performed on a test set built
using a directory service to have information about analyzed documents. The
obtained results compared with other similar systems show an effective
improvement. Keywords: WordNet, ontologies, semantic relatedness metrics | |||
| No mining, no meaning: relating documents across repositories with ontology-driven information extraction | | BIBAK | Full-Text | 110-118 | |
| Víctor Codocedo; Hernán Astudillo | |||
| Far from eliminating documents as some expected, the Internet has lead to a
proliferation of digital documents, without a centralized control or indexing.
Thus, identifying relevant documents becomes simultaneously more important and
much harder, since what users require may be dispersed across many documents
and many repositories. This paper describes Ontologic Anchoring, a technique to
relate documents in domain ontologies, using named entity recognition (a
natural-language processing approach) and semantic annotation to relate
individual documents to elements in ontologies. This approach allows document
retrieval using domain-level inferences, and integration of repositories with
heterogeneous media, languages and structure. Ontological anchoring is a
two-way street: ontologies allow semantic indexing of documents, and
simultaneously new documents enrich ontologies. The approach is illustrated
with an initial deployment for heritage documents in Spanish. Keywords: NLP, human-in-the-loop, information extraction, metadata creation,
ontological anchoring, ontology | |||
| Document logs: a distributed approach to metadata for better security and flexibility | | BIBAK | Full-Text | 119-122 | |
| Michael Gormish; Greg Wolff; Kurt Piersol; Peter Hart | |||
| A document log is an ordered list of entries providing a history for any
sort of media or file, just as a logfile provides a history of a computer
program and a logbook provides a history of a journey. The history of a
document may consist of copyright information, approvals, annotations, or any
sort of metadata. This paper describes a metadata architecture using Content
Based Identifiers and Document Logs that facilitates location of metadata from
distributed sources, caching, ordering of log entries, and detection of changes
in metadata or documents. The techniques used complement existing metadata
format standards and are contrasted with storage of metadata in a file or
document management system. Keywords: hash chain, time-stamp, uuid | |||
| The CONCUR framework for community maintenance of curated resources | | BIBAK | Full-Text | 123-126 | |
| Patrick Schmitz | |||
| The increasing use of computational linguistics for semantic search and
discovery tools requires much work on development and maintenance of associated
ontologies. Related applications depend upon curated resources like
dictionaries, gazetteers, etc. In order to scale these application models and
leverage the respective communities of interest, a new set of tools is needed
that facilitate community development and extension of these resources while
retaining the curatorial model to ensure a reliable, high quality resource. We
describe the requirements and principles for such a system, and present the
CONCUR framework that addresses these needs. CONCUR defines a reputation model
and a set of reusable infrastructure services to maintain the resource. The
reputation model combines correctness as well as utility of participants'
contributions, tracked over time and by sub-domain within the resource. We
describe the architectural issues of the model, potential applications, and
continuing research on the model. Keywords: SOA, community, curation, ontology, structured information | |||
| Online ancient documents: Armarius | | BIBAK | Full-Text | 127-130 | |
| Reim Doumat; Elöd Egyed-Zsigmond; Jean-Marie Pinon; Emese Csiszar | |||
| Many museums and libraries digitize their collections of historical
manuscripts to preserve the historic documents and to facilitate their
browsing. The collections are available as digital images and they need
annotation to be accessible and exploitable. The annotations can be created
manually, automatically or semi-automatically. Manual annotation is expensive
and tedious; hence the reuse of users' experiences, by tracing their actions
during the annotation process, helps other users to accomplish repetitive tasks
in a semi-automatic manner. In this article we present a digital archive model
and prototype of a collaborative system for the management of online ancient
manuscript. The application offers an online annotation service, an assistant
for semi-automatic annotation, and a tracing system that saves traces of
important actions in order to reuse them in a recommender system afterward. Keywords: document categorization and classification, integrating documents with other
digital artifacts, system | |||
| Satisficing scrolls: a shortcut to satisfactory layout | | BIBAK | Full-Text | 131-140 | |
| Nathan Hurst; Kim Marriott | |||
| We present at a new approach to finding aesthetically pleasing page layouts.
We do not aim to find an optimal layout, rather the aim is to find a layout
which is not obviously wrong. We consider vertical scroll-like layout with
floating figures referenced within the text where floats can have alternate
sizes, may be optional, move from one side to the other and change their order.
We also allow pagination. Our approach is to use a randomised local search
algorithm to explore different configurations of floats, i.e. choice of floats
and relative ordering. For a particular float configuration we use an efficient
gradient projection-like continuous optimization algorithm. The resulting
system is fast and provides an efficient warm start option to improve
interactive support. Keywords: floating figure, multi-column layout, optimisation techniques | |||
| Two algorithms for automatic document page layout | | BIBAK | Full-Text | 141-149 | |
| João Batista S. de Oliveira | |||
| This paper describes two approaches to the problem of automatically placing
document items on pages of some output device. Both solutions partition the
page into regions where each item is to be placed, but work on different input
data according to the application: One approach assumes that previously defined
rectangular items are to be placed freely on the page (as in a sales brochure),
whereas the second approach places free-form items on pages divided into
columns (as in a newspaper). Moreover, both approaches try to preserve the
reading order provided by the input and use all available area on the page. The
algorithms implementing those approaches and based on recursive page division
are presented, as well as test results, possible changes and research
directions. Keywords: automatic page layout, packing, placement algorithms | |||
| PDF document restoration and optimization during image enhancement | | BIBAK | Full-Text | 150-153 | |
| Hui Chao; Carl Staelin; Sagi Schein; Marie Vans; John Lumley | |||
| We present a document processing method that addresses some of the practical
challenges in image enhancement for digital photo album in PDF documents. With
the advent of digital offset presses, consumer photo books are becoming
increasingly popular, and most such workflows convert the consumer's photos and
layout into PDF documents. In order to produce appealing photo albums from
consumer photographs, some form of automatic enhancement is usually required,
and this enhancement is often done late in the workflow just before printing,
and therefore it is done on the PDF file. If each and every PDF generation tool
simply inserted a single complete image each time an image appeared in the
document, then the process of opening a PDF document, iterating through the
document, extracting, enhancing, and replacing images, and then saving the
enhanced document would be relatively easy. Unfortunately, PDF generation tools
often violate that assumption in two ways. Firstly, large images are often
written as a set of small images in strips or tiles, which visually appear to
be a single image. Secondly, an image in a PDF document may be reused in the
document on different position and pages; directly enhancing images without the
consideration of the reuse model could result in great increase in the document
size and poor system performance. Therefore, image reconstruction and document
optimization were performed in our PDF photo album enhancement solution. Keywords: PDF optimization, document enhancement, image stitching | |||
| Authoring adaptive diagrams | | BIBAK | Full-Text | 154-163 | |
| Cameron McCormack; Kim Marriott; Bernd Meyer | |||
| The web and digital media requires intelligent, adaptive documents whose
appearance and content adapts to the viewing context and which support user
interaction. While previous research has focussed on textual and multimedia
content, this is also true for diagrammatic content. We have designed and
implemented an authoring tool which supports the construction of adaptive
diagrams. Adaptive layout behaviour is specified by using constraint-based
placement tools as well as by allowing the author to specify more radical
layout changes using alternate layout configurations. As well as specifying
alternate layouts, the author can specify alternate representations for an
object, alternate styles and alternate textual content. The resulting space of
different versions of the diagram is the cross product of these different
alternatives. At display time the version is constructed dynamically, taking
into account the author specified preference order on the alternatives, current
viewing environment, and user interaction. Keywords: adaptive layout, authoring, diagrams | |||
| Towards extending and using SPARQL for modular document generation | | BIBAK | Full-Text | 164-172 | |
| Faisal Alkhateeb; Sébastien Laborie | |||
| RDF is one of the most used languages for resource description and SPARQL
has become its standard query language. Nonetheless, SPARQL remains limited to
generate automatically documents from RDF repositories, as it can be used to
construct only RDF documents. We propose in this paper an extension to SPARQL
that allows to generate any kind of XML documents from multiple RDF data and a
given XML template. Thanks to this extension, an XML template can itself
contain SPARQL queries that can import template instances. Such an approach
allows to reuse templates, divide related information into various templates
and avoid templates containing mixed languages. Moreover, reasoning
capabilities can be exploited using RDF Schema or simply RDFS. Keywords: RDF, SPARQL, XML document generation, semantic web, template | |||
| Fast identification of visual documents using local descriptors | | BIBAK | Full-Text | 173-176 | |
| Eduardo Valle; Matthieu Cord; Sylvie Philipp-Foliguet | |||
| In this paper we introduce a system for the identification of visual
documents. Since it stems from content-based document indexing and retrieval,
our system does not need to rely on textual annotations, watermarks or other
metadata, which can be missing or incorrect. Our retrieval system is based on
local descriptors, which have been shown to provide accurate and robust
description. Because of the high computational costs associated to the matching
of local descriptors, we propose Projection KD-Forest: an indexing technique
which allows efficient approximate k nearest neighbors search. Experiments
demonstrate that the Projection KD-Forest allows the system to provide prompt
results with negligible loss on accuracy. The Projection KD-Forest also
compares well when contrasted to other strategies of k nearest neighbors
search. Keywords: copy detection, document identification, image retrieval, k nearest
neighbors search, local descriptors, multidimensional indexing | |||
| Improving query performance on XML documents: a workload-driven design approach | | BIBAK | Full-Text | 177-186 | |
| Rebeca Schroeder; Ronaldo dos Santos Mello | |||
| As XML has emerged as a data representation format and as great quantities
of data have been stored in the XML format, XML document design has become an
important and evident issue in several application contexts. Methodologies
based on conceptual modeling are being tightly applied for designing XML
documents. However, the conversion of a conceptual schema to an XML schema is a
complex process. In many cases, conceptual relationships cannot be represented
in a hierarchy so that they have to be represented by reference relationships
in the XML schema. The problem is that reference relationships generate a
disconnected XML structure and, consequently, produce an overhead cost for
query processing on XML documents.
This paper presents a design approach for generating XML schemas from conceptual schemas considering the expected workload of the XML applications. Query workload is used to produce XML schemas which minimize the impact of the reference relationships on query performance. We evaluate our approach through a case study where a set of XML documents are redesigned by our methodology. The results demonstrate that query performance is improved in terms of the number of accesses generated by the queries on the XML documents designed by our approach. Keywords: XML schemas, conceptual schemas, query performance | |||
| Similarity of XML schema definitions | | BIBAK | Full-Text | 187-190 | |
| Irena Mlýnková | |||
| In this paper we propose a technique for evaluating similarity of XML Schema
fragments. Firstly, we define classes of structurally and semantically
equivalent XSD constructs. Then we propose a similarity measure that is based
on the idea of edit distance utilized to XSD constructs and enables one to
involve various additional similarity aspects. In particular, we exploit the
equivalence classes and semantic similarity of element/attribute names. Using
experiments we show the behavior and advantages of the proposal. Keywords: XML schema, equivalence of XSD constructs, similarity | |||
| Matching XML documents in highly dynamic applications | | BIBAK | Full-Text | 191-198 | |
| Adrovane M. Kade; Carlos A. Heuser | |||
| Highly dynamic applications like the Web and peer-to-peer systems require a
great deal of effort in document management. Documents from different sources
may contain parts that, although having different structure or different
contents, may be considered as representing the same conceptual information.
One essential task in this scenario is the identification of complementary or
overlapping documents that need to be integrated. In this paper, we deal
specifically with documents represented in the XML format. XML document
integration is an important process in highly dynamic applications, for the
volume of data available in this format is constantly growing. XML integration
is also a challenging task, due to the flexible nature of XML, which may lead
to structure divergences and content conflicts between the documents. In this
work, we present a novel approach to the matching problem, i.e., the problem of
defining which parts of two documents contain the same information. Matching is
usually the first step of an integration process. Our approach is novel in the
sense it combines similarity information from the content of the elements with
information from the structure of the documents. This feature, as our
experiments confirm, makes our approach capable of dealing with content as well
as structural divergences. Keywords: XML, document management, matching, similarity measure | |||
| Automatic keyphrase extraction from scientific documents using N-gram filtration technique | | BIBAK | Full-Text | 199-208 | |
| Niraj Kumar; Kannan Srinathan | |||
| In this paper we present an automatic Keyphrase extraction technique for
English documents of scientific domain. The devised algorithm uses n-gram
filtration technique, which filters sophisticated n-grams {1dnd4} along with
their weight from the words of input document. To develop n-gram filtration
technique, we have used (1) LZ78 data compression based technique, (2) a simple
refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term
weighting scheme. In term weighting scheme, we have introduced the importance
of position of sentence (where given phrase occurs first) in document and
position of phrase in sentence for documents of scientific domain (which is
literally more organized than other domains). The entire system is based upon
statistical observations, simple grammatical facts, heuristics, and lexical
information of English language. We remark that the devised system does not
require a learning phase. Our experimental results with publically available
text dataset, shows that the devised system is comparable with other known
algorithms. Keywords: information extraction, information retrieval, keyphrase extraction,
scientific domain | |||
| Semantic impact graphs for information valuation | | BIBAK | Full-Text | 209-212 | |
| Sinan al-Saffar; Gregory L. Heileman | |||
| Information valuation has typically been carried out implicitly in
question-answering and document retrieval systems. We argue that explicit
information valuation is needed to move away from the system and
process-centric nature of implicit valuation which has also hindered the
theoretical study of information value under a unified and explicit framework.
In this paper we present a graphical-based model for explicit information
valuation. Our model caters to the subjective nature of information quality by
measuring the impact a candidate piece of information may have on a knowledge
base representing the recipient's world view. Our model is capable of
evaluating information semantically at the statement level and is in effect
basing information-valuation on information-understanding. However, information
value can be computed and predicted using our causal graph model without
requiring full logical inference typically needed for
information-understanding. Keywords: document ranking, information retrieval, information valuation, semantic web
search | |||
| Identifying and expanding titles in web texts | | BIBAK | Full-Text | 213-216 | |
| Clémentine Adam; Estelle Delpech; Patrick Saint-Dizier | |||
| In this paper, we present an analysis based on linguistic and typographic
features that allows for the identification of titles in web documents. We
focus in particular on procedural texts. Identifying titles is a difficult task
because ways of encoding them are very diverse. A number of titles are also
incomplete because of context, we propose therefore a way to retrieve the
missing elements, in particular predicates, so that titles are fully
intelligible. Keywords: structure analysis, text semantics, text titles | |||
| A demonstration of a configurable editing framework | | BIBAK | Full-Text | 217-218 | |
| John Lumley; Roger Gimson; Owen Rees | |||
| XML-based variable data documents are special cases of XML documents
subjected to processing before final visualisation. We demonstrate how such
'templates' can be edited from specific instances in a generalised manner and
that this can be supported by a highly extensible and configurable editing
framework. The demonstration covers simple authoring actions, higher-level
authoring control (altering the editability within a document), reconfiguring
the overall editor capability, using alternative 'views' of documents and
exploiting the framework to modify generalised XML 'files', including some of
those that define the editor itself. Keywords: SVG, XSLT, document construction, document editing, functional programming | |||
| Playback of mixed multimedia document | | BIBAK | Full-Text | 219-220 | |
| Cyril Concolato; Jean Le Feuvre | |||
| Many multimedia languages exist today to describe animated, interactive, 2D
or 3D graphics and media elements, and each language has its merits. We studied
the problems underlying the integration of all these languages in a single
player. We present here the result of this work, and in particular, we
demonstrate the mixed playback of SVG, BIFS, LASeR, Flash or VRML/X3D content. Keywords: BIFS, SVG, VRML, mixed documents, multimedia player | |||
| Scalable multimedia documents for digital radio | | BIBAK | Full-Text | 221-222 | |
| Benoit Pellan; Cyril Concolato | |||
| In this paper, we demonstrate the adaptation of multimedia digital radio
services in broadcast environments based on scalable multimedia documents. The
authoring of our multimedia services relies on the Scalable MSTI model that
decomposes multimedia documents into three ordered dimensions: Spatial,
Temporal and Interactive descriptions. Our demonstration shows Scalable MSTI
multimedia documents that can be adapted to typical T-DMB digital radio usage
scenarios. Keywords: DMB digital radio, digital radio, document adaptation, multimedia radio
services, multimedia scalability | |||
| An exploratory mapping strategy for web-driven magazines | | BIBAK | Full-Text | 223-229 | |
| Fabio Giannetti | |||
| "There will always (I hope) be print books, but just as the advent of
photography changed the role of painting or film changed the role of theater in
our culture, electronic publishing is changing the world of print media. To
look for a one-to-one transposition to the new medium is to miss the future
until it has passed you by." -- Tim O'Reilly [1].
It is not hard to envisage that publishers will leverage subscribers' information, interest groups' shared knowledge and others sources to enhance their publications. While this enhances the value of the publication through more accurate and personalized content, it also brings a new set of challenges to the publisher. Content is now driven by web and in a truly automated system no designer "re-touch" intervention can be envisaged. The paper introduces an exploratory mapping strategy to allocate web driven content in a highly graphical publication like a traditional magazine. Two major aspects of the mapping are covered, which enables different level of flexibility and addresses different content flowing strategies. The last contribution is an evaluation of existing standards, which potentially can leverage this work to incorporate more flexible mapping, and subsequently, composition capabilities. Keywords: SVG, XML, XPS, XSL-FO, content driven pagination, layout, print, template,
transactional printing, variable data print | |||
| PrintMonkey: giving users a grip on printing the web | | BIBAK | Full-Text | 230-239 | |
| Jennifer Baldwin; James A. Rowson; Yvonne Coady | |||
| Web content is notoriously difficult to capture on a printed page due to
inconsistent and undesired results. Items that users may not want to print,
such as media, navigation menus and more show up on their page. Other items
that they may care about are truncated or spread across several pages. Some
tools exist to help users with what is printed, but they often are cumbersome
to use or are costly for a company to maintain. Therefore, we introduce
PrintMonkey, which allows users to write their own printing templates and share
them with others on the web. No modifications to the original webpages are
required and users with less development experience can use and develop
templates. A comparison with four alternative solutions reveals the concrete
ways in which PrintMonkey improves upon existing approaches in terms of
functionality, customizability and scalability. Keywords: JavaScript, customized browsing, print templates, printing the web, screen
scraping | |||
| Towards Brazilian Portuguese automatic text simplification systems | | BIBAK | Full-Text | 240-248 | |
| Sandra M. Aluísio; Lucia Specia; Thiago A. S. Pardo; Erick G. Maziero; Renata P. M. Fortes | |||
| In this paper we investigate the main linguistic phenomena that can make
texts complex and how they could be simplified. We focus on a corpus analysis
of simple account texts available on the web for Brazilian Portuguese and
propose simplification strategies for this language. This study illustrates the
need for text simplification to facilitate accessibility to information by poor
literacy readers and potentially by people with other cognitive disabilities.
It also highlights characteristics of simplification for Portuguese, which may
differ from other languages. Such study consists of the first step towards
building Brazilian Portuguese text simplification systems. One of the scenarios
in which these systems could be used is that of reading electronic texts
produced, e.g., by the Brazilian government or by relevant news agencies. Keywords: Brazilian Portuguese, corpus analysis, natural language processing, poor
literacy readers, text simplification | |||
| Constructing a know-how repository of advices and warnings from procedural texts | | BIBAK | Full-Text | 249-252 | |
| Lionel Fontan; Patrick Saint-Dizier | |||
| In this paper, we show how a domain dependent know-how textual database of
advices and warnings can be constructed from procedural texts. We show how
arguments of type warnings and advices can be annotated and extracted from
procedural texts, and propose a format and a strategy to automatically generate
a know-how textual database. Keywords: automatically generated document, structure and content analysis, text
semantics | |||
| Summarizing and referring: towards cohesive extracts | | BIBAK | Full-Text | 253-256 | |
| Patricia Nunes Gonçalves; Lucia Rino; Renata Vieira | |||
| In this paper we propose and evaluate a system for summary post-edition,
which aims at replacing referential expressions, trying to avoid referencial
cohesion problems. To propose expressions that best represent the evoked
entity, the system uses knowledge about coreference chains. We evaluate the
system both with knowledge provided by manual and automatic annotation of
coreference chains. Keywords: automatic summarization, coreference chains, referencial cohesion | |||
| Keeping a digital library clean: new solutions to old problems | | BIBAK | Full-Text | 257-262 | |
| Alberto H. F. Laender; Marcos André Gonçalves; Ricardo G. Cota; Anderson A. Ferreira; Rodrygo L. T. Santos; Allan J. C. Silva | |||
| Digital Libraries are complex information systems that involve rich sets of
digital objects and their respective metadata, along with multiple
organizational structures and services (e.g., searching, browsing, and
personalization), and are normally built having a target community of users
with specific interests. Central to the success of this type of system is the
quality of their services and content. In the context of DLs of scientific
literature, among the many problems faced to sustain their information quality,
two specific ones, related to information consistency, have taken a lot of
attention from the research community: name disambiguation and lack of
information to access the full-text of cataloged documents. In this paper, we
examine these two problems and describe the solutions we have proposed to solve
them. Keywords: citation management, digital libraries, full-text management, information
quality, name disambiguation | |||
| An optical character recognition approach to qualifying thresholding algorithms | | BIBAK | Full-Text | 263-266 | |
| Margaret Sturgill; Steven J. Simske | |||
| Pre-processing for raster image based document segmentation begins with
image thresholding, which is a binarization process separating foreground from
background. In this paper, we compare an existing (Otsu), modified existing
(Kittler-Illingworth) and simple peak-based thresholding approach on a set of
982 documents for which existing ground truth (full text) is available. We use
the output of an open source OCR engine which incorporates an adaptive/dynamic
thresholder that can be bypassed by one of the three global thresholds we
tested. This allowed comparison of these three approaches in the aggregate. We
then used an independently-generated dictionary as a means of characterizing
thresholder efficacy. Such an approach, if successful, will provide the means
for selecting an optimal thresholder in the absence of a large set of ground
truthed documents. Our preliminary findings here indicate that this approach
may provide a reliable means for thresholder comparison and eventually preclude
the need for time-intensive human ground truthing. Keywords: Kittler-Illingworth, OCR, accuracy, meta-algorithms, Otsu, testing,
threshold | |||
| A rotation method for binary document images using DDA algorithm | | BIBAK | Full-Text | 267-270 | |
| Duc Thanh Nguyen | |||
| DDA (Digital Differential Analyzer) is a famous algorithm used commonly in
computer graphics to interpolate integer coordinate pixels of a straight line.
In this paper, we introduce a method of image rotation for binary document
images using DDA algorithm with assumption that the true skew angles of the
documents have already been computed. The proposed method applies the main idea
of DDA algorithm with some modifications for the skew scanning lines along to
the inverse direction of the skew angle. In this method the ratios between the
length of black runs and the whole scan line are guaranteed. Thus the algorithm
can overcome disadvantages of mathematical rotation such as white holes and
over segmentation. Moreover, using DDA algorithm to approximate integer points
helps this method reduce the number of rotation operations. Keywords: DDA, rotation algorithm, skew correction | |||
| Segmentation of overlapping cursive handwritten digits | | BIBAK | Full-Text | 271-274 | |
| Carlos A. B. Mello; Edward Roe; Everton B. Lacerda | |||
| In this paper, we describe an approach for the problem of segmenting
overlapping characters. We are working with digit segmentation for bank check
processing. Our method is based on the idea of a hypothetical ball traversing
the number. The inertia of the movement segments the overlapping digits. Rules
are defined for this movement. Our initial proposal achieved very good results
with O(n{sup:2}) complexity. Keywords: document processing, overlapping digits, segmentation | |||
| Multimedia adaptation in ubiquitous environments: benefits of structured multimedia documents | | BIBAK | Full-Text | 275-284 | |
| Pablo Cesar; Ishan Vaishnavi; Ralf Kernchen; Stefan Meissner; Cristian Hesselman; Matthieu Boussard; Antonietta Spedalieri; Dick C. A. Bulterman; Bo Gao | |||
| This paper demonstrates the advantages of using structured multimedia
documents for session management and media distribution in ubiquitous
environments. We show how document manipulations can be used to perform
powerful operations such as content to context adaptation and presentation
continuity. When consuming media in ubiquitous environments, where the set of
devices surrounding a user may change, dynamic media adaptation and session
transfer become primary requirements. This paper presents a working system,
based on a representative scenario, in which multimedia content is distributed
and adapted to a movable user to best suit his/her contextual situation. The
implemented scenario includes the following scenes: content selection using a
personal mobile phone, content distribution to the most suitable device
according to the user's context, and presentation continuity when the user
moves to another location. This paper introduces the underlying document
manipulations that turn the scenario into a working system. Keywords: SMIL, multimedia adaptation, session continuity, structured multimedia
documents | |||
| A visual approach for modeling spatiotemporal relations | | BIBAK | Full-Text | 285-288 | |
| Rodrigo Laiola Guimarães; Carlos de Salles Soares Neto; Luiz Fernando Gomes Soares | |||
| Textual programming languages have proven to be difficult to learn and to
use effectively for many people. For this sake, visual tools can be useful to
abstract the complexity of such textual languages, minimizing the specification
efforts. In this paper we present a visual approach for high level
specification of spatiotemporal relations. In order to accomplish this task,
our visual representation provides an intuitive way to specify complex
synchronization events amongst media. Finally, to validate our work, the visual
specification is mapped to NCL (Nested Context Language), the standard
declarative language of the Brazilian Terrestrial Digital TV System. Keywords: NCL, SBTVD, connector, spatiotemporal relations, synchronization, visual
representation, visual specification | |||
| Intermedia synchronization management in DTV systems | | BIBAK | Full-Text | 289-297 | |
| Romualdo Monteiro de Resende Costa; Marcelo Ferreira Moreno; Luiz Fernando Gomes Soares | |||
| Intermedia synchronization is related with spatial and temporal
relationships among media objects that compound a DTV application. From the
server side (usually a broadcaster's server or a Web Server) to receivers,
end-to-end intermedia synchronization support must be provided. Based on
application specifications, several abstract data structures should be created
to guide all synchronization control processes. A special data structure, a
labeled digraph called HTG (Hypermedia Temporal Graph) is proposed in this
paper as the basis of all other data structures. From HTG, receivers derive a
presentation plan to orchestrate media content presentations that make up a DTV
application. From this plan other data structures are derived to estimate when
media players should be instantiated and when data contents should be retrieved
from a DSM-CC carousel or from a return channel. If the return channel provides
QoS support, another data structure is derived from the presentation plan, in
order to determine when resource reservation should take place. For content
pushed by broadcasters, HTG is used in the server side as the basis for
building the carousel plan, a data structure that guides the order and
frequency that media objects should be broadcasted.
The paper's proposals were partially put into practice in the current open source reference implementation of the standard middleware of the Brazilian Terrestrial Digital TV System. However, this reference implementation is used just as a proof of concept. The ideas presented can be extended to any multimedia document presentation player (user agent) and content distribution server. Keywords: NCL, digital TV, intermedia synchronization, middleware, temporal graph | |||
| End-user editing of interactive multimedia documents | | BIBAK | Full-Text | 298-301 | |
| Maria da Graça C. Pimentel; Renan G. Cattelan; Erick L. Melo; Cesar A. C. Teixeira | |||
| The problem of allowing user-centric control within multimedia presentations
is important to document engineering when the presentations are specified as
structured multimedia documents. In this paper we investigate the problem in
the context of end-user "real-time" editing of interactive video programs. Keywords: interactive multimedia, interactive video | |||