| The "puzzle" of sensory perception: putting together multisensory information | | BIBA | Full-Text | 1 | |
| Marc O. Ernst | |||
| For perceiving the environment our brain uses multiple sources of sensory
information derived from several different modalities, including vision, touch
and audition. The question how information derived from these different sensory
modalities converges in the brain to form a coherent and robust percept is
central to understanding the process of perception. My main research interest
is the study of human perception focusing on multimodal integration and
visual-haptic interaction. For this, I use quantitative
computational/statistical models together with psychophysical and
neuropsychological methods.
A desirable goal for the perceptual system is to maximize the reliability of the various perceptual estimates. From a statistical viewpoint the optimal strategy for achieving this goal is to integrate all available sensory information. This may be done using a "maximum-likelihood-estimation" (MLE) strategy. Then the combined percept will be a weighted average across the individual estimates with weights that are proportional to their reliabilities. In a recent study we could show that humans actually integrate visual and haptic information in such a statistically optimal fashion (Ernst & Banks, Nature, 2002). Others have now demonstrated that this finding is true not only for the integration across vision and touch, but also for the integration of information across and within other modalities, such as audition or vision. This suggests that maximum-likelihood-estimation is an effective and widely used strategy exploited by the perceptual system. By integrating sensory information the brain may or may not loose access to the individual input signals feeding into the integrated percept. The degree to which the original information is still accessible defines the strength of coupling between the signals. We found that the strengths of coupling is varying depending on the set of signals used; e.g. strong coupling for stereo and texture signals to slant and weak coupling for visual and haptic signals to size (Hillis, Ernst, Banks, & Landy, Science, 2002). As suggested by one of our recent learning studies, the strength of coupling, which can be modeled using Bayesian statistics, seems to depend on the natural statistical co-occurrence between signals (Jäkel & Ernst, in prep.) Important precondition for integrating signals is to know which signals derived from the different modalities belong together and how reliable these are. Recently we could show that touch can teach the visual modality how to interpret its signals and their reliabilities. More specifically, we could show that by exploiting touch we can alter visual perception of slant (Ernst, Banks & Bulthoff, Nature Neuroscience, 2000). This finding contributes to a very old debate postulating that we only perceive the world because of our interactions with the environment. Similarly, in one of our latest studies we could show that experience can change the so-called "light-from-above" prior. Prior knowledge is essential for the interpretation of sensory signals during perception. Consequently, with the prior change we introduced a change in the perception of shape (Adams, Graf & Ernst, Nature Neuroscience, 2004). Integration is only sensible if the information sources carry redundant information. If the information sources are complementary, different combination strategies have to be exploited. Complementation of cross-modal information was demonstrated in a recent study investigating visual-haptic shape perception (Newell, Ernst, Tjan, & Bulthoff, Psychological Science, 2001). | |||
| Integrating sketch and speech inputs using spatial information | | BIBAK | Full-Text | 2-9 | |
| Bee-Wah Lee; Alvin W. Yeo | |||
| Since the development of multimodal spatial query, the integration technique
in determining the correct pair of multimodal inputs remains a problem in
multimodal fusion. Although there exist integration techniques that have been
proposed to resolve this problem, they are limited to the interaction with
predefined speech and sketch commands. Furthermore, they are only designed to
resolve the spatial query with single speech input and single sketch input.
Therefore, when it comes to the introduction of multiple speech and sketch
inputs in a single query, all the existing integration techniques are unable to
resolve it. To date, no integration technique has been found that can resolve
the Multiple Sentences and Sketch Objects Spatial Query. In this paper, the
limitations of the existing integration techniques are discussed. A new
integration technique in resolving this problem is described and compared with
the widely used integration technique, Unification-based Integration Technique. Keywords: multimodal interaction, multimodal spatial query, multimodal spatial scene
description, multiple sentences and sketch objects spatial query, spatial query | |||
| Distributed pointing for multimodal collaboration over sketched diagrams | | BIBAK | Full-Text | 10-17 | |
| Paulo Barthelmess; Ed Kaiser; Xiao Huang; David Demirdjian | |||
| A problem faced by groups that are not co-located but need to collaborate on
a common task is the reduced access to the rich multimodal communicative
context that they would have access to if they were collaborating face-to-face.
Collaboration support tools aim to reduce the adverse effects of this
restricted access to the fluid intermixing of speech, gesturing, writing and
sketching by providing mechanisms to enhance the awareness of distributed
participants of each others' actions.
In this work we explore novel ways to leverage the capabilities of multimodal context-aware systems to bridge co-located and distributed collaboration contexts. We describe a system that allows participants at remote sites to collaborate in building a project schedule via sketching on multiple distributed whiteboards, and show how participants can be made aware of naturally occurring pointing gestures that reference diagram constituents as they are performed by remote participants. The system explores the multimodal fusion of pen, speech and 3D gestures, coupled to the dynamic construction of a semantic representation of the interaction, anchored on the sketched diagram, to provide feedback that overcomes some of the intrinsic ambiguities of pointing gestures. Keywords: collaborative interaction, gesture, intelligent interfaces, multimodal
processing | |||
| Contextual recognition of head gestures | | BIBAK | Full-Text | 18-24 | |
| Louis-Philippe Morency; Candace Sidner; Christopher Lee; Trevor Darrell | |||
| Head pose and gesture offer several key conversational grounding cues and
are used extensively in face-to-face interaction among people. We investigate
how dialog context from an embodied conversational agent (ECA) can improve
visual recognition of user gestures. We present a recognition framework which
(1) extracts contextual features from an ECA's dialog manager, (2) computes a
prediction of head nod and head shakes, and (3) integrates the contextual
predictions with the visual observation of a vision-based head gesture
recognizer. We found a subset of lexical, punctuation and timing features that
are easily available in most ECA architectures and can be used to learn how to
predict user feedback. Using a discriminative approach to contextual prediction
and multi-modal integration, we were able to improve the performance of head
gesture detection even when the topic of the test set was significantly
different than the training set. Keywords: context-based recognition, dialog context, embodied conversational agent,
head gestures, human-computer interaction | |||
| Combining environmental cues & head gestures to interact with wearable devices | | BIBAK | Full-Text | 25-31 | |
| M. Hanheide; C. Bauckhage; G. Sagerer | |||
| As wearable sensors and computing hardware are becoming a reality, new and
unorthodox approaches to seamless human-computer interaction can be explored.
This paper presents the prototype of a wearable, head-mounted device for
advanced human-machine interaction that integrates speech recognition and
computer vision with head gesture analysis based on inertial sensor data. We
will focus on the innovative idea of integrating visual and inertial data
processing for interaction. Fusing head gestures with results from visual
analysis of the environment provides rich vocabularies for human-machine
communication because it renders the environment into an interface: if objects
or items in the surroundings are being associated with system activities, head
gestures can trigger commands if the corresponding object is being looked at.
We will explain the algorithmic approaches applied in our prototype and present
experiments that highlight its potential for assistive technology. Apart from
pointing out a new direction for seamless interaction in general, our approach
provides a new and easy to use interface for disabled and paralyzed users in
particular. Keywords: inertial and visual sensors, seamless interaction, wearable intelligent
interfaces | |||
| Automatic detection of interaction groups | | BIBAK | Full-Text | 32-36 | |
| Oliver Brdiczka; Jérôme Maisonnasse; Patrick Reignier | |||
| This paper addresses the problem of detecting interaction groups in an
intelligent environment. To understand human activity, we need to identify
human actors and their interpersonal links. An interaction group can be seen as
basic entity, within which individuals collaborate in order to achieve a common
goal. In this regard, the dynamic change of interaction group configuration,
i.e. the split and merge of interaction groups, can be seen as indicator of new
activities. Our approach takes speech activity detection of individuals forming
interaction groups as input. A classical HMM-based approach learning different
HMM for the different group configurations did not produce promising results.
We propose an approach for detecting interaction group configurations based on
the assumption that conversational turn taking is synchronized inside groups.
The proposed detector is based on one HMM constructed upon conversational
hypotheses. The approach shows good results and thus confirms our
conversational hypotheses. Keywords: clustering interaction groups, conversational analysis, hidden markov model,
intelligent environment, speech detection, ubiquitous computing | |||
| Meeting room configuration and multiple camera calibration in meeting analysis | | BIBAK | Full-Text | 37-44 | |
| Yingen Xiong; Francis Quek | |||
| In video based cross-model analysis of planning meeting, the meeting events
are recorded by multiple cameras distributed in the entire meeting room.
Subject's hand gestures, hand motion, head orientations, gaze targets, body
poses are very important for the meeting event analysis. In order to register
everything to the same global coordinate system, build 3D model, get 3D data
from the video, we need to create a proper meeting room configuration and
calibrate all cameras to obtain their intrinsic and extrinsic parameters.
However, the calibration of multiple cameras distributed in the entire meeting
room area is a challenging task because it is impossible to let all cameras in
the meeting room see a reference object at the same time and wide field-of-view
cameras suffer under radial distortion. In this paper, we propose a simple
approach to create a good meeting room configuration and calibrate multiple
cameras in the meeting room. The proposed approach includes several steps.
First, we create stereo camera pairs according to the room configuration and
the requirements of the targets, the participants of the meeting. Second, we
apply Tsai's algorithm to calibrate each stereo camera pair and obtain the
parameters in its own local coordinate system. Third, we use Vicon motion
capture data to transfer all local coordinate systems of stereo camera pairs
into a global coordinate system in the meeting room. We can obtain the
positions, orientations, and parameters for all cameras in the same global
coordinate system, so that we can register everything into this global
coordinate system. Next, we do calibration error analysis for the current
camera and meeting room configuration. We can obtain error distribution in the
entire meeting room area. Finally, we improve the current camera and meeting
room configuration according to the error distribution. By repeating these
steps, we can obtain a good meeting room configuration and parameters of all
cameras for this room configuration. Keywords: camera calibration, error analysis, meeting analysis, meeting room
configuration | |||
| A multimodal perceptual user interface for video-surveillance environments | | BIBAK | Full-Text | 45-52 | |
| Giancarlo Iannizzotto; Carlo Costanzo; Francesco La Rosa; Pietro Lanzafame | |||
| In this paper a perceptual user interface (PUI) for video-surveillance
environments is introduced. This system provides a tool for a
video-surveillance control-room, and exploits a novel multimodal user
interaction paradigm based on hand gesture and perceptual user interfaces. The
proposed system, being simple and intuitive, is expected to be useful in the
control of large and dynamic environments. To illustrate our work, we introduce
a proof-of concept multimodal, bare-hand gesture-based application and discuss
its implementation and the obtained experimental results. Keywords: computer vision, human-computer interaction, video surveillance | |||
| Inferring body pose using speech content | | BIBAK | Full-Text | 53-60 | |
| Sy Bor Wang; David Demirdjian | |||
| Untethered multimodal interfaces are more attractive than tethered ones
because they are more natural and expressive for interaction. Such interfaces
usually require robust vision-based body pose estimation and gesture
recognition. In interfaces where a user is interacting with a computer using
speech and arm gestures, the user's spoken keywords can be recognized in
conjunction with a hypothesis of body poses. This co-occurrence can reduce the
number of body pose hypothesis for the vision based tracker. In this paper we
show that incorporating speech-based body pose constraints can increase the
robustness and accuracy of vision-based tracking systems.
Next, we describe an approach for gesture recognition. We show how Linear Discriminant Analysis (LDA), can be employed to estimate 'good features' that can be used in a standard HMM-based gesture recognition system. We show that, by applying our LDA scheme, recognition errors can be significantly reduced over a standard HMM-based technique. We applied both techniques in a Virtual Home Desktop scenario. Experiments where the users controlled a desktop system using gestures and speech were conducted and the results show that the speech recognised in conjunction with body poses has increased the accuracy of the vision-based tracking system. Keywords: arm gesture recognition, audio-visual tracking, untethered body pose
tracking | |||
| A joint particle filter for audio-visual speaker tracking | | BIBAK | Full-Text | 61-68 | |
| Kai Nickel; Tobias Gehrig; Rainer Stiefelhagen; John McDonough | |||
| In this paper, we present a novel approach for tracking a lecturer during
the course of his speech. We use features from multiple cameras and
microphones, and process them in a joint particle filter framework. The filter
performs sampled projections of 3D location hypotheses and scores them using
features from both audio and video. On the video side, the features are based
on foreground segmentation, multi-view face detection and upper body detection.
On the audio side, the time delays of arrival between pairs of microphones are
estimated with a generalized cross correlation function. Computationally
expensive features are evaluated only at the particles' projected positions in
the respective camera images, thus the complexity of the proposed algorithm is
low. We evaluated the system on data that was recorded during actual lectures.
The results of our experiments were 36 cm average error for video only
tracking, 46 cm for audio only, and 31 cm for the combined audio-video system. Keywords: multimodal systems, particle filters, speaker tracking | |||
| The connector: facilitating context-aware communication | | BIBAK | Full-Text | 69-75 | |
| M. Danninger; G. Flaherty; K. Bernardin; H. K. Ekenel; T. Köhler; R. Malkin; R. Stiefelhagen; A. Waibel | |||
| We present the Connector, a context-aware service that intelligently
connects people. It maintains an awareness of its users' activities,
preoccupations and social relationships to mediate a proper connection at the
right time between them. In addition to providing users with important
contextual cues about the availability of potential callees, the Connector
adapts the behavior of the contactee's device automatically in order to avoid
inappropriate interruptions.
To acquire relevant context information, perceptual components analyze sensor input obtained from a smart mobile phone and -- if available -- from a variety of audio-visual sensors built into a smart meeting room environment. The Connector also uses any available multimodal interface (e.g. a speech interface to the smart phone, steerable camera-projector, targeted loudspeakers) in the smart meeting room, to deliver information to users in the most unobtrusive way possible. Keywords: context-aware communication, multimodal interfaces | |||
| A user interface framework for multimodal VR interactions | | BIBAK | Full-Text | 76-83 | |
| Marc Erich Latoschik | |||
| This article presents a User Interface (UI) framework for multimodal
interactions targeted at immersive virtual environments. Its configurable input
and gesture processing components provide an advanced behavior graph capable of
routing continuous data streams asynchronously. The framework introduces a
Knowledge Representation Layer which augments objects of the simulated
environment with Semantic Entities as a central object model that bridges and
interfaces Virtual Reality (VR) and Artificial Intelligence (AI)
representations. Specialized node types use these facilities to implement
required processing tasks like gesture detection, preprocessing of the visual
scene for multimodal integration, or translation of movements into multimodally
initialized gestural interactions. A modified Augmented Transition Nettwork
(ATN) approach accesses the knowledge layer as well as the preprocessing
components to integrate linguistic, gestural, and context information in
parallel. The overall framework emphasizes extensibility, adaptivity and
reusability, e.g., by utilizing persistent and interchangeable XML-based
formats to describe its processing stages. Keywords: gesture and speech processing, multimodal interaction, semantic scene
description, user interface framework, virtual reality | |||
| Multimodal output specification / simulation platform | | BIBAK | Full-Text | 84-91 | |
| Cyril Rousseau; Yacine Bellik; Frédéric Vernier | |||
| The design of an output multimodal system is a complex task due to the
richness of today interaction contexts. The diversity of environments, systems
and user profiles requires a new generation of software tools to specify
complete and valid output interactions. In this paper, we present a multimodal
output specification and simulation platform. After introducing the design
process which inspired this platform, we describe the two main platform's tools
which respectively allow the outputs specification and the outputs simulation
of a multimodal system. Finally, an application of the platform is illustrated
through the outputs design on a mobile phone application. Keywords: human-computer interaction, output multimodality, outputs simulation,
outputs specification | |||
| Migratory MultiModal interfaces in MultiDevice environments | | BIBAK | Full-Text | 92-99 | |
| Silvia Berti; Fabio Paternò | |||
| This paper describes an environment able to support migratory multimodal
interfaces in multidevice environments. We introduce the software architecture
and the device-independent languages used by our tool, which provides services
enabling users to freely move about, change device and continue the current
task from the point where they left off in the previous device. Our environment
currently supports interaction with applications through graphical and vocal
modalities, either separately or together. Such applications are implemented in
Web-based languages. We discuss how the features of the device at hand, desktop
or mobile, are considered when generating the multimodal user interface. Keywords: MultiModal user interfaces, migratory interfaces, model-based design,
multi-device environments, ubiquitous systems | |||
| Exploring multimodality in the laboratory and the field | | BIBAK | Full-Text | 100-107 | |
| Lynne Baillie; Raimund Schatz | |||
| There are new challenges to us, as researchers, on how to design and
evaluate new mobile applications because they give users access to powerful
computing devices through small interfaces, which typically have limited input
facilities. One way of overcoming these shortcomings is to utilize the
possibilities of multimodality. We report in this paper how we designed,
developed, and evaluated a multimodal mobile application through a combination
of laboratory and field studies. This is the first time, as far as we know,
that a multimodal application has been developed in such a way. We did this so
that we would understand more about where and when users envisioned using
different modes of interaction and what problems they may encounter when using
an application in context. Keywords: action scenarios, mobile applications and devices, multimodal interaction | |||
| Understanding the effect of life-like interface agents through users' eye movements | | BIBAK | Full-Text | 108-115 | |
| Helmut Prendinger; Chunling Ma; Jin Yingzi; Arturo Nakasone; Mitsuru Ishizuka | |||
| We motivate an approach to evaluating the utility of life-like interface
agents that is based on human eye movements rather than questionnaires. An eye
tracker is employed to obtain quantitative evidence of a user's focus of
attention. The salient feature of our evaluation strategy is that it allows us
to measure important properties of a user's interaction experience on a
moment-by-moment basis in addition to a cumulative (spatial) analysis of the
user's areas of interest. We describe an empirical study in which we compare
attending behavior of subjects watching the presentation of an apartment by
three types of media: an animated agent, a text box, and speech only. The
investigation of users' eye movements reveals that agent behavior may trigger
natural and social interaction behavior of human users. Keywords: animated interface agents, eye tracking, user study, web-based presentation | |||
| Analyzing and predicting focus of attention in remote collaborative tasks | | BIBAK | Full-Text | 116-123 | |
| Jiazhi Ou; Lui Min Oh; Susan R. Fussell; Tal Blum; Jie Yang | |||
| To overcome the limitations of current technologies for remote
collaboration, we propose a system that changes a video feed based on task
properties, people's actions, and message properties. First, we examined how
participants manage different visual resources in a laboratory experiment using
a collaborative task in which one partner (the helper) instructs another (the
worker) how to assemble online puzzles. We analyzed helpers' eye gaze as a
function of the aforementioned parameters. Helpers gazed at the set of
alternative pieces more frequently when it was harder for workers to
differentiate these pieces, and less frequently over repeated trials. The
results further suggest that a helper's desired focus of attention can be
predicted based on task properties, his/her partner's actions, and message
properties. We propose a conditional Markov model classifier to explore the
feasibility of predicting gaze based on these properties. The accuracy of the
model ranged from 65.40% for puzzles with easy-to-name pieces to 74.25% for
puzzles with more difficult to name pieces. The results suggest that we can use
our model to automatically manipulate video feeds to show what helpers want to
see when they want to see it. Keywords: computer-supported cooperative work, eye tracking, focus of attention,
keyword spotting, remote collaborative tasks | |||
| Gaze-based selection of standard-size menu items | | BIBAK | Full-Text | 124-128 | |
| Oleg Spakov; Darius Miniotas | |||
| With recent advances in eye tracking technology, eye gaze gradually gains
acceptance as a pointing modality. Its relatively low accuracy, however,
determines the need to use enlarged controls in eye-based interfaces rendering
their design rather peculiar. Another factor impairing pointing performance is
deficient robustness of an eye tracker's calibration. To facilitate pointing at
standard-size menus, we developed a technique that uses dynamic target
expansion for on-line correction of the eye tracker's calibration. Correction
is based on the relative change in the gaze point location upon the expansion.
A user study suggests that the technique affords a dramatic six-fold
improvement in selection accuracy. This is traded off against a much smaller
reduction in performance speed (39%). The technique is thus believed to
contribute to development of universal-access solutions supporting navigation
through standard menus by eye gaze alone. Keywords: eye tracking, eye-based interaction, human performance, menus, pointing,
target expansion | |||
| Region extraction of a gaze object using the gaze point and view image sequences | | BIBAK | Full-Text | 129-136 | |
| Norimichi Ukita; Tomohisa Ono; Masatsugu Kidode | |||
| Analysis of the human gaze is a basic way to investigate human attention.
Similarly, the view image of a human being includes the visual information of
what he/she pays attention to.
This paper proposes an interface system for extracting the region of an object viewed by a human from a view image sequence by analyzing the history of gaze points. All the gaze points, each of which is recorded as a 2D point in a view image, are transfered to an image in which the object region is extracted. These points are then divided into several groups based on their colors and positions. The gaze points in each group compose an initial region. After all the regions are extended, outlier regions are removed by comparing the colors and optical flows in the extended regions. All the remaining regions are merged into one in order to compose a gaze region. Keywords: gaze object, gaze points, region extraction, view image sequence | |||
| Interactive humanoids and androids as ideal interfaces for humans | | BIBA | Full-Text | 137 | |
| Hiroshi Ishiguro | |||
| Many robotics researchers are exploring new possibilities of intelligent
robots in our everyday life. Humanoid and androids, which have various
modalities, can communicate with humans as new information media. In this talk,
we argue how to develop the interactive robots and how to evaluate them as
introducing several robots developed in ATR Intelligent Robotics and
Communications Laboratories and Department of Adaptive Machine Systems, Osaka
University. Especially, we focus on a constructive approach to developing the
interactive robots, cognitive studies using the humanoids and androids for
evaluating the interactions, and long-term field experiments in an elementary
school.
The talk consists of two parts. There are two relationships between robots and humans: one is inter-personal and the other is social. In the inter-personal relationships, the appearance of the robot is a new and important research issues. In the social relationships, a function to recognize human relationships through interactions is needed for robots of the next generation. These two issues explore new possibilities of robots. In these issues, the appearance problem bridges between science and engineering. In the development of humanoids, both the appearance and behavior of the robots are significant issues. However, designing the robot's appearance, especially to give it a humanoid one, was always a role of the industrial designer. To tackle the problem of appearance and behavior, two approaches are necessary: one from robotics and the other from cognitive science. The approach from robotics tries to build very human-like robots based on knowledge from cognitive science. The approach from cognitive science uses the robot for verifying hypotheses for understanding humans. We call this cross-interdisciplinary framework android science (www.androidscience.com). The speaker hopes that attendees catch new waves in robotics and media research and our future life. | |||
| Probabilistic grounding of situated speech using plan recognition and reference resolution | | BIBAK | Full-Text | 138-143 | |
| Peter Gorniak; Deb Roy | |||
| Situated, spontaneous speech may be ambiguous along acoustic, lexical,
grammatical and semantic dimensions. To understand such a seemingly difficult
signal, we propose to model the ambiguity inherent in acoustic signals and in
lexical and grammatical choices using compact, probabilistic representations of
multiple hypotheses. To resolve semantic ambiguities we propose a situation
model that captures aspects of the physical context of an utterance as well as
the speaker's intentions, in our case represented by recognized plans. In a
single, coherent Framework for Understanding Situated Speech (FUSS) we show how
these two influences, acting on an ambiguous representation of the speech
signal, complement each other to disambiguate form and content of situated
speech. This method produces promising results in a game playing environment
and leaves room for other types of situation models. Keywords: grounding, language, plan recognition, situated, speech, understanding | |||
| Augmenting conversational dialogue by means of latent semantic googling | | BIBAK | Full-Text | 144-150 | |
| Robin Senior; Roel Vertegaal | |||
| This paper presents Latent Semantic Googling, a variant of Landauer's Latent
Semantic Indexing that uses the Google search engine to judge the semantic
closeness of sets of words and phrases. This concept is implemented via Ambient
Google, a system for augmenting conversations through the classification of
discussed topics. Ambient Google uses a speech recognition engine to generate
Google keyphrase queries directly from conversations. These queries are used to
analyze the semantics of the conversation, and infer related topics that have
been discussed. Conversations are visualized using a spring-model algorithm
representing common topics. This allows users to browse their conversation as a
contextual relationship between discussed topics, and augment their discussion
through the use of related websites discovered by Google. An evaluation of
Ambient Google is presented, discussing user reaction to the system. Keywords: augmented intelligence, context, latent semantic indexing, speech | |||
| Human-style interaction with a robot for cooperative learning of scene objects | | BIBA | Full-Text | 151-158 | |
| Shuyin Li; Axel Haasch; Britta Wrede; Jannik Fritsch; Gerhard Sagerer | |||
| In research on human-robot interaction the interest is currently shifting from uni-modal dialog systems to multi-modal interaction schemes. We present a system for human-style interaction with a robot that is integrated on our mobile robot BIRON. To model the dialog we adopt an extended grounding concept with a mechanism to handle multi-modal in- and output where object references are resolved by the interaction with an object attention system (OAS). The OAS integrates multiple input from, e.g., the object and gesture recognition systems and provides the information for a common representation. This representation can be accessed by both modules and combines symbolic verbal attributes with sensor-based features. We argue that such a representation is necessary to achieve a robust and efficient information processing. | |||
| A look under the hood: design and development of the first SmartWeb system demonstrator | | BIBAK | Full-Text | 159-166 | |
| Norbert Reithinger; Simon Bergweiler; Ralf Engel; Gerd Herzog; Norbert Pfleger; Massimo Romanelli; Daniel Sonntag | |||
| Experience shows that decisions in the early phases of the development of a
multimodal system prevail throughout the life-cycle of a project. The
distributed architecture and the requirement for robust multimodal interaction
in our project SmartWeb resulted in an approach that uses and extends W3C
standards like EMMA and RDFS. These standards for the interface structure and
content allowed us to integrate available tools and techniques. However, the
requirements in our system called for various extensions, e.g., to introduce
result feedback tags for an extended version of EMMA. The interconnection
framework depends on a commercial telephone voice dialog system platform for
the dialog-centric components while the information access processes are linked
using web service technology. Also in the area of this underlying
infrastructure, enhancements and extensions were necessary. The first
demonstration system is operable now and will be presented at the Football
World Cup 2006 in Germany. Keywords: interaction design, multimodality, semantic web | |||
| Audio-visual cues distinguishing self- from system-directed speech in younger and older adults | | BIBAK | Full-Text | 167-174 | |
| Rebecca Lunsford; Sharon Oviatt; Rachel Coulston | |||
| In spite of interest in developing robust open-microphone engagement
techniques for mobile use and natural field contexts, there currently are no
reliable techniques available. One problem is the lack of empirically-grounded
models as guidance for distinguishing how users' audio-visual activity actually
differs systematically when addressing a computer versus human partner. In
particular, existing techniques have not been designed to handle high levels of
user self talk as a source of "noise," and they typically assume that a user is
addressing the system only when facing it while speaking. In the present
research, data were collected during two related studies in which adults aged
18-89 interacted multimodally using speech and pen with a simulated map system.
Results revealed that people engaged in self talk prior to addressing the
system over 30% of the time, with no decrease in younger adults' rate of self
talk compared with elders. Speakers' amplitude was lower during 96% of their
self talk, with a substantial 26 dBr amplitude separation observed between
self- and system-directed speech. The magnitude of speaker's amplitude
separation ranged from approximately 10-60 dBr and diminished with age, with
79% of the variance predictable simply by knowing a person's age. In contrast
to the clear differentiation of intended addressee revealed by amplitude
separation, gaze at the system was not a reliable indicator of speech directed
to the system, with users looking at the system over 98% of the time during
both self- and system-directed speech. Results of this research have
implications for the design of more effective open-microphone engagement for
mobile and pervasive systems. Keywords: gaze, individual differences, intended addressee, multimodal interaction,
open-microphone engagement, spoken amplitude, system adaptation, universal
access, user modeling | |||
| Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features | | BIBAK | Full-Text | 175-182 | |
| Koen van Turnhout; Jacques Terken; Ilse Bakx; Berry Eggen | |||
| Against the background of developments in the area of speech-based and
multimodal interfaces, we present research on determining the addressee of an
utterance in the context of mixed human-human and multimodal human-computer
interaction. Working with data that are taken from realistic scenarios, we
explore several features with respect to their relevance to the question who is
the addressee of an utterance: eye gaze both of speaker and listener, dialogue
history and utterance length. With respect to eye gaze, we inspect the detailed
timing of shifts in eye gaze between different communication partners (human or
computer). We show that these features result in an improved classification of
utterances in terms of addressee-hood relative to a simple classification
algorithm that assumes that "the addressee is where the eye is", and compare
our results to alternative approaches. Keywords: eye gaze, multi party interaction, perceptive user interfaces | |||
| Multimodal multispeaker probabilistic tracking in meetings | | BIBAK | Full-Text | 183-190 | |
| Daniel Gatica-Perez; Guillaume Lathoud; Jean-Marc Odobez; Iain McCowan | |||
| Tracking speakers in multiparty conversations constitutes a fundamental task
for automatic meeting analysis. In this paper, we present a probabilistic
approach to jointly track the location and speaking activity of multiple
speakers in a multisensor meeting room, equipped with a small microphone array
and multiple uncalibrated cameras. Our framework is based on a mixed-state
dynamic graphical model defined on a multiperson state-space, which includes
the explicit definition of a proximity-based interaction model. The model
integrates audio-visual (AV) data through a novel observation model. Audio
observations are derived from a source localization algorithm. Visual
observations are based on models of the shape and spatial structure of human
heads. Approximate inference in our model, needed given its complexity, is
performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which
results in high sampling efficiency. We present results -based on an objective
evaluation procedure-that show that our framework (1) is capable of locating
and tracking the position and speaking activity of multiple meeting
participants engaged in real conversations with good accuracy; (2) can deal
with cases of visual clutter and partial occlusion; and (3) significantly
outperforms a traditional sampling-based approach. Keywords: MCMC, audio-visual speaker tracking, particle filters | |||
| A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances | | BIBAK | Full-Text | 191-198 | |
| Kazuhiro Otsuka; Yoshinao Takemae; Junji Yamato | |||
| A novel probabilistic framework is proposed for inferring the structure of
conversation in face-to-face multiparty communication, based on gaze patterns,
head directions and the presence/absence of utterances. As the structure of
conversation, this study focuses on the combination of participants and their
participation roles. First, we assess the gaze patterns that frequently appear
in conversations, and define typical types of conversation structure, called
conversational regime, and hypothesize that the regime represents the
high-level process that governs how people interact during conversations. Next,
assuming that the regime changes over time exhibit Markov properties, we
propose a probabilistic conversation model based on Markov-switching; the
regime controls the dynamics of utterances and gaze patterns, which
stochastically yield measurable head-direction changes. Furthermore, a Gibbs
sampler is used to realize the Bayesian estimation of regime, gaze pattern, and
model parameters from observed head directions and utterances. Experiments on
four-person conversations confirm the effectiveness of the framework in
identifying conversation structures. Keywords: Gibbs sampler, Markov chain Monte Carlo, Markov-switching model, dynamic
Bayesian network, eye gaze, face-to-face multiparty conversation, nonverbal
cues | |||
| Socially aware computation and communication | | BIBAK | Full-Text | 199 | |
| Alex (Sandy) Pentland | |||
| By building machines that understand social signaling and social context, we
can dramatically improve collective decision making and help keep remote users
'in the loop.' I will describe three systems that have a substantial
understanding of social context, and use this understanding to improve human
group performance. The first system is able to interpret social displays of
interest and attraction, and uses this information to improve conferences and
meetings. The second is able to infer friendship, acquaintance, and workgroup
relationships, and uses this to help people build social capital. The third is
able to examine human interactions and categorize participants attitudes
(attentive, agreeable, determined, interested, etc), and uses this information
to proactively promote group cohesion and to match participants on the basis of
their compatiblity. Keywords: affect, non-linguistic communication, social signals | |||
| Synthetic characters as multichannel interfaces | | BIBAK | Full-Text | 200-207 | |
| Elena Not; Koray Balci; Fabio Pianesi; Massimo Zancanaro | |||
| Synthetic characters are an effective modality to convey messages to the
user, provide visual feedback about the system internal understanding of the
communication, and engage the user in the dialogue through emotional
involvement. In this paper we argue for a fine-grain distinction of the
expressive capabilities of synthetic agents: avatars should not be considered
as an indivisible modality but as the synergic contribution of different
communication channels that, properly synchronized, generate an overall
communication performance. In this view, we propose SMIL-AGENT as a
representation and scripting language for synthetic characters, which abstracts
away from the specific implementation and context of use of the character.
SMIL-AGENT has been defined starting from SMIL 0.1 standard specification and
aims at providing a high-level standardized language for presentations by
different synthetic agents within diverse communication and application
contexts. Keywords: SMIL, multimodal presentations, synthetic characters | |||
| XfaceEd: authoring tool for embodied conversational agents | | BIBAK | Full-Text | 208-213 | |
| Koray Balci | |||
| In this paper, XfaceEd, our open source, platform independent tool for
authoring 3D embodied conversational agents (ECAs) is presented. Following
MPEG-4 Facial Animation (FA) standard, XfaceEd provides an easy to use
interface to generate MPEG-4 ready ECAs from static 3D models. Users can set
MPEG-4 Facial Definition Points (FDP) and Facial Animation Parameter Units
(FAPU), define the zone of influence of each feature point and how this
influence is propagated among the neighboring vertices. As an alternative to
MPEG-4, one can also specify morph targets for different categories such as
visemes, emotions and expressions, in order to achieve facial animation using
the keyframe interpolation technique. Morph targets from different categories
are blended to create more lifelike behaviour.
Results can be previewed and parameters can be tweaked real time within the application for fine tuning. Changes made take into effect immediately, which in turn ensures rapid production. The final output is a configuration file XML format and can be interpreted by XfacePlayer or other applications for easy authoring of embodied conversational agents for multimodal environments. Keywords: 3D facial animation, MPEG-4, embodied conversational agents, open source,
talking heads | |||
| A first evaluation study of a database of kinetic facial expressions (DaFEx) | | BIBAK | Full-Text | 214-221 | |
| Alberto Battocchi; Fabio Pianesi; Dina Goren-Bar | |||
| In this paper we present DaFEx (Database of Facial Expressions), a database
created with the purpose of providing a benchmark for the evaluation of the
facial expressivity of Embodied Conversational Agents (ECAs). DaFEx consists of
1008 short videos containing emotional facial expressions of the 6 Ekman's
emotions plus the neutral expression. The facial expressions were recorded by 8
professional actors (male and female) in two acting conditions ("utterance" and
"no-utterance") and at 3 intensity levels (high, medium, low). The properties
of DaFEx were studied by having 80 subjects classify the emotion expressed in
the videos. High rates of accuracy were obtained for most of the emotions
displayed. We also tested the effect of the intensity level, of the
articulatory movements due to speech, and of the actors' and subjects' gender,
on classification accuracy. The results showed that decoding accuracy decreases
with the intensity of emotions; that the presence of articulatory movements
negatively affects the recognition of fear, surprise and of the neutral
expression, while it improves the recognition of anger; and that facial
expressions seem to be recognized (slightly) better when acted by actresses
than by actors. Keywords: databases, emotion recognition, expressiveness, quality of facial displays,
user study | |||
| Hapticat: exploration of affective touch | | BIBAK | Full-Text | 222-229 | |
| Steve Yohanan; Mavis Chan; Jeremy Hopkins; Haibo Sun; Karon MacLean | |||
| This paper describes the Hapticat, a device we developed to study affect
through touch. Though intentionally not highly zoomorphic, the device borrows
behaviors from pets and the rich manner in which they haptically communicate
with humans. The Hapticat has four degrees of freedom to express itself: a pair
of ear-like appendages, a breathing mechanism, a purring mechanism, and a
warming element. Combinations of levels for these controls are used to define
the five active haptic responses: playing dead, asleep, content, happy, and
upset. In the paper we present the design considerations and implementation
details of the device. We also detail a preliminary observational study where
participants interacted with the Hapticat through touch. To compare the effects
of haptic feedback, the device presented either active haptic renderings or
none at all. Participants reported which of the five responses they believed
the Hapticat rendered, as well as their degree of affect to the device. We
observed that participants' expectations of the device's response to various
haptic stimuli correlated with our mappings. We also observed that participants
were able to reasonably recognize three of the five response renderings, while
having difficulty discriminating between happy and content states. Finally, we
found that participants registered a broader range of affect when active haptic
renderings were applied as compared to when none were presented. Keywords: affect, affective computing, affective touch, emotion, haptics, robot pets,
socially interactive robots | |||
| Using observations of real designers at work to inform the development of a novel haptic modeling system | | BIBAK | Full-Text | 230-235 | |
| Umberto Giraudo; Monica Bordegoni | |||
| Gestures, besides speech, represent the mostly used means of expression by
humans. For what regards the product design field, designers have multiple ways
for communicating their ideas and concepts. One of them concerns the model
making activity, where designers make explicit their concepts by using some
appropriate tools and specific hand movements on plastic material with the
intent of obtaining a shape. Some studies have demonstrated that visual,
tactile and kinesthetic feedbacks are equally important in the shape creation
and evaluation process [1]. The European project "Touch and Design" (T'nD)
(www.kaemart.it/touch-and-design) proposes the implementation of an innovative
virtual clay modeling system based on novel haptic interaction modality
oriented to industrial designers. In order to develop an intuitive and
easy-to-use system, a study of designers' hand modeling activities has been
carried out by the project industrial partners supported by cognitive
psychologists. The users' manual operators and tools have been translated into
corresponding haptic tools and multimodal interaction modalities in the virtual
free-form shape modeling system. The paper presents the project research
activities and the results achieved so far. Keywords: haptic interaction, haptic modeling, virtual prototyping | |||
| A comparison of two methods of scaling on form perception via a haptic interface | | BIBAK | Full-Text | 236-243 | |
| Mounia Ziat; Olivier Gapenne; John Stewart; Charles Lenay | |||
| In this fundamental study, we compare two scaling methods by focusing on the
subjects' strategies which are using a sensory substitution device. Method 1
consists in a reduction of the sensor size and its displacement speed. Here,
speed reduction is obtained by a "human" movement reduction (hand speed
reduction). Method 2 consists in a classical increase of the image dimension.
The experimental device couples the pen on a graphics tablet with tactile
sensory stimulators. These latter are activated when the sensor crosses the
figure on the computer screen. This virtual sensor (square matrix composed of
16 elementary fields) is displaced when the pen, guided by a human hand
displacements, moves on the graphics tablet. Even if it seems that there is no
difference between the two methods, the results show that the recognition rate
is closely dependent on the figure size and the strategies used by the subjects
are more suitable for method 2 than the method 1. In fact, half of the subjects
found that method 1 inhibits their movements and the majority of them don't
feel the scaling effect, whereas this is clearly felt in method 2. Keywords: PDA (personal digital assistant), ZUI (zoomable user interfaces), haptic
perception, sensory substitution | |||
| An initial usability assessment for symbolic haptic rendering of music parameters | | BIBAK | Full-Text | 244-251 | |
| Meghan Allen; Jennifer Gluck; Karon MacLean; Erwin Tang | |||
| Current methods of playlist creation and maintenance do not support user
needs, especially in a mobile context. Furthermore, they do not scale: studies
show that users with large mp3 collections have abandoned the concept of
playlists. To remedy the usability problems associated with playlist creation
and navigation -- in particular, reliance on visual feedback and the absence of
rapid content scanning mechanisms -- we propose a system that utilizes the
haptic channel. A necessary first step in this objective is the creation of a
haptic mapping for music. In this paper, we describe an exploratory study
addressed at understanding the feasibility, with respect to learnability and
usability, of efficient, eyes-free playlist navigation based on symbolic haptic
renderings of key song parameters. Users were able to learn haptic mappings for
music parameters to usable accuracy with 4 minutes of training. These results
indicate promise for the approach and support for continued effort in both
improving the rendering scheme and implementing the haptic playlist system. Keywords: digital music, force feedback, haptics, mp3, music classification, physical
interfaces, playlist creation, vibrotactile feedback | |||
| Tangible user interfaces for 3D clipping plane interaction with volumetric data: a case study | | BIBAK | Full-Text | 252-258 | |
| Wen Qi; Jean-Bernard Martens | |||
| Visualization via direct volume rendering is a potentially very powerful
technique for exploring and interacting with large amounts of scientific data.
However, the available two-dimensional (2D) interfaces make three-dimensional
(3D) manipulation with such data very difficult. Many usability problems during
interaction in turn discourage the widespread use of volume rendering as a
scientific tool. In this paper, we present a more in-depth investigation into
one specific interface aspect, i.e., the positioning of a clipping plane within
volume-rendered data. More specifically, we propose three different interface
prototypes that have been realized with the help of wireless vision-based
tracking. These three prototypes combine aspects of 2D graphical user
interfaces with 3D tangible interaction devices. They allow to experience and
compare different user interface strategies for performing the clipping plane
interaction task. They also provide a basis for carrying out user evaluations
in the near future. Keywords: intersection, tangible interface, volume visualization, volumetric data | |||
| A transformational approach for multimodal web user interfaces based on UsiXML | | BIBAK | Full-Text | 259-266 | |
| Adrian Stanciulescu; Quentin Limbourg; Jean Vanderdonckt; Benjamin Michotte; Francisco Montero | |||
| A transformational approach for developing multimodal web user interfaces is
presented that progressively moves from a task model and a domain model to a
final user interface. This approach consists of three steps: deriving one or
many abstract user interfaces from a task model and a domain model, deriving
one or many concrete user interfaces from each abstract one, and producing the
code of the corresponding final user interfaces. To ensure these steps,
transformations are encoded as graph transformations performed on the involved
models expressed in their graph equivalent. For each step, a graph grammar
gathers relevant graph transformations for accomplishing the sub-steps. The
final user interface is multimodal as it involves graphical (keyboard, mouse)
and vocal interaction. The approach outlined in the paper is illustrated
throughout a running example for a graphical interface, a vocal interface, and
two multimodal interfaces with graphical and vocal predominances, respectively. Keywords: model-driven development, multimodal interaction, transformational approach,
user interface eXtensible markup language | |||
| A pattern mining method for interpretation of interaction | | BIBAK | Full-Text | 267-273 | |
| Tomoyuki Morita; Yasushi Hirano; Yasuyuki Sumi; Shoji Kajita; Kenji Mase | |||
| This paper proposes a novel mining method for multimodal interactions to
extract important patterns of group activities. These extracted patterns can be
used as machine-readable event indices in developing an interaction corpus
based on a huge collection of human interaction data captured by various
sensors. The event indices can be used, for example, to summarize a set of
events and to search for particular events because they contain various pieces
of context information. The proposed method extracts simultaneously occurring
patterns of primitive events in interaction, such as gaze and speech, that in
combination occur more consistently than randomly. The proposed method provides
a statistically plausible definition of interaction events that is not possible
through intuitive top-down definitions. We demonstrate the effectiveness of our
method for the data captured in an experimental setup of a poster-exhibition
scene. Several interesting patterns are extracted by the method, and we
examined their interpretations. Keywords: activity patterns, behavior mining, interaction corpus, multimodal
interaction patterns | |||
| A study of manual gesture-based selection for the PEMMI multimodal transport management interface | | BIBAK | Full-Text | 274-281 | |
| Fang Chen; Eric Choi; Julien Epps; Serge Lichman; Natalie Ruiz; Yu Shi; Ronnie Taib; Mike Wu | |||
| Operators of traffic control rooms are often required to quickly respond to
critical incidents using a complex array of multiple keyboards, mice, very
large screen monitors and other peripheral equipment. To support the aim of
finding more natural interfaces for this challenging application, this paper
presents PEMMI (Perceptually Effective Multimodal Interface), a transport
management system control prototype taking video-based manual gesture and
speech recognition as inputs. A specific theme within this research is
determining the optimum strategy for gesture input in terms of both
single-point input selection and suitable multimodal feedback for selection. It
has been found that users tend to prefer larger selection areas for targets in
gesture interfaces, and tend to select within 44% of this selection radius. The
minimum effective size for targets when using 'device-free' gesture interfaces
was found to be 80 pixels (on a 1280x1024 screen). This paper also shows that
feedback on gesture input via large screens is enhanced by the use of both
audio and visual cues to guide the user's multimodal input. Audio feedback in
particular was found to improve user response time by an average of 20% over
existing gesture selection strategies for multimodal tasks. Keywords: manual gesture, multimodal fusion, multimodal interaction, multimodal output
generation, speech | |||
| Recognition of sign language subwords based on boosted hidden Markov models | | BIBAK | Full-Text | 282-287 | |
| Liang-Guo Zhang; Xilin Chen; Chunli Wang; Yiqiang Chen; Wen Gao | |||
| Sign language recognition (SLR) plays an important role in human-computer
interaction (HCI), especially for the convenient communication between deaf and
hearing society. How to enhance the traditional hidden Markov models (HMM)
based SLR is an important issue in the SLR community. And how to refine the
boundaries of the classifiers to effectively characterize the property of
spread-out of the training samples is another significant issue. In this paper,
a new classification framework applying adaptive boosting (AdaBoost) strategy
to continuous HMM (CHMM) training procedure at the subwords classification
level for SLR is presented. The ensemble of multiple composite CHMMs for each
subword trained in boosting iterations tends to concentrate more on the
hard-to-classify samples so as to generate more complex decision boundary than
that of the single HMM classifier. Experimental results on the vocabulary of
frequently used Chinese sign language (CSL) subwords show that the proposed
boosted CHMM outperforms the conventional CHMM for SLR. Keywords: AdaBoost, HMM, human-computer interaction, sign language recognition | |||
| Gesture-driven American sign language phraselator | | BIBAK | Full-Text | 288-292 | |
| Jose L. Hernandez-Rebollar | |||
| This paper describes a portable American Sign Language (ASL)-to-English
phraselator. This wearable device is based on an Acceleglove originally
developed for recognizing the hand alphabet, and a two-link arm skeleton that
detects hand location and movement with respect to the body. Therefore, this
phraselator is able to recognize finger-spelled words as well as hand gestures
and translate them into spoken voice through a speech synthesizer. To speed-up
the recognition process, a simple prediction algorithm has been introduced so
the phraselator predicts words based on the current letter being inputted, or
complete sentences based on the current sign being translated. The user selects
the rest of the sentence (or word) by means of a predefined hand gesture for
the phraselator to speak out the sentence in English or Spanish. New words of
phrases are automatically added to the lexicon for future predictions. Keywords: ASL translation, gestural interfaces, gesture recognition | |||
| Interactive vision to detect target objects for helper robots | | BIBAK | Full-Text | 293-300 | |
| Altab Hossain; Rahmadi Kurnia; Akio Nakamura; Yoshinori Kuno | |||
| An effective human-robot interaction is essential for wide penetration of
service robots into the market. Such robots need vision systems to recognize
objects. It is, however, difficult to realize vision systems that can work in
various conditions. More robust techniques of object recognition and image
segmentation are essential. Thus, we have proposed to use the human user's
assistance for object recognition through speech. The robot asks a question to
which the user can easily answer and whose answer can efficiently reduce the
number of candidate objects even if there are occluded objects and/or objects
composed of multicolor parts in the scene. It considers the characteristics of
features used for object recognition such as the easiness for humans to specify
them by word, thus generating a user-friendly and efficient sequence of
questions. Experimental results show that the robot can detect target objects
by asking the questions generated by the method. Keywords: human robot interaction, interactive object recognition, multimodal
interface, object recognition, segmentation | |||
| The contrastive evaluation of unimodal and multimodal interfaces for voice output communication aids | | BIBAK | Full-Text | 301-308 | |
| Melanie Baljko | |||
| For computational Augmentative and Alternative Communication (AAC) aids, it
has often been asserted that multimodal interfaces have benefits over unimodal
ones. Several such benefits have been described informally, but, to date, few
have actually been formalized or quantified. In this paper, some of the special
considerations of this application domain are described. Next, the hypothesized
benefits of semantically nonredundant multimodal input actions over unimodal
input actions are described formally. The notion of information rate, already
well established as a dependent variable in evaluations of AAC devices, is
quantified in this paper, using the formalisms provided by Information Theory
(as opposed to other, idiosyncratic approaches that have been employed
previously). A comparative analysis was performed between interfaces that
afford unimodal input actions and those that afford semantically nonredundant
multimodal input actions. This analysis permitted generalized conclusions,
which have been synthesized with those of another, recently-completed analysis
in which unimodal and semantically redundant multimodal input actions were
compared. A reinterpretation of Keates and Robinson's empirical data (1998)
shows that their criticism of multimodal interfaces for AAC devices, in part,
was unfounded. Keywords: augmentative and alternative communication (AAC), interventions for
communication disorders, multimodal interfaces, speech generating devices
(SGD), voice output communication aids (VOCA) | |||
| Agent-based architecture for implementing multimodal learning environments for visually impaired children | | BIBAK | Full-Text | 309-316 | |
| Rami Saarinen; Janne Järvi; Roope Raisamo; Jouni Salo | |||
| Visually impaired children have a great disadvantage in the modern society
since their ability to use modern computer technology is limited due to
inappropriate user interfaces. The aim of the work presented in this paper was
to develop a multimodal software architecture and applications to support
visually impaired children and to enable them to interact equally with sighted
children in learning situations. The architecture is based on software agents,
and has specific support for visual, auditory and haptic interaction. It has
been used successfully with different groups of 7-8-year-old and 12-year-old
visually impaired children. In this paper we focus on the enabling software
technology and interaction techniques aimed to realize our goal. Keywords: auditory feedback, haptics, inclusion, multimodal software architectures,
navigation, teaching programs, visually impaired children | |||
| Perceiving ordinal data haptically under workload | | BIBAK | Full-Text | 317-324 | |
| Anthony Tang; Peter McLachlan; Karen Lowe; Chalapati Rao Saka; Karon MacLean | |||
| Visual information overload is a threat to the interpretation of displays
presenting large data sets or complex application environments. To combat this
problem, researchers have begun to explore how haptic feedback can be used as
another means for information transmission. In this paper, we show that people
can perceive and accurately process haptically rendered ordinal data while
under cognitive workload. We evaluate three haptic models for rendering ordinal
data with participants who were performing a taxing visual tracking task. The
evaluation demonstrates that information rendered by these models is
perceptually available even when users are visually busy. This preliminary
research has promising implications for haptic augmentation of visual displays
for information visualization. Keywords: 1-DOF, graspable user interface, haptic perception, haptics, information
visualization, multimodal displays, tangible user interface | |||
| Virtual tangible widgets: seamless universal interaction with personal sensing devices | | BIBAK | Full-Text | 325-332 | |
| Eiji Tokunaga; Hiroaki Kimura; Nobuyuki Kobayashi; Tatsuo Nakajima | |||
| Using a single personal device as an universal controller for diverse
services is a promising approach to solving the problem of too many controllers
in ubiquitous multimodal environments. However, the current approaches to
universal controllers cannot provide intuitive control because they are
restricted to traditional mobile user interfaces such as small keys or small
touch panels. We propose Vidgets, which is short for virtual tangible widgets},
as an approach to selecting and controlling ubiquitous services with virtually
implemented tangible user interfaces based on a single sensing personal device
equipped with a digital camera and several physical sensors. We classify the
use of the universal controller into the three stages: (a) searching for a
service, (b) grasping the service and (c) using the service. User studies with
our prototype implementation indicate that the smooth transition and
integration of the three stages improve the overall interaction with our
universal controller. Keywords: handheld augmented reality, interaction techniques, personal mobile devices,
ubiquitous computing, universal controllers, virtual tangible user interface | |||