| Experimental evaluation of vision and speech based multimodal interfaces | | BIBA | Full-Text | 1 | |
| Emilio Schapira; Rajeev Sharma | |||
| Progress in computer vision and speech recognition technologies has recently
enabled multimodal interfaces that use speech and gestures. These technologies
offer promising alternatives to existing interfaces because they emulate the
natural way in which humans communicate. However, no systematic work has been
reported that formally evaluates the new speech/gesture interfaces. This paper
is concerned with formal experimental evaluation of new human-computer
interactions enabled by speech and hand gestures.
The paper describes an experiment conducted with 23 subjects that evaluates selection strategies for interaction with large screen displays. The multimodal interface designed for this experiment does not require the user to be in physical contact with any device. Video cameras and long range microphones are used as input for the system. Three selection strategies are evaluated and results for Different target sizes and positions are reported in terms of accuracy, selection times and user preference. Design implications for vision/speech based interfaces are inferred from these results. This study also raises new question and topics for future research. Note: 9 pages | |||
| Human-robot interface based on the mutual assistance between speech and vision | | BIBA | Full-Text | 2 | |
| Mitsutoshi Yoshizaki; Yoshinori Kuno; Akio Nakamura | |||
| This paper presents a user interface for a service robot that can bring the
objects asked by the user. Speech-based interface is appropriate for this
application. However, it alone is not sufficient. The system needs a
vision-based interface to recognize gestures as well. Moreover, it needs vision
capabilities to obtain the real world information about the objects mentioned
in the user's speech. For example, the robot needs to find the target object
ordered by speech to carry out the task. This can be considered that vision
assists speech. However, vision sometimes fails to detect the objects.
Moreover, there are objects for which vision cannot be expected to work well.
In these cases, the robot tells the current status to the user so that he/she
can give advice by speech to the robot. This can be considered that speech
assists vision through the user. This paper presents how the mutual assistance
between speech and vision works and demonstrates promising results through
experiments. Note: 4 pages | |||
| A visual modality for the augmentation of paper | | BIBA | Full-Text | 3 | |
| David R. McGee; Misha Pavel; Adriana Adami; Guoping Wang; Philip R. Cohen | |||
| In this paper we describe how we have enhanced our multimodal paper-based
system, Rasa, with visual perceptual input. We briefly explain how Rasa
improves upon current decision-support tools by augmenting, rather than
replacing, the paper-based tools that people in command and control centers
have come to rely upon. We note shortcomings in our initial approach, discuss
how we have added computer-vision as another input modality in our multimodal
fusion system, and characterize the advantages that it has to offer. We
conclude by discussing our current limitations and the work we intend to pursue
to overcome them in the future. Note: 7 pages | |||
| Signal level fusion for multimodal perceptual user interface | | BIBA | Full-Text | 4 | |
| John W. Fisher; Trevor Darrell | |||
| Multi-modal fusion is an important, yet challenging task for perceptual user
interfaces. Humans routinely perform complex and simple tasks in which
ambiguous auditory and visual data are combined in order to support accurate
perception. By contrast, automated approaches for processing multi-modal data
sources lag far behind. This is primarily due to the fact that few methods
adequately model the complexity of the audio/visual relationship. We present an
information theoretic approach for fusion of multiple modalities. Furthermore
we discuss a statistical model for which our approach to fusion is justified.
We present empirical results demonstrating audio-video localization and
consistency measurement. We show examples determining where a speaker is within
a scene, and whether they are producing the specified audio stream. Note: 7 pages | |||
| Perceptive user interfaces workshop | | BIB | Full-Text | 5 | |
| Dylan Schmorrow; Jim Patrey | |||
| Sketch based interfaces: early processing for sketch understanding | | BIBA | Full-Text | 6 | |
| Tevfik Metin Sezgin; Thomas Stahovich; Randall Davis | |||
| Freehand sketching is a natural and crucial part of everyday human
interaction, yet is almost totally unsupported by current user interfaces. We
are working to combine the flexibility and ease of use of paper and pencil with
the processing power of a computer, to produce a user interface for design that
feels as natural as paper, yet is considerably smarter. One of the most basic
steps in accomplishing this is converting the original digitized pen strokes in
a sketch into the intended geometric objects. In this paper we describe an
implemented system that combines multiple sources of knowledge to provide
robust early processing for freehand sketching. Note: 8 pages | |||
| Speech driven facial animation | | BIBA | Full-Text | 7 | |
| P. Kakumanu; R. Gutierrez-Osuna; A. Esposito; R. Bryll; A. Goshtasby; O. N. Garcia | |||
| The results reported in this article are an integral part of a larger
project aimed at achieving perceptually realistic animations, including the
individualized nuances, of three-dimensional human faces driven by speech. The
audiovisual system that has been developed for learning the spatio-temporal
relationship between speech acoustics and facial animation is described,
including video and speech processing, pattern analysis, and MPEG-4 compliant
facial animation for a given speaker. In particular, we propose a perceptual
transformation of the speech spectral envelope, which is shown to capture the
dynamics of articulatory movements. An efficient nearest-neighbor algorithm is
used to predict novel articulatory trajectories from the speech dynamics. The
results are very promising and suggest a new way to approach the modeling of
synthetic lip motion of a given speaker driven by his/her speech. This would
also provide clues toward a more general cross-speaker realistic animation. Note: 5 pages | |||
| An experimental multilingual speech translation system | | BIBA | Full-Text | 8 | |
| Kenji Matsui; Yumi Wakita; Tomohiro Konuma; Kenji Mizutani; Mitsuru Endo; Masashi Murata | |||
| In this paper, we describe an experimental speech translation system
utilizing small, PC-based hardware with multi-modal user interface. Two major
problems for people using an automatic speech translation device are speech
recognition errors and language translation errors. In this paper we focus on
developing techniques to overcome these problems. The techniques include a new
language translation approach based on example sentences, simplified expression
rules, and a multi-modal user interface which shows possible speech recognition
candidates retrieved from the example sentences. Combination of the proposed
techniques can provide accurate language translation performance even if the
speech recognition result contains some errors. We propose to use keyword
classes by looking at the dependency between keywords to detect the
misrecognized keywords and to search the example expressions. Then, the
suitable example expression is chosen using a touch panel or by pushing
buttons. The language translation picks up the expression in the other
language, which should always be grammatically correct. Simplified translated
expressions are realized by speech-act based simplifying rules so that the
system can avoid various redundant expressions. A simple comparison study
showed that the proposed method outputs almost 2 to 10 times faster than a
conventional translation device. Note: 4 pages | |||
| A multimodal presentation planner for a home entertainment environment | | BIBA | Full-Text | 9 | |
| Christian Elting; Georg Michelitsch | |||
| In this paper we outline the design and the implementation of the multimodal
presentation planner PMO, which is part of the EMBASSI intelligent user
interface for home entertainment devices. We provide details about the concepts
we use to produce cohesive and coherent output as well as illustrate the
software architecture of the PMO. We compare our approach with the state of the
art in presentation planning and conclude with an illustration of our future
work. Note: 5 pages | |||
| Physiological data feedback for application in distance education | | BIBA | Full-Text | 10 | |
| Martha E. Crosby; Brent Auernheimer; Christoph Aschwanden; Curtis Ikehara | |||
| This paper describes initial experiments collecting physiological data from
subjects performing computer tasks. A prototype realtime Emotion Mouse
collected skin temperature, galvanic skin response (GSR), and heartbeat data.
Possible applications to distance education, and a second-generation system are
discussed. Note: 5 pages | |||
| The Bayes Point Machine for computer-user frustration detection via PressureMouse | | BIBA | Full-Text | 11 | |
| Yuan Qi; Carson Reynolds; Rosalind W. Picard | |||
| We mount eight pressure sensors on a computer mouse and collect mouse
pressure signals from subjects who fill out web forms containing usability
bugs. This approach is based on a hypothesis that subjects tend to apply excess
pressure to the mouse after encountering frustrating events. We then train a
Bayes Point Machine in an attempt to classify two regions of each user's
behavior: mouse pressure where the form-filling process is proceeding smoothly,
and mouse pressure following a usability bug. Different from current popular
classifiers such as the Support Vector Machine, the Bayes Point Machine is a
new classification technique rooted in the Bayesian theory. Trained with a new
efficient Bayesian approximation algorithm, Expectation Propagation, the Bayes
Point Machine achieves a person-dependent classification accuracy rate of 88%,
which outperforms the Support Vector Machine in our experiments. The resulting
system can be used for many applications in human-computer interaction
including adaptive interface design. Note: 5 pages | |||
| Using eye movements to determine referents in a spoken dialogue system | | BIBA | Full-Text | 12 | |
| Ellen Campana; Jason Baldridge; John Dowding; Beth Ann Hockey; Roger W. Remington; Leland S. Stone | |||
| Most computational spoken dialogue systems take a "literary" approach to
reference resolution. With this type of approach, entities that are mentioned
by a human interactor are unified with elements in the world state based on the
same principles that guide the process during text interpretation. In
human-to-human interaction, however, referring is a much more collaborative
process. Participants often under-specify their referents, relying on their
discourse partners for feedback if more information is needed to uniquely
identify a particular referent. By monitoring eye-movements during this
interaction, it is possible to improve the performance of a spoken dialogue
system on referring expressions that are underspecified according to the
literary model. This paper describes a system currently under development that
employs such a strategy. Note: 5 pages | |||
| An automatic sign recognition and translation system | | BIBA | Full-Text | 13 | |
| Jie Yang; Jiang Gao; Ying Zhang; Xilin Chen; Alex Waibel | |||
| A sign is something that suggests the presence of a fact, condition, or
quality. Signs are everywhere in our lives. They make our lives easier when we
are familiar with them. But sometimes they pose problems. For example, a
tourist might not be able to understand signs in a foreign country. This paper
discusses problems of automatic sign recognition and translation. We present a
system capable of capturing images, detecting and recognizing signs, and
translating them into a target language. We describe methods for automatic sign
extraction and translation. We use a user-centered approach in system
development. The approach takes advantage of human intelligence if needed and
leverage human capabilities. We are currently working on Chinese sign
translation. We have developed a prototype system that can recognize Chinese
sign input from a video camera that is a common gadget for a tourist, and
translate the signs into English or voice stream. The sign translation, in
conjunction with spoken language translation, can help international tourists
to overcome language barriers. The technology can also help a visually
handicapped person to increase environmental awareness. Note: 8 pages | |||
| Multimodal optimizations: can legacy systems defeat them? | | BIBA | Full-Text | 14 | |
| John Harper; Donal Sweeney | |||
| This paper describes several results obtained during the implementation and
evaluation of a speech complemented interface to a vehicle monitoring system. A
speech complemented interface is one wherein the operations at the interface
(keyboard and mouse, for instance) are complemented by operator speech not
directly processed by the computer. Such systems from an interface perspective
have 'low brow' multimodal characteristics. Typical domains include vehicle
tracking applications (taxis, buses, freight) where operators frequently use
speech to confirm displayed vehicle properties with a driver. Note: 8 pages | |||
| Using multimodal interaction to navigate in arbitrary virtual VRML worlds | | BIBA | Full-Text | 15 | |
| Frank Althoff; Gregor McGlaun; Björn Schuller; Peter Morguet; Manfred Lang | |||
| In this paper we present a multimodal interface for navigating in arbitrary
virtual VRML worlds. Conventional haptic devices like keyboard, mouse, joystick
and touchscreen can freely be combined with special Virtual-Reality hardware
like spacemouse, data glove and position tracker. As a key feature, the system
additionally provides intuitive input by command and natural speech utterances
as well as dynamic head and hand gestures. The communication of the interface
components is based on the abstract formalism of a context-free grammar,
allowing the representation of device-independent information. Taking into
account the current system context, user interactions are combined in a
semantic unification process and mapped on a model of the viewer's
functionality vocabulary. To integrate the continuous multimodal information
stream we use a straight-forward rule-based approach and a new technique based
on evolutionary algorithms. Our navigation interface has extensively been
evaluated in usability studies, obtaining excellent results. Note: 8 pages | |||
| Towards reliable multimodal sensing in aware environments | | BIBA | Full-Text | 16 | |
| Scott Stillman; Irfan Essa | |||
| A prototype system for implementing a reliable sensor network for large
scale smart environments is presented. Most applications within any form of
smart environments (rooms, offices, homes, etc.) are dependent on reliable who,
where, when, and what information of its inhabitants (users). This information
can be inferred from different sensors spread throughout the space. However,
isolated sensing technologies provide limited information under the varying,
dynamic, and long-term scenarios (24/7), that are inherent in applications for
intelligent environments. In this paper, we present a prototype system that
provides an infrastructure for leveraging the strengths of different sensors
and processes used for the interpretation of their collective data. We describe
the needs of such systems, propose an architecture to deal with such
multi-modal fusion, and discuss the initial set of sensors and processes used
to address such needs. Note: 6 pages | |||
| Visually prototyping perceptual user interfaces through multimodal storyboarding | | BIBA | Full-Text | 17 | |
| Anoop K. Sinha; James A. Landay | |||
| We are applying our knowledge in designing informal prototyping tools for
user interface design to create an interactive visual prototyping tool for
perceptual user interfaces. Our tool allows a designer to quickly map out
certain types of multimodal, cross-device user interface scenarios. These
sketched designs form a multimodal storyboard that can then be executed,
quickly testing the interaction and collecting feedback about refinements
necessary for the design. By relying on visual prototyping, our multimodal
storyboarding tool simplifies and speeds perceptual user interface prototyping
and opens up the challenging space of perceptual user interface design to
non-programmers. Note: 4 pages | |||
| Naturally conveyed explanations of device behavior | | BIBA | Full-Text | 18 | |
| Michael Oltmans; Randall Davis | |||
| Designers routinely explain their designs to one another using sketches and
verbal descriptions of behavior, both of which can be understood long before
the device has been fully specified. But current design tools fail almost
completely to support this sort of interaction, instead not only forcing
designers to specify details of the design, but typically requiring that they
do so by navigating a forest of menus and dialog boxes, rather than directly
describing the behaviors with sketches and verbal explanations. We have created
a prototype system, called assistance, capable of interpreting multimodal
explanations for simple 2-D kinematic devices. The program generates a model of
the events and the causal relationships between events that have been described
via hand drawn sketches, sketched annotations, and verbal descriptions. Our
goal is to make the designer's interaction with the computer more like
interacting with another designer. This requires the ability not only to
understand physical devices but also to understand the means by which the
explanations of these devices are conveyed. Note: 8 pages | |||
| Audio-video array source separation for perceptual user interfaces | | BIBA | Full-Text | 19 | |
| Kevin Wilson; Neal Checka; David Demirdjian; Trevor Darrell | |||
| Steerable microphone arrays provide a flexible infrastructure for audio
source separation. In order for them to be used effectively in perceptual user
interfaces, there must be a mechanism in place for steering the focus of the
array to the sound source. Audio-only steering techniques often perform poorly
in the presence of multiple sound sources or strong reverberation. Video-only
techniques can achieve high spatial precision but require that the audio and
video subsystems be accurately calibrated to preserve this precision. We
present an audio-video localization technique that combines the benefits of the
two modalities. We implement our technique in a test environment containing
multiple stereo cameras and a room-sized microphone array. Our technique
achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB
improvement over source separation based on video-only localization. Note: 7 pages | |||
| Estimating focus of attention based on gaze and sound | | BIBA | Full-Text | 20 | |
| Rainer Stiefelhagen; Jie Yang; Alex Waibel | |||
| Estimating a person's focus of attention is useful for various
human-computer interaction applications, such as smart meeting rooms, where a
user's goals and intent have to be monitored. In work presented here, we are
interested in modeling focus of attention in a meeting situation. We have
developed a system capable of estimating participants' focus of attention from
multiple cues. We employ an omnidirectional camera to simultaneously track
participants' faces around a meeting table and use neural networks to estimate
their head poses. In addition, we use microphones to detect who is speaking.
The system predicts participants' focus of attention from acoustic and visual
information separately, and then combines the output of the audio- and
video-based focus of attention predictors. We have evaluated the system using
the data from three recorded meetings. The acoustic information has provided 8%
error reduction on average compared to using a single modality. Note: 9 pages | |||
| A pneumatic tactile alerting system for the driving environment | | BIBA | Full-Text | 21 | |
| Mario Enriquez; Oleg Afonin; Brent Yager; Karon Maclean | |||
| Sensory overloaded environments present an opportunity for innovative design
in the area of Human-Machine Interaction. In this paper we study the usefulness
of a tactile display in the automobile environment. Our approach uses a simple
pneumatic pump to produce pulsations of varying frequencies on the driver's
hands through a car steering wheel fitted with inflatable pads. The goal of the
project is to evaluate the effectiveness of such a system in alerting the
driver of a possible problem, when it is used to augment the visual display
presently used in automobiles. A steering wheel that provides haptic feedback
using pneumatic pockets was developed to test our hypothesis. The steering
wheel can pulsate at different frequencies. The system was tested in a simple
multitasking paradigm on several subjects and their reaction times to different
stimuli were measured and analyzed. For these experiments, we found that using
a tactile feedback device lowers reaction time significantly and that
modulating frequency of vibration provides extra information that can reduce
the time necessary to identify a problem. Note: 7 pages | |||
| A robust algorithm for reading detection | | BIBA | Full-Text | 22 | |
| Christopher S. Campbell; Paul P. Maglio | |||
| As video cameras become cheaper and more pervasive, there is now increased
opportunity for user interfaces to take advantage of user gaze data. Eye
movements provide a powerful source of information that can be used to
determine user intentions and interests. In this paper, we develop and test a
method for recognizing when users are reading text based solely on eye-movement
data. The experimental results show that our reading detection method is robust
to noise, individual differences, and variations in text difficulty. Compared
to a simple detection algorithm, our algorithm reliably, quickly, and
accurately recognizes and tracks reading. Thus, we provide a means to capture
normal user activity, enabling interfaces that incorporate more natural
interactions of human and computer. Note: 7 pages | |||
| A perceptual user interface for recognizing head gesture acknowledgements | | BIBA | Full-Text | 23 | |
| James W. Davis; Serge Vaks | |||
| We present the design and implementation of a perceptual user interface for
a responsive dialog-box agent that employs real-time computer vision to
recognize user acknowledgements from head gestures (e.g., nod = yes). IBM
Pupil-Cam technology together with anthropometric head and face measures are
used to first detect the location of the user's face. Salient facial features
are then identi ed and tracked to compute the global 2-D motion direction of
the head. For recognition, timings of natural gesture motion are incorporated
into a state-space model. The interface is presented in the context of an
enhanced text editor employing a perceptual dialog-box agent. Note: 7 pages | |||
| Perception and haptics: towards more accessible computers for motion-impaired users | | BIBA | Full-Text | 24 | |
| Faustina Hwang; Simeon Keates; Patrick Langdon; P. John Clarkson; Peter Robinson | |||
| For people with motion impairments, access to and independent control of a
computer can be essential. Symptoms such as tremor and spasm, however, can make
the typical keyboard and mouse arrangement for computer interaction difficult
or even impossible to use. This paper describes three approaches to improving
computer input effectiveness for people with motion impairments. The three
approaches are: (1) to increase the number of interaction channels, (2) to
enhance commonly existing interaction channels, and (3) to make more effective
use of all the available information in an existing input channel. Experiments
in multimodal input, haptic feedback, user modelling, and cursor control are
discussed in the context of the three approaches. A haptically enhanced
keyboard emulator with perceptive capability is proposed, combining approaches
in a way that improves computer access for motion impaired users. Note: 9 pages | |||
| A real-time head nod and shake detector | | BIBA | Full-Text | 25 | |
| Ashish Kapoor; Rosalind W. Picard | |||
| Head nods and head shakes are non-verbal gestures used often to communicate
intent, emotion and to perform conversational functions. We describe a
vision-based system that detects head nods and head shakes in real time and can
act as a useful and basic interface to a machine. We use an infrared sensitive
camera equipped with infrared LEDs to track pupils. The directions of head
movements, determined using the position of pupils, are used as observations by
a discrete Hidden Markov Model (HMM) based pattern analyzer to detect when a
head nod/shake occurs. The system is trained and tested on natural data from
ten users gathered in the presence of varied lighting and varied facial
expressions. The system as described achieves a real time recognition accuracy
of 78.46% on the test dataset. Note: 5 pages | |||
| "Those look similar!" issues in automating gesture design advice | | BIBA | Full-Text | 26 | |
| A. Chris Long; James A. Landay; Lawrence A. Rowe | |||
| Today, state-of-the-art user interfaces often include new interaction
technologies, such as speech recognition, computer vision, or gesture
recognition. Unfortunately, these technologies are difficult for most interface
designers to incorporate into their interfaces, and traditional tools do not
help designers with these technologies. One such technology is pen gestures,
which are valuable as a powerful pen-based interaction technique, but are
difficult to design well. We developed an interface design tool that uses
unsolicited advice to help designers of pen-based user interfaces create pen
gestures. Specifically, the tool warns designers when their gestures will be
perceived to be similar and advises designers how to make their gestures less
similar. We believe that the issues we encountered while designing an interface
for advice and implementing this advice will reappear in design tools for other
novel input technologies, such as hand and body gestures. Note: 5 pages | |||
| Design issues for vision-based computer interaction systems | | BIBA | Full-Text | 27 | |
| Rick Kjeldsen; Jacob Hartman | |||
| Computer Vision and other direct sensing technologies have progressed to the
point where we can detect many aspects of a user's activity reliably and in
real time. Simply recognizing the activity is not enough, however. If
perceptual interaction is going to become a part of the user interface, we must
turn our attention to the tasks we wish to perform and methods to effectively
perform them.
This paper attempts to further our understanding of vision-based interaction by looking at the steps involved in building practical systems, giving examples from several existing systems. We classify the types of tasks well suited to this type of interaction as pointing, control or selection, and discuss interaction techniques for each class. We address the factors affecting the selection of the control action, and various types of control signals that can be extracted from visual input. We present our design for widgets to perform different types of tasks, and techniques, similar to those used with established user interface devices, to give the user the type of control they need to perform the task well. We look at ways to combine individual widgets into Visual Interfaces that allow the user to perform these tasks both concurrently and sequentially. Note: 8 pages | |||
| Hand tracking for human-computer interaction with Graylevel VisualGlove: turning back to the simple way | | BIBA | Full-Text | 28 | |
| Giancarlo Iannizzotto; Massimo Villari; Lorenzo Vita | |||
| Recent developments in the manufacturing and marketing of low
power-consumption computers, small enough to be "worn" by users and remain
almost invisible, have reintroduced the problem of overcoming the outdated
paradigm of human-computer interaction based on use of a keyboard and a mouse.
Approaches based on visual tracking seem to be the most promising, as they do
not require any additional devices (gloves, etc.) and can be implemented with
off-the-shelf devices such as webcams. Unfortunately, extremely variable
lighting conditions and the high degree of computational complexity of most of
the algorithms available make these techniques hard to use in systems where CPU
power consumption is a major issue (e.g. wearable computers) and in situations
where lighting conditions are critical (outdoors, in the dark, etc.). This
paper describes the work carried out at VisiLAB at the University of Messina as
part of the VisualGlove Project to develop a real-time, vision-based device
able to operate as a substitute for the mouse and other similar input devices.
It is able to operate in a wide range of lighting conditions, using a low-cost
webcam and running on an entry-level PC. As explained in detail below,
particular care has been taken to reduce computational complexity, in the
attempt to reduce the amount of resources needed for the whole system to work. Note: 7 pages | |||
| Robust finger tracking for wearable computer interfacing | | BIBA | Full-Text | 29 | |
| Sylvia M. Dominguez; Trish Keaton; Ali H. Sayed | |||
| Key to the design of human-machine gesture interface applications is the
ability of the machine to quickly and efficiently identify and track the hand
movements of its user. In a wearable computer system equipped with head-mounted
cameras, this task is extremely difficult due to the uncertain camera motion
caused by the user's head movement, the user standing still then randomly
walking, and the user's hand or pointing finger abruptly changing directions at
variable speeds. This paper presents a tracking methodology based on a robust
state-space estimation algorithm, which attempts to control the influence of
uncertain environment conditions on the system's performance by adapting the
tracking model to compensate for the uncertainties inherent in the data. Our
system tracks a user's pointing gesture from a single head mounted camera, to
allow the user to encircle an object of interest, thereby coarsely segmenting
the object. The snapshot of the object is then passed to a recognition engine
for identification, and retrieval of any pre-stored information regarding the
object. A comparison of our robust tracker against a plain Kalman tracker
showed a 15% improvement in the estimated position error, and exhibited a
faster response time. Note: 5 pages | |||
| Privacy protection by concealing persons in circumstantial video image | | BIBA | Full-Text | 30 | |
| Suriyon Tansuriyavong; Shin-ichi Hanaki | |||
| A circumstantial video image should convey sufficient situation information,
while protecting specific person's privacy information in the scene. This paper
proposes a system which automatically identifies a person by face recognition,
tracks him or her, and displays the image of the person in modified form such
as silhouette with or without name, or only name in characters (i.e. invisible
person). A subjective evaluation experiment was carried out in order to know
how people prefer each modified video image either from observer or subject
viewpoint. It turned out that the silhouette display with name list seems to be
most appropriate from the balance between protecting privacy and conveying
situation information in circumstantial video image. Note: 4 pages | |||
| Bare-hand human-computer interaction | | BIBA | Full-Text | 31 | |
| Christian von Hardenberg; François Bérard | |||
| In this paper, we describe techniques for barehanded interaction between
human and computer. Barehanded means that no device and no wires are attached
to the user, who controls the computer directly with the movements of his/her
hand.
Our approach is centered on the needs of the user. We therefore define requirements for real-time barehanded interaction, derived from application scenarios and usability considerations. Based on those requirements a finger-finding and hand-posture recognition algorithm is developed and evaluated. To demonstrate the strength of the algorithm, we build three sample applications. Finger tracking and hand posture recognition are used to paint virtually onto the wall, to control a presentation with hand postures, and to move virtual items on the wall during a brainstorming session. We conclude the paper with user tests, which were conducted to prove the usability of bare-hand human computer interaction. Note: 8 pages | |||
| User and social interfaces by observing human faces for intelligent wheelchairs | | BIBA | Full-Text | 32 | |
| Yoshinori Kuno; Yoshifumi Murakami; Nobutaka Shimada | |||
| With the increase in the number of senior citizens, there is a growing
demand for human-friendly wheelchairs as mobility aids. Thus several
intelligent wheelchairs have been proposed recently. However, they consider
friendliness only to their users. Since wheelchairs move among people, they
should also be friendly to people around them. In other words, they should have
a social-friendly interface as well as a user-friendly interface. We propose an
intelligent wheelchair that is friendly to both user and people around it by
observing the faces of both user and others. The user can control it by turning
his/her face in the direction where he/she would like to turn. It observes
pedestrian's face and changes its collision avoidance method depending on
whether or not he/she notices it. Here we assume that the pedestrian notices
the wheelchair if his/her face often faces toward the wheelchair. Note: 4 pages | |||
| First steps towards automatic recognition of spontaneous facial action units | | BIBA | Full-Text | 33 | |
| B. Braathen; M. S. Bartlett; G. Littlewort; J. R. Movellan | |||
| We present ongoing work on a project for automatic recognition of
spontaneous facial actions (FACs). Current methods for automatic facial
expression recognition assume images are collected in controlled environments
in which the subjects deliberately face the camera. Since people often nod or
turn their heads, automatic recognition of spontaneous facial behavior requires
methods for handling out-of-image-plane head rotations. There are many
promising approaches to address the problem of out-of-image plane rotations. In
this paper we explore an approach based on 3-D warping of images into canonical
views. Since our goal is to explore the potential of this approach, we first
tried with images with 8 hand-labeled facial landmarks. However the approach
can be generalized in a straight-forward manner to work automatically based on
the output of automatic feature detectors. A front-end system was developed
that jointly estimates camera parameters, head geometry and 3-D head pose
across entire sequences of video images. Head geometry and image parameters
were assumed constant across images and 3-D head pose is allowed to vary. First
a small set of images was used to estimate camera parameters and 3D face
geometry. Markov chain Monte-Carlo methods were then used to recover the
most-likely sequence of 3D poses given a sequence of video images. Once the 3D
pose was known, we warped each image into frontal views with a canonical face
geometry. We evaluate the performance of the approach as a front-end for an
spontaneous expression recognition task. Note: 5 pages | |||
| A video joystick from a toy | | BIBA | Full-Text | 34 | |
| Gary Bradski; Victor Eruhimov; Sergey Molinov; Valery Mosyagin; Vadim Pisarevsky | |||
| The paper describes an algorithm for 3D reconstruction of a toy composed
from rigid bright colored blocks with the help of a conventional video camera.
The blocks are segmented using histogram thresholds and merged into one
connected component corresponding to the whole toy. We also present the
algorithm for extracting the color structure and matching feature points across
the frames and discuss robust structure from motion and recognition connected
with the subject. Note: 4 pages | |||
| WebContext: remote access to shared context | | BIBA | Full-Text | 35 | |
| Robert G., III Capra; Manuel A. Pérez-Quiñones; Naren Ramakrishnan | |||
| In this paper, we describe a system and architecture for building and
remotely accessing shared context between a user and a computer. The system is
designed to allow a user to browse web pages on a personal computer and then
remotely make queries about information seen on the web pages using a
telephone-based voice user interface. Note: 9 pages | |||
| Recognizing movements from the ground reaction force | | BIBA | Full-Text | 36 | |
| Robert Headon; Rupert Curwen | |||
| This paper presents a novel approach to movement recognition, using the
vertical component of a person's Ground Reaction Force (GRF). Typical primitive
movements such as taking a step, jumping, drop-landing, sitting down, rising to
stand and crouching are decomposed and recognized in terms of the GRF signal
observed by a weight sensitive floor. Previous works focused on vision
processing for movement recognition. This work provides a new sensor modality
for a larger research effort, that of sentient computing, which is concerned
with giving computers awareness of their environment and inhabitants. Note: 8 pages | |||
| The Infocockpit: providing location and place to aid human memory | | BIBA | Full-Text | 37 | |
| Desney S. Tan; Jeanine K. Stefanucci; Dennis R. Proffitt; Randy Pausch | |||
| Our work focuses on building and evaluating computer system interfaces that
make information memorable. Psychology research tells us people remember
spatially distributed information based on its location relative to their body,
as well as the environment in which the information was learned. We apply these
principles in the implementation of a multimodal prototype system, the
Infocockpit (for "Information Cockpit"). The Infocockpit not only uses multiple
monitors surrounding the user to engage human memory for location, but also
provides ambient visual and auditory displays to engage human memory for place.
We report a user study demonstrating a 56% increase in memory for information
presented with our Infocockpit system as compared to a standard desktop system. Note: 4 pages | |||
| Visual panel: virtual mouse, keyboard and 3D controller with an ordinary piece of paper | | BIBA | Full-Text | 38 | |
| Zhengyou Zhang; Ying Wu; Ying Shan; Steven Shafer | |||
| This paper presents a vision-based interface system, VISUAL PANEL, which
employs an arbitrary quadrangle-shaped panel (e.g., an ordinary piece of paper)
and a tip pointer (e.g., fingertip) as an intuitive, wireless and mobile input
device. The system can accurately and reliably track the panel and the tip
pointer. The panel tracking continuously determines the projective mapping
between the panel at the current position and the display, which in turn maps
the tip position to the corresponding position on the display. By detecting the
clicking and dragging actions, the system can fulfill many tasks such as
controlling a remote large display, and simulating a physical keyboard. Users
can naturally use their fingers or other tip pointers to issue commands and
type texts. Furthermore, by tracking the 3D position and orientation of the
visual panel, the system can also provide 3D information, serving as a virtual
joystick, to control 3D virtual objects. Note: 8 pages | |||