| Research Challenges in Gesture: Open Issues and Unsolved Problems | | BIBA | Full-Text | 1-11 | |
| Alan Wexelblat | |||
| Gesture today remains a sideline in computer interfaces. I argue that this is due to several longstanding deficiencies in the theoretical foundations of the field. We must act to correct these deficiencies and strengthen the research community in order to avoid becoming a footnote in the history of computer science. I specify fundamental unsolved problems in the areas of naturalness, anthropology and systems building. I also suggest some things that we could do to make our research community stronger and more able to tackle these problems. | |||
| Progress in Sign Languages Recognition | | BIBA | Full-Text | 13-21 | |
| Alistair D. N. Edwards | |||
| The automatic recognition of sign language is an attractive prospect; the technology exists to make it possible, while the potential applications are exciting and worthwhile. To date the research emphasis has been on the capture and classification of the gestures of sign language and progress in that work is reported. However, it is suggested that there are some greater, broader research questions to be addressed before full sign language recognition is achieved. The main areas to be addressed are sign language representation (grammars) and facial expression recognition. | |||
| Movement Phase in Signs and Co-Speech Gestures, and Their Transcriptions by Human Coders | | BIBA | Full-Text | 23-35 | |
| Sotaro Kita; Ingeborg van Gijn; Harry van der Hulst | |||
| The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaning-bearing phase. | |||
| Classifying Two Dimensional Gestures in Interactive Systems | | BIBA | Full-Text | 37-48 | |
| Axel Kramer | |||
| This paper motivates and presents a classification scheme for
two-dimensional gestures in interactive systems. Most pen-based systems allow
the user to perform gestures in order to enter and execute commands, but the
usage of gestures can be found in other interactive systems as well. Much
research so far has been focused on how to implement two-dimensional gestures,
how to recognize the users input, or what context to use gestures in.
Instead, the focus of this paper is to explore and classify interactive characteristics of two-dimensional gestures as they are used in interactive systems. The benefits for the field are three-fold. First, such a classification describes one design space for the usage of two-dimensional gestures in interactive systems and thus presents possible choices to system designers. Second, empirical researchers can make use of such a classification to make systematic choices about aspects of gesture based systems that are worth studying. Finally, it can serve as a starting point for drawing parallels and exploring differences to gestures used in three-dimensional interfaces. | |||
| Are Listeners Paying Attention to the Hand Gestures of an Anthropomorphic Agent? An Evaluation Using a Gaze Tracking Method | | BIBAK | Full-Text | 49-59 | |
| Shuichi Nobe; Satoru Hayamizu; Osamu Hasegawa; Hideaki Takahashi | |||
| The information that listeners are looking at and paying attention to is
significant in the evaluation of the human-anthropomorphic agent interaction
system. A pilot study was conducted, using a gaze tracking method, on relevant
aspects of an anthropomorphic agent's hand gestures in a real-time setting. It
revealed that a highly informative, one-handed gesture with
seemingly-interactive speech attracted attention when it had a slower stroke
and/or a long post-stroke hold at the Center-Center space and upper position. Keywords: gestures; anthropomorphic agents; gaze tracking method; human-computer
interaction | |||
| Gesture-Based and Haptic Interaction for Human Skill Acquisition | | BIBA | Full-Text | 61-68 | |
| Monica Bordegoni; Franco De Angelis | |||
| This paper describes the preliminary results of the research work currently ongoing at the University of Parma, and partially carried out within a basic research project funded by the European Union. The research work aims at applying the techniques used in gesture analysis and recognition for understanding human skill performed for non rigid object grasping and manipulation. The various grasping gestures have been classified on the basis of some quantitative features extracted from the hand gesture analysis. Finally, it is planned to map the formalized skill into a robotics system, that will be able to grasp and manipulate non rigid objects, also facing unexpected situations. | |||
| High Performance Real-Time Gesture Recognition Using Hidden Markov Models | | BIBA | Full-Text | 69-80 | |
| Gerhard Rigoll; Andreas Kosmala; Stefan Eickeler | |||
| An advanced real-time system for gesture recognition is presented, which is able to recognize complex dynamic gestures, such as "hand waving", "spin", "pointing", and "head moving". The recognition is based on global motion features, extracted from each difference image of the image sequence. The system uses Hidden Markov Models (HMMs) as statistical classifier. These HMMs are trained on a database of 24 isolated gestures, performed by 14 different people. With the use of global motion features, a recognition rate of 92.9% is achieved for a person and background independent recognition. | |||
| Velocity Profile Based Recognition of Dynamic Gestures with Discrete Hidden Markov Models | | BIBA | Full-Text | 81-95 | |
| Frank G. Hofmann; Peter Heyer; Günter Hommel | |||
| In this paper we present a method for the recognition of dynamic gestures with discrete Hidden Markov Models (HMMs) from a continuous stream of gesture input data. The segmentation problem is addressed by extracting two velocity profiles from the gesture data and using their extrema as segmentation cues. Gestures are captured with a TUB-SensorGlove. The paper focuses on the description of the gesture recognition method (including data preprocessing) and describes experiments for the evaluation of the performance of the recognition method. The paper combines and further develops ideas from some of our previous work. | |||
| Video-Based Sign Language Recognition Using Hidden Markov Models | | BIBA | Full-Text | 97-109 | |
| Marcell Assan; Kirsti Grobel | |||
| This paper is concerned with the video-based recognition of signs. Concentrating on the manual parameters of sign language, the system aims for the signer dependent recognition of 262 different signs taken from Sign Language of the Netherlands. For Hidden Markov Modelling a sign is considered a doubly stochastic process, represented by an unobservable state sequence. The observations emitted by the states are regarded as feature vectors, that are extracted from video frames. This work deals with three topics: Firstly the recognition of isolated signs, secondly the influence of variations of the feature vector on the recognition rate and thirdly an approach for the recognition of connected signs. The system achieves recognition rates up to 94% for isolated signs and 73% for a reduced vocabulary of connected signs. | |||
| Corpus 3D Natural Movements and Sign Language Primitives of Movement | | BIBA | Full-Text | 111-121 | |
| Sylvie Gibet; James Richardson; Thierry Lebourque; Annelies Braffort | |||
| This paper describes the development of a corpus or database of hand-arm pointing gestures, considered as a basic element for gestural communication. The structure of the corpus is defined for natural pointing movements carried out in different directions, heights and amplitudes. It is then extended to movement primitives habitually used in sign language communication. The corpus is based on movements recorded using an optoelectronic recording system that allows the 3D description of movement trajectories in space. The main technical characteristics of the capture and pretreatment system are presented, and perspectives are highlighted for recognition and generation purposes. | |||
| On the Use of Context and A Priori Knowledge in Motion Analysis for Visual Gesture Recognition | | BIBA | Full-Text | 123-134 | |
| Karin Husballe Munk; Erik Granum | |||
| The correspondence analysis part of a model based vision system is
investigated theoretically and through a synthetic image sequence showing a
human hand gesture. The purpose of the study is to find and describe ways of
improving the conditions for robust tracking, by introducing a priori knowledge
such as structural information from the model and temporal context of the
observed motion.
Primary performance characteristics are the size of the search space for correspondence analysis, and the prediction error under various conditions. Theoretical models for the search space dependencies on connectivity properties and on prediction accuracy are developed. Observations from the image sequence suggest simple predictors for the context of smooth motion, and their expected influence on search space is verified. Special considerations must be given to handling of motion trajectory discontinuities, and alternatives are suggested. | |||
| Automatic Estimation of Body Regions from Video Images | | BIBAK | Full-Text | 135-145 | |
| Hermann Hienz; Kirsti Grobel | |||
| In our approach video-based recognition of sign language requires the
extraction of sign parameters. Each sign can be characterised by means of
manual (handshape, hand orientation, location and movement) and non-manual
(trunk, head, gaze, facial expression, mouth) parameters. This paper introduces
a software module which is as a part of the developed automatic sign language
recognition system able to extract relevant body regions from digitised video
images. The recognition of body regions is crucial for determining location of
signs. The proposed software module uses a rule-based system for analysing the
body contour in order to compute the 2D position of the shoulders, the top of
head and the vertical axis of the body. Based on these results the position of
the eyes are calculated directly from the segmented face of the signer. The
position of the remaining face- (nose, forehead, mouth, cheek, chin) and trunk
regions (shoulder belt, chest, belly, hip) are determined by means of two
estimators, where a-priori known geometric data of the face and fuzzy technique
are used. Experiments indicate that our approach leads to good estimation of
body regions, which we all compute in real time. Keywords: Estimation of body regions; sign language recognition; digital image
processing; gesture analysis | |||
| Rendering Gestures as Line Drawings | | BIBA | Full-Text | 147-157 | |
| Frank Godenschweger; Thomas Strothotte; Hubert Wagener | |||
| This paper discusses computer generated illustrations and animation
sequences of hand gestures. Especially the animation of gestures is very useful
in teaching the sign language.
We propose algorithms for rendering 3D models of hands as line drawings and for designing animations of line drawn gestures. Presentations of gestures as line drawings as opposed to a photorealistic representations have several advantages. Most importantly, the abstract nature of line drawings emphasizes the essential information a picture is to express and thus supports an easier cognition. Especially when line drawings are rendered from simple 3D-models (of human parts), they are aesthetically more pleasing than photorealistic renderings of the same model. This leads us to the assumption that simpler 3D-models suffice for line drawn illustrations and animations of gestures, which in consequence facilitates the 3D modeling task and speeds up the rendering. Another advantage of line drawings include fast transmission in networks, as e.g. the Internet, and the wide scale-independence they exhibit. | |||
| Investigating the Role of Redundancy in Multimodal Input Systems | | BIBA | Full-Text | 159-171 | |
| Karen McKenzie Mills; James L. Alty | |||
| A major concern of Human Computer Interaction is to improve communication between people and computer applications. One possible way of improving such communication is to capitalise on the way human beings use speech and gesture in a complementary manner, exploiting the redundancy of information between these modes. Redundant data input via multiple modalities, give considerable scope for the resolution of error and ambiguity. This paper describes implementation of a simple, inexpensive tri-modal input system accepting touch, two dimensional gesture and speech input. Currently the speech and gesture recognition systems operate separately. Truth maintenance and blackboard system architectures in a multimodal interpreter are proposed for handling the integration between modes and task knowledge. Preliminary results from the two dimensional gesture recognition system are presented. Rule Induction is used for analysis of the gesture data and preliminary classification results are presented. Current implementations and future work on redundancy are also discussed. | |||
| Gesture Recognition of the Upper Limbs -- From Signal to Symbol | | BIBA | Full-Text | 173-184 | |
| Martin Fröhlich; Ipke Wachsmuth | |||
| To recognise gestures performed by people without disabilities during verbal communication -- so-called coverbal gestures -- a flexible system with task-oriented design is proposed. The issue of flexibility is addressed via different kinds of modules -- grasped as agents --, which are grouped in different levels. They can be easily reconfigured or rewritten to suit another application. This system of layered agents uses an abstract body-model to transform the up-taken data from the six-degree-of-freedom-sensors, and the data gloves, to a first-level symbolic description of gesture features. In a first integration step the first-level symbols are integrated to second-level symbols describing a whole gesture. Second-level symbolic gesture descriptions are the entities which can be integrated with speech tokens to form multi-modal utterances. | |||
| Exploiting Distant Pointing Gestures for Object Selection in a Virtual Environment | | BIBA | Full-Text | 185-196 | |
| Marc Erich Latoschik; Ipke Wachsmuth | |||
| Developing state of the art multimedia applications nowadays calls for the use of sophisticated visualisation and immersion techniques, commonly referenced as Virtual Reality. While Virtual Reality meanwhile reaches good results both in image quality and in fast user feedback using parallel computation techniques, the methods for interacting with these systems need to be improved. In this paper we introduce a multimedia application that uses a gesture-driven interface and, secondly, the architecture for an expandable gesture recognition system. After different gesture types for interaction in a virtual environment are discussed with respect to a required functionality, the implementation of a specific gesture detection module for distant pointing recognition is described, and the whole system design is tested for its task adequacy. | |||
| An Intuitive Two-Handed Gestural Interface for Computer Supported Product Design | | BIBA | Full-Text | 197-208 | |
| Caroline Hummels; Gerda Smets; Kees Overbeeke | |||
| More and more researchers emphasize the development of humanising computer interaction, thus bringing us closer to intuitive interfaces. Gestural interface research fits in with these new developments. However, the existing gestural interfaces hardly take advantage of the possibilities gestures offer. They even force the user to learn a new language. We propose a gestural interface for product design that exploits the use of gestures. This interface supports the perceptual-motor skills of the designer and the expressive and creative design process. To develop this task-specific gestural interface we emphasize the importance of explorative experiments to obtain the meaning of gestures used for product design. We show with two experiments that an accurate interpretation of a created product can be made, even when designers are allowed full freedom in their gestures. MOVE ON, a computer supported design application is our first step towards full freedom gestural human-computer interaction. Creating task-specific human-computer interaction using limitless gestures is feasible, although extensive research is necessary and ongoing. | |||
| Detection of Fingertips in Human Hand Movement Sequences | | BIBA | Full-Text | 209-218 | |
| Claudia Nölker; Helge Ritter | |||
| This paper presents an hierarchical approach with neural networks to locate the positions of the fingertips in grey-scale images of human hands. The first chapters introduce and sum up the research done in this area. Afterwards, our hierarchical approach and the preprocessing of the grey-scale images are described. A low-dimensional encoding of the images is done by the means of Gabor-Filters and a special kind of artificial neural net, the LLM-net, is employed to find the positions of the fingertips. The capabilities of the system are demonstrated on three tasks: locating the tip of the forefinger and of the thumb, finding the pointing-direction regardless of the operator's pointing style, and detecting all 5 fingertips in hand movement sequences. The system is able to perform these tasks even when the fingertips are in an area with low contrast. | |||
| Neural Architecture for Gesture-Based Human-Machine-Interaction | | BIBA | Full-Text | 219-232 | |
| Hans-Joachim Böhme; Anja Brakensiek; Ulf-Dietrich Braumann; Markus Krabbes; Horst-Michael Gross | |||
| We present a neural architecture for gesture-based interaction between a mobile robot and human users. One crucial problem for natural interface techniques is the robustness under highly varying environmental conditions. Therefore, we propose a multiple cue approach for the localisation of a potential user in the operation field, followed by the acquisition and interpretation of its gestural instructions. The whole approach is motivated in the context of a reliable operation scenario, but can be extended easily for other applications, such as videoconferencing. | |||
| Robotic Gesture Recognition | | BIBA | Full-Text | 233-244 | |
| Jochen Triesch; Christoph von der Malsburg | |||
| Robots of the future should communicate with humans in a natural way. We are especially interested in vision-based gesture interfaces. In the context of robotics several constraints exist, which make the task of gesture recognition particularly challenging. We discuss these constraints and report on progress being made in our lab in the development of techniques for building robust gesture interfaces which can handle these constraints. In an example application, the techniques are shown to be easily combined to build a gesture interface for a real robot grasping objects on a table in front of it. | |||
| Image Based Recognition of Graze Direction Using Adaptive Methods | | BIBA | Full-Text | 245-257 | |
| Axel Christian Varchmin; Robert Rae; Helge Ritter | |||
| Human-machine interfaces based on gaze recognition can greatly simplify the handling of computer applications. However, most of the existing systems have problems with changing environments and different users. As a solution we use (i) adaptive components which can be trained online and (ii) detect common facial features, i.e. eyes, nose and mouth, for gaze recognition. In a first step an adaptive color histogram segmentation method roughly determines the region of interest including the user's face. Within this region we then use a hierarchical recognition approach to detect the facial features. In the last stage of our system these feature positions are used to estimate gaze direction by detailed analysis of the eye region. We achieve an average precision of 1.5 ++ for the gaze pan and 2.5 ++ for the tilt angle while the user looks on a computer screen. The system runs at a rate of one frame per second on a common workstation. | |||
| Towards a Dialogue System Based on Recognition and Synthesis of Japanese Sign Language | | BIBA | Full-Text | 259-271 | |
| Shan Lu; Seiji Igi; Hideaki Matsuo; Yuji Nagashima | |||
| This paper describes a dialogue system based on the recognition and synthesis of Japanese sign language. The purpose of this system is to support conversation between people with hearing impairments and hearing people. The system consists of five main modules: sign-language recognition and synthesis, voice recognition and synthesis, and dialogue control. The sign-language recognition module uses a stereo camera and a pair of colored gloves to track the movements of the signer, and sign-language synthesis is achieved by regenerating the motion data obtained by an optical motion capture system. An experiment was done to investigate changes in the gaze-line of hearing-impaired people when they read sign language, and the results are reported. | |||
| The Recognition Algorithm with Non-contact for Japanese Sign Language Using Morphological Analysis | | BIBA | Full-Text | 273-284 | |
| Hideaki Matsuo; Seiji Igi; Shan Lu; Yuji Nagashima; Yuji Takata; Terutaka Teshima | |||
| This paper documents the recognition method of deciphering Japanese sign
language (JSL) using projected images. The goal of the movement recognition is
to foster communication between hearing impaired and people capable of normal
speech. We uses a stereo camera for recording three-dimensional movements, a
image processing board for tracking movements, and a personal computer for an
image processor charting the recognition of JSL patterns. This system works by
formalizing the space area of the signers according to the characteristics of
the human body, determining components such as location and movements, and then
recognizing sign language patterns.
The system is able to recognize JSL by determining the extent of similarities in the sign field, and does so even when vibrations in hand movements occur and when there are differences in body build. We obtained useful results from recognition experiments in 38 different JSL in two signers. | |||
| Special Topics of Gesture Recognition Applied in Intelligent Home Environments | | BIBA | Full-Text | 285-296 | |
| Markus Kohler | |||
| This report shows how to realize a gesture recognition system for controlling appliances in home environments. It gives a brief overview on an existing system and clarifies details on ergonomic remote control of devices by gestures with the help of a vision system. The focus is on the motion detection, object normalization and identification, the modelling and the prediction of motion by the Kalman Filter. A main interest was to show through the example ARGUS, how the Kalman Filter should be modelled and initialized for a physical human motion. The initialization problem of the Kalman Filter of a vision based system for human motion tracking differs from initializing for physical systems, where manuals report measurement errors. Most aspects mentioned in this report were implemented in the ARGUS prototype. | |||
| BUILD-IT: An Intuitive Design Tool Based on Direct Object Manipulation | | BIBAK | Full-Text | 297-308 | |
| Morten Fjeld; Martin Bichsel; Matthias Rauterberg | |||
| Natural interaction, in the context of this paper, means human action in a
world of tangible objects and live subjects. We introduce the concept of action
regulation and relate it to observable human behaviour. A tool bringing
together motor and cognitive action is a promising way to assure complete task
regulation. Aiming for such tools, we propose a set of guidelines for the next
generation of user interfaces, the Natural User Interface (NUI). We present a
NUI instantiation called BUILD-IT, featuring video-mediated interaction in a
task specific context. This multi-brick interaction tool renders virtual
objects tangible and allows multiple user simultaneous interaction in one
common space. A few user experiences are briefly described. Keywords: Augmented Reality; natural interaction; Natural User Interface; graspable
objects; computer mediated design | |||