HCI Bibliography Home | HCI Conferences | GW Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
GW Tables of Contents: 96979901030507091113

GW 2001: Gesture Workshop

Fullname:GW 2001: Gesture and Sign Language in Human-Computer Interaction: International Gesture Workshop Revised Papers
Editors:Ipke Wachsmuth; Timo Sowa
Location:London, England
Dates:2001-Apr-18 to 2001-Apr-20
Publisher:Springer Berlin Heidelberg 2002
Series:Lecture Notes in Computer Science 2298
Standard No:DOI: 10.1007/3-540-47873-6; hcibib: GW01; ISBN: 978-3-540-43678-2 (print), 978-3-540-47873-7 (online)
Papers:33
Pages:321
Links:Online Proceedings | Workshop Website
  1. Invited Paper
  2. Gesture Recognition
  3. Recognition of Sign Language
  4. Gesture and Sign Language Synthesis
  5. Nature and Notation of Sign Language
  6. Gestural Action & Interaction
  7. Applications Based on Gesture Control

Invited Paper

Research on Computer Science and Sign Language: Ethical Aspects BIBAFull-Text 1-8
  Annelies Braffort
The aim of this paper is to raise the ethical problems which appear when hearing computer scientists work on the Sign Languages (SL) used by the deaf communities, specially in the field of Sign Language recognition. On one hand, the problematic history of institutionalised SL must be known. On the other hand, the linguistic properties of SL must be learned by computer scientists before trying to design systems with the aim to automatically translate SL into oral or written language or vice-versa. The way oral language and SL function is so different that it seems impossible to work on that topic without a close collaboration with linguists specialised in SL and deaf people.

Gesture Recognition

An Inertial Measurement Framework for Gesture Recognition and Applications BIBAFull-Text 9-20
  Ari Y. Benbasat; Joseph A. Paradiso
We describe an inertial gesture recognition framework composed of three parts. The first is a compact, wireless six-axis inertial measurement unit to fully capture three-dimensional motion. The second, a gesture recognition algorithm, analyzes the data and categorizes it on an axis-by-axis basis as simple motions (straight line, twist, etc.) with magnitude and duration. The third allows an application designer to combine recognized gestures both concurrently and consecutively to create specific composite gestures can then be set to trigger output routines. This framework was created to enable application designers to use inertial sensors with a minimum of knowledge and effort. Sample implementations and future directions are discussed.
Interpretation of Shape-Related Iconic Gestures in Virtual Environments BIBAFull-Text 21-33
  Timo Sowa; Ipke Wachsmuth
So far, approaches towards gesture recognition focused mainly on deictic and emblematic gestures. Iconics, viewed as iconic signs in the sense of Peirce, are different from deictics and emblems, for their relation to the referent is based on similarity. In the work reported here, the breakdown of the complex notion of similarity provides the key idea towards a computational model of gesture semantics for iconic gestures. Based on an empirical study, we describe first steps towards a recognition model for shape-related iconic gestures and its implementation in a prototype gesture recognition system. Observations are focused on spatial concepts and their relation to features of iconic gestural expressions. The recognition model is based on a graph-matching method which compares the decomposed geometrical structures of gesture and object.
Real-Time Gesture Recognition by Means of Hybrid Recognizers BIBAFull-Text 34-46
  Andrea Corradini
In recent times, there have been significant efforts to develop intelligent and natural interfaces for interaction between human users and computer systems by means of a variety of modes of information (visual, audio, pen, etc.). These modes can be used either individually or in combination with other modes. One of the most promising interaction modes for these interfaces is the human user's natural gesture.
   In this work, we apply computer vision techniques to analyze real-time video streams of a user's freehand gestures from a predefrined vocabulary. We propose the use of a set of hybrid recognizers where each of them accounts for one single gesture and consists of one hidden Markov model (HMM) whose state emission probabilities are computed by partially recurrent artificial neural networks (ANN).
   The underlying idea is to take advantage of the strengths of ANNs to capture the nonlinear local dependencies of a gesture, while handling its temporal structure within the HMM formalism. The recognition engine's accuracy outperforms that of HMM- and ANN-based recognizers used individually.
Development of a Gesture Plug-In for Natural Dialogue Interfaces BIBAKFull-Text 47-58
  Karin Husballe Munk
This paper describes work in progress on the development of a plug-in automatic gesture recogniser intended for a human-computer natural speech-and-gesture dialogue interface module. A model-based approach is suggested for the gesture recognition. Gesture models are built from models of the trajectories selected finger tips traverse through physical 3D space of the human gesturer when he performs the different gestures of interest. Gesture capture employs in the initial version a data glove, later computer vision is intended. The paper outlines at a general level the gesture model design and argues for its choice, as well as the rationale behind the entire work is laid forth. As the recogniser is not yet fully implemented no test results can be presented so far.
Keywords: bi-modal dialogue interface, human natural co-verbal gestures, gesture recognition, model-based, spatial gesture model, finger tip trajectory
A Natural Interface to a Virtual Environment through Computer Vision-Estimated Pointing Gestures BIBAFull-Text 59-63
  Thomas B. Moeslund; Moritz Störring; Erik Granum
This paper describes the development of a natural interface to a virtual environment. The interface is through a natural pointing gesture and replaces pointing devices which are normally used to interact with virtual environments. The pointing gesture is estimated in 3D using kinematic knowledge of the arm during pointing and monocular computer vision. The latter is used to extract the 2D position of the user's hand and map it into 3D. Off-line tests show promising results with an average errors of 8cm when pointing at a screen 2m away.

Recognition of Sign Language

Towards an Automatic Sign Language Recognition System Using Subunits BIBAFull-Text 64-75
  Britta Bauer; Karl-Friedrich Kraiss
This paper is concerned with the automatic recognition of German continuous sign language. For the most user-friendliness only one single color video camera is used for image recording. The statistical approach is based on the Bayes decision rule for minimum error rate. Following speech recognition system design, which are in general based on subunits, here the idea of an automatic sign language recognition system using subunits rather than models for whole signs will be outlined. The advantage of such a system will be a future reduction of necessary training material. Furthermore, a simplified enlargement of the existing vocabulary is expected. Since it is difficult to define subunits for sign language, this approach employs totally self-organized subunits called fenone. K-means algorithm is used for the definition of such fenones. The software prototype of the system is currently evaluated in experiments.
Signer-Independent Continuous Sign Language Recognition Based on SRN/HMM BIBAKFull-Text 76-85
  Gaolin Fang; Wen Gao; Xilin Chen; Chunli Wang; Jiyong Ma
A divide-and-conquer approach is presented for signer-independent continuous Chinese Sign Language (CSL) recognition in this paper. The problem of continuous CSL recognition is divided into the subproblems of isolated CSL recognition. The simple recurrent network (SRN) and the hidden Markov models (HMM) are combined in this approach. The improved SRN is introduced for segmentation of continuous CSL. Outputs of SRN are regarded as the states of HMM, and the Lattice Viterbi algorithm is employed to search the best word sequence in the HMM framework. Experimental results show SRN/HMM approach has better performance than the standard HMM one.
Keywords: Simple recurrent network; hidden Markov models; continuous sign language recognition; Chinese sign language
A Real-Time Large Vocabulary Recognition System for Chinese Sign Language BIBAFull-Text 86-95
  Chunli Wang; Wen Gao; Jiyong Ma
The major challenge that faces Sign Language recognition now is to develop methods that will scale well with increasing vocabulary size. In this paper, a real-time system designed for recognizing Chinese Sign Language (CSL) signs with a 5100 sign vocabulary is presented. The raw data are collected from two CyberGlove and a 3-D tracker. An algorithm based on geometrical analysis for purpose of extracting invariant feature to signer position is proposed. Then the worked data are presented as input to Hidden Markov Models (HMMs) for recognition. To improve recognition performance, some useful new ideas are proposed in design and implementation, including modifying the transferring probability, clustering the Gaussians and fast matching algorithm. Experiments show that techniques proposed in this paper are efficient on either recognition speed or recognition performance.
The Recognition of Finger-Spelling for Chinese Sign Language BIBAFull-Text 96-100
  Jiangqin Wu; Wen Gao
In this paper 3-layer feedforward network is introduced to recognize Chinese manual alphabet, and Single Parameter Dynamic Search Algorithm (SPDS) is used to learn net parameters. In addition, a recognition algorithm for recognizing manual alphabets based on multifeatures and multi-classifiers is proposed to promote the recognition performance of finger-spelling. From experiment result, it is shown that Chinese finger-spelling recognition based on multi-features and multiclassifiers outperforms its recognition based on single-classifier.
Overview of Capture Techniques for Studying Sign Language Phonetics BIBAFull-Text 101-104
  Martha E. Tyrone
The increased availability of technology to measure human movement has presented exciting new possibilities for analysing natural sign language production. Up until now, most descriptions of sign movement have been produced in the context of theoretical phonology. While such descriptions are useful, they have the potential to mask subtle distinctions in articulation across signers or across sign languages. This paper seeks to describe the advantages and disadvantages of various technologies used in sign articulation research.

Gesture and Sign Language Synthesis

Models with Biological Relevance to Control Anthropomorphic Limbs: A Survey BIBAFull-Text 105-119
  Sylvie Gibet; Pierre-Francois Marteau; Frédéric Julliard
This paper is a review of different approaches and models underlying the voluntary control of human hand-arm movement. These models, dedicated to artificial movement simulation with application to motor control, robotics and computer animation, are categorized along at least three axis: Direct vs. Inverse models, Dynamics vs. Kinematics models, Global vs. Local models. We focus on sensory-motor models which have a biologically relevant control scheme for hand-arm reaching movements. Different methods are proposed with various points of view, related to kinematics, dynamics, theory of control, optimization or learning theory.
Lifelike Gesture Synthesis and Timing for Conversational Agents BIBAFull-Text 120-133
  Ipke Wachsmuth; Stefan Kopp
Synchronization of synthetic gestures with speech output is one of the goals for embodied conversational agents which have become a new paradigm for the study of gesture and for human-computer interface. In this context, this contribution presents an operational model that enables lifelike gesture animations of an articulated figure to be rendered in real-time from representations of spatiotemporal gesture knowledge. Based on various findings on the production of human gesture, the model provides means for motion representation, planning, and control to drive the kinematic skeleton of a figure which comprises 43 degrees of freedom in 29 joints for the main body and 20 DOF for each hand. The model is conceived to enable cross-modal synchrony with respect to the coordination of gestures with the signal generated by a text-to-speech system.
SignSynth: A Sign Language Synthesis Application Using Web3D and Perl BIBAFull-Text 134-145
  Angus B. Grieve-Smith
Sign synthesis (also known as text-to-sign) has recently seen a large increase in the number of projects under development. Many of these focus on translation from spoken languages, but other applications include dictionaries and language learning. I will discuss the architecture of typical sign synthesis applications and mention some of the applications and prototypes currently available. I will focus on SignSynth, a CGI-based articulatory sign synthesis prototype I am developing at the University of New Mexico. SignSynth takes as its input a sign language text in ASCII-Stokoe notation (chosen as a simple starting point) and converts it to an internal feature tree. This underlying linguistic representation is then converted into a three-dimensional animation sequence in Virtual Reality Modeling Language (VRML or Web3D), which is automatically rendered by a Web3D browser.
Synthetic Animation of Deaf Signing Gestures BIBAFull-Text 146-157
  Richard Kennaway
We describe a method for automatically synthesizing deaf signing animations from a high-level description of signs in terms of the HamNoSys transcription system. Lifelike movement is achieved by combining a simple control model of hand movement with inverse kinematic calculations for placement of the arms. The realism can be further enhanced by mixing the synthesized animation with motion capture data for the spine and neck, to add natural "ambient motion".
From a Typology of Gestures to a Procedure for Gesture Production BIBAFull-Text 158-168
  Isabella Poggi
A typology of gesture is presented based on four parameters: whether the gesture necessarily occurs with the verbal signal or not, whether it is represented in memory or created anew, how arbitrary or motivated it is, and what type of meaning it conveys. According to the second parameter, gestures are distinguished into codified gestures, ones represented in memory, and creative gestures, ones created on the spot by applying a set of generative rules. On the basis of this typology, a procedure is presented to generate the different types of gestures in a Multimodal Embodied Agent.
A Signing Avatar on the WWW BIBAKFull-Text 169-172
  Margriet Verlinden; Corrie Tijsseling; Han Frowein
The work described is part of the European project ViSiCAST, which develops an avatar presented in a web page, signing the current weather forecast. The daily content of the forecast is created by semi-automatic conversion of a written weather forecast into sign language. All signs needed for weather forecasts were separately recorded by way of motion capturing. The sign language forecast is displayed on a computer by and animating fluently blended sequences of pre-recorded signs. Through a browser plug-in, this application will work on the world-wide-web.
Keywords: animation; avatars; sign language; translation; world-wide-web

Nature and Notation of Sign Language

Iconicity in Sign Language: A Theoretical and Methodological Point of View BIBAFull-Text 173-180
  Marie-Anne Sallandre; Christian Cuxac
This research was carried out within the framework of the linguistic theory of iconicity and cognitive grammar for French Sign Language (FSL). In this paper we briefly explain some crucial elements used to analyse any Sign Language (SL), especially transfer operations, which appear to make up the core of a spatial grammar. Then we present examples taken from our video database of deaf native speakers engaged in narrative activities. Finally we discuss the difficulty as well as the importance of studying highly iconic occurrences in uninterrupted spontaneous FSL discourse.
Notation System and Statistical Analysis of NMS in JSL BIBAFull-Text 181-192
  Kazuyuki Kanda; Akira Ichikawa; Yuji Nagashima; Yushi Kato; Mina Terauchi; Daisuke Hara; Masanobu Sato
To describe non-manual signals (NMS's) of Japanese Sign Language (JSL), we have developed the notational system sIGNDEX. The notation describes both JSL words and NMS's. We specify characteristics of sIGNDEX in detail. We have also made a linguistic corpus that contains 100 JSL utterances. We show how sIGNDEX successfully describes not only manual signs but also NMS's that appear in the corpus. Using the results of the descriptions, we conducted statistical analyses of NMS's, which provide us with intriguing facts about frequencies and correlations of NMS's.
Head Movements and Negation in Greek Sign Language BIBAFull-Text 193-196
  Klimis Antzakas; Bencie Woll
This paper is part of a study examining how negation is marked in Greek Sign Language (GSL). Head movements which are reported to mark negation in other sign languages have been examined to see if they are also used in GSL along with negation sings and signs with incorporated negation. Of particular interest is the analysis of the backward tilt of the head which is distinct for marking negation in GSL.
Study on Semantic Representations of French Sign Language Sentences BIBAFull-Text 197-201
  Fanch Lejeune; Annelies Braffort; Jean-Pierre Desclés
This study addresses the problem of semantic representation in French Sign Language (FSL) sentences. We studied in particular static situations (spatial localisations) and situations denoted by motion verbs. The aim is to propose some models which could be implemented and integrated in computing systems dedicated to Sign Language (SL). According to the FSL functioning, we suggest a framework using representations based on cognitive grammars.
SignWriting-Based Sign Language Processing BIBAKFull-Text 202-205
  Antônio Carlos da Rocha Costa; Graçaliz Pereira Dimuro
This paper proposes an approach to the computer processing of deaf sign languages that uses SignWriting as the writing system for deaf sign languages, and SWML (SignWriting Markup Language) as its computer encoding. Every kind of language and document processing (storage and retrieval, analysis and generation, translation, spellchecking, search, animation, dictionary automation, etc.) can be applied to sign language texts and phrases when they are written in SignWriting and encoded in SWML. This opens the whole area of deaf sign languages to the methods and techniques of text-oriented computational linguistics.
Keywords: Sign language processing; SignWriting; SWML

Gestural Action & Interaction

Visual Attention towards Gestures in Face-to-Face Interaction vs. on Screen BIBAFull-Text 206-214
  Marianne Gullberg; Kenneth Holmqvist
Previous eye-tracking studies of whether recipients look at speakers. gestures have yielded conflicting results but have also differed in method. This study aims to isolate the effect of the medium of presentation on recipients' fixation behaviour towards speakers' gestures by comparing fixations of the same gestures either performed live in a face-to-face condition or presented on video ceteris paribus. The results show that although fewer gestures are fixated on video, fixation behaviour towards gestures is largely similar across conditions. In discussing the effect of the absence of a live interlocutor vs. the projection size as a source for this reduction, we touch on some underlying mechanisms governing gesture fixations. The results are pertinent to man-machine interface issues as well as to the ecological validity of video-based paradigms needed to study the relationship between visual and cognitive attention to gestures and Sign Language.
Labeling of Gestures in SmartKom -- The Coding System BIBAKFull-Text 215-227
  Silke Steininger; Bernd Lindemann; Thorsten Paetzold
The SmartKom project is concerned with the development of an intelligent computer-user interface that allows almost natural communication and gesture input. For the training of the gesture analyzer, data is collected in so called Wizard-of-Oz experiments. Recordings of subjects are made and labeled off-line with respect to the gestures that were used. This article is concerned with the coding of these gestures. The presented concept is the first step in the development of a practical gesture coding system specifically designed for the description of communicative and non-communicative gestures that typically show up in human-machine dialogues. After a short overview over the development process and the special requirements for the project, the labels will be described in detail. We will conclude with a short outline of open points and differences to the well-known taxonomy for gestures of Ekman [1].
Keywords: Human-Machine interaction, annotation of corpora, multimodal dialogue systems, gesture coding system
Evoking Gestures in SmartKom -- Design of the Graphical User Interface BIBAKFull-Text 228-240
  Nicole Beringer
The aim of the SmartKom project is to develop an intelligent, multimodal computer-user interface which can deal with various kinds of input and allows a quasi-natural communication between user and machine. This contribution is concerned with the design of the Graphical User Interface (GUI) of the Wizard-of-Oz recordings. Our special interest was to create a display different from known internet applications which could motivate the users to communicate with the machine also via gestures. The following sections give a short overview about the system itself followed by a detailed description of the used methods and the results in the first phase. To conclude, a short overview of the methods we implemented in the second phase is given.
Keywords: GUI-design, human-machine interaction, multimodal dialogue systems, gestural input
Quantitative Analysis of Non-obvious Performer Gestures BIBAFull-Text 241-253
  Marcelo M. Wanderley
This article presents preliminary quantitative results from movement analysis of several clarinet performers with respect to nonobvious or ancillary gestures produced while playing a piece. The comparison of various performances of a piece by the same clarinetist shows a high consistency of movement patterns. Different clarinetists show different overall patterns, although clear similarities may be found, suggesting the existence of various levels of information in the resulting movement. The relationship of these non-obvious gestures to material/physiological, structural and interpretative parameters is highlighted.
Interactional Structure Applied to the Identification and Generation of Visual Interactive Behavior: Robots that (Usually) Follow the Rules BIBAFull-Text 254-267
  Bernard Ogden; Kerstin Dautenhahn; Penny Stribling
This chapter outlines the application of interactional structures observed by various researchers to the development of artificial interactive agents. The original work from which these structures are drawn has been carried out by researchers in a range of fields including anthropology, sociology and social psychology: the 'local approach' described in this paper draws particularly on conversation analysis. We briefly discuss the application of heuristics derived from this work to the development of an interaction tracking system and, in more detail, discuss the use of this work in the development of an architecture for generating action for an interactive agent.
Are Praxical Gestures Semiotised in Service Encounters? BIBAFull-Text 268-271
  Isabelle Dumas
Empirically based on the study of praxical gestures in service encounters, this paper questions the possibility of semiotisation for such gestures. Praxical gestures are supposed to be extra-communicative and thus "unsemiotised". In reality, although they are not coded in a systematic way, their meaning is fundamentally connected to their context of realisation. An analysis of the contexts in which praxical gestures appear in service interactions demonstrates that they become "semiotised" when they are put into context and that they have a full part in the script of service interactions.

Applications Based on Gesture Control

Visually Mediated Interaction Using Learnt Gestures and Camera Control BIBAKFull-Text 272-284
  A. Jonathan Howell; Hilary Buxton
In this paper we introduce connectionist techniques for visually mediated interaction to be used, for example, in video-conferencing applications. First, we briefly present background work on recognition of identity, expression and pose using Radial Basis Function (RBF) networks. Flexible, example-based, learning methods allow a set of specialised networks to be trained. Second, we address the problem of gesture-based communication and attentional focus using Time-Delay versions of the networks. Colour/motion cues are used to direct face detection and the capture of 'attentional frames' surrounding the upper torso and head of the subjects, which focus the processing for visually mediated interaction. Third, we present methods for the gesture recognition and behaviour (user-camera) coordination in the system. In this work, we are taking an appearance-based approach and use the specific phases of communicative gestures to control the camera systems in an integrated system.
Keywords: Gesture Recognition; Computer Vision; Visually Mediated Interaction; Camera Control; Face Recognition; Time-Delay Neural Networks
Gestural Control of Sound Synthesis and Processing Algorithms BIBAFull-Text 285-295
  Daniel Arfib; Loïc Kessous
Computer programs such as MUSIC V or CSOUND lead to a huge number of sound examples, either in the synthesis or in the processing domain. The translation of such algorithms to real-time programs such as MAX-MSP allows these digitally created sounds to be used effectively in performance. This includes interpretation, expressivity, or even improvisation and creativity. This particular bias of our project (from sound to gesture) brings about new questions such as the choice of strategies for gesture control and feedback, as well as the mapping of peripherals data to synthesis and processing data. The learning process is required for these new controls and the issue of virtuosity versus simplicity is an everyday challenge.
Juggling Gestures Analysis for Music Control BIBAFull-Text 296-306
  Aymeric Willier; Catherine Marque
The aim of this work is to provide jugglers with gestural control of music. This is based on the willing to control music by recycling mastered gestures generated by another art. Therefore we propose the use of a gestural acquisition system based on the processing of the electromyographic signal. The recordings are done during a three-ball cascade, of electromyogram from chosen muscles, which play a specific role in the juggling gesture. Processing of those signals is proposed in order to control musical events by means of parameters related to juggling gesture.
Hand Postures for Sonification Control BIBAFull-Text 307-316
  Thomas Hermann; Claudia Nölker; Helge Ritter
Sonification is a rather new technique in human-computer interaction which addresses auditory perception. In contrast to speech interfaces, sonification uses non-verbal sounds to present information. The most common sonification technique is parameter mapping where for each data point a sonic event is generated whose acoustic attributes are determined from data values by a mapping function. For acoustic data exploration, this mapping must be adjusted or manipulated by the user. We propose the use of hand postures as a particularly natural and intuitive means of parameter manipulation for this data exploration task. As a demonstration prototype we developed a hand posture recognition system for gestural controlling of sound. The presented implementation applies artificial neural networks for the identification of continuous hand postures from camera images and uses a real-time sound synthesis engine. In this paper, we present our system and first applications of the gestural control of sounds. Techniques to apply gestures to control sonification are proposed and sound examples are given.
Comparison of Feedforward (TDRBF) and Generative (TDRGBN) Network for Gesture Based Control BIBAFull-Text 317-321
  Helen Vassilakis; A. Jonathan Howell; Hilary Buxton
In Visually Mediated Interaction (VMI) there is a range of tasks that need to be supported (face and gesture recognition, camera controlled by gestures, visual interaction etc). These tasks vary in complexity. Generative and self-organising models may offer strong advantages over feedforward ones in cases where a higher degree of generalization is needed. They have the ability to model the density function that generates the data, and this gives the potential of understanding. the gesture independent from the individual differences on the performance of a gesture. This paper presents a comparison between a feedforward network (RBFN) and a generative one (RGBN) both extended in a time-delay version.