HCI Bibliography Home | HCI Conferences | GW Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
GW Tables of Contents: 96979901030507091113

GW 2003: Gesture Workshop

Fullname:GW 2003: Gesture-Based Communication in Human-Computer Interaction: 5th International Gesture Workshop Selected Revised Papers
Editors:Antonio Camurri; Gualtiero Volpe
Location:Genova, Italy
Dates:2003-Apr-15 to 2003-Apr-17
Publisher:Springer Berlin Heidelberg 2004
Series:Lecture Notes in Computer Science 2915
Standard No:DOI: 10.1007/b95740; hcibib: GW03; ISBN: 978-3-540-21072-6 (print), 978-3-540-24598-8 (online)
Links:Online Proceedings | Conference Website (defunct)
  1. Foundational Issues
  2. Gesture Tracking
  3. Gesture Recognition
  4. Gesture Notation and Synthesis
  5. Multimodal Gestural Interfaces
  6. Gesture in Multimedia and Performing Arts

Foundational Issues

Gesture Analysis: Invariant Laws in Movement BIBAFull-Text 1-9
  Sylvie Gibet; Jean-François Kamp; Franck Poirier
This paper presents gesture analysis under the scope of motor control theory. Following the motor program view, some studies have revealed a number of invariant features that characterize movement trajectories in human hand-arm gestures. These features express general spatio-temporal laws underlying coordination and motor control processes. Some typical invariants are described and illustrated for planar pointing and tracing gestures. We finally discuss how these invariant laws can be used for motion edition and generation.
The Features People Use to Recognize Human Movement Style BIBAFull-Text 10-19
  Frank E. Pollick
Observation of human movement informs a variety of person properties. However, it is unclear how this understanding of person properties is derived from the complex visual stimulus of human movement. I address this topic by first reviewing the literature on the visual perception of human movement and then discussing work that has explored the features used to discriminate between different styles of movement. This discussion includes work on quantifying human performance at style recognition, exaggeration of human movement and finally experimental derivation of a feature space to represent human emotion.
Multimodal Analysis of Expressive Gesture in Music and Dance Performances BIBAFull-Text 20-39
  Antonio Camurri; Barbara Mazzarino; Matteo Ricchetti; Renee Timmers; Gualtiero Volpe
This paper presents ongoing research on the modelling of expressive gesture in multimodal interaction and on the development of multimodal interactive systems explicitly taking into account the role of non-verbal expressive gesture in the communication process. In this perspective, a particular focus is on dance and music as first-class conveyors of expressive and emotional content. Research outputs include (i) computational models of expressive gesture, (ii) validation by means of continuous ratings on spectators exposed to real artistic stimuli, and (iii) novel hardware and software components for the EyesWeb open platform (www.eyesweb.org), such as the recently developed Expressive Gesture Processing Library. The paper starts with a definition of expressive gesture. A unifying framework for the analysis of expressive gesture is then proposed. Finally, two experiments on expressive gesture in dance and music are discussed. This research work has been supported by the EU IST project MEGA (Multisensory Expressive Gesture Applications, www.megaproject.org) and the EU MOSART TMR Network.
Correlation of Gestural Musical Audio Cues and Perceived Expressive Qualities BIBAFull-Text 40-54
  Marc Leman; Valery Vermeulen; Liesbeth De Voogdt; Johannes Taelman; Dirk Moelants; Micheline Lesaffre
An empirical study on the perceived semantic quality of musical content and its relationship with perceived structural audio features is presented. In a first study, subjects had to judge a variety of musical excerpts using adjectives describing different emotive/affective/expressive qualities of music. Factor analysis revealed three dimensions, related to valence, activity and interest. In a second study, semantic judgements were then compared with automated and manual structural descriptions of the musical audio signal. Applications of the results in domains of audio-mining, interactive multimedia and brain research are straightforward.
Gestural Imagery in the Service of Musical Imagery BIBAFull-Text 55-62
  Rolf Inge Godøy
There seem to be strong links between gestural imagery and musical imagery, and it is suggested that gestural imagery can be instrumental in triggering and sustaining mental images of musical sound. Gestural images are seen as integral to most experiences of music, and several practical and theoretical musical disciplines could profit from focusing on these gestural images. Research in support of this is reviewed, and some topics for future research are presented.
The Interaction of Iconic Gesture and Speech in Talk BIBAFull-Text 63-69
  Judith Holler; Geoffrey Beattie
One traditional view of how speech and gesture interact in talk is that gestures represent information, which is largely redundant with respect to the information contained in the speech that they accompany. Other researchers, however, have primarily stressed a complementary interaction of gesture and speech, and yet others have emphasised that gesture and speech interact in a very flexible manner. These discrepant views have crucially different implications with regard to the communicative role of gestures. The study reported here offers a systematic and detailed investigation of this issue to gain further insights into how the two modalities interact in the representation of meaning. The findings support the notion of gesture and speech interacting in a highly flexible manner.
Conceptual and Lexical Factors in the Production of Speech and Conversational Gestures: Neuropsychological Evidence BIBAFull-Text 70-76
  Carla Cristilli; Sergio Carlomagno
Parallel breakdown of gestures and speech following neurological damage supports the notion that "... gestures and speech ... share a computational stage..." (McNeill, 1992) and emphasizes the role of neuropsychological approach for studying gestures. In this study patterns of conversational gestures were analysed in subjects with Alzheimer's dementia, whose communicative performance indicated either primary lexical deficit or deficit in pragmatic/conceptual elaboration of discourse. Two gesture patterns were identified. The demented subjects with lexical deficit mostly produced iconic gestures which accompanied paraphasic expressions related to discriminating information. Conversely, the pragmatic/conceptual deficit corresponded to reduced production of iconic and increased production of deictic gestures. These findings indicate that the cognitive impairment underlying communicative deficit constrains the production of conversational gestures in brain-damaged patients. They also support the hypothesis that the early conceptual processes of speech production system play a central role in producing gestures.
The Communicative System of Touch. Alphabet, Lexicon, and Norms of Use BIBAFull-Text 77-89
  Isabella Poggi; Filomena Cirella; Antonietta Zollo; Alessia Agostini
The paper argues that the communicative system of touch includes a lexicon, an alphabet and some norms of use, and presents a research aimed at making them explicit. 104 items of touch were analysed in terms of their formational parameters, various semantic criteria, and norms of use, and some hypotheses on the structure of the communicative system of touch in Italy were tested in a pilot study. Then the communicative use of touch in 3 mother-child couples was analysed, showing how the criteria of analysis proposed allow to distinguish different styles of mother-child interaction.
Some Issues in Sign Language Processing BIBAFull-Text 90-100
  Bruno Bossard; Annelies Braffort; Michèle Jardino
The aim of this paper is to specify some of the problems raised by the design of a gesture recognition system dedicated to Sign Language, and to propose suited solutions. The three topics considered here concern the simultaneity of information conveyed by manual signs, the possible temporal or spatial synchronicity between the two hands, and the different classes of signs that may be encountered in a Sign Language sentence.
Multimodality and Gestures in the Teacher?s Communication BIBAFull-Text 101-111
  Giorgio Merola; Isabella Poggi
The paper presents a research on the multimodal communication of teachers in the classroom. The "musical score", a procedure for the analysis of multimodal communication, is used to analyse aspects of the signals produced and of the meanings conveyed by teachers while interacting with their pupils, by focusing not only on affective and interactional aspects, but also on the cognitive effects of nonverbal communication. Finally, the paper shows how, based on this procedure, it is possible to analyse chunks of teachers' communication, and to distinguish different multimodal communicative styles among different teachers.
Deixis in Multimodal Human Computer Interaction: An Interdisciplinary Approach BIBAFull-Text 112-123
  Alfred Kranstedt; Peter Kühnlein; Ipke Wachsmuth
Focusing on deixis in human computer interaction this paper presents interdisciplinary work on the use of co-verbal gesture. Empirical investigations, theoretical modeling, and computational simulations with an anthropomorphic agent are based upon comparable settings and common representations. Findings pertain to the coordination of verbal and gestural constituents in deictic utterances. We discovered high variability in the temporal synchronization of such constituents in task-oriented dialogue, and a theoretical treatment thereof is presented. With respect to simulation we exemplarily show how the influence of situational characteristics on the choice of verbal and nonverbal constituents can be accounted for. In particular, this depends on spatio-temporal relations between speaker and the objects they refer to in dialogue.
The Analysis of Gesture: Establishing a Set of Parameters BIBAFull-Text 124-131
  Nicla Rossini
Studying gesture has always implied the application of different methods concerning the selection of suitable parameters for gesture. Several solutions to this problem have been proposed by scholars over the years. These contributions will be briefly reviewed and discussed with the aim of retrieving a common method for the analysis and definition of gesture.

Gesture Tracking

Holistic Body Tracking for Gestural Interfaces BIBAFull-Text 132-139
  Christian Lange; Thomas Hermann; Helge Ritter
In this paper we present an approach to track a moving body in a sequence of camera images by model adaptation. The parameters of a stick figure model are varied by using a stochastic search algorithm. The similarity of rendered model images and camera images of the user are used as quality measure. A refinement of the algorithm is introduced by using combined stereo views and relevance maps to infer responsible joint angles from the difference of successive input images. Finally, the successful application of various versions of the algorithm on sequences of synthetic images is demonstrated.
Recovering Articulated Motion with a Hierarchical Factorization Method BIBAFull-Text 140-151
  Hanning Zhou; Thomas S. Huang
Recovering articulated human motion is an important task in many applications including surveillance and human-computer interaction. In this paper, a hierarchical factorization method is proposed for recovering articulated human motion (such as hand gesture) from a sequence of images captured under weak perspective projection. It is robust against missing feature points due to self-occlusion, and various observation noises. The accuracy of our algorithm is verified by experiments on synthetic data.
An Experimental Comparison of Trajectory-Based and History-Based Representation for Gesture Recognition BIBAFull-Text 152-163
  Kenny Morrison; Stephen J. McKenna
Automatic visual recognition of gestures can be performed using either a trajectory-based or a history-based representation. The former characterises the gesture using 2D trajectories of the hands. The latter summarises image sequences using values computed from individual pixel histories. A direct experimental comparison of these two approaches is presented using skin colour as a common visual cue and recognition methods based on hidden Markov models, moment features and normalised template matching. Skin history images are proposed as a useful history-based representation. Results are reported on a database of sixty gestures and the relative advantages and disadvantages of the different methods are highlighted.
Tracking of Real Time Acrobatic Movements by Image Processing BIBAFull-Text 164-171
  Ryan Cassel; Christophe Collet
This paper presents the design and evaluation approach of a video camera-based system for tracking and evaluating acrobatic movements. The context of this study is human gesture recognition in trampoline competition performance. Our work builds a model of trampoline movement and conducts efficient real time tracking of the trampolinist in order to recognize and evaluate a competition routine. We describe the prospective architecture of our system, which satisfies constraints of real time image processing. The global project includes three consecutive phases: first, body extraction and tracking in the image sequence, second, body part localization, and last, gesture recognition and quantification according to our model of trampoline movement characterization. This paper describes the first phase, which combines image processing techniques to perform fast and efficient extraction and tracking of the trampolinist's body. An evaluation protocol is presented, as well as some results for this first stage.
A Dynamic Model for Real-Time Tracking of Hands in Bimanual Movements BIBAFull-Text 172-179
  Atid Shamaie; Alistair Sutherland
The problem of hand tracking in the presence of occlusion is addressed. In bimanual movements the hands tend to be synchronised effortlessly. Different aspects of this synchronisation are the basis of our research to track the hands. The spatial synchronisation in bimanual movements is modelled by the position and the temporal synchronisation by the velocity and acceleration of each hand. Based on a dynamic model, we introduce algorithms for occlusion detection and hand tracking.

Gesture Recognition

Robust Video-Based Recognition of Dynamic Head Gestures in Various Domains -- Comparing a Rule-Based and a Stochastic Approach BIBAFull-Text 180-197
  Gregor McGlaun; Frank Althoff; Manfred K. Lang; Gerhard Rigoll
This work describes two video-based approaches for detecting and classifying dynamic head-gestures. We compare a simple, fast, and efficient rule-based algorithm with a powerful, robust, and flexible stochastic implementation. In both realizations, the head is localized via a combination of color- and shape-based segmentation. For a continuous feature extraction, the rule-based approach uses a template-matching of the nose bridge. In addition, the stochastic algorithm applies features derived from the optical flow, and classifies them by a set of discrete Hidden Markov Models. The rule-based implementation evaluates the key-feature in a finite state machine. We extensively tested the systems in two different application domains (VR desktop scenario vs. automotive environment). Six different gestures can be classified with an overall recognition rate of 93.7% (rule-based) and 97.3% (stochastic) in the VR (92.6% and 95.5% in the automotive environment, respectively). Both approaches work independently from the image background. Concerning the stochastic concept, further gesture types can easily be implemented.
Remote Vision-Based Multi-type Gesture Interaction BIBAFull-Text 198-209
  Christian Brockmann; Heinrich Müller
Gestures offer a possibility of interaction with technical systems if other communication channels are excluded, for instance because of distance, noise, or usage for other purposes. However, gestures as the only mode of interaction lead to the problem of deciding whether a posture or motion of the user is indeed a gesture, in particular if commands are issued just from time to time. In this contribution, we overcome this problem by combining different gesture types. The types we use are static hand gestures based on hand postures, dynamic hand gestures based on hand motions, and pointing gestures based on hand or arm location. The gestures are acquired by computer-vision. In the case of remote interaction a difficulty is that some gesture types require a global view of the interaction space while others need local observation, like e.g. hand postures. We present a solution in which a camera with computer-controlled pan, tilt, and zoom is controlled by information captured by this camera, as well as by information captured by static cameras which survey the complete interaction space.
Model-Based Motion Filtering for Improving Arm Gesture Recognition Performance BIBAFull-Text 210-230
  Greg S. Schmidt; Donald H. House
We describe a model-based motion filtering process that, when applied to human arm motion data, leads to improved arm gesture recognition. Arm movements can be viewed as responses to muscle actuations that are guided by responses of the nervous system. Our motion filtering method makes strides towards capturing this structure by integrating a dynamic model with a control system for the arm. We hypothesize that embedding human performance knowledge into the processing of arm movements will lead to better recognition performance. We present details for the design of our filter, our evaluation of the filter from both expert-user and multiple-user pilot studies. Our results show that the filter has a positive impact on recognition performance for arm gestures.
GesRec3D: A Real-Time Coded Gesture-to-Speech System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures BIBAFull-Text 231-238
  Michael P. Craven; K. Mervyn Curtis
A complete microcomputer system is described, GesRec3D, which facilitates the data acquisition, segmentation, learning, and recognition of 3-Dimensional arm gestures, with application as a Augmentative and Alternative Communication (AAC) aid for people with motor and speech disability. The gesture data is acquired from a Polhemus electro-magnetic tracker system, with sensors attached to the finger, wrist and elbow of one arm. Coded gestures are linked to user-defined text, to be spoken by a text-to-speech engine that is integrated into the system. A segmentation method and an algorithm for classification are presented that includes acceptance/rejection thresholds based on intra-class and inter-class dissimilarity measures. Results of recognition hits, confusion misses and rejection misses are given for two experiments, involving predefined and arbitrary 3D gestures.
Classification of Gesture with Layered Meanings BIBAFull-Text 239-246
  Sylvie C. W. Ong; Surendra Ranganath
Automatic sign language recognition research has largely not addressed an integral aspect of sign language communication -- grammatical inflections which are conveyed through systematic temporal and spatial movement modifications. We propose to use low-level static and dynamic classifiers, together with Bayesian Networks, to classify gestures that include these inflections layered on top of the basic meaning. With a simulated vocabulary of 6 basic signs and 4 different layered meanings, test data for four test subjects was classified with 84.6% accuracy.
Handshapes and Movements: Multiple-Channel American Sign Language Recognition BIBAFull-Text 247-258
  Christian Vogler; Dimitris N. Metaxas
In this paper we present a framework for recognizing American Sign Language (ASL). The main challenges in developing scalable recognition systems are to devise the basic building blocks from which to build up the signs, and to handle simultaneous events, such as signs where both the hand moves and the handshape changes. The latter challenge is particularly thorny, because a naive approach to handling them can quickly result in a combinatorial explosion.
   We loosely follow the Movement-Hold model to devise a breakdown of the signs into their constituent phonemes, which provide the fundamental building blocks. We also show how to integrate the handshape into this breakdown, and discuss what handshape representation works best. To handle simultaneous events, we split up the signs into a number of channels that are independent from one another. We validate our framework in experiments with a 22-sign vocabulary and up to three channels.
Hand Postures Recognition in Large -- Display VR Environments BIBAFull-Text 259-268
  Jean-Baptiste de la Rivière; Pascal Guitton
Large-display environments like Reality Center or Powerwall are recent equipments used in the Virtual Reality (VR) field. In contrast to HMDs or similar displays, they allow several unadorned users to visualize a virtual environment. Bringing interaction possibilities to those displays must not suppress the users' liberty. Thus, devices based on trackers like DataGlove or wand should be forgotten as they oblige users to don such gear. On the contrary, video cameras seem very promising in those environments: their use could range from looking for a laser dot on the display to recovering each user's full body posture. The goal we are considering is to film one's hand in front of a large display in order to recover its posture, which will then be interpreted according to a predefined interaction technique. While most of such systems rely on appearance-based approaches, we have chosen to investigate how far a model-based one could be efficient. This paper presents the first steps of this work, namely the real-time results obtained by using hand silhouette feature and some further conclusions related to working in a large-display VR environment.
Developing Task-Specific RBF Hand Gesture Recognition BIBAFull-Text 269-276
  A. Jonathan Howell; Kingsley Sage; Hilary Buxton
In this paper we develop hand gesture learning and recognition techniques to be used in advanced vision applications, such as the ActIPret system for understanding the activities of expert operators for education and training. Radial Basis Function (RBF) networks have been developed for reactive vision tasks and work well, exhibiting fast learning and classification. Specific extensions of our existing work to allow more general 3-D activity analysis reported here are: 1) action-based representation in a hand frame-of-reference by pre-processing of the trajectory data; 2) adaptation of the time-delay RBF network scheme to use this relative velocity information from the 3-D trajectory information in gesture recognition; and 3) development of multi-task support in the classifications by exploiting prototype similarities extracted from different combinations of direction (target tower) and height (target pod) for the hand trajectory.
Developing Context Sensitive HMM Gesture Recognition BIBAFull-Text 277-287
  Kingsley Sage; A. Jonathan Howell; Hilary Buxton
We are interested in methods for building cognitive vision systems to understand activities of expert operators for our ActIPret System. Our approach to the gesture recognition required here is to learn the generic models and develop methods for contextual bias of the visual interpretation in the online system. The paper first introduces issues in the development of such flexible and robust gesture learning and recognition, with a brief discussion of related research. Second, the computational model for the Hidden Markov Model (HMM) is described and results with varying amounts of noise in the training and testing phases are given. Third, extensions of this work to allow both top-down bias in the contextual processing and bottom-up augmentation by moment to moment observation of the hand trajectory are described.
Database Indexing Methods for 3D Hand Pose Estimation BIBAFull-Text 288-299
  Vassilis Athitsos; Stan Sclaroff
Estimation of 3D hand pose is useful in many gesture recognition applications, ranging from human-computer interaction to recognition of sign languages. In this paper, 3D hand pose estimation is treated as a database indexing problem. Given an input image of a hand, the most similar images in a large database of hand images are retrieved. The hand pose parameters of the retrieved images are used as estimates for the hand pose in the input image. Lipschitz embeddings are used to map edge images of hands into a Euclidean space. Similarity queries are initially performed in this Euclidean space, to quickly select a small set of candidate matches. These candidate matches are finally ranked using the more computationally expensive chamfer distance. Using Lipschitz embeddings to select likely candidate matches greatly reduces retrieval time over applying the chamfer distance to the entire database, without significant losses in accuracy.

Gesture Notation and Synthesis

Experience with and Requirements for a Gesture Description Language for Synthetic Animation BIBAFull-Text 300-311
  Richard Kennaway
We have created software for automatic synthesis of signing animations from the HamNoSys transcription notation. In this process we have encountered certain shortcomings of the notation. We describe these, and consider how to develop a notation more suited to computer animation.
The Development of a Computational Notation for Synthesis of Sign and Gesture BIBAFull-Text 312-323
  Kirsty Crombie Smith; William H. Edmondson
This paper presents a review of four current notation systems used in sign language research. Their properties are discussed with a view to using such systems for synthesising sign with a computer. The evaluation leads to a proposal for a new notational approach, which distinguishes three layers of description in the production of sign. Experimental work is summarised which constrains the synthesis of signs to match the requirements of visual perception. The new notation is described in detail with illustrative example at each of the three layers. The notation is being used in experimental work on sign synthesis and it is envisaged that this work would extend to include synthesis of gesture.
Gesture in Style BIBAFull-Text 324-337
  Han Noot; Zsófia Ruttkay
GESTYLE is a new markup language to annotate text which has to be spoken by Embodied Conversational Agents (ECA), to prescribe the usage of hand-, head- and facial gestures accompanying the speech in order to augment the communication. The annotation ranges from low level (e.g. perform a specific gesture) to high level (e.g. take turn in a conversation) instructions. On top of that, and central to GESTYLE is the notion of style which determines the gesture repertoire and the gesturing manner of the ECA. GESTYLE contains constructs to define and dynamically modify style. The low-level tags, prescribing specific gestures to be performed are generated automatically, based on the style definition and the high-level tags. By using GESTYLE, different aspects of gesturing of an ECA can be defined and tailored to the needs of different application situations or user groups.
Gestural Mind Markers in ECAs BIBAFull-Text 338-349
  Isabella Poggi; Catherine Pelachaud; Emanuela Magno Caldognetto
We aim at creating Embodied Conversational Agents (ECAs) able to communicate multimodally with a user or with other ECAs. In this paper we focus on the Gestural Mind Markers, that is, those gestures that convey information on the Speaker's Mind; we present the ANVIL-SCORE, a tool to analyze and classify multimodal data that is a semantically augmented version of Kipp's ANVIL [1]. Thanks to an analysis through the ANVIL-SCORE of a set Gestural Mind Markers taken from a corpus of video-taped data, we classify gestures both on the level of the signal and of the meaning; finally we show how they can be implemented in an ECA System, and how they can be integrated with facial and bodily communication.
Audio Based Real-Time Speech Animation of Embodied Conversational Agents BIBAFull-Text 350-360
  Mario Malcangi; Raffaele de Tintis
A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.

Multimodal Gestural Interfaces

Neo Euclide: A Low-Cost System for Performance Animation and Puppetry BIBAFull-Text 361-368
  Samuele Vacchi; Giovanni Civati; Daniele Marini; Alessandro Rizzi
This paper presents a low-cost flexible Performance Animation system for gesture generation and mapping, easy to use and general purpose, focusing on the most ancient and classical idea of animation: Puppetry. A system designed for generic puppetry, and not for a special purpose that can be easily adapted to different scenarios and budgets. For this reason we chose consumer and mainstream technologies, common graphic libraries, PC graphic cards and cheap motion-capture equipment allowing the user to insert movies and sounds integrated with a three-dimensional scene.
Gesture Desk -- An Integrated Multi-modal Gestural Workplace for Sonification BIBAFull-Text 369-379
  Thomas Hermann; Thomas Henning; Helge Ritter
This paper presents the gesture desk, a new platform for a human-computer interface at a regular computer workplace. It extends classical input devices like keyboard and mouse by arm and hand gestures, without the need to use any inconvenient accessories like data gloves or markers. A central element is a "gesture box" containing two infrared cameras and a color camera which is positioned under a glass desk. Arm and hand motions are tracked in three dimensions. A synchronizer board has been developed to provide an active glare-free IR-illumination for robust body and hand tracking. As a first application, we demonstrate interactive real-time browsing and querying of auditory self-organizing maps (AuSOMs). An AuSOM is a combined visual and auditory presentation of high-dimensional data sets. Moving the hand above the desk surface allows to select neurons on the map and to manipulate how they contribute to data sonification. Each neuron is associated with a prototype vector in high-dimensional space, so that a set of 2D-topologically ordered feature maps is queried simultaneously. The level of detail is selected by hand altitude over the table surface, allowing to emphasize or deemphasize neurons on the map.
Gesture Frame -- A Screen Navigation System for Interactive Multimedia Kiosks BIBAFull-Text 380-385
  Yinlin Li; Christoph Groenegress; Wolfgang Strauss; Monika Fleischmann
In this article we present the gesture frame, a system based on quasi-electrostatic field sensing. The system captures the arm gestures of the user and translates pointing gestures into screen coordinates and selection command, which become a gesture-based, hands-free interface for browsing and searching multimedia archives of an information kiosk in public spaces. The interface is intuitive and body-centered, and the playful interaction allows visitors to experience a new and magical means to communicate with computers. The system can be placed on or behind any surface and can be used as a user interface in conjunction with any display device.
Intuitive Manipulation of a Haptic Monitor for the Gestural Human-Computer Interaction BIBAFull-Text 386-398
  Hidefumi Moritani; Yuki Kawai; Hideyuki Sawada
A display device called intuition-driven monitor will be introduced. The device has been developed by combining a LCD, a flexible arm suspending the display, angle sensors and a CCD camera, and is directly manipulated by a user with his direct operations by hands, eye-gaze and facial expressions. A user is able to explore virtual space and manipulate virtual environment with gestures and the sense of haptics.
Gesturing with Tangible Interfaces for Mixed Reality BIBAFull-Text 399-408
  José Miguel Salles Dias; Pedro Santos; Rafael Bastos
This work reports a Tangible Mixed Reality system that can be applied to interactive visualisation scenarios in diverse human-assisted operations, such as in training, troubleshooting and maintenance tasks of real equipments, or in product design scenarios in various sectors: Architecture, Automotive or Aerospace. With the system, the user is able to intuitively interact with an augmented version of real equipment, in normal working settings, where she can observe 3D virtual objects (in the VRML 97 format) registered to the real ones. Totally sensor-less Tangible Interfaces (Paddle and Magic Ring), are used to aid interaction and visualisation tasks in the mixed environment. By means of Tangible Interfaces gesturing recognition, it is possible to activate menus, browse and choose menu items or pick, move, rotate and scale 3D virtual objects, within the user's real working area or transport the user from Augmented Reality to a fully Virtual Environment and back.
A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI BIBAFull-Text 409-420
  Michael Nielsen; Moritz Störring; Thomas B. Moeslund; Erik Granum
Many disciplines of multimedia and communication go towards ubiquitous computing and hand-free- and no-touch interaction with computers. Application domains in this direction involve virtual reality, augmented reality, wearable computing, and smart spaces, where gesturing is a possible method of interaction. This paper presents some important issues in choosing the set of gestures for the interface from a user-centred view such as the learning rate, ergonomics, and intuition. A procedure is proposed which includes those issues in the selection of gestures, and to test the resulting set of gestures. The procedure is tested and demonstrated on an example application with a small test group. The procedure is concluded to be useful for finding a basis for the choice of gestures. The importance of tailoring the gesture vocabulary for the user group was also shown.
Evaluating Multimodal Interaction Patterns in Various Application Scenarios BIBAFull-Text 421-435
  Frank Althoff; Gregor McGlaun; Manfred K. Lang; Gerhard Rigoll
In this work, we present the results of a comparative user study evaluating multimodal user interactions with regard to two different operation scenarios: a desktop Virtual-Reality application (DVA) and an automotive infotainment application (AIA). Besides classical tactile input devices, like touch-screen and key-console, the systems can be controlled by natural speech as well as by hand and head gestures. Concerning both domains, we have found out that experts tend to use tactile devices, but normal users and beginners prefer combinations of more advanced input possibilities. Complementary actions most often occurred in DVA, whereas in AIA, the use of redundant input clearly dominates the set of multimodal interactions. Concerning time relations, the individual interaction length of speech and gesture-based input was below 1.5 seconds on the average and staggered intermodal overlapping occurred most often. Additionally, we could find out that the test users try to stay within a chosen interaction form. With regard to the overall subjective user experiences, the interfaces were rated very positively.
Imitation Games with an Artificial Agent: From Mimicking to Understanding Shape-Related Iconic Gestures BIBAFull-Text 436-447
  Stefan Kopp; Timo Sowa; Ipke Wachsmuth
We describe an anthropomorphic agent that is engaged in an imitation game with the human user. In imitating natural gestures demonstrated by the user, the agent brings together gesture recognition and synthesis on two levels of representation. On the mimicking level, the essential form features of the meaning-bearing gesture phase (stroke) are extracted and reproduced by the agent. Meaning-based imitation requires extracting the semantic content of such gestures and re-expressing it with possibly alternative gestural forms. Based on a compositional semantics for shape-related iconic gestures, we present first steps towards this higher-level gesture imitation in a restricted domain.
Gesture Components for Natural Interaction with In-Car Devices BIBAFull-Text 448-459
  Martin Zobl; Ralf Nieschulz; Michael Geiger; Manfred K. Lang; Gerhard Rigoll
The integration of more and more functionality into the human machine interface (HMI) of vehicles increases the complexity of device handling. Thus optimal use of different human sensory channels is an approach to simplify the interaction with in-car devices. This way the user convenience increases as much as distraction decreases. In this paper the gesture part of a multimodal system is described. It consists of a gesture optimized user interface, a real time gesture recognition system and an adaptive help system for gesture input. The components were developed in course of extensive usability studies. The so built HMI allows intuitive, effective and assisted operation of infotainment in-car devices, like radio, CD, telephone and navigation system, with handposes and dynamic hand gestures.

Gesture in Multimedia and Performing Arts

Analysis of Expressive Gesture: The EyesWeb Expressive Gesture Processing Library BIBAFull-Text 460-467
  Antonio Camurri; Barbara Mazzarino; Gualtiero Volpe
This paper presents some results of a research work concerning algorithms and computational models for real-time analysis of expressive gesture in full-body human movement. As a main concrete result of our research work, we present a collection of algorithms and related software modules for the EyesWeb open architecture (freely available from www.eyesweb.org). These software modules, collected in the EyesWeb Expressive Gesture Processing Library, have been used in real scenarios and applications, mainly in the fields of performing arts, therapy and rehabilitation, museum interactive installations, and other immersive augmented reality and cooperative virtual environment applications. The work has been carried out at DIST -- InfoMus Lab in the framework of the EU IST Project MEGA (Multisensory Expressive Gesture Applications, www.megaproject.org).
Performance Gestures of Musicians: What Structural and Emotional Information Do They Convey? BIBAFull-Text 468-478
  Bradley W. Vines; Marcelo M. Wanderley; Carol Krumhansl; Regina L. Nuzzo; Daniel J. Levitin
This paper investigates how expressive gestures of a professional clarinetist contribute to the perception of structure and affect in musical performance. The thirty musically trained subjects saw, heard, or both saw and heard the performance. All subjects made the same judgments including a real-time judgment of phrasing, which targeted the experience of structure, and a real-time judgment of tension, which targeted emotional experience.
   In addition to standard statistical methods, techniques in the field of Functional Data Analysis were used to interpret the data. These new techniques model data drawn from continuous processes and explore the hidden structures of the data as they change over time.
   Three main findings add to our knowledge of gesture and movement in music: 1) The visual component carries much of the same structural information as the audio. 2) Gestures elongate the sense of phrasing during a pause in the sound and certain gestures cue the beginning of a new phrase. 3) The importance of visual information to the experience of tension changes with certain structural features in the sound. When loudness, pitch height, and note density are relatively low, the effect of removing the visual component is to decrease the experience of tension.
Expressiveness of Musician's Body Movements in Performances on Marimba BIBAFull-Text 479-486
  Sofia Dahl; Anders Friberg
To explore to what extent emotional intentions can be conveyed through musicians' movements, video recordings were made of a marimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful. 20 subjects were presented video clips, without sound, and asked to rate both the perceived emotional content as well as the movement qualities. The video clips were presented in different conditions, showing the player to different extent. The observers' ratings for the intended emotions confirmed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. Identification of the intended emotion was only slightly influenced by the viewing condition. The movement ratings indicated that there were cues that the observers used to distinguish between intentions, similar to cues found for audio signals in music performance.
Expressive Bowing on a Virtual String Instrument BIBAFull-Text 487-496
  Jean-Loup Florens
The physical model and its real-time computing with gesture feedback interfaces provide powerful means for making new playable and musically interesting instrumental synthesis process. Bowed strings models can be designed and built with these tools. Like the real bow instruments, they present a great sensitivity of the gesture dynamics. Beyond the musical interest of these synthesis process, the tuning possibilities of the gesture interface and the general context of the modular simulation system provide new means for evaluating and understanding some of the complex gesture interaction features that characterize the bowing action.
Recognition of Musical Gestures in Known Pieces and in Improvisations BIBAKFull-Text 497-508
  Damien Cirotteau; Giovanni De Poli; Luca Mion; Alvise Vidolin; Patrick Zanon
Understanding the content in musical gestures is an ambitious issue in scientific environment. Several studies demonstrated how different expressive intentions can be conveyed by a musical performance and correctly recognized by the listeners: several models for the synthesis can also be found in the literature. In this paper we draw an overview of the studies which have been done at the Center of Computational Sonology (CSC) during the last year on automatic recognition of musical gestures. These studies can be grouped in two main branches: analysis with the score knowledge and analysis without. A brief description of the implementations and validations is presented.
Keywords: Gestural communication; Gestural perception and production; Analysis; Segmentation and Synthesis of Gestures
Design and Use of Some New Digital Musical Instruments BIBAFull-Text 509-518
  Daniel Arfib; Jean-Michel Couturier; Loïc Kessous
This article presents some facts about the use of gesture in computer music, more specifically in home made instruments dedicated to performance on stage. We first give some theoretical and practical ideas to design an instrument, in the following areas: the sound, the gesture and the mapping. Then, we introduce three examples of digital instruments we have created, focusing on their design and their musical use.
Analysis of a Genuine Scratch Performance BIBAFull-Text 519-528
  Kjetil Falkenberg Hansen; Roberto Bresin
The art form of manipulating vinyl records done by disc jockeys (DJs) is called scratching, and has become very popular since its start in the seventies. Since then turntables are commonly used as expressive musical instruments in several musical genres. This phenomenon has had a serious impact on the instrument-making industry, as the sales of turntables and related equipment have boosted. Despite of this, the acoustics of scratching has been barely studied until now. In this paper, we illustrate the complexity of scratching by measuring the gestures of one DJ during a performance. The analysis of these measurements is important to consider in the design of a scratch model.
Conducting Audio Files via Computer Vision BIBAFull-Text 529-540
  Declan Murphy; Tue Haste Andersen; Kristoffer Jensen
This paper presents a system to control the playback of audio files by means of the standard classical conducting technique. Computer vision techniques are developed to track a conductor's baton, and the gesture is subsequently analysed. Audio parameters are extracted from the sound-file and are further processed for audio beat tracking. The sound-file playback speed is adjusted in order to bring the audio beat points into alignment with the gesture beat points. The complete system forms all parts necessary to simulate an orchestra reacting to a conductor's baton.
A Video System for Recognizing Gestures by Artificial Neural Networks for Expressive Musical Control BIBAFull-Text 541-548
  Paul Modler; Tony Myatt
In this paper we describe a system to recognize gestures to control musical processes. For that we applied a Time Delay Neuronal Network to match gestures processed as variation of luminance information in video streams. This resulted in recognition rates of about 90% for 3 different types of hand gestures and it is presented here as a prototype for a gestural recognition system that is tolerant to ambient conditions and environments. The neural network can be trained to recognize gestures difficult to be described by postures or sign language. This can be used to adapt to unique gestures of a performer or video sequences of arbitrary moving objects. We will discuss the outcome of extending the system to learn successfully a set of 17 hand gestures. The application was implemented in jMax to achieve real-time conditions and easy integration into a musical environment. We will describe the design and learning procedure of the using the Stuttgart Neuronal Network Simulator. The system aims to integrate into an environment that enables expressive control of musical parameters (KANSEI).
Ghost in the Cave -- An Interactive Collaborative Game Using Non-verbal Communication BIBAFull-Text 549-556
  Marie-Louise Rinman; Anders Friberg; Bendik Bendiksen; Damien Cirotteau; Sofia Dahl; Ivar Kjellmo; Barbara Mazzarino; Antonio Camurri
The interactive game environment, Ghost in the Cave, presented in this short paper, is a work still in progress. The game involves participants in an activity using non-verbal emotional expressions. Two teams use expressive gestures in either voice or body movements to compete. Each team has an avatar controlled either by singing into a microphone or by moving in front of a video camera. Participants/players control their avatars by using acoustical or motion cues. The avatar is navigated in a 3D distributed virtual environment using the Octagon server and player system. The voice input is processed using a musical cue analysis module yielding performance variables such as tempo, sound level and articulation as well as an emotional prediction. Similarly, movements captured from a video camera are analyzed in terms of different movement cues. The target group is young teenagers and the main purpose to encourage creative expressions through new forms of collaboration.