HCI Bibliography Home | HCI Conferences | HCII Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
HCII Tables of Contents: 97-299-199-20103-103-203-303-407-107-207-307-409-109-209-309-411-111-211-311-411-5

HCI International 2007: 12th International Conference on Human-Computer Interaction, Part III: HCI Intelligent Multimodal Interaction Environments

Fullname:HCI International 2007: 12th International Conference on Human-Computer Interaction, Part III: HCI Intelligent Multimodal Interaction Environments
Editors:Julie A. Jacko
Location:Beijing, China
Dates:2007-Jul-22 to 2007-Jul-27
Series:Lecture Notes in Computer Science 4552
Standard No:ISBN: 978-3-540-73108-5 (print), 978-3-540-73110-8 (online); hcibib: HCII07-3
Links:Online Proceedings | Publisher Book Page
  1. HCII 2007-07-22 Volume 3
    1. Part I: Multimodality and Conversational Dialogue
    2. Part II: Adaptive, Intelligent and Emotional User Interfaces
    3. Part III: Gesture and Eye Gaze Recognition
    4. Part IV: Interactive TV and Media

HCII 2007-07-22 Volume 3

Part I: Multimodality and Conversational Dialogue

Preferences and Patterns of Paralinguistic Voice Input to Interactive Media BIBAKFull-Text 3-12
  Sama'a Al Hashimi
This paper investigates the factors that affect users' preferences of non-speech sound input and determine their vocal and behavioral interaction patterns with a non-speech voice-controlled system. It throws light on shyness as a psychological determinant and on vocal endurance as a physiological factor. It hypothesizes that there are certain types of non-speech sounds, such as whistling, that shy users are more prone to resort to as an input. It also hypothesizes that there are some non-speech sounds which are more suitable for interactions that involve prolonged or continuous vocal control. To examine the validity of these hypotheses, it presents and employs a voice-controlled Christmas tree in a preliminary experimental approach to investigate the factors that may affect users' preferences and interaction patterns during non-speech voice control, and by which the developer's choice of non-speech input to a voice-controlled system should be determined.
Keywords: Paralanguage; vocal control; preferences; voice-physical
"Show and Tell": Using Semantically Processable Prosodic Markers for Spatial Expressions in an HCI System for Consumer Complaints BIBAKFull-Text 13-22
  Christina Alexandris
The observed relation between prosodic information and the degree of precision and lack of ambiguity is attempted to be integrated in the processing of the user's spoken input in the CitizenShield ("POLIAS") system for consumer complaints for commercial products. The prosodic information contained in the spoken descriptions provided by the consumers is attempted to be preserved with the use of semantically processable markers, classifiable within an Ontological Framework and signalizing prosodic prominence in the speakers spoken input. Semantic processability is related to the reusability and/or extensibility of the present system to multilingual applications or even to other types of monolingual applications.
Keywords: Prosodic prominence; Ontology; Selectional Restrictions; Indexical Interpretation for Emphasis; Deixis; Ambiguity resolution; Spatial Expressions
Exploiting Speech-Gesture Correlation in Multimodal Interaction BIBAKFull-Text 23-30
  Fang Chen; Eric H. C. Choi; Ning Wang
This paper introduces a study about deriving a set of quantitative relationships between speech and co-verbal gestures for improving multimodal input fusion. The initial phase of this study explores the prosodic features of two human communication modalities, speech and gestures, and investigates the nature of their temporal relationships. We have studied a corpus of natural monologues with respect to frequent deictic hand gesture strokes, and their concurrent speech prosody. The prosodic features from the speech signal have been co-analyzed with the visual signal to learn the correlation of the prominent spoken semantic units with the corresponding deictic gesture strokes. Subsequently, the extracted relationships can be used for disambiguating hand movements, correcting speech recognition errors, and improving input fusion for multimodal user interactions with computers.
Keywords: Multimodal user interaction; gesture; speech; prosodic features; lexical features; temporal correlation
Pictogram Retrieval Based on Collective Semantics BIBAFull-Text 31-39
  Heeryon Cho; Toru Ishida; Rieko Inaba; Toshiyuki Takasaki; Yumiko Mori
To retrieve pictograms having semantically ambiguous interpretations, we propose a semantic relevance measure which uses pictogram interpretation words collected from a web survey. The proposed measure uses ratio and similarity information contained in a set of pictogram interpretation words to (1) retrieve pictograms having implicit meaning but not explicit interpretation word and (2) rank pictograms sharing common interpretation word(s) according to query relevancy which reflects the interpretation ratio.
Enrich Web Applications with Voice Internet Persona Text-to-Speech for Anyone, Anywhere BIBAKFull-Text 40-49
  Min Chu; Yusheng Li; Xin Zou; Frank K. Soong
To embrace the coming age of rich Internet applications and to enrich applications with voice, we propose a Voice Internet Persona (VIP) service. Unlike current text-to-speech (TTS) applications, in which users need to painstakingly install TTS engines in their own machines and do all customizations by themselves, our VIP service consists of a simple, easy-to-use platform that enables users to voice-empower their content, such as podcasts or voice greeting cards. We offer three user interfaces for users to create and tune new VIPs with built-in tools, share their VIPs via this new platform, and generate expressive speech content with selected VIPs. The goal of this work is to popularize TTS features to additional scenarios such as entertainment and gaming with the easy-to-access VIP platform.
Keywords: Voice Internet Persona; Text-to-Speech; Rich Internet Application
Using Recurrent Fuzzy Neural Networks for Predicting Word Boundaries in a Phoneme Sequence in Persian Language BIBAKFull-Text 50-59
  Mohammad-Reza Feizi-Derakhshi; Mohammad Reza Kangavari
The word boundary detection has an application in speech processing systems. The problem this paper tries to solve is to separate words of a sequence of phonemes where there is no delimiter between phonemes. In this paper, at first, a recurrent fuzzy neural network (RFNN) together with its relevant structure is proposed and learning algorithm is presented. Next, this RFNN is used to predict word boundaries. Some experiments have already been implemented to determine complete structure of RFNN. Here in this paper, three methods are proposed to encode input phoneme and their performance have been evaluated. Some experiments have been conducted to determine required number of fuzzy rules and then performance of RFNN in predicting word boundaries is tested. Experimental results show an acceptable performance.
Keywords: Word boundary detection; Recurrent fuzzy neural network (RFNN); Fuzzy neural network; Fuzzy logic; Natural language processing; Speech processing
Subjective Measurement of Workload Related to a Multimodal Interaction Task: NASA-TLX vs. Workload Profile BIBAKFull-Text 60-69
  Dominique Fréard; Eric Jamet; Olivier Le Bohec; Gérard Poulain; Valérie Botherel
This paper addresses workload evaluation in the framework of a multimodal application. Two multidimensional subjective workload rating instruments are compared. The goal is to analyze the diagnostics obtained on four implementations of an applicative task. In addition, an Automatic Speech Recognition (ASR) error was introduced in one of the two trials. Eighty subjects participated in the experiment. Half of them rated their subjective workload with NASA-TLX and the other half rated it with Workload Profile (WP) enriched with two stress-related scales. Discriminant and variance analyses revealed a better sensitivity with WP. The results obtained with this instrument led to hypotheses on the cognitive activities of the subjects during interaction. Furthermore, WP permitted us to classify two strategies offered for error recovery. We conclude that WP is more informative for the task tested. WP seems to be a better diagnostic instrument in multimodal system conception.
Keywords: Human-Computer Dialogue; Workload Diagnostic
Menu Selection Using Auditory Interface BIBAKFull-Text 70-75
  Koichi Hirota; Yosuke Watanabe; Yasushi Ikei
An approach to auditory interaction with wearable computer is investigated. Menu selection and keyboard input interfaces are experimentally implemented by integrating pointing interface using motion sensors with auditory localization system based on HRTF. Performance of users, or the efficiency of interaction, is evaluated through experiments using subjects. The average time for selecting a menu item was approximately 5-9 seconds depending on the geometric configuration of the menu, and average key input performance was approximately 6 seconds per a character. The result did not support our expectation that auditory localization of menu items will be a helpful cue for accurate pointing.
Keywords: auditory interface; menu selection; keyboard input
Analysis of User Interaction with Service Oriented Chatbot Systems BIBAKFull-Text 76-83
  Marie-Claire Jenkins; Richard Churchill; Stephen Cox; Dan Smith
Service oriented chatbot systems are designed to help users access information from a website more easily. The system uses natural language responses to deliver the relevant information, acting like a customer service representative. In order to understand what users expect from such a system and how they interact with it we carried out two experiments which highlighted different aspects of interaction. We observed the communication between humans and the chatbots, and then between humans, applying the same methods in both cases. These findings have enabled us to focus on aspects of the system which directly affect the user, meaning that we can further develop a realistic and helpful chatbot.
Keywords: human-computer interaction; chatbot; question-answering; communication; intelligent system; natural language; dialogue
Performance Analysis of Perceptual Speech Quality and Modules Design for Management over IP Network BIBAKFull-Text 84-93
  Jinsul Kim; Hyun-Woo Lee; Won Ryu; Seung Ho Han; Minsoo Hahn
Voice packets with guaranteed QoS (Quality of Service) on the VoIP system are responsible for digitizing, encoding, decoding, and playing out the speech signal. The important point is based on the factor that different parts of speech over IP networks have different perceptual importance and each part of speech does not contribute equally to the overall voice quality. In this paper, we propose new additive noise reduction algorithms to improve voice over IP networks and present performance evaluation of perceptual speech signal through IP networks in the additive noise environment during realtime phone-call service. The proposed noise reduction algorithm is applied to pre-processing method before speech coding and to post-processing method after speech decoding based on single microphone VoIP system. For noise reduction, this paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for speech enhancement. Various noisy conditions including white Gaussian, office, babble, and car noises are considered with G.711 codec. Also, we provide critical message report procedures and management schemes to guarantee QoS over IP networks. Finally, as following the experimental results, the proposed algorithm and method has been prove for improving speech quality.
Keywords: VoIP; Noise Reduction; QoS; Speech Packet; IP Network
A Tangible User Interface with Multimodal Feedback BIBAKFull-Text 94-103
  Laehyun Kim; Hyunchul Cho; Se Hyung Park; Manchul Han
Tangible user interface allows the user to manipulate digital information intuitively through physical things which are connected to digital contents spatially and computationally. It takes advantage of human ability to manipulate delicate objects precisely. In this paper, we present a novel tangible user interface, SmartPuck system, which consists of a PDP-based table display, SmartPuck having a built-in actuated wheel and button for the physical interactions, and a sensing module to track the position of SmartPuck. Unlike passive physical things in the previous systems, SmartPuck has built-in sensors and actuator providing multimodal feedback such as visual feedback by LEDs, auditory feedback by a speaker, and haptic feedback by an actuated wheel. It gives a feeling as if the user works with physical object. We introduce new tangible menus to control digital contents just as we interact with physical devices. In addition, this system is used to navigate geographical information in Google Earth program.
Keywords: Tangible User Interface; Tabletop display; Smart Puck System
Minimal Parsing Key Concept Based Question Answering System BIBAKFull-Text 104-113
  Sunil Kumar Kopparapu; Akhilesh Srivastava; P. V. S. Rao
The home page of a company is an effective means for show casing their products and technology. Companies invest major effort, time and money in designing their web pages to enable their user's to access information they are looking for as quickly and as easily as possible. In spite of all these efforts, it is not uncommon for a user to spend a sizable amount of time trying to retrieve the particular information that he is looking for. Today, he has to go through several hyperlink clicks or manually search the pages displayed by the site search engine to get to the information that he is looking for. Much time gets wasted if the required information does not exist on that website. With websites being increasingly used as sources of information about companies and their products, there is need for a more convenient interface. In this paper we discuss a system based on a set of Natural Language Processing (NLP) techniques which addresses this problem. The system enables a user to ask for information from a particular website in free style natural English. The NLP based system is able to respond to the query by 'understanding' the intent of the query and then using this understanding to retrieve relevant information from its unstructured info-base or structured database for presenting it to the user. The interface is called UniqliQ as it avoids the user having to click through several hyperlinked pages. The core of UniqliQ is its ability to understand the question without formally parsing it. The system is based on identifying key-concepts and keywords and then using them to retrieve information. This approach enables UniqliQ framework to be used for different input languages with minimal architectural changes. Further, the key-concept -- keyword approach gives the system an inherent ability to provide approximate answers in case the exact answers are not present in the information database.
Keywords: NL Interface; Question Answering System; Site search engine
Customized Message Generation and Speech Synthesis in Response to Characteristic Behavioral Patterns of Children BIBAKFull-Text 114-123
  Ho-Joon Lee; Jong C. Park
There is a growing need for a user-friendly human-computer interaction system that can respond to various characteristics of a user in terms of behavioral patterns, mental state, and personalities. In this paper, we present a system that generates appropriate natural language spoken messages with customization for user characteristics, taking into account the fact that human behavioral patterns usually reveal one's mental state or personality subconsciously. The system is targeted at handling various situations for five-year old kindergarteners by giving them caring words during their everyday lives. With the analysis of each case study, we provide a setting for a computational method to identify user behavioral patterns. We believe that the proposed link between the behavioral patterns and the mental state of a human user can be applied to improve not only user interactivity but also believability of the system.
Keywords: natural language processing; customized message generation; behavioral pattern recognition; speech synthesis; ubiquitous computing
Multi-word Expression Recognition Integrated with Two-Level Finite State Transducer BIBAKFull-Text 124-133
  Keunyong Lee; Ki-Soen Park; Yong-Seok Lee
This paper proposes another two-level finite state transducer to recognize the multi-word expression (MWE) in two-level morphological parsing environment. In our proposed the Finite State Transducer with Bridge State (FSTBS), we defined Bridge State (concerned with connection of multi-word), Bridge Character (used in connection of multi-word expression) and two-level rule to extend existing FST. FSTBS could recognize both Fixed Type MWE and Flexible Type MWE which are expressible as regular expression, because FSTBS recognizes MWE in morphological parsing.
Keywords: Multi-word Expression; Two-level morphological parsing; Finite State Transducer
Towards Multimodal User Interfaces Composition Based on UsiXML and MBD Principles BIBAKFull-Text 134-143
  Sophie Lepreux; Anas Hariri; José Rouillard; Dimitri Tabary; Jean-Claude Tarby; Christophe Kolski
In software design, the reuse issue brings the increasing of web services, components and others techniques. These techniques allow reusing code associated to technical aspect (as software component). With the development of business components which can integrate technical aspect with HCI, the composition issue has appeared. Our previous work concerned the GUI composition based on an UIDL as UsiXML. With the generalization of Multimodal User Interfaces (MUI), MUI composition principles have to be studied. This paper aims at extend existing basic composition principles in order to treat multimodal interfaces. The same principle as in the previous work, based on the tree algebra, can be used in another level (AUI) of the UsiXML framework to support the Multimodal User Interfaces composing. This paper presents a case study on the food ordering system based on multimodal (coupling GUI and MUI). A conclusion and the future works in the HCI domain are presented.
Keywords: User interfaces design; UsiXML; AUI (Abstract User Interface); Multimodal User Interfaces; Vocal User Interfaces
m-LoCoS UI: A Universal Visible Language for Global Mobile Communication BIBAKFull-Text 144-153
  Aaron Marcus
The LoCoS universal visible language developed by the graphic/sign designer Yukio Ota in Japan in 1964 may serve as a usable, useful, and appealing basis for a mobile phone applications that can provide capabilities for communication among people who do not share a spoken language. User-interface design issues including display and input are discussed in conjunction with prototype screens showing the use of LoCoS for a mobile phone.
Keywords: design; interface; language; LoCoS; mobile; phone; user
Developing a Conversational Agent Using Ontologies BIBAFull-Text 154-164
  Manish Mehta; Andrea Corradini
We report on the benefits achieved by using ontologies in the context of a fully implemented conversational system that allows for a real-time rich communication between primarily 10 to 18 years old human users and a 3D graphical character through spontaneous speech and gesture. In this paper, we focus on the categorization of ontological resources into domain independent and domain specific components in the effort of both augmenting the agent's conversational capabilities and enhancing system's reusability across conversational domains. We also present a novel method of exploiting the existing ontological resources along with Google directory categorization for a semi-automatic understanding of user utterance on general purpose topics like e.g. movies and games.
Conspeakuous: Contextualising Conversational Systems BIBAFull-Text 165-175
  S. Arun Nair; Amit Anil Nanavati; Nitendra Rajput
There has been a tremendous increase in the amount and type of information that is available through the Internet and through various sensors that now pervade our daily lives. Consequentially, the field of context aware computing has also contributed significantly in providing new technologies to mine and use the available context data. We present Conspeakuous -- an architecture for modeling, aggregating and using the context in spoken language conversational systems. Since Conspeakuous is aware of the environment through different sources of context, it helps in making the conversation more relevant to the user, and thus reducing the cognitive load on the user. Additionally, the architecture allows for representing learning of various user/environment parameters as a source of context. We built a sample tourist information portal application based on the Conspeakuous architecture and conducted user studies to evaluate the usefulness of the system.
Persuasive Effects of Embodied Conversational Agent Teams BIBAFull-Text 176-185
  Hien Nguyen; Judith Masthoff; Peter Edwards
In a persuasive communication, not only the content of the message but also its source, and the type of communication can influence its persuasiveness on the audience. This paper compares the effects on the audience of direct versus indirect communication, one-sided versus two-sided messages, and one agent presenting the message versus a team presenting the message.
Exploration of Possibility of Multithreaded Conversations Using a Voice Communication System BIBAKFull-Text 186-195
  Kanayo Ogura; Kazushi Nishimoto; Kozo Sugiyama
Everyday voice conversations require people to obey the turn-taking rule and to keep to a single topic thread; therefore, it is not always an effective way to communicate. Hence, we propose "ChaTEL," a voice communication system for facilitating real-time multithreaded voice communications. ChaTEL has two functions to support multithreaded communications: a function to indicate to whom the user talks and a function to indicate which utterance the user responds to. Comparing ChaTEL with a baseline system that does not have these functions, we show that multithreaded conversations occur more frequently with ChaTEL. Moreover, we discuss why ChaTEL can facilitate multi-threaded conversations based on analyses of users' speaking and listening behaviors.
Keywords: CMC (Computer-Mediated Communication); Multithreaded Conversations
A Toolkit for Multimodal Interface Design: An Empirical Investigation BIBAKFull-Text 196-205
  Dimitrios I. Rigas; Mohammad M. Alsuraihi
This paper introduces a comparative multi-group study carried out to investigate the use of multimodal interaction metaphors (visual, oral, and aural) for improving learnability (or usability from first time use) of interface-design environments. An initial survey was used for taking views about the effectiveness and satisfaction of employing speech and speech-recognition for solving some of the common usability problems. Then, the investigation was done empirically by testing the usability parameters: efficiency, effectiveness, and satisfaction of three design-toolkits (TVOID, OFVOID, and MMID) built especially for the study. TVOID and OFVOID interacted with the user visually only using typical and time-saving interaction metaphors. The third environment MMID added another modality through vocal and aural interaction. The results showed that the use of vocal commands and the mouse concurrently for completing tasks from first time use was more efficient and more effective than the use of visual-only interaction metaphors.
Keywords: interface-design; usability; learnability; effectiveness; efficiency; satisfaction; visual; oral; aural; multimodal; auditory-icons; earcons; speech; text-to-speech; speech recognition; voice-instruction
An Input-Parsing Algorithm Supporting Integration of Deictic Gesture in Natural Language Interface BIBAKFull-Text 206-215
  Yong Sun; Fang Chen; Yu (David) Shi; Vera Chung
Natural language interface (NLI) enables an efficient and effective interaction by allowing a user to submit a single phrase in natural language to the system. Free hand gestures can be added to an NLI to specify the referents for deictic terms in speech. By combining NLI with other modalities to a multimodal user interface, speech utterance length can be reduced, and users need not clearly specify the referent verbally. Integrating deictic terms with deictic gestures is a critical function in multimodal user interface. This paper presents a novel approach to extend chart parsing used in natural language processing (NLP) to integrate multimodal input based on speech and manual deictic gesture. The effectiveness of the technique has been validated through experiments, using a traffic incident management scenario where an operator interacts with a map on large display at a distance and issues multimodal commands through speech and manual gestures. The preliminary experiment of the proposed algorithm shows encouraging results.
Keywords: Multimodal chart parsing; Multimodal Fusion; Deictic Gesture; Deictic Terms
Multimodal Interfaces for In-Vehicle Applications BIBAKFull-Text 216-224
  Roman Vilimek; Thomas Hempel; Birgit Otto
This paper identifies several factors that were observed as being crucial to the usability of multimodal in-vehicle applications -- a multimodal system is not of value in itself. Focusing in particular on the typical combination of manual and voice control, this article describes important boundary conditions and discusses the concept of natural interaction.
Keywords: Multimodal; usability; driving; in-vehicle systems
Character Agents in E-Learning Interface Using Multimodal Real-Time Interaction BIBAKFull-Text 225-231
  Hua Wang; Jie Yang; Mark H. Chignell; Mitsuru Ishizuka
This paper describes an e-learning interface with multiple tutoring character agents. The character agents use eye movement information to facilitate empathy-relevant reasoning and behavior. Eye Information is used to monitor user's attention and interests, to personalize the agent behaviors, and for exchanging information of different learners. The system reacts to multiple users' eye information in real-time and the empathic character agents owned by each learner exchange learner's information to help to form the online learning community. Based on these measures, the interface infers the focus of attention of the learner and responds accordingly with affective and instructional behaviors. The paper will also report on some preliminary usability test results concerning how users respond to the empathic functions and interact with other learners using the character agents.
Keywords: Multiple user interface; e-learning; character agent; tutoring; educational interface
An Empirical Study on Users' Acceptance of Speech Recognition Errors in Text-Messaging BIBAFull-Text 232-242
  Shuang Xu; Santosh Basapur; Mark Ahlenius; Deborah Matteo
Although speech recognition technology and voice synthesis systems have become readily available, recognition accuracy remain a serious problem in the design and implementation of voice-based user interfaces. Error correction becomes particularly difficult on mobile devices due to the limited system resources and constrained input methods. This research is aimed to investigate users' acceptance of speech recognition errors in mobile text messaging. Our results show that even though the audio presentation of the text messages does help users understand the speech recognition errors, users indicate low satisfaction when sending or receiving text messages with errors. Specifically, senders show significantly lower acceptance than the receivers due to the concerns of follow-up clarifications and the reflection of the sender's personality. We also find that different types of recognition errors greatly affect users' overall acceptance of the received message.
Flexible Multi-modal Interaction Technologies and User Interface Specially Designed for Chinese Car Infotainment System BIBAKFull-Text 243-252
  Chen Yang; Nan Chen; Peng-fei Zhang; Zhen Jiao
In this paper, we present a car infotainment prototype system which aims to develop an advanced concept for intuitive use-centered human machine interface especially designed for Chinese users. In technology aspect, we apply several innovative interaction technologies (most of which are Chinese language specific) to make interaction easier, more convenient and effective. Speech interaction design is especially elaborated in this aspect. While in user interface design aspect, we systematically conducted user investigation to give enlightening clue for better designing logic flow of the system and aesthetic design. Under user-centered design principle and with deep understanding of different interaction technologies, our prototype system makes transition from different interaction modalities quite flexible. Preliminary performance evaluation shows that our system attains high user acceptance.
Keywords: Car Infotainment; Chinese ASR; Chinese TTS; Chinese NLU; Chinese Finger Stroke Recognition; Melody Recognition; User-centered design
A Spoken Dialogue System Based on Keyword Spotting Technology BIBAFull-Text 253-261
  Pengyuan Zhang; Qingwei Zhao; Yonghong Yan
In this paper, a keyword spotting based dialogue system is described. It is critical to understand user's requests accurately in a dialogue system. But the performance of large vocabulary continuous speech recognition (LVCSR) system is far from perfect, especially for spontaneous speech. In this work, an improved keyword spotting scheme is adopted instead. A fuzzy search algorithm is proposed to extract keyword hypotheses from syllable confusion networks (CN). CNs are linear and naturally suitable for indexing. To accelerate search process, CNs are pruned to feasible sizes. Furthermore, we enhance the discriminability of confidence measure by applying entropy information to the posterior probability of word hypotheses. On mandarin conversational telephone speech (CTS), the proposed algorithms obtained a 4.7% relative equal error rate (EER) reduction.

Part II: Adaptive, Intelligent and Emotional User Interfaces

Dynamic Association Rules Mining to Improve Intermediation Between User Multi-channel Interactions and Interactive e-Services BIBAKFull-Text 265-274
  Vincent Chevrin; Olivier Couturier
This paper deals with multi-channel interaction managing thru an intermediation between channels and Interactive e-Services (IeS). After work on modeling and theoretical framework, we implemented a platform: Ubi-Learn, which is able to manage this kind of interaction thru an intermediation middleware based on a Multi-Agents System (MAS): Jade. The issue addressed here is linked to the way you choose a channel depending on the user's task. First, we have encoded several ad hoc rules (tacit knowledge) into the system. In this paper, we present our new approach based on association rules mining approach which allows us to propose automatically several dynamic rules (explicit knowledge).
Keywords: Interactive e-Services; Intermediation; association rules mining
Emotionally Expressive Avatars for Chatting, Learning and Therapeutic Intervention BIBAKFull-Text 275-285
  Marc Fabri; Salima Y. Awad Elzouki; David J. Moore
We present our work on emotionally expressive avatars, animated virtual characters that can express emotions via facial expressions. Because these avatars are highly distinctive and easily recognizable, they may be used in a range of applications. In the first part of the paper we present their use in computer mediated communication where two or more people meet in virtual space, each represented by an avatar. Study results suggest that social interaction behavior from the real-world is readily transferred to the virtual world. Empathy is identified as a key component for creating a more enjoyable experience and greater harmony between users. In the second part of the paper we discuss the use of avatars as an assistive, educational and therapeutic technology for people with autism. Based on the results of a preliminary study, we provide pointers regarding how people with autism may overcome some of the limitations that characterize their condition.
Keywords: Emotion; avatar; virtual reality; facial expression; instant messaging; empathy; autism; education; therapeutic intervention
Can Virtual Humans Be More Engaging Than Real Ones? BIBAFull-Text 286-297
  Jonathan Gratch; Ning Wang; Anna Okhmatovskaia; Francois Lamothe; Mathieu Morales; Rick J. van der Werf; Louis-Philippe Morency
Emotional bonds don't arise from a simple exchange of facial displays, but often emerge through the dynamic give and take of face-to-face interactions. This article explores the phenomenon of rapport, a feeling of connectedness that seems to arise from rapid and contingent positive feedback between partners and is often associated with socio-emotional processes. Rapport has been argued to lead to communicative efficiency, better learning outcomes, improved acceptance of medical advice and successful negotiations. We provide experimental evidence that a simple virtual character that provides positive listening feedback can induce stronger rapport-like effects than face-to-face communication between human partners. Specifically, this interaction can be more engaging to storytellers than speaking to a human audience, as measured by the length and content of their stories.
Automatic Mobile Content Conversion Using Semantic Image Analysis BIBAFull-Text 298-307
  Eunjung Han; JongYeol Yang; HwangKyu Yang; Keechul Jung
An approach to knowledge-assisted semantic offline content re-authoring based on an automatic content conversion (ACC) ontology infrastructure is presented. Semantic concepts in the context are defined in ontology, text detection (e.g. connected component based), feature (e.g. texture homogeneity), feature parameter (e.g. texture model distribution), clustered feature (e.g. k-manes algorithm). We will show how the adaptation of the layout can facilitate browsing with mobile devices, especially small-screen mobile phones. In a second stage we address the topic of content personalization by providing a personalization scheme that is based on the ontology technology. Our experiment shows that the proposed ACC is more efficient than the existing methods in providing mobile comic contents.
History Based User Interest Modeling in WWW Access BIBAKFull-Text 308-312
  Shuang Han; Wenguang Chen; Heng Wang
WWW cache stores user's browsing history, which contains large amount of information that may be accessed again but not yet added to user's favorite page folder. The existed www pages can be used to abstract user's interest and predicts user interaction. By that means, a model that describes user's interest is needed. In this paper, we discuss two methods about www-cache, data mining and user interest: simple user interest model and real time two-dimensional interest model. Moreover, the latter is described in detail and applied to user interest modeling. An experiment is performed on 20 users' interest data sets, which shows real time two-dimensional interest model is more effective in www cache modeling.
Keywords: www cache; user interest; interest model; data mining
Development of a Generic Design Framework for Intelligent Adaptive Systems BIBAKFull-Text 313-320
  Ming Hou; Michelle Sylvia Gauthier; Simon Banbury
A lack of established design guidelines for intelligent adaptive systems is a challenge in designing a human-machine performance maximization system. An extensive literature review was conducted to examine existing approaches in the design of intelligent adaptive systems. A unified framework to describe design approaches using consistent and unambiguous terminology was developed. Combining design methodologies from both Human Computer Interaction and Human Factors fields, conceptual and design frameworks were also developed to provide guidelines for the design and implementation of intelligent adaptive systems. A number of criteria for the selection of appropriate analytical techniques are recommended. The proposed frameworks will not only provide guidelines for designing intelligent adaptive systems in the military domain, but also broadly guide the design of other generic systems to optimize human-machine system performance.
Keywords: design guidance; design framework; intelligent adaptive interface; intelligent adaptive system
Three Way Relationship of Human-Robot Interaction BIBAKFull-Text 321-330
  Jung-Hoon Hwang; Kang-Woo Lee; Dong-Soo Kwon
In this paper, we conceptualize human-robot interaction (HRI) such that a 3-way relationship among a human, robot and environment can be established. Various interactive patterns that may occur are analyzed on the basis of shared ground. The model sheds light on how uncertainty caused by lack of knowledge may be resolved and how shared ground can be established through interaction. We also develop measures to evaluate the interactivities such as an Interaction Effort, Interaction Situation Awareness using the information theory as well as Markovian transition. An experiment is carried out in which human subjects are asked to explain or answer about objects through interaction. The results of the experiments show the feasibility of the proposed model and the usefulness of the measures. It is expected that the presented model and measures will serve to increase understanding of the patterns of HRI and to evaluate the interactivity of HRI system.
Keywords: Human-Robot Interaction; Shared Ground; Metrics; Interaction Effort; Interaction SA
MEMORIA: Personal Memento Service Using Intelligent Gadgets BIBAKFull-Text 331-339
  Hyeju Jang; Jongho Won; Changseok Bae
People would like to record what they experience to recall their earlier events, share with others, or even hand down to their next generations. In addition, our environment has been getting digitalized and the cost of storing media has been being reduced. This has led research on the life log that stores people's daily life. The research area includes collecting experience information conveniently, manipulating and recording the collected information efficiently, and retrieving and providing the stored information to users effectively. This paper describes a personalized memory augmentation service, called MEMORIA, that collects, stores and retrieves various kinds of experience information in real time using the specially designed wearable intelligent gadget (WIG).
Keywords: Intelligent Gadget; Smart Object; Personalized Service; Memory Assistant System; Memory Augmentation Service
A Location-Adaptive Human-Centered Audio Email Notification Service for Multi-user Environments BIBAKFull-Text 340-348
  Ralf Jung; Tim Schwartz
In this paper, we introduce an application for a discreet notification of mobile persons in a multi-user environment. In particular we use the current user position to provide a personalized email notification with non-speech audio cues embedded in aesthetic background music. The notification is done in a peripheral way to avoid distraction of other people in the surrounding.
Keywords: Auditory Display; Ambient Soundscapes; Indoor Positioning
Emotion-Based Textile Indexing Using Neural Networks BIBAKFull-Text 349-357
  Na Yeon Kim; Yunhee Shin; Eun Yi Kim
This paper proposes a neural network based approach for emotion based textile indexing. Generally, the human emotion can be affected by some physical features such as color, texture, pattern, and so on. In the previous work, we investigated the correlation between the human emotion and color or texture. Here, we aim at investigating the correlation between the emotion and pattern, and developing the textile indexing system using the pattern information. Therefore, the survey is first conducted to investigate the correlation between the emotion and the pattern. The result shows that a human emotion is deeply affected by the certain pattern. Based on that result, an automatic indexing system is developed. The proposed system is composed of feature extraction and classification. To describe the pattern information in the textiles, the wavelet transform is used. And the neural network is used as the classifier. To assess the validity of the proposed method, it was applied to recognize the human emotions in 100 textiles, and then our system produced the accuracy of 90%. This result confirmed that our system has the potential to be applied for various applications such as textile industry and e-business.
Keywords: Emotion recognition; neural networks; pattern recognition; feature extraction; wavelet transform
Decision Theoretic Perspective on Optimizing Intelligent Help BIBAKFull-Text 358-365
  Chulwoo Kim; Mark R. Lehto
With the increasing complexity of systems and information overload, agent technology has become widely used to provide personalized advice (help message) to users with their computer-based tasks. The purpose of this study is to investigate the way to optimize advice provided by the intelligent agent from a decision theoretic perspective. The study utilizes the time associated with processing a help message as the trade-off criterion of whether to present a help message or not. The proposed approach is expected to provide guidance as to where, when and why help messages are likely to be effective or ineffective by providing quantitative predictions of value of help messages in time.
Keywords: intelligent agent; intelligent help; decision theoretic perspective; help optimization
Human-Aided Cleaning Algorithm for Low-Cost Robot Architecture BIBAKFull-Text 366-375
  Seungyong Kim; Kiduck Kim; Tae-Hyung Kim
This paper presents a human-aided cleaning algorithm that can be implemented on low-cost robot architecture while the cleaning performance far exceeds the conventional random style cleaning. We clarify the advantages and disadvantages of the two notable cleaning robot styles: the random and the mapping styles, and show the possibility how we can achieve the performance of the complicated mapping style under the random style-like robot architecture using the idea of human-aided cleaning algorithm. Experimental results are presented to show the cleaning performance.
Keywords: Cleaning robots; Random style cleaning; Mapping style cleaning; Human-robot interaction
The Perception of Artificial Intelligence as "Human" by Computer Users BIBAKFull-Text 376-384
  Jurek Kirakowski; Patrick O'Donnell; Anthony Yiu
This paper deals with the topic of 'humanness' in intelligent agents. Chatbot agents (e.g. Eliza, Encarta) had been criticized on their ability to communicate in human like conversation. In this study, a CIT approach was used for analyzing the human and non-human parts of Eliza's conversation. The result showed that Eliza could act like a human as if it could greet, maintain a theme, apply damage control, react appropriately to cue, offer a cue, use appropriate language style and have a personality. It was non human insofar as it used formal or unusual treatment of language, failed to respond to a specific question, failed to respond to a general question or implicit cue, evidenced time delays and phrases delivered at inappropriate times.
Keywords: chatbot; connectionist network; Eliza; Critical Incident Technique; humanness
Speaker Segmentation for Intelligent Responsive Space BIBAKFull-Text 385-392
  Soonil Kwon
Information drawn from conversational speech can be useful for enabling intelligent interactions between humans and computers. Speaker information can be obtained from speech signals by performing Speaker Segmentation. In this paper, a method for Speaker Segmentation is presented to address the challenge of identifying speakers even when utterances are very short (0.5sec). This method, involving the selective use of feature vectors, experimentally reduced the relative error rates by 27-42% for groups of 2 to 16 speakers as compared to the conventional approach for Speaker Segmentation. Thus, this new approach offers a way to significantly improve speech-data classification and retrieval systems.
Keywords: Speaker Segmentation; Speaker Recognition; Intelligent Responsive Space (IRS); Human Computer Interaction (HCI)
Emotion and Sense of Telepresence: The Effects of Screen Viewpoint, Self-transcendence Style, and NPC in a 3D Game Environment BIBAKFull-Text 393-400
  Jim Jiunde Lee
Telepresence, or the sense of "being there", has been discussed in the literature as an essential, defining aspect of a virtual environment, including definitions rooted in behavioral response, signal detection theory, and philosophy, but has generally ignored the emotional aspects of the virtual experience. The purpose of this study is to examine the concept of presence in terms of people's emotional engagement within an immersive mediate environment. Three main theoretical statements are discussed: a). Objective telepresence: display viewpoint; b). Subjective telepresence: emotional factors and individual self-transcendence styles; c). Social telepresence: program-controlled entities in an on-line game environment. This study has implications for how research could be conducted to further our understanding of telepresence. Validated psychological subjective techniques for assessing emotions and a sense of telepresence will be applied. The study results could improve our knowledge of the construct of telepresence, as well as better inform us about how a virtual environment, such as an online game, can be managed in creating and designing emotional effects.
Keywords: Computer game; emotion; self-transcendence style; telepresence
Emotional Interaction Through Physical Movement BIBAKFull-Text 401-410
  Jong-Hoon Lee; Jin-Yung Park; Tek-Jin Nam
As everyday products become more intelligent and interactive, there are growing interests on the methods to improve emotional value attached to the products. This paper presents a basic method of using temporal and dynamic design elements, in particular physical movements, to improve the emotional value of products. To utilize physical movements in design, a relation framework between movement and emotion was developed as the first step of the research. In the framework, the movement representing emotion was structurized in terms of three properties; velocity, smoothness and openness. Based on this framework, a new interactive device, 'Emotion Palpus', was developed, and a user study was also conducted. The result of the research is expected to improve emotional user experience when used as a design method or directly applied to design practice as an interactive element of products.
Keywords: Emotion; Physical Movement Design; Interaction Design; Interactive Product Design; Design Method
Towards Affective Sensing BIBAFull-Text 411-420
  Gordon McIntyre; Roland Göcke
This paper describes ongoing work towards building a multimodal computer system capable of sensing the affective state of a user. Two major problem areas exist in the affective communication research. Firstly, affective states are defined and described in an inconsistent way. Secondly, the type of training data commonly used gives an oversimplified picture of affective expression. Most studies ignore the dynamic, versatile and personalised nature of affective expression and the influence that social setting, context and culture have on its rules of display. We present a novel approach to affective sensing, using a generic model of affective communication and a set of ontologies to assist in the analysis of concepts and to enhance the recognition process. Whilst the scope of the ontology provides for a full range of multimodal sensing, this paper focuses on spoken language and facial expressions as examples.
Affective User Modeling for Adaptive Intelligent User Interfaces BIBAKFull-Text 421-430
  Fatma Nasoz; Christine L. Lisetti
In this paper we describe the User Modeling phase of our general research approach: developing Adaptive Intelligent User Interfaces to facilitate enhanced natural communication during the Human-Computer Interaction. Natural communication is established by recognizing users' affective states (i.e., emotions experienced by the users) and responding to those emotions by adapting to the current situation via an affective user model. Adaptation of the interface was designed to provide multi-modal feedback to the users about their current affective state and to respond to users' negative emotional states in order to compensate for the possible negative impacts of those emotions. Bayesian Belief Networks formalization was employed to develop the User Model to enable the intelligent system to appropriately adapt to the current context and situation by considering user-dependent factors, such as: personality traits and preferences.
Keywords: User Modeling; Bayesian Belief Networks; Intelligent Interfaces; Human Computer Interaction
A Multidimensional Classification Model for the Interaction in Reactive Media Rooms BIBAFull-Text 431-439
  Ali A. Nazari Shirehjini
We are already living in a world where we are surrounded by intelligent devices which support us to plan, organize, and perform our daily life. Their number is constantly increasing. At the same time, the complexity of the environment and the number of intelligent devices must not distract the user from his original tasks. Therefore a primary goal is to reduce the user's mental workload. With the emergence of newly available technology, the challenge to maintain control increases, while the additional value decreases. After taking a closer look at enriched environments, there will come up the question of how to build a more intuitive way for people to interact with such an environment. As a result the design of proper interaction models appears to be crucial for AmI systems. To facilitate the design of proper interaction models we are introducing a multidimensional classification model for the interaction in reactive media rooms. It describes the various dimensions of interaction and outlines the design space for the creation of interaction models. By doing so, the proposed work can also be used as a meta-model for interaction design.
An Adaptive Web Browsing Method for Various Terminals: A Semantic Over-Viewing Method BIBAKFull-Text 440-448
  Hisashi Noda; Teruya Ikegami; Yushin Tatsumi; Shin'ichi Fukuzumi
This paper proposed a semantic over-viewing method. This method extracts headings and semantic blocks by analyzing a layout structure of a web page and can provide a semantic overview of the web page. This method allows users grasp the overall structure of pages. It also reduces the number of operations to target information to about 6% by moving along semantic blocks. Additionally, it reduces the cost of Web page creation because of adapting one Web page content to multi-terminals. The evaluations were conducted in respect to effectiveness, efficiency and satisfaction. The results confirmed that the proposed browser is more usable than the traditional method.
Keywords: cellular phone; mobile phone; non-PC terminal; remote controller; web browsing; overview
Evaluation of P2P Information Recommendation Based on Collaborative Filtering BIBAKFull-Text 449-458
  Hidehiko Okada; Makoto Inoue
Collaborative filtering is a social information recommendation/ filtering method, and the peer-to-peer (P2P) computer network is a network on which information is distributed on the peer-to-peer basis (each peer node works as a server, a client, and even a router). This research aims to develop a model of P2P information recommendation system based on collaborative filtering and evaluate the ability of the system by computer simulations based on the model. We previously proposed a simple model, and the model in this paper is a modified one that is more focused on recommendation agents and user-agent interactions. We have developed a computer simulator program and tested simulations with several parameter settings. From the results of the simulations, recommendation recall and precision are evaluated. Findings are that the agents are likely to overly recommend so that the recall score becomes high but the precision score becomes low.
Keywords: Multi agents; P2P network; information recommendation; collaborative filtering; simulation
Understanding the Social Relationship Between Humans and Virtual Humans BIBAKFull-Text 459-464
  Sung Park; Richard Catrambone
Our review surveys a range of human-human relationship models and research that might provide insights to understanding the social relationship between humans and virtual humans. This involves investigating several social constructs (expectations, communication, trust, etc.) that are identified as key variables that influence the relationship between people and how these variables should be implemented in the design for an effective and useful virtual human. This theoretical analysis contributes to the foundational theory of human computer interaction involving virtual humans.
Keywords: Embodied conversational agent; virtual agent; animated character; avatar; social interaction
EREC-II in Use -- Studies on Usability and Suitability of a Sensor System for Affect Detection and Human Performance Monitoring BIBAKFull-Text 465-474
  Christian Peter; Randolf Schultz; Jörg Voskamp; Bodo Urban; Nadine Nowack; Hubert Janik; Karin Kraft; Roland Göcke
Interest in emotion detection is increasing significantly. For research and development in the field of Affective Computing, in smart environments, but also for reliable non-lab medical and psychological studies or human performance monitoring, robust technologies are needed for detecting evidence of emotions in persons under everyday conditions. This paper reports on evaluation studies of the EREC-II sensor system for acquisition of emotion-related physiological parameters. The system has been developed with a focus on easy handling, robustness, and reliability. Two sets of studies have been performed covering 4 different application fields: medical, human performance in sports, driver assistance, and multimodal affect sensing. Results show that the different application fields pose different requirements mainly on the user interface, while the hardware for sensing and processing the data proved to be in an acceptable state for use in different research domains.
Keywords: Physiology sensors; Emotion detection; Evaluation; Multimodal affect sensing; Driver assistance; Human performance; Cognitive load; Medical treatment; Peat baths
Development of an Adaptive Multi-agent Based Content Collection System for Digital Libraries BIBAKFull-Text 475-485
  R. Ponnusamy; T. V. Gopal
Relevant digital content collection and access are huge problems in digital libraries. It poses a greater challenge to the digital library users and content builders. In this present work an attempt has been made to design and develop a user-adaptive multi-agent system approach to recommend the contents automatically for the digital library. An adaptive dialogue based user-interaction screen has also been provided to access the contents. Once the new contents are added to the collection then the system should automatically alert appropriate user about the new content arrivals based on their interest. The user interactive Question Answering (QA) system provides enough knowledge about the user requirements.
Keywords: Question Answering (QA) systems; Adaptive Interaction; Digital Libraries; Multi-Agent System
Using Content-Based Multimedia Data Retrieval for Multimedia Content Adaptation BIBAKFull-Text 486-492
  Adriana Reveiu; Marian Dardala; Felix Furtuna
The effective retrieval and multimedia data management techniques to facilitate the searching and querying of large multimedia data sets are very important in multimedia applications development. The content-based retrieval systems must use the multimedia content to represent and to index data. The representation of multimedia data supposes to identify the most useful features for representing the multimedia content and the approaches needed for coding the attributes of multimedia data. The multimedia content adaptation realize the multimedia resources manipulation, respecting the specific quality parameters, function on the limits required by networks and terminal devices. The goal of the paper is to identify a design model for using content-based multimedia data retrieval in multimedia content adaptation. The goal of this design is to deliver the multimedia content in various networks and to different types of peripheral devices, in the most appropriate format and function on specific characteristics.
Keywords: multimedia streams; content based data retrieval; content adaptation; media type
Coping with Complexity Through Adaptive Interface Design BIBAKFull-Text 493-498
  Nadine B. Sarter
Complex systems are characterized by a large number and variety of, and often a high degree of dependency between, subsystems. Complexity, in combination with coupling, has been shown to lead to difficulties with monitoring and comprehending system status and activities and thus to an increased risk of breakdowns in human-machine coordination. In part, these breakdowns can be explained by the fact that increased complexity tends to be paralleled by an increase in the amount of data that is made available to operators. Presenting this data in an inappropriate form is crucial to avoiding problems with data overload and attention management. One approach for addressing this challenge is to move from fixed display designs to adaptive information presentation, i.e., information presentation that changes as a function of context. This paper will discuss possible approaches to, challenges for, and effects of increasing the flexibility of information presentation.
Keywords: interface design; adaptive; adaptable; complex systems; adaptation drivers
Region-Based Model of Tour Planning Applied to Interactive Tour Generation BIBAFull-Text 499-507
  Inessa Seifert
The paper addresses a tour planning problem, which encompasses weakly specified constraints such as different kinds of activities together with corresponding spatial assignments such as locations and regions. Alternative temporal orders of planed activities together with underspecified spatial assignments available at different levels of granularity lead to a high computational complexity of the given tour planning problem. The paper introduces the results of an exploratory tour planning study and a Region-based Direction Heuristic, derived from the acquired data. A gesture-based interaction model is proposed, which allows structuring the search space by a human user at a high level of abstraction for the subsequent generation of alternative solutions so that the proposed Region-based Direction Heuristic can be applied.
A Learning Interface Agent for User Behavior Prediction BIBAKFull-Text 508-517
  Gabriela Serban; Adriana Tarta; Grigoreta Sofia Moldovan
Predicting user behavior is an important issue in Human Computer Interaction ([5]) research, having an essential role when developing intelligent user interfaces. A possible solution to deal with this challenge is to build an intelligent interface agent ([8]) that learns to identify patterns in users behavior. The aim of this paper is to introduce a new agent based approach in predicting users behavior, using a probabilistic model. We propose an intelligent interface agent that uses a supervised learning technique in order to achieve the desired goal. We have used Aspect Oriented Programming ([7]) in the development of the agent in order to benefit of the advantages of this paradigm. Based on a newly defined evaluation measure, we have determined the accuracy of the agent's prediction on a case study.
Keywords: user interface; interface agent; supervised learning; aspect oriented programming
Sharing Video Browsing Style by Associating Browsing Behavior with Low-Level Features of Videos BIBAKFull-Text 518-526
  Akio Takashima; Yuzuru Tanaka
This paper focuses on a method to extract video browsing styles and reusing it. In video browsing process for knowledge work, users often develop their own browsing styles to explore the videos because the domain knowledge of contents is not enough, and then the users interact with videos according to their browsing style. The User Experience Reproducer enables users to browse new videos according to their own browsing style or other users' browsing styles. The preliminary user studies show that video browsing styles can be reused to other videos.
Keywords: video browsing; active watching; tacit knowledge
Adaptation in Intelligent Tutoring Systems: Development of Tutoring and Domain Models BIBAKFull-Text 527-534
  Oswaldo Velez-Langs; Xiomara Argüello
This paper describes the aspects kept in mind for the development of the tutoring and domain models, of an Intelligent Tutoring System (ITS), where the instruction type that will give the tutoring system, the pedagogic strategies and the structure of the course are established. Also is described the software development process and their principal functions. This work is part of the research project that involves the adaptation process of the interfaces into Intelligent Tutoring Systems at the University of Sinu's TESEEO Research Group ([2]). The final objective of this work is to provide mechanisms for the design and development of system interfaces for tutoring/training, those are effective and at the same time modular, structured, configurable, flexible and adaptable.
Keywords: Adaptive Interfaces; Tutoring Model; Domain Model; Tutoring Intelligent Systems; Instructional Cognitive Theory
Confidence Measure Based Incremental Adaptation for Online Language Identification BIBAKFull-Text 535-543
  Shan Zhong; Yingna Chen; Chunyi Zhu; Jia Liu
This paper proposes an novel two-pass adaptation method for online language identification by using confidence measure based incremental language model adaptation. In this system, we firstly used semi-supervised language model adaptation to solve the problem of channel mismatch, and then used unsupervised incremental adaptation to adjust new language model during online language identification. For robust adaptation, we compare three confidence measures and then present a new fusion method with Bayesian classifier. Tested on the RMTS (Real-world Multi-channel Telephone Speech) database, experiments show that using semi-supervised language model adaptation, the target language detection rate rises from 73.26% to 80.02% and after unsupervised incremental language model adaptation, an extra rise over 3.91% (from 80.02% to 83.93%) is obtained.
Keywords: Language Identification; Language Model Adaptation; Confidence Measure; Bayesian Fusion
Study on Speech Emotion Recognition System in E-Learning BIBAKFull-Text 544-552
  Aiqin Zhu; Qi Luo
Aiming at emotion deficiency in present E-Learning system, speech emotion recognition system is proposed in the paper. A corpus of emotional speech from various subjects, speaking different languages is collected for developing and testing the feasibility of the system. The potential prosodic features are first identified and extracted from the speech data. Then we introduce a systematic feature selection approach which involves the application of Sequential Forward Selection (SFS) with a General Regression Neural Network (GRNN) in conjunction with a consistency-based selection method. The selected features are employed as the input to a Modular Neural Network (MNN) to realize the classification of emotions. Our simulation experiment results show that the proposed system gives high recognition performance.
Keywords: E-learning; SFS; GRNN; MNN; Affective computing

Part III: Gesture and Eye Gaze Recognition

How Do Adults Solve Digital Tangram Problems? Analyzing Cognitive Strategies Through Eye Tracking Approach BIBAKFull-Text 555-563
  Bahar Baran; Berrin Dogusoy; Kursat Cagiltay
Purpose of the study is to investigate how adults solve tangram based geometry problems on computer screen. Two problems with different difficulty levels were presented to 20 participants. The participants tried to solve problems by placing seven geometric objects into correct locations. In order to analyze the process, the participants and their eye movements were recorded by an Tobii Eye Tracking device while solving the problems. The results showed that the participants employed different strategies while solving problems with different difficulty levels.
Keywords: Tangram; problem solving; eye tracking; spatial ability
Gesture Interaction for Electronic Music Performance BIBAKFull-Text 564-572
  Reinhold Behringer
This paper describes an approach for a system which analyses an orchestra conductor in real-time, with the purpose of using the extracted information of time pace and expression for an automatic play of a computer-controlled instrument (synthesizer). The system in its final stage will use non-intrusive computer vision methods to track the hands of the conductor. The main challenge is to interpret the motion of the hand/baton/mouse as beats for the timeline. The current implementation uses mouse motion to simulate the movement of the baton. It allows to "conduct" a pre-stored MIDI file of a classical orchestral music work on a PC.
Keywords: Computer music; human-computer interaction; gesture interaction
A New Method for Multi-finger Detection Using a Regular Diffuser BIBAKFull-Text 573-582
  Li-Wei Chan; Yi-Fan Chuang; Yi-Wei Chia; Yi-Ping Hung; Jane Yung-jen Hsu
In this paper, we developed a fingertip finding algorithm working with a regular diffuser. The proposed algorithm works on images captured by infra-red cameras, settled on one side of the diffuser, observing human gestures taken place on the other side. With diffusion characteristics of the diffuser, we can separate finger-touch from palm-hover events when the user interacts with the diffuser. This paper contributes on: Firstly, the technique works with a regular diffuser, infra-red camera coupled with an infra-red illuminator, which is easy to deploy and cost effective. Secondly, the proposed algorithm is designed to be robust for casually illuminated surface. Lastly, with diffusion characteristics of the diffuser, we can detect finger-touch and palm-hover events, which is useful for natural user interface design. We have deployed the algorithm on a rear-projection multi-resolution tabletop, called I-M-Top. A video retrieval application using the two events on design of UIs is implemented to show its intuitiveness on the tabletop system.
Keywords: Multi-Finger Detection; Intuitive Interaction
Lip Contour Extraction Using Level Set Curve Evolution with Shape Constraint BIBAFull-Text 583-588
  Jae Sik Chang; Eun Yi Kim; Se Hyun Park
In this work, a novel method for lip contour extraction based on level set curve evolution is presented. This method takes not only color information but also lip contour shape constraint represented by a distance function between the evolving curve and parametric shape model. In this method, the curve is evolved by minimizing an energy function that incorporates shape constraint function as internal energy, while previous curve evolution methods use a simple smoothing function. The new shape constraint function prevents the curve from evolving to arbitrary shapes occurred due to weak color contrast between lip and skin regions. Comparisons with other method are conducted to evaluate the proposed method. It showed that the proposed method provides more accurate results than other methods.
Visual Foraging of Highlighted Text: An Eye-Tracking Study BIBAKFull-Text 589-598
  Ed Huai-hsin Chi; Michelle Gumbrecht; Lichan Hong
The wide availability of digital reading material online is causing a major shift in everyday reading activities. Readers are skimming instead of reading in depth [Nielson 1997]. Highlights are increasingly used in digital interfaces to direct attention toward relevant passages within texts. In this paper, we study the eye-gaze behavior of subjects using both keyword highlighting and ScentHighlights [Chi et al. 2005]. In this first eye-tracking study of highlighting interfaces, we show that there is direct evidence of the von Restorff isolation effect [VonRestorff 1933] in the eye-tracking data, in that subjects focused on highlighted areas when highlighting cues are present. The results point to future design possibilities in highlighting interfaces.
Keywords: Automatic text highlighting; dynamic summarization; contextualization; personalized information access; eBooks; Information Scent
Effects of a Dual-Task Tracking on Eye Fixation Related Potentials (EFRP) BIBAFull-Text 599-604
  Hiroshi Daimoto; Tsutomu Takahashi; Kiyoshi Fujimoto; Hideaki Takahashi; Masaaki Kurosu; Akihiro Yagi
The eye fixation related brain potentials (EFRP) associated with the occurrence of fixation pause can be obtained by averaging EEGs at offset of saccades. EFRP is a kind of event-related brain potential (ERP) measurable at the eye movement situation. In this experiment, EFRP were examined concurrently along with performance and subjective measures to compare the effects of tracking difficulty during a dual-task. Twelve participants were assigned four different types of a tracking task for each 5 min. The difficulty of tracking task is manipulated by the easiness to track a target with a trackball and the easiness to give a correct response to the numerical problem. The workload of the each tracking condition is different in the task quality (the difficulty of perceptual motor level and/or cognitive level). As a result, the most prominent positive component with latency of about 100 ms in EFRP was observed under all tracking conditions. The amplitude of the condition with the highest workload was smaller than that of the condition with the lowest workload, while the effects of the task quality and the correspondency with the subjective difficulty in incremental step were not recognized in this experiment. The results suggested that EFRP was an useful index of the excessive mental workload.
Effect of Glance Duration on Perceived Complexity and Segmentation of User Interfaces BIBAKFull-Text 605-614
  Yifei Dong; Chen Ling; Lesheng Hua
Computer users who handle complex tasks like air traffic control (ATC) need to quickly detect updated information from multiple displays of graphical user interface. The objectives of this study are to investigate how much computer users can segment GUI display into distinctive objects within very short glances and whether human perceives complexity differently after different durations of exposure. Subjects in this empirical study were presented with 20 screenshots of web pages and software interfaces for different short durations (100ms, 500ms, 1000ms) and were asked to recall the visual objects and rate the complexity of the images. The results indicate that subjects can reliably recall 3-5 objects regardless of image complexity and exposure duration up to 1000ms. This result agrees with the "magic number 4" of visual short-term memory (VSTM). Perceived complexity by subjects is consistent among the different exposure durations, and it is highly correlated with subjects' rating on the ease to segmentation as well as the image characteristics of density, layout, and color use.
Keywords: Visual Segmentation; Perceptual Complexity; Rapid Glance
Movement-Based Interaction and Event Management in Virtual Environments with Optical Tracking Systems BIBAKFull-Text 615-624
  Maxim Foursa; Gerold Wesche
In this paper we present our experience in using optical tracking systems in Virtual Environment applications. First we briefly describe the tracking systems we used, and then we describe the application scenarios and present how we adapted the scenarios for the tracking systems. One of the tracking systems is markerless, that means that a user doesn't have to wear any specific devices to be tracked and can interact with an application with free hand movements. With our application we compare the performance of different tracking systems and demonstrate that it is possible to perform complex actions in an intuitive way with just small special knowledge of the system and without any specific devices. This is a step forward to a more natural human-computer interface.
Keywords: tracking systems; virtual environments; application scenarios; interaction techniques
Multiple People Gesture Recognition for Human-Robot Interaction BIBAKFull-Text 625-633
  Seok-Ju Hong; Nurul Arif Setiawan; Chil-Woo Lee
In this paper, we propose gesture recognition in multiple people environment. Our system is divided into two modules: Segmentation and Recognition. In segmentation part, we extract foreground area from input image, and we decide the closest person as a recognition subject. In recognition part, firstly we extract feature point of subject's both hands using contour based method and skin based method. Extracted points are tracked using Kalman filter. We use trajectories of both hands for recognizing gesture. In this paper, we use the simple queue matching method as a recognition method. We also apply our system as an animation system. Our method can select subject effectively and recognize gesture in multiple people environment. Therefore, proposed method can be used for real world application such as home appliance and humanoid robot.
Keywords: Context Aware; Gesture Recognition; Multiple People
Position and Pose Computation of a Moving Camera Using Geometric Edge Matching for Visual SLAM BIBAKFull-Text 634-641
  HyoJong Jang; Gye-Young Kim; Hyung-Il Choi
A prerequisite component of a autonomous mobile vehicle system is the self localization ability to recognize its environment and to estimate where it is. Generally, we can determine the position and the pose using homography approach, but it has errors especially in simultaneous change of position and pose. In this paper, we proposed position and pose computation method of a camera through analysis of images obtained from camera equipped mobile robot. Proposed method is made up of two steps. First step is to extract feature points and matching in sequential images. Second step is to compute the accurate camera position and pose using geometric edge matching. In first step, we use KLT tracking to extract feature points and matching in sequential images. In second step, we propose an iterative matching method between predicted edge models through perspective transform using the result calculated by homography of the matched feature points and generated edge models in correspond points till there is no variation in matching error. For the purpose of the performance evaluation, we performed the test to compensate the position and the pose of the camera installed in wireless-controlled vehicle with the video sequence stream obtained at 15Hz frame rate and show the experimental results.
Keywords: vSLAM; Perspective Transformation; KLT tracking; Geometric Edge Matching
"Shooting a Bird": Game System Using Facial Feature for the Handicapped People BIBAKFull-Text 642-648
  Jinsun Ju; Yunhee Shin; Eun Yi Kim
This paper presents a novel computer game system that controls a game using only the movement of human's facial features. Our system is specially designated for the handicapped people with severe disabilities and the people without experience of using the computer. Using a usual PC camera, the proposed game system detects the user's eye movement and mouse movement, and then interprets the communication intent to play a game. The game system is tested with 42 numbers of people, and then the result shows that our game system should be efficiently and effectively used as the interface for the disabled people.
Keywords: Augmented game; HCI; Facial feature tracking; neural network
Human Pose Estimation Using a Mixture of Gaussians Based Image Modeling BIBAKFull-Text 649-658
  Do Joon Jung; Kyung Su Kwon; Hang Joon Kim
In this paper, we propose an approach toward body parts representation, localization, and human pose estimation from an image. In the image, the human body parts and a background are represented by a mixture of Gaussians, and the body parts configuration is modeled by a Bayesian network. In this model, state nodes represent pose parameters of an each body part, and arcs represent spatial constraints. The Gaussian mixture distribution is used to model the prior distribution for the body parts and the background as a parametric model. We estimate the human pose through an optimization of the pose parameters using likelihood objective functions. The performance of the proposed approach is illustrated on various single images, and improves the human pose estimation quality.
Keywords: Human Pose Estimation; Mixture of Gaussians; Bayesian Network
Human Motion Modeling Using Multivision BIBAFull-Text 659-668
  Byoung-Doo Kang; Jae-Seong Eom; Jong-Ho Kim; Chulsoo Kim; Sang-Ho Ahn; Bum-Joo Shin; Sang-Kyoon Kim
In this paper, we propose a gesture modeling system based on computer vision in order to recognize a gesture naturally without any trouble between a system and a user using real-time 3D modeling information on multiple objects. It recognizes a gesture after 3D modeling and analyzing the information pertaining to the user's body shape in stereo views for human movement. In the 3D-modeling step, 2D information is extracted from each view by using an adaptive color difference detector. Potential objects such as faces, hands, and feet are labeled by using the information from 2D detection. We identify reliable objects by comparing the similarities of the potential objects that are obtained from both the views. We acquire information on 2D tracking from the selected objects by using the Kalman filter and reconstruct it as a 3D gesture. A joint of each part of a body is generated in the combined objects. We experimented on ambiguities using occlusion, clutter, and irregular 3D gestures to analyze the efficiency of the proposed system. In this experiment, the proposed gesture modeling system showed a good detection and a processing time of 30 frames per second, which can be used in a real-time.
Real-Time Face Tracking System Using Adaptive Face Detector and Kalman Filter BIBAFull-Text 669-678
  Jong-Ho Kim; Byoung-Doo Kang; Jae-Seong Eom; Chulsoo Kim; Sang-Ho Ahn; Bum-Joo Shin; Sang-Kyoon Kim
In this paper, we propose a real-time face tracking system using adaptive face detector and the Kalman filter. Basically, the features used for face detection are five types of simple Haar-like features. To only extract the more significant features from these features, we employ principal component analysis (PCA). The extracted features are used for a learning vector of the support vector machine (SVM), which classifies the faces and non-faces. The face detector locates faces from the face candidates separated from the background by using real-time updated skin color information. We trace the moving faces with the Kalman filter, which uses the static information of the detected faces and the dynamic information of changes between previous and current frames. In this experiment, the proposed system showed an average tracking rate of 97.3% and a frame rate of 23.5 frames per s, which can be adapted into a real-time tracking system.
Kalman Filtering in the Design of Eye-Gaze-Guided Computer Interfaces BIBAKFull-Text 679-689
  Oleg Komogortsev; Javed I. Khan
In this paper, we design an Attention Focus Kalman Filter (AFKF) -- a framework that offers interaction capabilities by constructing an eye-movement language, provides real-time perceptual compression through Human Visual System (HVS) modeling, and improves system's reliability. These goals are achieved by an AFKF through identification of basic eye-movement types in real-time, the prediction of a user's perceptual attention focus, and the use of the eye's visual sensitivity function and eye-position data signal de-noising.
Keywords: Human Visual System Modeling; Kalman Filter; Human Computer Interaction; Perceptual Compression
Human Shape Tracking for Gait Recognition Using Active Contours with Mean Shift BIBAKFull-Text 690-699
  Kyung Su Kwon; Se Hyun Park; Eun Yi Kim; Hang Joon Kim
In this paper, we present a human shape extraction and tracking for gait recognition using geodesic active contour models (GACMs) combined with mean-shift algorithm. The active contour models (ACMs) are very effective to deal with the non-rigid object because of its elastic property, but they have the limitation that their performance is mainly dependent on the initial curve. To overcome this problem, we combine the mean-shift algorithm with the traditional GACMs. The main idea is very simple. Before evolving using level-set method, the initial curve in each frame is re-localized near the human region and is resized enough to include the target object. This mechanism allows for reducing the number of iterations and for handling the large object motion. Our system is composed of human region detection and human shape tracking. In the human region detection module, the silhouette of a walking person is extracted by background subtraction and morphologic operation. Then human shape are correctly obtained by the GACMs with mean-shift algorithm. To evaluate the effectiveness of the proposed method, it is applied the common gait data, then the results show that the proposed method is extracted and tracked efficiently accurate shape for gait recognition.
Keywords: Human Shape Tracking; Geodesic Active Contour Models; Mean Shift; Gait Recognition
Robust Gaze Tracking Method for Stereoscopic Virtual Reality Systems BIBAFull-Text 700-709
  Eui Chul Lee; Kang Ryoung Park; Min Cheol Whang; Junseok Park
In this paper, we propose a new face and eye gaze tracking method that works by attaching gaze tracking devices to stereoscopic shutter glasses. This paper presents six advantages over previous works. First, through using the proposed method with stereoscopic VR systems, users feel more immersed and comfortable. Second, by capturing reflected eye images with a hot mirror, we were able to increase eye gaze accuracy in a vertical direction. Third, by attaching the infrared passing filter and using an IR illuminator, we were able to obtain robust gaze tracking performance irrespective of environmental lighting conditions. Fourth, we used a simple 2D-based eye gaze estimation method based on the detected pupil center and the 'geometric transform' process. Fifth, to prevent gaze positions from being unintentionally moved by natural eye blinking, we discriminated between different kinds of eye blinking by measuring pupil sizes. This information was also used for button clicking or mode toggling. Sixth, the final gaze position was calculated by the vector summation of face and eye gaze positions and allowing for natural face and eye movements. Experimental results showed that the face and eye gaze estimation error was less than one degree.
EyeScreen: A Gesture Interface for Manipulating On-Screen Objects BIBAFull-Text 710-717
  Shanqing Li; Jingjun Lv; Yihua Xu; Yunde Jia
This paper presented a gesture-based interaction system which provides a natural way of manipulating on-screen objects. We generate a synthetic image by linking images from two cameras to recognize hand gestures. The synthetic image contains all the features captured from two different views, which can be used to alleviate the self-occlusion problem and improve the recognition rate. The MDA and EM algorithms are used to obtain parameters for pattern classification. To compute more detailed pose parameters such as fingertip positions and hand contours in the image, a random sampling method is introduced in our system. We describe a method based on projective geometry for background subtraction to improve the system performance. Robustness of the system has been verified by extensive experiments with different user scenarios. The applications of picture browser and visual pilot are discussed in this paper.
GART: The Gesture and Activity Recognition Toolkit BIBAKFull-Text 718-727
  Kent Lyons; Helene Brashear; Tracy L. Westeyn; Jungsoo Kim; Thad Starner
The Gesture and Activity Recognition Toolkit (GART) is a user interface toolkit designed to enable the development of gesture-based applications. GART provides an abstraction to machine learning algorithms suitable for modeling and recognizing different types of gestures. The toolkit also provides support for the data collection and the training process. In this paper, we present GART and its machine learning abstractions. Furthermore, we detail the components of the toolkit and present two example gesture recognition applications.
Keywords: Gesture recognition; user interface toolkit
Static and Dynamic Hand-Gesture Recognition for Augmented Reality Applications BIBAKFull-Text 728-737
  Stefan Reifinger; Frank Wallhoff; Markus Ablaßmeier; Tony Poitschke; Gerhard Rigoll
This contribution presents our approach for an instrumented automatic gesture recognition system for use in Augmented Reality, which is able to differentiate static and dynamic gestures. Basing on an infrared tracking system, infrared targets mounted at the users thumbs and index fingers are used to retrieve information about position and orientation of each finger. Our system receives this information and extracts static gestures by distance classifiers and dynamic gestures by statistical models. The concluded gesture is provided to any connected application. We introduce a small demonstration as basis for a short evaluation. In this we compare interaction in a real environment, Augmented Reality with a mouse/keyboard, and our gesture recognition system concerning properties, such as task execution time or intuitiveness of interaction. The results show that tasks executed by interaction with our gesture recognition system are faster than using the mouse/keyboard. However, this enhancement entails a slightly lowered wearing comfort.
Keywords: Augmented Reality; Gesture Recognition; Human Computer Interaction
Multiple People Labeling and Tracking Using Stereo for Human Computer Interaction BIBAKFull-Text 738-746
  Nurul Arif Setiawan; Seok-Ju Hong; Chil-Woo Lee
In this paper, we propose a system for multiple people tracking using fragment based histogram matching. Appearance model is based on Improved HLS color histogram which can be calculated efficiently using integral histogram representation. Since the histograms will loss all spatial information, we define a fragment based region representation which retains spatial information, robust against occlusion and scale issue by using disparity information. Multiple people labeling is maintained by creating an online appearance representation for each person detected in the scene and calculating fragment vote map. Initialization is performed automatically from the background segmentation step.
Keywords: Integral Histogram; Fragment Based Tracking; Multiple People; Stereo Vision
A Study of Human Vision Inspection for Mura BIBAKFull-Text 747-754
  Pei-Chia Wang; Sheue-Ling Hwang; Chao-Hua Wen
In the present study, some factors were considered such as the various types and sizes of real Mura, and Mura inspection experience. The steps of data collection and experiments were conducted systematically from the viewpoint of human factors. From the experimental results, Mura size was the most important factor on visual contrast threshold. The purpose of this research was to objectively describe the relationships between the Mura characteristics and visual contrast thresholds. Furthermore, a domestic JND model of LCD industry was constructed. This model could be an inspection criterion for LCD industry.
Keywords: Mura; JND; vision; LCD
Tracing Users' Behaviors in a Multimodal Instructional Material: An Eye-Tracking Study BIBAKFull-Text 755-762
  Esra Yecan; Evren Sumuer; Bahar Baran; Kursat Cagiltay
This study aims to explore user behaviors in instructional environments combining multimodal presentation of information. Cognitive load theory and dual coding theory were taken as the theoretical perspectives for the analyses. For this purpose, user behaviors were analyzed by recording participants' eye movements while they were using an instructional material with synchronized video and PowerPoint slides. 15 participants' eye fixation counts and durations for specific parts of the material were collected. Findings of the study revealed that the participants used the slide and video presentations in a complementary way.
Keywords: Producer; PowerPoint; video; eye tracking; cognitive load; dual coding; multiple channels
A Study on Interactive Artwork as an Aesthetic Object Using Computer Vision System BIBAKFull-Text 763-768
  Joonsung Yoon; Jaehwa Kim
With the recent rapid rise of Human-Computer Interaction and surveillance system, various application systems are a matter of primary concern. However, the application systems mostly deal with the technologies of recognition facial characteristics, analyzing facial expression and automatic face recognition. By applying this kind of various technologies and methods of face recognition, I made an interactive artwork after computing the range of hands. This study is about the artwork application theory and using computer vision system method. The approach of this study makes possible to create artworks application in real-time. Now, I'd like to propose how to utilize analyze and make interactions, of created artworks. And also I'll explain the immersion of the viewers. The viewers can express their imagination freely and artists provide viewers an opportunity not to only enjoy visual experience, but also interact and be immersed in the works via interface. This interactive art makes viewers to actually take part in the works.
Keywords: aesthetic object; artistic desire; interactive art; art and science
Human-Computer Interaction System Based on Nose Tracking BIBAKFull-Text 769-778
  Lumin Zhang; Fuqiang Zhou; Weixian Li; Xiaoke Yang
This paper presents a novel Human-Computer Interaction (HCI) system with calibrated mono-camera which integrates active computer vision technology and embedded speech command recognition technology. Mainly by tracking the nose tip motion robustly as the mouse trace, this system completes mouse mission with recognition rate more than 85% at the speed 15 frame per second. To achieve the goal, we adopt a novel approach based on the symmetry of the nose plane feature to localize and track invariantly to the varied environment. Comparing to other kinds of pointing device, this hand-free HCI system is hands-free, cheap, real-time, convenient and unpolluted, which can be used in the field of disabled aid, entertainment and remote control.
Keywords: HCI; Nose Tracking; Calibration
Evaluating Eye Tracking with ISO 9241 -- Part 9 BIBAKFull-Text 779-788
  Xuan Zhang; I. Scott MacKenzie
The ISO 9241-9 standard for computer pointing devices proposes an evaluation of performance and comfort [4]. This paper is the first eye tracking evaluation conforming to ISO 9241-9. We evaluated three techniques and compared them with a standard mouse. The evaluation used throughput (in bits/s) as a measurement of user performance in a multi-directional point-select task. The "Eye Tracking Long" technique required participants to look at an on-screen target and dwell on it for 750 ms for selection. Results revealed a lower throughput than for the "Eye Tracking Short" technique with a 500 ms dwell time. The "Eye+Spacebar" technique allowed participants to "point" with the eye and "select" by pressing the spacebar upon fixation. This eliminated the need to wait for selection. It was the best among the three eye tracking techniques with a throughput of 3.78 bits/s, which was close to the 4.68 bits/s for the mouse.
Keywords: Pointing devices; ISO 9241; Fitts' law; performance evaluation; eye movement; eye tracking
Impact of Mental Rotation Strategy on Absolute Direction Judgments: Supplementing Conventional Measures with Eye Movement Data BIBAKFull-Text 789-798
  Ronggang Zhou; Kan Zhang
By training participants to use map-first mental rotation as their primary strategy on absolute navigational task, this study focused on how integration of heading information (from the exocentric reference frame) with target position information (from the egocentric reference frame) affects absolute direction judgments. Comparing with previous studies, the results in this study showed (1) response was not better for north than for south, (2) response was the slowest for back position in canonical position condition, and (3) the cardinal direction advantage of right-back position was not impaired. Eye movement data supported these conclusions partially, and should be cautious to use for similar goals. These findings can be applied to navigational training and interfaces design such as electric space.
Keywords: absolute direction judgments; mental rotation strategy; eye movement; reference frame

Part IV: Interactive TV and Media

Beyond Mobile TV: Understanding How Mobile Interactive Systems Enable Users to Become Digital Producers BIBAKFull-Text 801-810
  Anxo Cereijo Roibás; Riccardo Sala
This paper aims to explore the quality of the user experience with mobile and pervasive interactive multimedia systems that enable the creation and sharing of digital content through mobile phones. It also looks at discussing the use and validity of different experimental in-situ and other data gathering and evaluation techniques for the assessment of how the physical and social contexts might influence the use of these systems. This scenario represents an important shift away from professionally produced digital content for the mass-market. It addresses methodologies and techniques that are suitable to design co-creative applications for non-professional users in different contexts of use at home or in public spaces. Special focus is be given to understand how user participation and motivation in small themed communities can be encouraged, and how social interaction can be enabled through mobile interfaces. An enhancement of users creativity, self-authored content sharing, sociability and co-experience can be evidence for how creative people can benefit from Information and Communication Technologies.
Keywords: users' generated content; pervasive multimedia; mobileTV
Media Convergence, an Introduction BIBAFull-Text 811-814
  Sepideh Chakaveh; Manfred Bogen
Media convergence is a theory in communications where every mass medium eventually merges to the point where they become one medium due to the advent of new communication technologies. The Media Convergence research theme normally refers to entire production, distribution, and use process of future digital media services from contents production to service delivery through various channels such as mobile terminals, digital TV, or the Internet.
An Improved H.264 Error Concealment Algorithm with User Feedback Design BIBAKFull-Text 815-820
  XiaoMing Chen; Yuk Ying Chung
This paper proposes a new Error Concealment (EC) method for the H.264/AVC [1] video coding standard using both spatial and temporal information for intra-frame concealment. Five error concealing modes are offered by this method. The proposed EC method also allows feedback from users. It allows users to define and change the thresholds for switching between five different modes during the error concealing procedure. As a result, the concealing result for a video sequence can be optimized by taking advantage of relevant user feedback. The concealed video quality has been measured by a group of users and compared with the H.264 EC method which is without user feedback. The experimental results show that the proposed new EC algorithm with the user feedback performs better (3 dB gains) than the H.264 EC without user feedback.
Keywords: H.264; Error Concealment; User Feedback; Video Compression
Classification of a Person Picture and Scenery Picture Using Structured Simplicity BIBAKFull-Text 821-828
  Myoung-Bum Chung; Il-Ju Ko
We can classify various images as either people pictures, if they contain one or more persons, or scenery pictures, if they lack people, by using face region detection. However, the precision of a picture's classification is low if it uses existing face region detection technique. This paper proposes the algorithm about structured simplicity of the picture to do classification with higher accuracy. To verify the usefulness of an offer method, we did a classification experiment which uses 500 people pictures and scenery pictures. The experiment to use only face region detection in Open CV showed an accuracy of 79% detection rate. While the experiment to use face region detection in structured simplicity with Open CV showed an accuracy of 86.4%. Therefore by using structured simplicity with face region detection, we can do an efficient picture classification of a person picture and scenery picture.
Keywords: Face region detection; Picture classification; Structured simplicity
Designing Personalized Media Center with Focus on Ethical Issues of Privacy and Security BIBAKFull-Text 829-835
  Alma Leora Culén; Yonggong Ren
While considering the development of interactive television (iTV), we also need to consider new possibilities for personalization of its audio-video content as well as ethical issues related to such personalization. While offering immense possibilities for new ways of informing, communicating, gaming as well as watching selected and personalized broadcasted content, doors also open to misuse, manipulation and destructive behavior. Our goal is to propose and analyze a user-centered prototype for iTV, while keeping in mind ethical principles that we hope would lead to a positive experience of this forthcoming technology.
Keywords: interactive television; experience; ethics; privacy; multi-touch interface
Evaluation of VISTO: A New Vector Image Search TOol BIBAKFull-Text 836-845
  Tania Di Mascio; Daniele Frigioni; Laura Tarantino
We present en experimental evaluation of VISTO (Vector Image Search TOol), a new content-based image retrieval (CBIR) system that deals with vector images in SVG (Scalable Vector Graphics) format, differently to most of the CBIR tools available in the literature that deal with raster images. The experimental evaluation of retrieval systems is a critical part in the process of continuously improving the existing retrieval metrics. While researchers in text image retrieval have long been using a sophisticated set of tools for user-based evaluation, this does not yet apply to image retrieval. In this paper, we make a step forward toward this direction and present an experimental evaluation of VISTO in a framework for the production of 2D animation.
Keywords: Content Based Image Retrieval; vector images; SVG; evaluation
G-Tunes -- Physical Interaction Design of Playing Music BIBAKFull-Text 846-851
  Jia Du; Ying Li
In this paper we present G-tunes, a music player that couples tangible interface with digital music. The design is done based on the research of tangible interface and interaction engineering. We offer an overview of design concept, explain the prototyping and discuss the result. One of the goals of this project is to create rich experiences for people to play music; another goal is to explore how external physical expressions relate to human's inner perception and emotion, and how we can couple this with the design of a tangible music player.
Keywords: Interaction design; Tangible interaction; Sensory perception; Music player; Scale; Weight
nan0sphere: Location-Driven Fiction for Groups of Users BIBAFull-Text 852-861
  Kevin Eustice; Venkatraman Ramakrishna; Alison Walker; Matthew Schnaider; Nam T. Nguyen; Peter L. Reiher
We developed a locative fiction application called nan0sphere and deployed it on the UCLA campus. This application presents an interactive narrative to users working in a group as they move around the campus. Based on each user's current location, previously visited locations, actions taken, and on the similar attributes of other users in the same group, the story will develop in different ways. Group members are encouraged by the story to move independently, with their individual actions and progress affecting the narrative and the overall group experience. Eight different locations on campus are involved in this story. Groups consist of four participants, and the complete story unfolds through the actions of all four group members. The supporting system could be used to create other similar types of locative literature, possibly augmented with multimedia, for other purposes and in other locations. We will discuss benefits and challenges of group interactions in locative fiction, infrastructure required to support such applications, issues of determining user locations, and our experiences using the application.
How Panoramic Photography Changed Multimedia Presentations in Tourism BIBAKFull-Text 862-871
  Nelson Gonçalves
An overview of the use of panoramic photography, the panorama concept, and evolution of presentation and multimedia projects targeting tourism promotions The purpose is to stress the importance of panoramic pictures in the Portuguese design of the multimedia systems for the promotion of tourism. Through photography in the multimedia support on-line and off-line, the user can go back in time and watch what those landscapes were like in his/her childhood, for example. Consequently, one of the additional quality options in our productions is the diachronic view of the landscape.
Keywords: Design; Multimedia; CD-ROM; DVD; Web; Photography; Panorama; Tourism; Virtual Tour
Frame Segmentation Used MLP-Based X-Y Recursive for Mobile Cartoon Content BIBAKFull-Text 872-881
  Eunjung Han; Kirak Kim; HwangKyu Yang; Keechul Jung
With rapid growth of the mobile industry, the limitation of small screen mobile is attracting a lot of researchers attention for transforming on/off-line contents into mobile contents. Frame segmentation for limited mobile browsers is the key point of off-line contents transformation. The X-Y recursive cut algorithm has been widely used for frame segmentation in document analysis. However, this algorithm has drawbacks for cartoon images which have various image types and image with noises, especially the online cartoon contents obtain during scanning. In this paper, we propose a method to segment on/off-line cartoon contents into fitted frames for the mobile screen. This makes the x-y recursive cut algorithm difficult to find the exact cutting point. Therefore we use a method by combining two concepts: an X-Y recursive cut algorithm to extract candidate segmenting positions which shows a good performance on noises free contents, and Multi-Layer Perceptrons (MLP) concept use on candidate for verification. These methods can increase the accuracy of the frame segmentation and feasible to apply on various off-line cartoon images with frames.
Keywords: MLP; X-Y recursive; frame segmentation; mobile cartoon contents
Browsing and Sorting Digital Pictures Using Automatic Image Classification and Quality Analysis BIBAKFull-Text 882-891
  Otmar Hilliges; Peter Kunath; Alexey Pryakhin; Andreas Butz; Hans-Peter Kriegel
In this paper we describe a new interface for browsing and sorting of digital pictures. Our approach is two-fold. First we present a new method to automatically identify similar images and rate them based on sharpness and exposure quality of the images. Second we present a zoomable user interface based on the details-on-demand paradigm enabling users to browse large collections of digital images and select only the best images for further processing or sharing.
Keywords: Photoware; digital photography; image analysis; similarity measurement; informed browsing; zoomable user interfaces; content based image retrieval
A Usability Study on Personalized EPG (pEPG) UI of Digital TV BIBAKFull-Text 892-901
  Myo Ha Kim; Sang Min Ko; Jae Seung Mun; Yong Gu Ji; Moon Ryul Jung
As the use of digital television (D-TV) has spread across the globe, usability problems on D-TV have become an important issue. However, so far, very little has been done in the usability studies on D-TV. The aim of this study is developing evaluation methods for the user interface (UI) of a personalized electronic program guide (pEPG) of D-TV, and evaluating the UI of a working prototype of pEPG using this method. To do this, first, the structure of the UI system and navigation for a working prototype of pEPG was designed considering the expanded channel. Secondly, the evaluation principles as the usability method for a working prototype of pEPG were developed. Third, lab-based usability testing for a working prototype of pEPG was conducted with these evaluation principles. The usability problems founded by usability testing were reflected to improve the UI of a working prototype of pEPG.
Keywords: Usability; User Interface (UI); Evaluation Principles; Personalized EPG (pEPG); Digital Television (D-TV)
Recognizing Cultural Diversity in Digital Television User Interface Design BIBAKFull-Text 902-908
  Joonhwan Kim; Sanghee Lee
Research trends in user interface design and human-computer interaction have been shifting toward the consideration of use context. The reflection of differences in users' cultural diversity is an important topic in the consumer electronics design process, particularly for widely internationally sold products. In the present study, the authors compared users' responses to preference and performance to investigate the effect of different cultural backgrounds. A high-definition display product with digital functions was selected as a major digital product domain. Four user interface design concepts were suggested, and user studies were conducted internationally with 57 participants in three major market countries. The tests included users' subjective preferences on the suggested graphical designs, performances of the on-screen display navigation, and feedback on newly suggested TV features. For reliable analysis, both qualitative and quantitative data were measured. The results reveal that responses to design preference were affected by participants' cultural background. On the other hand, universal conflicts between preference and performance were witnessed regardless of cultural differences. This study indicates the necessity of user studies of cultural differences and suggests an optimized level of localization in the example of digital consumer electronics design.
Keywords: User Interface Design; Cultural Diversity; Consumer Electronics; Digital Television; Usability; Preference; Performance; International User Studies
A Study on User Satisfaction Evaluation About the Recommendation Techniques of a Personalized EPG System on Digital TV BIBAKFull-Text 909-917
  Sang Min Ko; Yeon Jung Lee; Myo Ha Kim; Yong Gu Ji; Soo Won Lee
With the growing popularity of digital broadcasting, viewers the have chance to watch various programs. However, they may have trouble choosing just one among many programs. To solve this problem, various studies about EPG and Personalized EPG have been performed. In this study, we reviewed previous studies about EPG, Personalized EPG and the results of recommendation evaluations, and evaluated PEPG system's recommendation, which was implemented as working prototype. We collected preference information about categories and channels with 30 subjects and executed evaluation through e-mail. Recall and Precision were calculated by analyzing recommended programs from an E-mail questionnaire, and an evaluation of subjective satisfaction was conducted. As a result, we determined how much the result of an evaluation reflects viewer satisfaction by comparing the variation of subjects' satisfaction and the variation of objective evaluation criteria.
Keywords: EPG; PEPG; Satisfaction; Digital TV; DTV
Usability of Hybridmedia Services -- PC and Mobile Applications Compared BIBAKFull-Text 918-925
  Jari Laarni; Liisa Lähteenmäki; Johanna Kuosmanen; Niklas Ravaja
The aim is to present results of a usability test of a prototype of a context-based personalized hybridmedia service for delivering product-specific information to consumers. We recorded participants' eye movements when they used the service either with a camera phone or with the web browser of a PC. The participants' task was to search for product-specific information from the food product database and test calculators by using both a PC and mobile user interface. Eye movements were measured by a head-mounted eye tracking system. Even though the completion of the tasks took longer when the participants used the mobile phone than when they used the PC, they could complete the tasks successfully with both interfaces. Provided that the barcode tag was not very small, taking pictures from the barcodes with a mobile phone was quite easy. Overall, the use of the service via the mobile phone provides a quite good alternative for the PC.
Keywords: Hybridmedia; usability; eye tracking; barcode reading
m-YouTube Mobile UI: Video Selection Based on Social Influence BIBAKFull-Text 926-932
  Aaron Marcus; Angel Perez
The ease-of-use of Web-based video-publishing services provided by applications like YouTube has encouraged a new means of asynchronous communication, in which users can post videos not only to make them public for review and criticism, but also as a way to express moods, feelings, or intentions to an ever-growing network of friends. Following the current trend of porting Web applications onto mobile platforms, the authors sought to explore user-interface design issues of a mobile-device-based YouTube, which they call m-YouTube. They first analyzed the elements of success of the current YouTube Web site and observed its functionality. Then, they looked for unsolved issues that could give benefit through information-visualization design for small screens on mobile phones to explore a mobile version of such a product/service. The biggest challenge was to reduce the number of functions and amount information to fit into a mobile phone screen, but still be usable, useful, and appealing within the YouTube context of use and user experience. Borrowing ideas from social research in the area of social influence processes, they made design decisions aiming to help YouTube users to make the decision of what video content to watch and to increase the chances of YouTube authors being evaluated and observed by peers. The paper proposes a means to visualize large amounts of video relevant to YouTube users by using their friendship network as a relevance indicator to help in the decision-making process.
Keywords: design; interface; mobile; network; social; user; YouTube; video
Can Video Support City-Based Communities? BIBAFull-Text 933-942
  Raquel Navarro-Prieto; Nidia Berbegal
The goal of our research has been to investigate the different ways in with using new communication technologies, especially mobile multimedia communications, could support the city-based communities. In this paper we review the research done about the effect of mobile technology, specially mobile video, into communities' communication patterns, and highlight the new challenges and gaps still not covered in this area. Finally, we will describe how we have tried to respond to these challenges by using User Centered Design in two very different types of communities: women associations, and elderly people.
Watch, Press, and Catch -- Impact of Divided Attention on Requirements of Audiovisual Quality BIBAKFull-Text 943-952
  Ulrich Reiter; Satu Jumisko-Pyykkö
Many of today's audiovisual application systems offer some kind of interactivity. Yet, quality assessments of these systems are often performed without taking into account the possible effects of divided attention caused by interaction or user task. We present a subjective assessment performed among 40 test subjects to investigate the impact of divided attention on the perception of audiovisual quality in interactive application systems. Test subjects were asked to rate the overall perceived audiovisual quality in an interactive 3D scene with varying degrees of interactive tasks to be performed by the subjects. As a result we found that the experienced overall quality did not vary with the degree of interaction. The results of our study make clear that in the case where interactivity is offered in an audiovisual application, it is not generally possible to technically lower the signal quality without perceptual effects.
Keywords: audiovisual quality; subjective assessment; divided attention; interactivity; task
Media Service Mediation Supporting Resident's Collaboration in ubiTV BIBAFull-Text 953-962
  Choonsung Shin; Hyoseok Yoon; Woontack Woo
A smart home is an intelligent and shared space, where various services coexist and multiple residents with different preferences and habits share these services most of the time. Due to the sharing of space and time, service conflicts may occur when multiple users try to access media services. In this paper, we propose a context-based mediation method, consisting of service mediators and mobile mediators, to resolve the service conflicts in a smart home. The service mediators detect service conflicts among the residents and recommend their preferred media contents on a shared screen and their own mobile devices by exploiting users' preferences and service profiles. The mobile mediators collect the recommendation information and give the users personal recommendation. With combination of the service and mobile mediator, the residents are allowed to negotiate the media contents in the conflict situation. Based on experiments in the ubiHome, we observed that mediation is useful to encourage discussion and helps to choose a proper service in a conflict situation. Therefore, we expect the proposed mediation method to play a vital role in resolving conflicts and providing multiple residents with harmonized services in a smart home environment.
Implementation of a New H.264 Video Watermarking Algorithm with Usability Test BIBAKFull-Text 963-970
  Mohd Afizi Mohd Shukran; Yuk Ying Chung; XiaoMing Chen
With the proliferation of digital multimedia content, issues of copyright protection have become more important because the copying of digital video does not result in the decrease in quality that occurs when analog video is copied. One method of copyright protection is to embed a digital code, "watermark", into the video sequence. The watermark can then unambiguously identify the copyright holder of the video sequence. In this paper, we propose a new video watermarking algorithm for the H.264 coded video with considering usability factors. The usability testings based on the concept of Human Computer Interface (HCI) have been performed on the proposed approach. The usability testing has been considered representative for most image manipulations and attacks. The proposed algorithm has passed all the attack testings. Therefore, the watermarking mechanisms in this paper have been proved to be robust and efficient to protect the copyright of H.264 coded video.
Keywords: Video watermarking; H.264; Human Computer Interface (HCI)
Innovative TV: From an Old Standard to a New Concept of Interactive TV -- An Italian Job BIBAKFull-Text 971-980
  Rossana Simeoni; Linnea Etzler; Elena Guercio; Monica Perrero; Amon Rapp; Roberto Montanari; Francesco Tesauri
The current market of television services adopts several broadcast technologies (e.g. IPTV, DVBH, DTT), delivering different ranges of contents. These services may be extremely heterogeneous, but they're all affected by the continuous increase in quantity of contents and this trend is becoming more and more complicated to manage. Hence, future television services must respond to an emerging question: in what way could the navigation among this increasing volume of multimedia contents be facilitated? To answer this question, a research study was conducted, resulting in a set of guidelines for Interactive TV development. At first, the current scenario was portrayed through a functional analysis of existing TV systems and a survey of actual and potential users. Subsequently, interaction models which could possibly be applied to Interactive TV (e.g.: peer-to-peer programs) were assessed. Guidelines were eventually defined as a synthesis of current best practices and new interactive features.
Keywords: Interactive TV; IPTV; enhanced TV; media consumers; peer-to-peer; focus group; heuristic evaluation
Evaluating the Effectiveness of Digital Storytelling with Panoramic Images to Facilitate Experience Sharing BIBAKFull-Text 981-989
  Zuraidah Sulaiman; Nor Laila Md. Noor; Narinderjit Singh; Suet Peng Yong
Technology advancement has now enabled experience sharing to happen in a digital storytelling environment that is facilitated through different delivery technologies such as panoramic images and virtual reality. However, panoramic images have not being fully explored and formally studied especially to assist experience sharing in digital storytelling setting. This research aims to study the effectiveness of an interactive digital storytelling to facilitate the sharing of experience. The interactive digital storytelling artifact was developed to convey the look and feel of Universiti Teknologi PETRONAS through the panoramic images. The effectiveness of digital storytelling through panoramic images was empirically tested based on the adapted Delone and McLean IS success model. The experiment was conducted on participants who have never visited the university. Six hypotheses were derived and experiment showed that there are correlations between user satisfaction of digital storytelling with panoramic images and user's individual impact of the application to assist experience sharing among users. Hence, this research concludes a model on the production of an effective digital storytelling with panoramic images for specific experience sharing to bloom among users.
Keywords: Digital storytelling; interactivity; panoramic images; experience sharing; effective system; effectiveness study; human computer interaction
User-Centered Design and Evaluation of a Concurrent Voice Communication and Media Sharing Application BIBAKFull-Text 990-999
  David Wheatley
This paper describes two user-centered studies undertaken in the development of a concurrent group voice and media sharing application. The first used paper prototyping to identify the user values relating to a number of functional capabilities. These results informed the development of a prototype application, which was ported to a 3G handset and evaluated in the second study using a conjoint analysis approach. Results indicated that concurrent photo sharing was of high user value, while the value of video sharing was limited by established mental models of file sharing. Overall higher ratings were found among female subjects and among less technologically aware subjects and most media sharing would be with those who are close and trusted. This, and other results suggest that the reinforcement of social connections, spontaneity and emotional communications would be important user objectives of such a media sharing application.
Keywords: User centered design; wireless communications; concurrent media sharing; cell-phone applications
Customer-Dependent Storytelling Tool with Authoring and Viewing Functions BIBAKFull-Text 1000-1009
  Sunhee Won; Mi Young Choi; Gye-Young Kim; Hyung-Il Choi
The animation is the main content of the digital storytelling. It usually has the fixed number of characters. We want a customer to appear in the animation as a main character. For this purpose, we have developed the tool that helps to automatically implants facial shapes of a customer into the existing animation images. Our tool first takes an image of a customer and extracts out a face region and some valuable features that depicts the shape and facial expression of the customer. Our tool has the module that changes the existing character's face with that of the customer. This module employs the facial expression recognition and warping functions so that the customer's face fits into the confined region with the similar facial expression. Our tool also has the module that shows the sequence of images in the form of animation. This module employs the data compression function and produces the AVI format files and throws them into the graphic board.
Keywords: facial expression recognition; warping functions
Reliable Partner System Always Providing Users with Companionship Through Video Streaming BIBAKFull-Text 1010-1018
  Takumi Yamaguchi; Kazunori Shimamura; Haruya Shiba
This paper presents a basic configuration of a system that provides dynamic delivery of full-motion video while following target users in ubiquitous computing environments. The proposed system is composed of multiple computer displays with radio frequency identification (RFID) tag readers, which are automatically connected to a network via IP, and RFID tags worn by users and some network servers. We adopted a passive tag RFID system. The delivery of full-motion video uses adaptive broadcasting. The system can continuously deliver streaming data, such as full-motion video, to the display, through the database and the streaming server on the network, moving from one display to the next as the user moves through the network. Because it maintains the information about the user's location in real time, it supports the user wherever he or she is, without requiring a conscious request to obtain their information. This paper describes a prototype implementation of this framework and a practical application.
Keywords: Ubiquitous; Partner system; Video streaming; Awareness
Modeling of Places Based on Feature Distribution BIBAFull-Text 1019-1027
  Yi Hu; Chang Woo Lee; Jong Yeol Yang; Bum Joo Shin
In this paper, a place model based on a feature distribution is proposed for place recognition. In many previous proposed methods, places are modeled as images or a set of extracted features. In those methods, a database of images or feature sets should be built. The cost of search time will grow exponentially when the database goes large. The proposed feature distribution method uses global information of each place and the search space grows linearly according to the number of places. In the experiments, we evaluate the performance using different number of frames and features for the recognition each time. Additionally, we have shown that the proposed method is applicable to many real-time applications such as robot navigation, wearable computing systems, and so on.
Knowledge Transfer in Semi-automatic Image Interpretation BIBAKFull-Text 1028-1034
  Jun Zhou; Li Cheng; Terry Caelli; Walter F. Bischof
Semi-automatic image interpretation systems utilize interactions between users and computers to adapt and update interpretation algorithms. We have studied the influence of human inputs on image interpretation by examining several knowledge transfer models. Experimental results show that the quality of the system performance depended not only on the knowledge transfer patterns but also on the user input, indicating how important it is to develop user-adapted image interpretation systems.
Keywords: knowledge transfer; image interpretation; road tracking; human influence; performance evaluation