| Human-centered collaborative interaction | | BIBAK | Full-Text | 1-8 | |
| Paulo Barthelmess; Edward Kaiser; Rebecca Lunsford; David McGee; Philip Cohen; Sharon Oviatt | |||
| Recent years have witnessed an increasing shift in interest from single user
multimedia/multimodal interfaces towards support for interaction among groups
of people working closely together, e.g. during meetings or problem solving
sessions. However, the introduction of technology to support collaborative
practices has not been devoid of problems. It is not uncommon that technology
meant to support collaboration may introduce disruptions and reduce group
effectiveness.
Human-centered multimedia and multimodal approaches hold a promise of providing substantially enhanced user experiences by focusing attention on human perceptual and motor capabilities, and on actual user practices. In this paper we examine the problem of providing effective support for collaboration, focusing on the role of human-centered approaches that take advantage of multimodality and multimedia. We show illustrative examples that demonstrate human-centered multimodal and multimedia solutions that provide mechanisms for dealing with the intrinsic complexity of human-human interaction support. Keywords: design, guidelines, human-centered systems, multimedia, multimodal systems | |||
| Multimedia: is it always better? | | BIBAK | Full-Text | 9-10 | |
| Nahum Gershon | |||
| As with almost every new medium and technology, we become enchanted with it
and very easily take the proclaimed benefits of the new creation for granted.
Slide presentation tools are one example. Since they became available, most of
the presentations in the technical and professional communities have
transitioned into using it. One of the reasons is that they make the
preparation ("production") of presentations and presenting it seem easy. Making
the content of the presentation understood and getting its messages across,
however, is another matter. The big question with presentation tools is -- when
is it better to use it and when other modes of presentation are more
appropriate? Now that almost everyone can produce a multimedia presentation, a
similar trend might be developing. This mindless transition, I feel, must be
stopped. First and foremost, we need to understand what are the advantages and,
yes, the disadvantages of multimedia. Since multimedia use and production are
human centered activities, practical knowledge of how humans beings perceive,
process information, and understand is essential to understanding these
advantages and disadvantages of this medium. Once we know the advantages and
disadvantages of multimedia, we should use it only when it offers advantages to
a particular presentation over other media. Sometimes, we might find out that a
simple oral (or even audio) presentation without a single visual might do the
trick. Sometimes not. As Neal Postman pointed out, for example, it could be
more difficult to effectively present a series of logical arguments using video
than with text or oral deliberations. On other occasions, we might find out
that a silent presentation of pictorial slides might deliver the message quite
effectively. Multimedia is not only about presentation. It is also about
production and thinking. As with writing or drawing, composing a multimedia
vignette could help the creative and critical thinking process about a topic.
This too needs an understanding of when multimedia is appropriate and when it's
not. The developer community is not exempt from the need to develop this type
of understanding. Without it, the tools will not be very useful. All of these
communities, the multimedia users, the production crowd, and the multimedia
developer community need to become more multimedia literate through training in
school, college, and work or through a personal transformative quest. This, I
believe, is essential yet possible. Keywords: human-centered multimedia, multimedia advantages, multimedia disadvantages,
multimedia literacy, multimedia presentation, multimedia production, thinking
through multimedia production | |||
| Human-centered multimedia: representations and challenges | | BIBAK | Full-Text | 11-18 | |
| Ahmed Elgammal | |||
| Human has always been a part of the computational loop. So, what do we mean
by human-centered computing (HCC)? aren't humans always the focus of
computations some how? The goal of this paper is to help answer this question
within the context of multimedia applications. So, what do we mean by
human-centered multimedia systems. We discuss some issues and challenges facing
developing real human-centered multimedia applications. Keywords: human-centered computing, multimedia systems | |||
| What should be automated?: The fundamental question underlying human-centered computing | | BIBAK | Full-Text | 19-24 | |
| Matti Tedre | |||
| In 1989 the ACM task force on the Core of Computer Science argued that "What
can be (effectively) automated?" is "the fundamental question underlying all of
computing". The task force's view of computing was a machine-oriented one; the
task force recognized the theoretical, empirical, and design-oriented aspects
of computer science. The question "What can be effectively automated?" indeed
draws some fundamental limits of automatic computation. However, since the
1980s there has been an ongoing shift away from the machine-centered view of
computing, towards a human-centered view of computing. In this paper I argue
that humancentered computing necessitates a perspective shift in computer
science. I note that the central question of machine-centered computing fails
to recognize the driving issues of human-centered computing. I argue that in
all branches of human-centered computing there is another fundamental question
that should be asked: "What should be automated?" Keywords: ethical questions, fundamental questions, human-centered computing,
normative questions | |||
| Lifetrak: music in tune with your life | | BIBAK | Full-Text | 25-34 | |
| Sasank Reddy; Jeff Mascia | |||
| Advances in sensing technology and wider availability of network services is
beckoning the use of context-awareness in ubiquitous computing applications.
One region in which these technologies can play a major role is in the area of
entertainment. Particularly, context-awareness can be used to provide higher
quality interaction between humans and the media they are interacting with. We
propose a music player, Lifetrak, that is in tune with a person's life by using
a context-sensitive music engine to drive what music is played. This context
engine is influenced by (i) the location of the user, (ii) the time of
operation, (iii) the velocity of the user, and (iv) urban environment
information such as traffic, weather, and sound modalities. Furthermore, we
adjust the context engine by implementing a learning model that is based on
user feedback on whether a certain song is appropriate for a particular
context. Also, we introduce the idea of a context equalizer that adjusts how
much a certain sensing modality affects what song is chosen. Since the music
player will be implemented on a mobile device, there is a strong focus on
creating a user interface that can be manipulated by users on the go. The goal
of Lifetrak is to liberate a user from having to consciously specify the music
that they want to play. Instead, Lifetrak intends to create a music experience
for the user that is in rhythm with themselves and the space they reside in. Keywords: context, entertainment, mobile, music, sensors | |||
| Human-centered interaction with documents | | BIBAK | Full-Text | 35-44 | |
| Andreas Dengel; Stefan Agne; Bertin Klein; Achim Ebert; Matthias Deller | |||
| In this paper, we discuss a new user interface, a complementary environment
for the work with personal document archives, i.e. for document filing and
retrieval. We introduce our implementation of a spatial medium for document
interaction, explorative search and active navigation, which exploits and
further stimulates the human strengths of visual information processing. Our
system achieves a high degree of immersion of the user, so that he/she forgets
the artificiality of his/her environment. This is done by means of a tripartite
ensemble of allowing users to interact naturally with gestures and postures (as
an option gestures and postures can be individually taught to the system by
users), exploiting 3D technology, and supporting the user to maintain
structures he/she discovers, as well as provide computer calculated semantic
structures. Our ongoing evaluation shows that even non-expert users can
efficiently work with the information in a document collection, and have fun. Keywords: 3D displays, 3D user interface, data glove, gesture recognition, immersion | |||
| Creating serendipitous encounters in a geographically distributed community | | BIBAK | Full-Text | 45-54 | |
| Adithya Renduchintala; Aisling Kelliher; Hari Sundaram | |||
| This paper is focused on the development of serendipitous interfaces that
promote casual and chance encounters within a geographically distributed
community. The problem is particularly important for distributed workforces,
where there is little opportunity for chance encounters that are crucial to the
formation of a sense of community. There are three contributions of this paper.
(a) development of a robust communication architecture facilitating
serendipitous casual interaction using online media repositories coupled to two
multimodal interfaces (b) development of multimodal interfaces that allow users
to browse, leave audio comments, and asynchronously listen to other community
members, and (c) a multimodal gesture driven control (vision and ultrasonic) of
the audio-visual display. Our user studies reveal that the interfaces are well
liked, and promote social interaction. Keywords: image repository, mediated communication, online media repository, remote
interfaces, serendipitous interaction, social computing | |||
| Discovering groups of people in Google news | | BIBAK | Full-Text | 55-64 | |
| Dhiraj Joshi; Daniel Gatica-Perez | |||
| In this paper, we study the problem of content-based social network
discovery among people who frequently appear in world news. Google news is used
as the source of data. We describe a probabilistic framework for associating
people with groups. A low-dimensional topic-based representation is first
obtained for news stories via probabilistic latent semantic analysis (PLSA).
This is followed by construction of semantic groups by clustering such
representations. Unlike many existing social network analysis approaches, which
discover groups based only on binary relations (e.g. co-occurrence of people in
a news article), our model clusters people using their topic distribution,
which introduces contextual information in the group formation process (e.g.
some people belong to several groups depending on the specific subject). The
model has been used to study evolution of people with respect to topics over
time. We also illustrate the advantages of our approach over a simple
co-occurrence-based social network extraction method. Keywords: probabilistic latent semantic indexing, social network analysis, text
mining, topic evolution | |||
| Interactive video authoring and sharing based on two-layer templates | | BIBAK | Full-Text | 65-74 | |
| Xian-Sheng Hua; Shipeng Li | |||
| The rapid adoption of digital cameras and camcorders leads to a huge demand
for new tools and systems that enables average users to more efficiently and
more effectively process, manage, author and share digital media contents, in
particular, a powerful video authoring tool that can dramatically reduce the
users' efforts in editing and sharing home video. Though there are many
commercial video authoring tools available today, video authoring remains as a
tedious and extremely time consuming task that often requires trained
professional skills. To tackle this problem, this paper presents a novel
interactive end-to-end system that enables fast, flexible and personalized
video authoring and sharing. The novel system, called LazyMedia, is based on
both content analysis techniques and the proposed content-aware twolayer
authoring templates: Composition Template and Presentation Template. Moreover,
it is designed as an open and extensible framework that can support dynamic
update of core components such as content analysis algorithms, editing methods,
and the two-layer authoring templates. Furthermore, the two layers of authoring
templates separate the video authoring from video presentation. Once authored
with LazyMedia, the video contents can be easily and flexibly presented in
other forms according to users' preference. LazyMedia provides a semiautomatic
video authoring and sharing system that significantly reduces users' efforts in
video editing while preserving sufficient flexibility and personalization. Keywords: interactive multimedia, multimedia authoring, multimedia management,
template, video editing | |||
| User modeling in a speech translation driven mediated interaction setting | | BIBAK | Full-Text | 75-80 | |
| JongHo Shin; Panayiotis G. Georgiou; Shrikanth Narayanan | |||
| The paper address user behavior modeling in a machine-mediated setting
involving bidirectional speech translation. Specifically, usability data from
doctor-patient dialogs involving a two way English-Persian speech translation
system are analyzed to understand the nature, and extent, of user accommodation
to machine errors. We consider user type "categorized along the classes of
Accommodating, Normal and Picky" as it relates to the user's tendency to accept
poor speech recognition and translation or retry to speak these again. For
modeling, we employ a dynamic Bayesian network that can identify the user type
with high accuracy after a few interactions of consistent user behavioral
patterns. This model can be utilized for the design of machine strategies that
can aid a user in operating the device more efficiently. Keywords: dynamic Bayesian network, inference, reasoning, speech-to-speech,
translation, user interaction, user modeling, user type, user-centered | |||
| Tillarom: an AJAX based folk song search and retrieval system with gesture interface based on kodály hand | | BIBAK | Full-Text | 81-88 | |
| Attila Licsár; Tamás Szirányi; László Kovács; Balázs Pataki | |||
| A digital folk song search and retrieval system with a hand gesture based
interface is presented. Tillarom is a comprehensive collection of original
Hungarian folk songs recorded using different technologies such as phonographs
and/or stereo DAT cassettes. This digital archive contains professional quality
metadata records as well as MIDI recordings for presenting the different types
of clustered folk songs. An AJAX based search and retrieval interface was
developed that can be used together with optically recognized Kodály's
hand signs to formulate queries through a web browser. The appearance based
recognition of hand gestures utilizes contour analysis and SVM based
classification. We evaluated the performance of the recognition of hand signs
and investigated the main problems of their usage in our system. Keywords: computer vision, digital archive, vision based hand gesture recognition, web
based information search and retrieval | |||
| Community annotation and remix: a research platform and pilot deployment | | BIBAK | Full-Text | 89-98 | |
| Ryan Shaw; Patrick Schmitz | |||
| We present a platform for community-supported media annotation and remix,
including a pilot deployment with a major film festival. The platform was well
received by users as fun and easy to use. An analysis of the resulting data
yielded insights into user behavior. Completed remixes exhibited a range of
genres, with over a third showing thematic unity and a quarter showing some
attempt at narrative. Remixes were often complex, using many short segments
taken from various source media. Reuse of spoken and written language in source
media, and the use of written language in user-defined overlay text segments
proved to be essential for most users. We describe how community remix
statistics can be leveraged for media summarization, browsing, and editing
support. Further, the platform as a whole provides a solid base for a range of
ongoing research into community annotation and remix including analysis of
remix syntax, identification of reusable segments, media and segment tagging,
structured annotation of media, collaborative media production, and hybrid
content-based and community-in-the-loop approaches to understanding media
semantics. Keywords: HCM, UGC, community media, human-centered multimedia, remix, tagging, video
annotation | |||
| Toward multimodal fusion of affective cues | | BIBAK | Full-Text | 99-108 | |
| Marco Paleari; Christine L. Lisetti | |||
| During face to face communication, it has been suggested that as much as 70%
of what people communicate when talking directly with others is through
paralanguage involving multiple modalities combined together (e.g. voice tone
and volume, body language). In an attempt to render human-computer interaction
more similar to human-human communication and enhance its naturalness, research
on sensory acquisition and interpretation of single modalities of human
expressions have seen ongoing progress over the last decade. These progresses
are rendering current research on artificial sensor fusion of multiple
modalities an increasingly important research domain in order to reach better
accuracy of congruent messages on the one hand, and possibly to be able to
detect incongruent messages across multiple modalities (incongruency being
itself a message about the nature of the information being conveyed). Accurate
interpretation of emotional signals -- quintessentially multimodal -- would
hence particularly benefit from multimodal sensor fusion and interpretation
algorithms. In this paper we provide a state of the art multimodal fusion and
describe one way to implement a generic framework for multimodal emotion
recognition. The system is developed within the MAUI framework [31] and
Scherer's Component Process Theory (CPT) [49, 50, 51, 24, 52], with the goal to
be modular and adaptive. We want the designed framework to be able to accept
different single and multi modality recognition systems and to automatically
adapt the fusion algorithm to find optimal solutions. The system also aims to
be adaptive to channel (and system) reliability. Keywords: HCI, affective computing, emotion recognition, multimodal fusion | |||
| Using model trees for evaluating dialog error conditions based on acoustic information | | BIBAK | Full-Text | 109-114 | |
| Abe Kazemzadeh; Sungbok Lee; Shrikanth Narayanan | |||
| This paper examines the use of model trees for evaluating user utterances
for response to system error in dialogs from the Communicator 2000 corpus. The
features used by the model trees are limited to those which can be
automatically obtained through acoustic measurements. These features are
derived from pitch and energy measurements. The curve of the model tree output
versus dialog turn is interpreted to be a measure of the level of user
activation in the dialog. We test the premise that user response to error at
the utterance level is related to user satisfaction at the dialog level.
Several different evaluation tasks are investigated: on an utterance level we
applied the model tree output to detecting response to error and on the dialog
level we analyzed the relation of model tree output to estimating user
satisfaction. For the former, we achieve 65% precision and 63% recall and for
the latter our predictions show significant .48 correlation with user surveys. Keywords: evaluation of human-computer dialog systems, paralinguistic feedback, user
response to error | |||
| Driver monitoring for a human-centered driver assistance system | | BIBA | Full-Text | 115-122 | |
| Joel McCall; Mohan M. Trivedi | |||
| Driving is a very complex task which, at its core, involves the interaction between the driver and his/her environment. It is therefore extremely important to develop driver assistance systems that are centered around the driver from the ground up. In this paper, we explore one aspect of such a system. Specifically, we focus on monitoring the driver's face and facial regions. We demonstrate a real-world system for tracking face and facial regions and provide insight as to it importance and placement in human-centered driver assistance systems. Result demonstrating its impact on driver assistance systems as well as its performance in real-world driving scenarios are shown. | |||
| A methodological study of situation understanding utilizing environments for multimodal observation of infant behavior | | BIBAK | Full-Text | 123-130 | |
| Shogo Ishikawa; Shinya Kiriyama; Hiroaki Horiuchi; Shigeyoshi Kitazawa; Yoichi Takebayashi | |||
| We have developed a framework to understand situations and intentions of
speakers focusing on the utterances of demonstratives. We aim at constructing a
'Multimodal Infant Behavior Corpus', which makes a valuable contribution to the
elucidation of human commonsense knowledge and its acquisition mechanism. For
this purpose, we have constructed environments for multimodal observation of
infant behavior, in particular, environments for infant behavior recording; we
have set up multiple cameras and microphones in the Cedar yurt. We have also
developed a wearable speech recording device of high quality to capture infant
utterances clearly. Moreover, we have developed a comment-collecting system
which allows everyone to make comments easily from the multi-viewpoints. Those
construction and developments make it possible to realize a framework for
multimodal observation of infant behavior. Utilizing the multimodal
environments, we propose a situation description model based on observation of
demonstratives uttered by infants, since demonstratives appear frequently in
their conversations and become a precious clue to understand situations. The
proposed model, which represents the mental distances of speakers and listeners
to objects on a general and simple model, enables us to predict speakers' next
behavior. The consideration results enable us to conclude that the constructed
environments lead to development and realization of human interaction models
applicable to spoken dialog systems for elder people supporting. Keywords: human interaction modeling, multimodal observation, situation understanding
model | |||