HCI Bibliography Home | HCI Journals | About TIIS | Journal Info | TIIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TIIS Tables of Contents: 01020304

ACM Transactions on Interactive Intelligent Systems 2

Editors:Anthony Jameson; John Riedl
Dates:2012
Volume:2
Publisher:ACM
Standard No:ISSN 2160-6455, EISSN 2160-6463
Papers:24
Links:Journal Home Page | ACM Digital Library | Table of Contents
  1. TIIS 2012-03 Volume 2 Issue 1
  2. TIIS 2012-06 Volume 2 Issue 2
  3. TIIS 2012-09 Volume 2 Issue 3
  4. TIIS 2012-12 Volume 2 Issue 4

TIIS 2012-03 Volume 2 Issue 1

Introduction to the special issue on affective interaction in natural environments BIBAFull-Text 1
  Ginevra Castellano; Laurel D. Riek; Christopher Peters; Kostas Karpouzis; Jean-Claude Martin; Louis-Philippe Morency
Affect-sensitive systems such as social robots and virtual agents are increasingly being investigated in real-world settings. In order to work effectively in natural environments, these systems require the ability to infer the affective and mental states of humans and to provide appropriate timely output that helps to sustain long-term interactions. This special issue, which appears in two parts, includes articles on the design of socio-emotional behaviors and expressions in robots and virtual agents and on computational approaches for the automatic recognition of social signals and affective states.
Emotional body language displayed by artificial agents BIBAFull-Text 2
  Aryel Beck; Brett Stevens; Kim A. Bard; Lola Cañamero
Complex and natural social interaction between artificial agents (computer-generated or robotic) and humans necessitates the display of rich emotions in order to be believable, socially relevant, and accepted, and to generate the natural emotional responses that humans show in the context of social interaction, such as engagement or empathy. Whereas some robots use faces to display (simplified) emotional expressions, for other robots such as Nao, body language is the best medium available given their inability to convey facial expressions. Displaying emotional body language that can be interpreted whilst interacting with the robot should significantly improve naturalness. This research investigates the creation of an affect space for the generation of emotional body language to be displayed by humanoid robots. To do so, three experiments investigating how emotional body language displayed by agents is interpreted were conducted. The first experiment compared the interpretation of emotional body language displayed by humans and agents. The results showed that emotional body language displayed by an agent or a human is interpreted in a similar way in terms of recognition. Following these results, emotional key poses were extracted from an actor's performances and implemented in a Nao robot. The interpretation of these key poses was validated in a second study where it was found that participants were better than chance at interpreting the key poses displayed. Finally, an affect space was generated by blending key poses and validated in a third study. Overall, these experiments confirmed that body language is an appropriate medium for robots to display emotions and suggest that an affect space for body expressions can be used to improve the expressiveness of humanoid robots.
Eliciting caregiving behavior in dyadic human-robot attachment-like interactions BIBAFull-Text 3
  Antoine Hiolle; Lola Cañamero; Marina Davila-Ross; Kim A. Bard
We present here the design and applications of an arousal-based model controlling the behavior of a Sony AIBO robot during the exploration of a novel environment: a children's play mat. When the robot experiences too many new perceptions, the increase of arousal triggers calls for attention towards its human caregiver. The caregiver can choose to either calm the robot down by providing it with comfort, or to leave the robot coping with the situation on its own. When the arousal of the robot has decreased, the robot moves on to further explore the play mat. We gathered results from two experiments using this arousal-driven control architecture. In the first setting, we show that such a robotic architecture allows the human caregiver to influence greatly the learning outcomes of the exploration episode, with some similarities to a primary caregiver during early childhood. In a second experiment, we tested how human adults behaved in a similar setup with two different robots: one "needy", often demanding attention, and one more independent, requesting far less care or assistance. Our results show that human adults recognise each profile of the robot for what they have been designed, and behave accordingly to what would be expected, caring more for the needy robot than for the other. Additionally, the subjects exhibited a preference and more positive affect whilst interacting and rating the robot we designed as needy. This experiment leads us to the conclusion that our architecture and setup succeeded in eliciting positive and caregiving behavior from adults of different age groups and technological background. Finally, the consistency and reactivity of the robot during this dyadic interaction appeared crucial for the enjoyment and engagement of the human partner.
Spotting laughter in natural multiparty conversations: A comparison of automatic online and offline approaches using audiovisual data BIBAFull-Text 4
  Stefan Scherer; Michael Glodek; Friedhelm Schwenker; Nick Campbell; Günther Palm
It is essential for the advancement of human-centered multimodal interfaces to be able to infer the current user's state or communication state. In order to enable a system to do that, the recognition and interpretation of multimodal social signals (i.e., paralinguistic and nonverbal behavior) in real-time applications is required. Since we believe that laughs are one of the most important and widely understood social nonverbal signals indicating affect and discourse quality, we focus in this work on the detection of laughter in natural multiparty discourses. The conversations are recorded in a natural environment without any specific constraint on the discourses using unobtrusive recording devices. This setup ensures natural and unbiased behavior, which is one of the main foci of this work. To compare results of methods, namely Gaussian Mixture Model (GMM) supervectors as input to a Support Vector Machine (SVM), so-called Echo State Networks (ESN), and a Hidden Markov Model (HMM) approach, are utilized in online and offline detection experiments. The SVM approach proves very accurate in the offline classification task, but is outperformed by the ESN and HMM approach in the online detection (F1 scores: GMM SVM 0.45, ESN 0.63, HMM 0.72). Further, we were able to utilize the proposed HMM approach in a cross-corpus experiment without any retraining with respectable generalization capability (F1score: 0.49). The results and possible reasons for these outcomes are shown and discussed in the article. The proposed methods may be directly utilized in practical tasks such as the labeling or the online detection of laughter in conversational data and affect-aware applications.
Continuous body and hand gesture recognition for natural human-computer interaction BIBAFull-Text 5
  Yale Song; David Demirdjian; Randall Davis
Intelligent gesture recognition systems open a new era of natural human-computer interaction: Gesturing is instinctive and a skill we all have, so it requires little or no thought, leaving the focus on the task itself, as it should be, not on the interaction modality. We present a new approach to gesture recognition that attends to both body and hands, and interprets gestures continuously from an unsegmented and unbounded input stream. This article describes the whole procedure of continuous body and hand gesture recognition, from the signal acquisition to processing, to the interpretation of the processed signals.
   Our system takes a vision-based approach, tracking body and hands using a single stereo camera. Body postures are reconstructed in 3D space using a generative model-based approach with a particle filter, combining both static and dynamic attributes of motion as the input feature to make tracking robust to self-occlusion. The reconstructed body postures guide searching for hands. Hand shapes are classified into one of several canonical hand shapes using an appearance-based approach with a multiclass support vector machine. Finally, the extracted body and hand features are combined and used as the input feature for gesture recognition. We consider our task as an online sequence labeling and segmentation problem. A latent-dynamic conditional random field is used with a temporal sliding window to perform the task continuously. We augment this with a novel technique called multilayered filtering, which performs filtering both on the input layer and the prediction layer. Filtering on the input layer allows capturing long-range temporal dependencies and reducing input signal noise; filtering on the prediction layer allows taking weighted votes of multiple overlapping prediction results as well as reducing estimation noise.
   We tested our system in a scenario of real-world gestural interaction using the NATOPS dataset, an official vocabulary of aircraft handling gestures. Our experimental results show that: (1) the use of both static and dynamic attributes of motion in body tracking allows statistically significant improvement of the recognition performance over using static attributes of motion alone; and (2) the multilayered filtering statistically significantly improves recognition performance over the nonfiltering method. We also show that, on a set of twenty-four NATOPS gestures, our system achieves a recognition accuracy of 75.37%.
A multitask approach to continuous five-dimensional affect sensing in natural speech BIBAFull-Text 6
  Florian Eyben; Martin Wöllmer; Björn Schuller
Automatic affect recognition is important for the ability of future technical systems to interact with us socially in an intelligent way by understanding our current affective state. In recent years there has been a shift in the field of affect recognition from "in the lab" experiments with acted data to "in the wild" experiments with spontaneous and naturalistic data. Two major issues thereby are the proper segmentation of the input and adequate description and modeling of affective states. The first issue is crucial for responsive, real-time systems such as virtual agents and robots, where the latency of the analysis must be as small as possible. To address this issue we introduce a novel method of incremental segmentation to be used in combination with supra-segmental modeling. For modeling of continuous affective states we use Long Short-Term Memory Recurrent Neural Networks, with which we can show an improvement in performance over standard recurrent neural networks and feed-forward neural networks as well as Support Vector Regression. For experiments we use the SEMAINE database, which contains recordings of spontaneous and natural human to Wizard-of-Oz conversations. The recordings are annotated continuously in time and magnitude with FeelTrace for five affective dimensions, namely activation, expectation, intensity, power/dominance, and valence. To exploit dependencies between the five affective dimensions we investigate multitask learning of all five dimensions augmented with inter-rater standard deviation. We can show improvements for multitask over single-task modeling. Correlation coefficients of up to 0.81 are obtained for the activation dimension and up to 0.58 for the valence dimension. The performance for the remaining dimensions were found to be in between that for activation and valence.
Affect recognition based on physiological changes during the watching of music videos BIBAFull-Text 7
  Ashkan Yazdani; Jong-Seok Lee; Jean-Marc Vesin; Touradj Ebrahimi
Assessing emotional states of users evoked during their multimedia consumption has received a great deal of attention with recent advances in multimedia content distribution technologies and increasing interest in personalized content delivery. Physiological signals such as the electroencephalogram (EEG) and peripheral physiological signals have been less considered for emotion recognition in comparison to other modalities such as facial expression and speech, although they have a potential interest as alternative or supplementary channels. This article presents our work on: (1) constructing a dataset containing EEG and peripheral physiological signals acquired during presentation of music video clips, which is made publicly available, and (2) conducting binary classification of induced positive/negative valence, high/low arousal, and like/dislike by using the aforementioned signals. The procedure for the dataset acquisition, including stimuli selection, signal acquisition, self-assessment, and signal processing is described in detail. Especially, we propose a novel asymmetry index based on relative wavelet entropy for measuring the asymmetry in the energy distribution of EEG signals, which is used for EEG feature extraction. Then, the classification systems based on EEG and peripheral physiological signals are presented. Single-trial and single-run classification results indicate that, on average, the performance of the EEG-based classification outperforms that of the peripheral physiological signals. However, the peripheral physiological signals can be considered as a good alternative to EEG signals in the case of assessing a user's preference for a given music video clip (like/dislike) since they have a comparable performance to EEG signals while being more easily measured.

TIIS 2012-06 Volume 2 Issue 2

A Computational Framework for Media Bias Mitigation BIBAFull-Text 8
  Souneil Park; Seungwoo Kang; Sangyoung Chung; Junehwa Song
Bias in the news media is an inherent flaw of the news production process. The bias often causes a sharp increase in political polarization and in the cost of conflict on social issues such as the Iraq war. This article presents NewsCube, a novel Internet news service which aims to mitigate the effect of media bias. NewsCube automatically creates and promptly provides readers with multiple classified views on a news event. As such, it helps readers understand the event from a plurality of views and to formulate their own, more balanced, viewpoints. The media bias problem has been studied extensively in mass communications and social science. This article reviews related mass communication and journalism studies and provides a structured view of the media bias problem and its solution. We propose media bias mitigation as a practical solution and demonstrate it through NewsCube. We evaluate and discuss the effectiveness of NewsCube through various performance studies.
Influencing Individually: Fusing Personalization and Persuasion BIBAFull-Text 9
  Shlomo Berkovsky; Jill Freyne; Harri Oinas-Kukkonen
Personalized technologies aim to enhance user experience by taking into account users' interests, preferences, and other relevant information. Persuasive technologies aim to modify user attitudes, intentions, or behavior through computer-human dialogue and social influence. While both personalized and persuasive technologies influence user interaction and behavior, we posit that this influence could be significantly increased if the two technologies were combined to create personalized and persuasive systems. For example, the persuasive power of a one-size-fits-all persuasive intervention could be enhanced by considering the users being influenced and their susceptibility to the persuasion being offered. Likewise, personalized technologies could cash in on increased success, in terms of user satisfaction, revenue, and user experience, if their services used persuasive techniques. Hence, the coupling of personalization and persuasion has the potential to enhance the impact of both technologies. This new, developing area clearly offers mutual benefits to both research areas, as we illustrate in this special issue.
Adaptive Persuasive Systems: A Study of Tailored Persuasive Text Messages to Reduce Snacking BIBAFull-Text 10
  Maurits Kaptein; Boris De Ruyter; Panos Markopoulos; Emile Aarts
This article describes the use of personalized short text messages (SMS) to reduce snacking. First, we describe the development and validation (N=215) of a questionnaire to measure individual susceptibility to different social influence strategies. To evaluate the external validity of this Susceptibility to Persuasion Scale (STPS) we set up a two week text-messaging intervention that used text messages implementing social influence strategies as prompts to reduce snacking behavior. In this experiment (N=73) we show that messages that are personalized (tailored) to the individual based on their scores on the STPS, lead to a higher decrease in snacking consumption than randomized messages or messages that are not tailored (contra-tailored) to the individual. We discuss the importance of this finding for the design of persuasive systems and detail how designers can use tailoring at the level of social influence strategies to increase the effects of their persuasive technologies.
Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study BIBAFull-Text 11
  Paolo Cremonesi; Franca Garzotto; Roberto Turrin
Recommender Systems (RSs) help users search large amounts of digital contents and services by allowing them to identify the items that are likely to be more attractive or useful. RSs play an important persuasion role, as they can potentially augment the users' trust towards in an application and orient their decisions or actions towards specific directions. This article explores the persuasiveness of RSs, presenting two vast empirical studies that address a number of research questions.
   First, we investigate if a design property of RSs, defined by the statistically measured quality of algorithms, is a reliable predictor of their potential for persuasion. This factor is measured in terms of perceived quality, defined by the overall satisfaction, as well as by how users judge the accuracy and novelty of recommendations. For our purposes, we designed an empirical study involving 210 subjects and implemented seven full-sized versions of a commercial RS, each one using the same interface and dataset (a subset of Netflix), but each with a different recommender algorithm. In each experimental configuration we computed the statistical quality (recall and F-measures) and collected data regarding the quality perceived by 30 users. The results show us that algorithmic attributes are less crucial than we might expect in determining the user's perception of an RS's quality, and suggest that the user's judgment and attitude towards a recommender are likely to be more affected by factors related to the user experience.
   Second, we explore the persuasiveness of RSs in the context of large interactive TV services. We report a study aimed at assessing whether measurable persuasion effects (e.g., changes of shopping behavior) can be achieved through the introduction of a recommender. Our data, collected for more than one year, allow us to conclude that, (1) the adoption of an RS can affect both the lift factor and the conversion rate, determining an increased volume of sales and influencing the user's decision to actually buy one of the recommended products, (2) the introduction of an RS tends to diversify purchases and orient users towards less obvious choices (the long tail), and (3) the perceived novelty of recommendations is likely to be more influential than their perceived accuracy.
   Overall, the results of these studies improve our understanding of the persuasion phenomena induced by RSs, and have implications that can be of interest to academic scholars, designers, and adopters of this class of systems.
System Personality and Persuasion in Human-Computer Dialogue BIBAFull-Text 12
  Pierre Y. Andrews
The human-computer dialogue research field has been studying interaction with computers since the early stage of Artificial Intelligence, however, research has often focused on very practical tasks to be completed with the dialogues. A new trend in the field tries to implement persuasive techniques with automated interactive agents; unlike booking a train ticket, for example, such dialogues require the system to show more anthropomorphic qualities. The influences of such qualities in the effectiveness of persuasive dialogue is only starting to be studied. In this article we focus on one important perceived trait of the system: personality, and explore how it influences the persuasiveness of a dialogue system. We introduce a new persuasive dialogue system and combine it with a state of the art personality utterance generator. By doing so, we can control the system's extraversion personality trait and observe its influence on the user's perception of the dialogue and its output. In particular, we observe that the user's extraversion influences their perception of the dialogue and its persuasiveness, and that the perceived personality of the system can affect its trustworthiness and persuasiveness. We believe that theses observations will help to set up guidelines to tailor dialogue systems to the user's interaction expectations and improve the persuasive interventions.

TIIS 2012-09 Volume 2 Issue 3

The Tag Genome: Encoding Community Knowledge to Support Novel Interaction BIBAFull-Text 13
  Jesse Vig; Shilad Sen; John Riedl
This article introduces the tag genome, a data structure that extends the traditional tagging model to provide enhanced forms of user interaction. Just as a biological genome encodes an organism based on a sequence of genes, the tag genome encodes an item in an information space based on its relationship to a common set of tags. We present a machine learning approach for computing the tag genome, and we evaluate several learning models on a ground truth dataset provided by users. We describe an application of the tag genome called Movie Tuner which enables users to navigate from one item to nearby items along dimensions represented by tags. We present the results of a 7-week field trial of 2,531 users of Movie Tuner and a survey evaluating users' subjective experience. Finally, we outline the broader space of applications of the tag genome.
Introduction to the Special Issue on Common Sense for Interactive Systems BIBAFull-Text 14
  Henry Lieberman; Catherine Havasi
This editorial introduction describes the aims and scope of the special issue on Common Sense for Interactive Systems of the ACM Transactions on Interactive Intelligent Systems. It explains why the common sense knowledge problem is crucial for both artificial intelligence and human-computer interaction, and it shows how the four articles selected for this issue fit into the theme.
Capturing Common Knowledge about Tasks: Intelligent Assistance for To-Do Lists BIBAFull-Text 15
  Yolanda Gil; Varun Ratnakar; Timothy Chklovski; Paul Groth; Denny Vrandecic
Although to-do lists are a ubiquitous form of personal task management, there has been no work on intelligent assistance to automate, elaborate, or coordinate a user's to-dos. Our research focuses on three aspects of intelligent assistance for to-dos. We investigated the use of intelligent agents to automate to-dos in an office setting. We collected a large corpus from users and developed a paraphrase-based approach to matching agent capabilities with to-dos. We also investigated to-dos for personal tasks and the kinds of assistance that can be offered to users by elaborating on them on the basis of substep knowledge extracted from the Web. Finally, we explored coordination of user tasks with other users through a to-do management application deployed in a popular social networking site. We discuss the emergence of Social Task Networks, which link users' tasks to their social network as well as to relevant resources on the Web. We show the benefits of using common sense knowledge to interpret and elaborate to-dos. Conversely, we also show that to-do lists are a valuable way to create repositories of common sense knowledge about tasks.
Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling BIBAFull-Text 16
  Reid Swanson; Andrew S. Gordon
We describe Say Anything, a new interactive storytelling system that collaboratively writes textual narratives with human users. Unlike previous attempts, this interactive storytelling system places no restrictions on the content or direction of the user's contribution to the emerging storyline. In response to these contributions, the computer continues the storyline with narration that is both coherent and entertaining. This capacity for open-domain interactive storytelling is enabled by an extremely large repository of nonfiction personal stories, which is used as a knowledge base in a case-based reasoning architecture. In this article, we describe the three main components of our case-based reasoning approach: a million-item corpus of personal stories mined from internet weblogs, a case retrieval strategy that is optimized for narrative coherence, and an adaptation strategy that ensures that repurposed sentences from the case base are appropriate for the user's emerging fiction. We describe a series of evaluations of the system's ability to produce coherent and entertaining stories, and we compare these narratives with single-author stories posted to internet weblogs.
Planning for Reasoning with Multiple Common Sense Knowledge Bases BIBAFull-Text 17
  Yen-Ling Kuo; Jane Yung-Jen Hsu
Intelligent user interfaces require common sense knowledge to bridge the gap between the functionality of applications and the user's goals. While current reasoning methods have been used to provide contextual information for interface agents, the quality of their reasoning results is limited by the coverage of their underlying knowledge bases. This article presents reasoning composition, a planning-based approach to integrating reasoning methods from multiple common sense knowledge bases to answer queries. The reasoning results of one reasoning method are passed to other reasoning methods to form a reasoning chain to the target context of a query. By leveraging different weak reasoning methods, we are able to find answers to queries that cannot be directly answered by querying a single common sense knowledge base. By conducting experiments on ConceptNet and WordNet, we compare the reasoning results of reasoning composition, directly querying merged knowledge bases, and spreading activation. The results show an 11.03% improvement in coverage over directly querying merged knowledge bases and a 49.7% improvement in accuracy over spreading activation. Two case studies are presented, showing how reasoning composition can improve performance of retrieval in a video editing system and a dialogue assistant.
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying BIBAFull-Text 18
  Karthik Dinakar; Birago Jones; Catherine Havasi; Henry Lieberman; Rosalind Picard
Cyberbullying (harassment on social networks) is widely recognized as a serious social problem, especially for adolescents. It is as much a threat to the viability of online social networks for youth today as spam once was to email in the early days of the Internet. Current work to tackle this problem has involved social and psychological studies on its prevalence as well as its negative effects on adolescents. While true solutions rest on teaching youth to have healthy personal relationships, few have considered innovative design of social network software as a tool for mitigating this problem. Mitigating cyberbullying involves two key components: robust techniques for effective detection and reflective user interfaces that encourage users to reflect upon their behavior and their choices.
   Spam filters have been successful by applying statistical approaches like Bayesian networks and hidden Markov models. They can, like Google's GMail, aggregate human spam judgments because spam is sent nearly identically to many people. Bullying is more personalized, varied, and contextual. In this work, we present an approach for bullying detection based on state-of-the-art natural language processing and a common sense knowledge base, which permits recognition over a broad spectrum of topics in everyday life. We analyze a more narrow range of particular subject matter associated with bullying (e.g. appearance, intelligence, racial and ethnic slurs, social acceptance, and rejection), and construct BullySpace, a common sense knowledge base that encodes particular knowledge about bullying situations. We then perform joint reasoning with common sense knowledge about a wide range of everyday life topics. We analyze messages using our novel AnalogySpace common sense reasoning technique. We also take into account social network analysis and other factors. We evaluate the model on real-world instances that have been reported by users on Formspring, a social networking website that is popular with teenagers.
   On the intervention side, we explore a set of reflective user-interaction paradigms with the goal of promoting empathy among social network participants. We propose an "air traffic control"-like dashboard, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints. For potential victims, we provide educational material that informs them about how to cope with the situation, and connects them with emotional support from others. A user evaluation shows that in-context, targeted, and dynamic help during cyberbullying situations fosters end-user reflection that promotes better coping strategies.

TIIS 2012-12 Volume 2 Issue 4

Introduction to the special issue on highlights of the decade in interactive intelligent systems BIBAFull-Text 19
  Anthony Jameson; John Riedl
This editorial introduction explains the motivation and origin of the TiiS special issue on Highlights of the Decade in Interactive Intelligent Systems and shows how its five articles exemplify the types of research contribution that TiiS aims to encourage and publish.
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare BIBAFull-Text 20
  Jesse Hoey; Craig Boutilier; Pascal Poupart; Patrick Olivier; Andrew Monk; Alex Mihailidis
The ratio of healthcare professionals to care recipients is dropping at an alarming rate, particularly for the older population. It is estimated that the number of persons with Alzheimer's disease, for example, will top 100 million worldwide by the year 2050 [Alzheimer's Disease International 2009]. It will become harder and harder to provide needed health services to this population of older adults. Further, patients are becoming more aware and involved in their own healthcare decisions. This is creating a void in which technology has an increasingly important role to play as a tool to connect providers with recipients. Examples of interactive technologies range from telecare for remote regions to computer games promoting fitness in the home. Currently, such technologies are developed for specific applications and are difficult to modify to suit individual user needs. The future potential economic and social impact of technology in the healthcare field therefore lies in our ability to make intelligent devices that are customizable by healthcare professionals and their clients, that are adaptive to users over time, and that generalize across tasks and environments.
   A wide application area for technology in healthcare is for assistance and monitoring in the home. As the population ages, it becomes increasingly dependent on chronic healthcare, such as assistance for tasks of everyday life (washing, cooking, dressing), medication taking, nutrition, and fitness. This article will present a summary of work over the past decade on the development of intelligent systems that provide assistance to persons with cognitive disabilities. These systems are unique in that they are all built using a common framework, a decision-theoretic model for general-purpose assistance in the home. In this article, we will show how this type of general model can be applied to a range of assistance tasks, including prompting for activities of daily living, assistance for art therapists, and stroke rehabilitation. This model is a Partially Observable Markov Decision Process (POMDP) that can be customized by end-users, that can integrate complex sensor information, and that can adapt over time. These three characteristics of the POMDP model will allow for increasing uptake and long-term efficiency and robustness of technology for assistance.
Access to multimodal articles for individuals with sight impairments BIBAFull-Text 21
  Sandra Carberry; Stephanie Elzer Schwartz; Kathleen Mccoy; Seniz Demir; Peng Wu; Charles Greenbacker; Daniel Chester; Edward Schwartz; David Oliver; Priscilla Moraes
Although intelligent interactive systems have been the focus of many research efforts, very few have addressed systems for individuals with disabilities. This article presents our methodology for an intelligent interactive system that provides individuals with sight impairments with access to the content of information graphics (such as bar charts and line graphs) in popular media. The article describes the methodology underlying the system's intelligent behavior, its interface for interacting with users, examples processed by the implemented system, and evaluation studies both of the methodology and the effectiveness of the overall system. This research advances universal access to electronic documents.
Multimodal behavior and interaction as indicators of cognitive load BIBAFull-Text 22
  Fang Chen; Natalie Ruiz; Eric Choi; Julien Epps; M. Asif Khawaja; Ronnie Taib; Bo Yin; Yang Wang
High cognitive load arises from complex time and safety-critical tasks, for example, mapping out flight paths, monitoring traffic, or even managing nuclear reactors, causing stress, errors, and lowered performance. Over the last five years, our research has focused on using the multimodal interaction paradigm to detect fluctuations in cognitive load in user behavior during system interaction. Cognitive load variations have been found to impact interactive behavior: by monitoring variations in specific modal input features executed in tasks of varying complexity, we gain an understanding of the communicative changes that occur when cognitive load is high. So far, we have identified specific changes in: speech, namely acoustic, prosodic, and linguistic changes; interactive gesture; and digital pen input, both interactive and freeform. As ground-truth measurements, galvanic skin response, subjective, and performance ratings have been used to verify task complexity.
   The data suggest that it is feasible to use features extracted from behavioral changes in multiple modal inputs as indices of cognitive load. The speech-based indicators of load, based on data collected from user studies in a variety of domains, have shown considerable promise. Scenarios include single-user and team-based tasks; think-aloud and interactive speech; and single-word, reading, and conversational speech, among others. Pen-based cognitive load indices have also been tested with some success, specifically with pen-gesture, handwriting, and freeform pen input, including diagraming. After examining some of the properties of these measurements, we present a multimodal fusion model, which is illustrated with quantitative examples from a case study.
   The feasibility of employing user input and behavior patterns as indices of cognitive load is supported by experimental evidence. Moreover, symptomatic cues of cognitive load derived from user behavior such as acoustic speech signals, transcribed text, digital pen trajectories of handwriting, and shapes pen, can be supported by well-established theoretical frameworks, including O'Donnell and Eggemeier's workload measurement [1986] Sweller's Cognitive Load Theory [Chandler and Sweller 1991], and Baddeley's model of modal working memory [1992] as well as McKinstry et al.'s [2008] and Rosenbaum's [2005] action dynamics work. The benefit of using this approach to determine the user's cognitive load in real time is that the data can be collected implicitly that is, during day-to-day use of intelligent interactive systems, thus overcomes problems of intrusiveness and increases applicability in real-world environments, while adapting information selection and presentation in a dynamic computer interface with reference to load.
AutoTutor and Affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back BIBAFull-Text 23
  Sidney D'Mello; Art Graesser
We present AutoTutor and Affective AutoTutor as examples of innovative 21st century interactive intelligent systems that promote learning and engagement. AutoTutor is an intelligent tutoring system that helps students compose explanations of difficult concepts in Newtonian physics and enhances computer literacy and critical thinking by interacting with them in natural language with adaptive dialog moves similar to those of human tutors. AutoTutor constructs a cognitive model of students' knowledge levels by analyzing the text of their typed or spoken responses to its questions. The model is used to dynamically tailor the interaction toward individual students' zones of proximal development. Affective AutoTutor takes the individualized instruction and human-like interactivity to a new level by automatically detecting and responding to students' emotional states in addition to their cognitive states. Over 20 controlled experiments comparing AutoTutor with ecological and experimental controls such reading a textbook have consistently yielded learning improvements of approximately one letter grade after brief 30-60-minute interactions. Furthermore, Affective AutoTutor shows even more dramatic improvements in learning than the original AutoTutor system, particularly for struggling students with low domain knowledge. In addition to providing a detailed description of the implementation and evaluation of AutoTutor and Affective AutoTutor, we also discuss new and exciting technologies motivated by AutoTutor such as AutoTutor-Lite, Operation ARIES, GuruTutor, DeepTutor, MetaTutor, and AutoMentor. We conclude this article with our vision for future work on interactive and engaging intelligent tutoring systems.
Creating personalized systems that people can scrutinize and control: Drivers, principles and experience BIBAFull-Text 24
  Judy Kay; Bob Kummerfeld
Widespread personalized computing systems play an already important and fast-growing role in diverse contexts, such as location-based services, recommenders, commercial Web-based services, and teaching systems. The personalization in these systems is driven by information about the user, a user model. Moreover, as computers become both ubiquitous and pervasive, personalization operates across the many devices and information stores that constitute the user's personal digital ecosystem. This enables personalization, and the user models driving it, to play an increasing role in people's everyday lives. This makes it critical to establish ways to address key problems of personalization related to privacy, invisibility of personalization, errors in user models, wasted user models, and the broad issue of enabling people to control their user models and associated personalization. We offer scrutable user models as a foundation for tackling these problems.
   This article argues the importance of scrutable user modeling and personalization, illustrating key elements in case studies from our work. We then identify the broad roles for scrutable user models. The article describes how to tackle the technical and interface challenges of designing and building scrutable user modeling systems, presenting design principles and showing how they were established over our twenty years of work on the Personis software framework. Our contributions are the set of principles for scrutable personalization linked to our experience from creating and evaluating frameworks and associated applications built upon them. These constitute a general approach to tackling problems of personalization by enabling users to scrutinize their user models as a basis for understanding and controlling personalization.