HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,237,481
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Penn_G* Results: 15 Sorted by: Date  Comments?
Help Dates
Limit:   
Speech-based Interaction: Myths, Challenges, and Opportunities Course Overviews / Munteanu, Cosmin / Penn, Gerald Extended Abstracts of the ACM CHI'16 Conference on Human Factors in Computing Systems 2016-05-07 v.2 p.992-995
ACM Digital Library Link
Summary: HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines -- despite, and perhaps, because it is the highest-bandwidth communication channel we possess. While significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent on improving machines' ability to understand speech, the CHI community (and the HCI field at large) has been relatively timid in embracing this modality as a central focus of research. This can be attributed in part to the relatively discouraging levels of accuracy in understanding speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces. As such, the development of interactive speech-based systems is mostly driven by engineering efforts to improve such systems with respect to largely arbitrary performance metrics. Such developments have often been void of any user-centered design principles or consideration for usability or usefulness. The goal of this course is to inform the CHI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, what are their limitations, and how they could be used to enhance current interaction paradigms. Through this, we hope that HCI researchers and practitioners will learn how to combine recent advances in speech processing with user-centred principles in designing more usable and useful speech-based interactive systems.

Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications Workshop Summaries / Munteanu, Cosmin / Irani, Pourang / Oviatt, Sharon / Aylett, Matthew / Penn, Gerald / Pan, Shimei / Sharma, Nikhil / Rudzicz, Frank / Gomez, Randy / Nakamura, Keisuke / Nakadai, Kazuhiro Extended Abstracts of the ACM CHI'16 Conference on Human Factors in Computing Systems 2016-05-07 v.2 p.3612-3619
ACM Digital Library Link
Summary: Traditional interfaces are continuously being replaced by mobile, wearable, or pervasive interfaces. Yet when it comes to the input and output modalities enabling our interactions, we have yet to fully embrace some of the most natural forms of communication and information processing that humans possess: speech, language, gestures, thoughts. Very little HCI attention has been dedicated to designing and developing spoken language and multimodal interaction techniques, especially for mobile and wearable devices. In addition to the enormous, recent engineering progress in processing such modalities, there is now sufficient evidence that many real-life applications do not require 100% accuracy of processing multimodal input to be useful, particularly if such modalities complement each other. This multidisciplinary, two-day workshop will bring together interaction designers, usability researchers, and general HCI practitioners to analyze the opportunities and directions to take in designing more natural interactions with mobile and wearable devices, and to look at how we can leverage recent advances in speech and multimodal processing.

Speech-based Interaction: Myths, Challenges, and Opportunities Course Overviews / Munteanu, Cosmin / Penn, Gerald Extended Abstracts of the ACM CHI'15 Conference on Human Factors in Computing Systems 2015-04-18 v.2 p.2483-2484
ACM Digital Library Link
Summary: HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines -- despite, and perhaps, because it is the highest-bandwidth communication channel we possess. While significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent on improving machines' ability to understand speech, the HCI community has been relatively timid in embracing this modality as a central focus of research. This can be attributed in part to the relatively discouraging levels of accuracy in understanding speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces. The goal of this course is to inform the CHI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, what are their limitations, and how they could be used to enhance current interaction paradigms. Through this, we hope that CHI researchers and general HCI, UI, and UX practitioners will learn how to combine recent advances in speech processing with user-centred principles in designing more usable and useful speech-based interactive systems.

Speech-based Interaction: Myths, Challenges, and Opportunities Tutorials / Munteanu, Cosmin / Penn, Gerald Proceedings of the 2015 International Conference on Intelligent User Interfaces 2015-03-29 v.1 p.437-438
ACM Digital Library Link
Summary: HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines -- despite, and perhaps, because it is the highest-bandwidth communication channel we possess. While significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent on improving machines' ability to understand speech, the HCI community has been relatively timid in embracing this modality as a central focus of research. This can be attributed in part to the relatively discouraging levels of accuracy in understanding speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces.
    The goal of this course is to inform the IUI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, what are their limitations, and how they could be used to enhance current interaction paradigms. Through this, we hope that IUI researchers and general HCI, UI, and UX practitioners will learn how to combine recent advances in speech processing with user-centred principles in designing more usable and useful speech-based interactive systems.

Speech-based interaction: myths, challenges, and opportunities Interactive tutorials / Munteanu, Cosmin / Penn, Gerald Proceedings of 2014 Conference on Human-Computer Interaction with Mobile Devices and Services 2014-09-23 p.567-568
ACM Digital Library Link
Summary: Human-Computer Interaction (HCI) research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines. This is largely due to speech being the highest-bandwidth communication channel we possess. As such, significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent during the past several decades on improving machines' ability to understand speech. Yet, the MobileHCI community (and HCI in general) has been relatively timid in embracing this modality as a central focus of research. This can be attributed in part to the relatively discouraging levels of accuracy in understanding speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces.
    The goal of this course is to inform the MobileHCI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, what are their limitations, and how they could be used to enhance current interaction paradigms. Through this, we hope that MobileHCI researchers and practitioners will learn how to combine recent advances in speech processing with user-centred principles in designing more usable and useful speech-based interactive systems.

Designing speech and language interactions Workshop summaries / Munteanu, Cosmin / Jones, Matt / Whittaker, Steve / Oviatt, Sharon / Aylett, Matthew / Penn, Gerald / Brewster, Stephen / d'Alessandro, Nicolas Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems 2014-04-26 v.2 p.75-78
ACM Digital Library Link
Summary: Speech and natural language remain our most natural forms of interaction; yet the HCI community have been very timid about focusing their attention on designing and developing spoken language interaction techniques. While significant efforts are spent and progress made in speech recognition, synthesis, and natural language processing, there is now sufficient evidence that many real-life applications using speech technologies do not require 100% accuracy to be useful. This is particularly true if such systems are designed with complementary modalities that better support their users or enhance the systems' usability. Engaging the CHI community now is timely -- many recent commercial applications, especially in the mobile space, are already tapping the increased interest in and need for natural user interfaces (NUIs) by enabling speech interaction in their products. This multidisciplinary, one-day workshop will bring together interaction designers, usability researchers, and general HCI practitioners to analyze the opportunities and directions to take in designing more natural interactions based on spoken language, and to look at how we can leverage recent advances in speech processing in order to gain widespread acceptance of speech and natural language interaction.

The CBC newsworld holodeck Interactivity / Ladly, Martha / Penn, Gerald / Chen, Cathy Pin Chun / Chintraruck, Pavika / Ghaderi, Maziar / Ludlow, Bryn A. / Peter, Jessica / Tanyag, Ruzette / Zhou, Peggy / Kazemian, Siavash Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems 2014-04-26 v.2 p.363-366
ACM Digital Library Link
Summary: For the past 73 years, the CBC has disseminated a unique Canadian perspective across the world, producing a phenomenally rich multimedia record of the country and our social, political and cultural heritage and news. This project utilizes visualization and sonification of portions of an enormous historical CBC Newsworld data corpus to enable an "on this day" experience for viewers. The digitized collection of 24-hour news videos spans a 24-year period (1989-2013) within an immersive multiscreen environment, to enable gesture-driven context-aware browsing, information seeking, and segment review. Employing natural language processing technologies, the interface displays keywords and key phrases identified in the transcripts, enabling serendipitous video search and display and offering a unique browsing opportunity within this rich "big data" corpus.

Speech-based interaction: myths, challenges, and opportunities Courses / Munteanu, Cosmin / Penn, Gerald Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems 2014-04-26 v.2 p.1035-1036
ACM Digital Library Link
Summary: HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines -- despite, and perhaps, because it is the highest-bandwidth communication channel we possess. While significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent on improving machines' ability to understand speech, the CHI community has been relatively timid in embracing this modality as a central focus of research. This can be attributed in part to the relatively discouraging levels of accuracy in understanding speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces. As such, the development of interactive speech-based systems is mostly driven by engineering efforts to improve such systems with respect to largely arbitrary performance metrics, often void of any user-centered design principles or consideration for usability or usefulness.
    The goal of this course is to inform the CHI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, what are their limitations, and how they could be used to enhance current interaction paradigms. Through this, we hope that HCI researchers and practitioners will learn how to combine recent advances in speech processing with user-centered principles in designing more usable and useful speech-based interactive systems.

We need to talk: HCI and the delicate topic of spoken language interaction Panels / Munteanu, Cosmin / Jones, Matt / Oviatt, Sharon / Brewster, Stephen / Penn, Gerald / Whittaker, Steve / Rajput, Nitendra / Nanavati, Amit Extended Abstracts of ACM CHI'13 Conference on Human Factors in Computing Systems 2013-04-27 v.2 p.2459-2464
ACM Digital Library Link
Summary: Speech and natural language remain our most natural form of interaction; yet the HCI community have been very timid about focusing their attention on designing and developing spoken language interaction techniques. This may be due to a widespread perception that perfect domain-independent speech recognition is an unattainable goal. Progress is continuously being made in the engineering and science of speech and natural language processing, however, and there is also recent research that suggests that many applications of speech require far less than 100% accuracy to be useful in many contexts. Engaging the CHI community now is timely -- many recent commercial applications, especially in the mobile space, are already tapping the increased interest in and need for natural user interfaces (NUIs) by enabling speech interaction in their products. As such, the goal of this panel is to bring together interaction designers, usability researchers, and general HCI practitioners to discuss the opportunities and directions to take in designing more natural interactions based on spoken language, and to look at how we can leverage recent advances in speech processing in order to gain widespread acceptance of speech and natural language interaction.

SeeSay and HearSay CAPTCHA for mobile interaction Papers: mobile interaction / Shirali-Shahreza, Sajad / Penn, Gerald / Balakrishnan, Ravin / Ganjali, Yashar Proceedings of ACM CHI 2013 Conference on Human Factors in Computing Systems 2013-04-27 v.1 p.2147-2156
ACM Digital Library Link
Summary: Speech certainly has advantages as an input modality for smartphone applications, especially in scenarios where using touch or keyboard entry is difficult, on increasingly miniaturized devices where useable keyboards are difficult to accommodate, or in scenarios where only small amounts of text need to be input, such as when entering SMS texts or responding to a CAPTCHA challenge. In this paper, we propose two new alternative ways to design CAPTCHAs in which the user says the answer instead of typing it with (a) output stimuli provided visually (SeeSay) or (b) auditorily (HearSay). Our user study results show that SeeSay CAPTCHA requires less time to be solved and users prefer it over current text-based CAPTCHA methods.

An ecologically valid evaluation of speech summarization Work-in-progress / McCallum, Anthony / Munteanu, Cosmin / Penn, Gerald / Zhu, Xiaodan Extended Abstracts of ACM CHI'12 Conference on Human Factors in Computing Systems 2012-05-05 v.2 p.2219-2224
ACM Digital Library Citation
Summary: The past decade has witnessed an explosion in the size and availability of online audio-visual repositories, such as entertainment, news, or lectures. Summarization systems have the potential to provide significant assistance with navigating such repositories. Unfortunately, automatically-generated summaries often fall short of delivering the information needed by users. This is due, in no small part, to the fact that the natural language heuristics used to generate summaries are often optimized with respect to currently-used evaluation metrics. Such metrics simply score automatically-generated summaries against subjectively-classified gold standards without taking into account the usefulness of a summary in assisting a user achieve a certain goal or even overall summary coherence. We have previously shown that an immediate consequence of this problem is that even the most linguistically-complex summarization systems perform no better than basic heuristics, such as picking the longest sentences from a general-topic, spontaneous dialog, or the first few sentences from a news recording. Our hypothesis is that complex systems are in fact better, if measured properly. What is thus needed instead are evaluation metrics (and consequently, automatic summarizers) that incorporate features such as user preferences and task-orientation. For this, we propose an ecologically valid evaluation metric that determines the value of a summary when embedded in a task, rather than how closely a summary matches a gold standard.

Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts Collaborative User Interfaces / Munteanu, Cosmin / Baecker, Ron / Penn, Gerald Proceedings of ACM CHI 2008 Conference on Human Factors in Computing Systems 2008-04-05 v.1 p.373-382
ACM Digital Library Link
Summary: One challenge in facilitating skimming or browsing through archives of on-line recordings of webcast lectures is the lack of text transcripts of the recorded lecture. Ideally, transcripts would be obtainable through Automatic Speech Recognition (ASR). However, current ASR systems can only deliver, in realistic lecture conditions, a Word Error Rate of around 45% -- above the accepted threshold of 25%. In this paper, we present the iterative design of a webcast extension that engages users to collaborate in a wiki-like manner on editing the ASR-produced imperfect transcripts, and show that this is a feasible solution for improving the quality of lecture transcripts. We also present the findings of a field study carried out in a real lecture environment investigating how students use and edit the transcripts.

Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't Poster Session 1 / Munteanu, Cosmin / Penn, Gerald / Baecker, Ron / Zhang, Yuecheng Proceedings of the 2006 International Conference on Multimodal Interfaces 2006-11-02 p.39-42
Keywords: automatic speech recognition, collaboration, webcasts
ACM Digital Library Link
Summary: The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One challenge to skimming and browsing through such archives is the lack of text transcripts of the webcast's audio channel. This paper describes a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototype-based and Wizard of Oz simulations. We used such a system in a user study showing that transcripts with WERs less than 25% are acceptable for use in webcast archives. As current ASR systems can only deliver, in realistic conditions, Word Error Rates (WERs) of around 45%, we also describe a solution for reducing the WER of such transcripts by engaging users to collaborate in a "wiki" fashion on editing the imperfect transcripts obtained through ASR.

The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives Visualization and search / Munteanu, Cosmin / Baecker, Ronald / Penn, Gerald / Toms, Elaine / James, David Proceedings of ACM CHI 2006 Conference on Human Factors in Computing Systems 2006-04-22 v.1 p.493-502
Best paper nominee: The authors have conducted an important experiment that establishes minimum levels of accuracy that will make automatic speech recognition useful for navigating transcriptions of webcasts. This result is particularly timely given the growing availability and use of webcasts in research and education.
ACM Digital Library Link
Summary: The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25% Word Error Rate (WER), transcripts with 45% WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45% WER are unsatisfactory, and suggests that transcripts having a WER of 25% or less would be useful and usable in webcast archives.

INTERNET Knowledge Media Design Institute / Alleyne, Joel / Baber, Zaheer / Baecker, Ronald / Balakrishnan, Ravin / Berry, Brent / Birnholtz, Jeremy / Boler, Megan / Brett, Clare / Buliung, Ron / Caidi, Nadia / Chan, Leslie / Chignell, Mark / Choo, Chun Wei / Clement, Andrew / Consens, Mariano / Danahy, John / Deibert, Ronald / de Kerckhove, Derrick / de Lara, Eyal / Dryer, Marc / Easterbrook, Steve / Eysenbach, Gunther / Fiume, Eugene / Fox, Mark / Garrett, Frances / Goldfarb, Avi / Gotlieb, Calvin / Hewitt, Jim / Hirst, Graeme / Hockema, Stephen / Hyman, Avi / Hoinkes, Rodney / Jacobsen, H.-Arno / Jadad, Alex / Jamieson, Gregory / Jenkinson, Jodie / Jones, Charles / Kaplan, Louis / Kolodny, Harvey / Koudas, Nick / Lancashire, Ian / Logan, Bob / Luke, Robert / Lyons, Kelly / Mann, Steve / Martimianakis, Tina / Marziali, Elsa / Milgram, Paul / Moller, Henry / Moore, Gale / Murty, Vijaya Kumar / Muter, Paul / Mylopoulos, John / Penn, Gerald / Pennefather, Peter / Phillips, David / Plataniotis, Kostas / Ratto, Matt / Ryan, David / Saroiu, Stefan / Scheffel-Dunand, Dominique / Shafrir, Uri / Singh, Karan / Slotta, Jim / Cantwell, Brian / Spence, Ian / Steele, Lisa / Timmerman, Peter / Treviranus, Jutta / Trifonas, Peter / Truong, Khai / Vicente, Kim / Wellman, Barry / Wensley, Anthony / Wilson-Pauwels, Linda / Wolfe, David / Woodruff, Earl / Woolridge, Nicholas / Wright, Robert / Yu, Eric 2001-01-01 Canada, Ontario, Toronto University of Toronto
Keywords: hci-sites:laboratories |  education:programs |  labs lab laboratory
www.kmdi.utoronto.ca/
Summary: Research Themes:
  • Knowledge media for learning - the application of computer, communications, and cognitive sciences to knowledge building, problem solving, planning, education, and training, especially to facilitate collaborative, distance and multimedia-based learning
  • Technologies for knowledge media - research and development of technologies and the technological infrastructure required to construct knowledge media, including interactive computer graphics, scientific visualization, hypertext, multimedia, databases, natural language processing, and artificial intelligence
  • Human-centred design - the design science of human-computer interaction and of the creation of innovative computer systems and interfaces appropriate for human use, and more generally in the human factors of complex real-world systems and t echnologies, as rooted in research from applied cognitive science, psychology, and sociology
  • Knowledge media, culture, and society - reflection and analysis of the social implications of the increasing reliance on new technologies. As information and new media technologies challenge fundamental beliefs, this area of research deals broadly with such issues as the nature of communities and institutions, work and employment, the balance of public and private good, privacy, copyright and intellectual property.