Towards Providing On-Demand Expert Support for Software Developers
Software and Programming Tools
/
Chen, Yan
/
Oney, Steve
/
Lasecki, Walter S.
Proceedings of the ACM CHI'16 Conference on Human Factors in Computing
Systems
2016-05-07
v.1
p.3192-3203
© Copyright 2016 ACM
Summary: Software development is an expert task that requires complex reasoning and
the ability to recall language or API-specific details. In practice, developers
often seek support from IDE tools, Web resources, or other developers to help
fill in gaps in their knowledge on-demand. In this paper, we present two
studies that seek to inform the design of future systems that use remote
experts to support developers on demand. The first explores what types of
questions developers would ask a hypothetical assistant capable of answering
any question they pose. The second study explores the interactions between
developers and remote experts in supporting roles. Our results suggest eight
key system features needed for on-demand remote developer assistants to be
effective, which has implications for future human-powered development tools.
Coding Varied Behavior Types Using the Crowd
Demos
/
Yim, Jinyeong
/
Jasani, Jeel
/
Henderson, Aubrey
/
Koutra, Danai
/
Dow, Steven
/
Leung, Winnie
/
Lim, Ellen
/
Gordon, Mitchell
/
Bigham, Jeffrey
/
Lasecki, Walter
Companion Proceedings of ACM CSCW 2016 Conference on Computer-Supported
Cooperative Work and Social Computing
2016-02-27
v.2
p.114-117
© Copyright 2016 ACM
Summary: Social science researchers spend significant time annotating behavioral
events in video data in order to quantitatively assess interactions [2]. These
behavioral events may be instantaneous changes, continuous actions that span
unbounded periods of time, or behaviors that would be best described by
severity or other scalar ratings. The complexity of these judgments, coupled
with the time and effort required to meticulously assess video, results in a
training and evaluation process that can take days or weeks. Computational
analysis of video data is still limited due to the challenges introduced by
objective interpretation and varied contexts. Glance [4] introduced a means of
leveraging human intelligence by recruiting crowds of paid online workers to
accurately analyze hours of video data in a matter of minutes. This approach
has been shown to expedite work in human-centered fields, as well as generate
training data for automated recognition systems. In this paper, we describe an
interactive demonstration of an improved, more expressive version of Glance
that expands the initial set of supported annotation formats (e.g. time range,
classification, etc.) from one to nine. Worker interfaces for each of these
options are dynamically generated, along with tutorials, based on the analyst's
question. These new features allow analysts to acquire more specific
information about events in video datasets.
Measuring text simplification with the crowd
Human computation
/
Lasecki, Walter S.
/
Rello, Luz
/
Bigham, Jeffrey P.
Proceedings of the 2015 International Cross-Disciplinary Conference on Web
Accessibility (W4A)
2015-05-18
p.4
© Copyright 2015 ACM
Summary: Text can often be complex and difficult to read, especially for people with
cognitive impairments or low literacy skills. Text simplification is a process
that reduces the complexity of both wording and structure in a sentence, while
retaining its meaning. However, this is currently a challenging task for
machines, and thus, providing effective on-demand text simplification to those
who need it remains an unsolved problem. Even evaluating the simplicity of text
remains a challenging problem for both computers, which cannot understand the
meaning of text, and humans, who often struggle to agree on what constitutes a
good simplification.
This paper focuses on the evaluation of English text simplification using
the crowd. We show that leveraging crowds can result in a collective decision
that is accurate and converges to a consensus rating. Our results from 2,500
crowd annotations show that the crowd can effectively rate levels of
simplicity. This may allow simplification systems and system builders to get
better feedback about how well content is being simplified, as compared to
standard measures which classify content into 'simplified' or 'not simplified'
categories. Our study provides evidence that the crowd could be used to
evaluate English text simplification, as well as to create simplified text in
future work.
The Effects of Sequence and Delay on Crowd Work
Evaluating Crowdsourcing
/
Lasecki, Walter S.
/
Rzeszotarski, Jeffrey M.
/
Marcus, Adam
/
Bigham, Jeffrey P.
Proceedings of the ACM CHI'15 Conference on Human Factors in Computing
Systems
2015-04-18
v.1
p.1375-1378
© Copyright 2015 ACM
Summary: A common approach in crowdsourcing is to break large tasks into small
microtasks so that they can be parallelized across many crowd workers and so
that redundant work can be more easily compared for quality control. In
practice, this can result in the microtasks being presented out of their
natural order and often introduces delays between individual microtasks. In
this paper, we demonstrate in a study of 338 crowd workers that non-sequential
microtasks and the introduction of delays significantly decreases worker
performance. We show that interruptions where a large delay occurs between two
related tasks can cause up to a 102% slowdown in completion time, and
interruptions where workers are asked to perform different tasks in sequence
can slow down completion time by 57%. We conclude with a set of design
guidelines to improve both worker performance and realized pay, and
instructions for implementing these changes in existing interfaces for crowd
work.
Apparition: Crowdsourced User Interfaces that Come to Life as You Sketch
Them
Understanding Crowdwork in Many Domains
/
Lasecki, Walter S.
/
Kim, Juho
/
Rafter, Nick
/
Sen, Onkur
/
Bigham, Jeffrey P.
/
Bernstein, Michael S.
Proceedings of the ACM CHI'15 Conference on Human Factors in Computing
Systems
2015-04-18
v.1
p.1925-1934
© Copyright 2015 ACM
Summary: Prototyping allows designers to quickly iterate and gather feedback, but the
time it takes to create even a Wizard-of-Oz prototype reduces the utility of
the process. In this paper, we introduce crowdsourcing techniques and tools for
prototyping interactive systems in the time it takes to describe the idea. Our
Apparition system uses paid microtask crowds to make even hard-to-automate
functions work immediately, allowing more fluid prototyping of interfaces that
contain interactive elements and complex behaviors. As users sketch their
interface and describe it aloud in natural language, crowd workers and sketch
recognition algorithms translate the input into user interface elements, add
animations, and provide Wizard-of-Oz functionality. We discuss how design teams
can use our approach to reflect on prototypes or begin user studies within
seconds, and how, over time, Apparition prototypes can become fully-implemented
versions of the systems they simulate. Powering Apparition is the first
self-coordinated, real-time crowdsourcing infrastructure. We anchor this
infrastructure on a new, lightweight write-locking mechanism that workers can
use to signal their intentions to each other.
Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds
Understanding Crowdwork in Many Domains
/
Laput, Gierad
/
Lasecki, Walter S.
/
Wiese, Jason
/
Xiao, Robert
/
Bigham, Jeffrey P.
/
Harrison, Chris
Proceedings of the ACM CHI'15 Conference on Human Factors in Computing
Systems
2015-04-18
v.1
p.1935-1944
© Copyright 2015 ACM
Summary: The promise of "smart" homes, workplaces, schools, and other environments
has long been championed. Unattractive, however, has been the cost to run wires
and install sensors. More critically, raw sensor data tends not to align with
the types of questions humans wish to ask, e.g., do I need to restock my
pantry? Although techniques like computer vision can answer some of these
questions, it requires significant effort to build and train appropriate
classifiers. Even then, these systems are often brittle, with limited ability
to handle new or unexpected situations, including being repositioned and
environmental changes (e.g., lighting, furniture, seasons). We propose Zensors,
a new sensing approach that fuses real-time human intelligence from online
crowd workers with automatic approaches to provide robust, adaptive, and
readily deployable intelligent sensors. With Zensors, users can go from
question to live sensor feed in less than 60 seconds. Through our API, Zensors
can enable a variety of rich end-user applications and moves us closer to the
vision of responsive, intelligent environments.
Exploring Privacy and Accuracy Trade-Offs in Crowdsourced Behavioral Video
Coding
Understanding Crowdwork in Many Domains
/
Lasecki, Walter S.
/
Gordon, Mitchell
/
Leung, Winnie
/
Lim, Ellen
/
Bigham, Jeffrey P.
/
Dow, Steven P.
Proceedings of the ACM CHI'15 Conference on Human Factors in Computing
Systems
2015-04-18
v.1
p.1945-1954
© Copyright 2015 ACM
Summary: Coding behavioral video is an important method used by researchers to
understand social phenomenon. Unfortunately, traditional hand-coding approaches
can take days or weeks of time to complete. Recent work has shown that these
tasks can be completed quickly by leveraging the parallelism of large online
crowds, but using the crowd introduces new concerns about accuracy,
reliability, privacy, and cost. To explore these issues, we conducted
interviews with 12 researchers who frequently code behavioral video, to
investigate common practices and challenges with video coding. We find accuracy
and privacy to be the researchers' primary concerns. To explore this more
concretely, we used sample videos to investigate whether crowds can accurately
recognize instances of commonly coded behaviors, and show that the crowd yields
accurate results. Then, we demonstrate a method for obfuscating participant
identity with a video blur filter, and find, as expected, that workers' ability
to identify participants decreases as blur level increases. The workers'
ability to accurately and reliably code behaviors also decreases, but not as
steeply as the identity test. This trade-off between coding quality and privacy
protection suggests that researchers can use online crowds to code for some key
behaviors in video without compromising participant identity. We conclude with
a discussion of how researchers can balance privacy and accuracy on their own
data using a system we introduce called Incognito.
RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for
Blind Users
Accessibility at Home & on The Go
/
Zhong, Yu
/
Lasecki, Walter S.
/
Brady, Erin
/
Bigham, Jeffrey P.
Proceedings of the ACM CHI'15 Conference on Human Factors in Computing
Systems
2015-04-18
v.1
p.2353-2362
© Copyright 2015 ACM
Summary: Blind people often seek answers to their visual questions from remote
sources, however, the commonly adopted single-image, single-response model does
not always guarantee enough bandwidth between users and sources. This is
especially true when questions concern large sets of information, or spatial
layout, e.g., where is there to sit in this area, what tools are on this work
bench, or what do the buttons on this machine do? Our RegionSpeak system
addresses this problem by providing an accessible way for blind users to (i)
combine visual information across multiple photographs via image stitching, em
(ii) quickly collect labels from the crowd for all relevant objects contained
within the resulting large visual area in parallel, and (iii) then
interactively explore the spatial layout of the objects that were labeled. The
regions and descriptions are displayed on an accessible touchscreen interface,
which allow blind users to interactively explore their spatial layout. We
demonstrate that workers from Amazon Mechanical Turk are able to quickly and
accurately identify relevant regions, and that asking them to describe only one
region at a time results in more comprehensive descriptions of complex images.
RegionSpeak can be used to explore the spatial layout of the regions
identified. It also demonstrates broad potential for helping blind users to
answer difficult spatial layout questions.
Towards Integrating Real-Time Crowd Advice with Reinforcement Learning
Poster & Demo Session
/
de la Cruz, Gabriel V.
/
Peng, Bei
/
Lasecki, Walter S.
/
Taylor, Matthew E.
Companion Proceedings of the 2015 International Conference on Intelligent
User Interfaces
2015-03-29
v.2
p.17-20
© Copyright 2015 ACM
Summary: Reinforcement learning is a powerful machine learning paradigm that allows
agents to autonomously learn to maximize a scalar reward. However, it often
suffers from poor initial performance and long learning times. This paper
discusses how collecting on-line human feedback, both in real time and post
hoc, can potentially improve the performance of such learning systems. We use
the game Pac-Man to simulate a navigation setting and show that workers are
able to accurately identify both when a sub-optimal action is executed, and
what action should have been performed instead. Demonstrating that the crowd is
capable of generating this input, and discussing the types of errors that
occur, serves as a critical first step in designing systems that use this
real-time feedback to improve systems' learning performance on-the-fly.
Increasing the bandwidth of crowdsourced visual question answering to better
support blind users
Poster abstracts
/
Lasecki, Walter S.
/
Zhong, Yu
/
Bigham, Jeffrey P.
Sixteenth International ACM SIGACCESS Conference on Computers and
Accessibility
2014-10-20
p.263-264
© Copyright 2014 ACM
Summary: Many of the visual questions that blind people ask cannot be easily answered
with a single image or a short response, especially when questions are of an
exploratory nature, e.g. what is in this area, or what tools are available on
this work bench? We introduce RegionSpeak to allow blind users to capture large
areas of visual information, identify all of the objects within them, and
explore their spatial layout with fewer interactions. RegionSpeak helps blind
users capture all of the relevant visual information using an interface
designed to support stitching multiple images together. We use a parallel
crowdsourcing workflow that asks workers to define and describe regions of
interest, allowing even complex images to be described quickly. The regions and
descriptions are displayed on an auditory touchscreen interface, allowing users
to know what is in a scene and how it is laid out.
Legion scribe: real-time captioning by non-experts
Demonstration abstracts
/
Lasecki, Walter S.
/
Kushalnagar, Raja
/
Bigham, Jeffrey P.
Sixteenth International ACM SIGACCESS Conference on Computers and
Accessibility
2014-10-20
p.303-304
© Copyright 2014 ACM
Summary: The promise of affordable, automatic approaches to real-time captioning
imagines a future in which deaf and hard of hearing (DHH) users have immediate
access to speech in the world around them my simply picking up their phone or
other mobile device. While the challenges of processing highly variable natural
language has prevented automated approaches from completing this task reliably
enough for use in settings such as classrooms or workplaces [4], recent work in
crowd-powered approaches have allowed groups of non-expert captionists to
provide a similarly-flexible source of captions for DHH users. This is in
contrast to current human-powered approaches, which use highly-trained
professional captionists who can type up to 250 words per minute (WPM), but
also can cost over $100/hr. In this paper, we describe a real-time demo of
Legion:Scribe (or just "Scribe"), a crowd-powered captioning system that allows
untrained participants and volunteers to provide reliable captions with less
than 5 seconds of latency by computationally merging their input into a single
collective answer that is more accurate and more complete than any one worker
could have generated alone.
Expert crowdsourcing with flash teams
Working with crowds
/
Retelny, Daniela
/
Robaszkiewicz, Sébastien
/
To, Alexandra
/
Lasecki, Walter S.
/
Patel, Jay
/
Rahmati, Negar
/
Doshi, Tulsee
/
Valentine, Melissa
/
Bernstein, Michael S.
Proceedings of the 2014 ACM Symposium on User Interface Software and
Technology
2014-10-05
v.1
p.75-85
© Copyright 2014 ACM
Summary: We introduce flash teams, a framework for dynamically assembling and
managing paid experts from the crowd. Flash teams advance a vision of expert
crowd work that accomplishes complex, interdependent goals such as engineering
and design. These teams consist of sequences of linked modular tasks and
handoffs that can be computationally managed. Interactive systems reason about
and manipulate these teams' structures: for example, flash teams can be
recombined to form larger organizations and authored automatically in response
to a user's request. Flash teams can also hire more people elastically in
reaction to task needs, and pipeline intermediate output to accelerate
completion times. To enable flash teams, we present Foundry, an end-user
authoring platform and runtime manager. Foundry allows users to author modular
tasks, then manages teams through handoffs of intermediate work. We demonstrate
that Foundry and flash teams enable crowdsourcing of a broad class of goals
including design prototyping, course development, and film animation, in half
the work time of traditional self-managed teams.
Glance: rapidly coding behavioral video with the crowd
Video
/
Lasecki, Walter S.
/
Gordon, Mitchell
/
Koutra, Danai
/
Jung, Malte F.
/
Dow, Steven P.
/
Bigham, Jeffrey P.
Proceedings of the 2014 ACM Symposium on User Interface Software and
Technology
2014-10-05
v.1
p.551-562
© Copyright 2014 ACM
Summary: Behavioral researchers spend considerable amount of time coding video data
to systematically extract meaning from subtle human actions and emotions. In
this paper, we present Glance, a tool that allows researchers to rapidly query,
sample, and analyze large video datasets for behavioral events that are hard to
detect automatically. Glance takes advantage of the parallelism available in
paid online crowds to interpret natural language queries and then aggregates
responses in a summary view of the video data. Glance provides analysts with
rapid responses when initially exploring a dataset, and reliable codings when
refining an analysis. Our experiments show that Glance can code nearly 50
minutes of video in 5 minutes by recruiting over 60 workers simultaneously, and
can get initial feedback to analysts in under 10 seconds for most clips. We
present and compare new methods for accurately aggregating the input of
multiple workers marking the spans of events in video data, and for measuring
the quality of their coding in real-time before a baseline is established by
measuring the variance between workers. Glance's rapid responses to natural
language queries, feedback regarding question ambiguity and anomalies in the
data, and ability to build on prior context in followup queries allow users to
have a conversation-like interaction with their data -- opening up new
possibilities for naturally exploring video data.
Powering interactive intelligent systems with the crowd
Doctoral symposium
/
Lasecki, Walter S.
Adjunct Proceedings of the 2014 ACM Symposium on User Interface Software and
Technology
2014-10-05
v.2
p.21-24
© Copyright 2014 ACM
Summary: Creating intelligent systems that are able to recognize a user's behavior,
understand unrestricted spoken natural language, complete complex tasks, and
respond fluently could change the way computers are used in daily life. But
fully-automated intelligent systems are a far-off goal -- currently, machines
struggle in many real-world settings because problems can be almost entirely
unconstrained and can vary greatly between instances. Human computation has
been shown to be effective in many of these settings, but is traditionally
applied in an offline, batch-processing fashion. My work focuses on a new model
of continuous, real-time crowdsourcing that enables interactive crowd-powered
systems.
Real-time captioning with the crowd
Features
/
Lasecki, Walter S.
/
Bigham, Jeffrey P.
interactions
2014-05
v.21
n.3
p.50-55
© Copyright 2014 ACM
Crowd storage: storing information on existing memories
Crowdfunding and crowd storage
/
Bigham, Jeffrey P.
/
Lasecki, Walter S.
Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems
2014-04-26
v.1
p.601-604
© Copyright 2014 ACM
Summary: This paper introduces the concept of crowd storage, the idea that digital
files can be stored and retrieved later from the memories of people in the
crowd. Similar to human memory, crowd storage is ephemeral, which means that
storage is temporary and the quality of the stored information degrades over
time. Crowd storage may be preferred over storing information directly in the
cloud, or when it is desirable for information to degrade inline with normal
human memories. To explore and validate this idea, we created WeStore, a system
that stores and then later retrieves digital files in the existing memories of
crowd workers. WeStore does not store information directly, but rather encrypts
the files using details of the existing memories elicited from individuals
within the crowd as cryptographic keys. The fidelity of the retrieved
information is tied to how well the crowd remembers the details of the memories
they provided. We demonstrate that crowd storage is feasible using an existing
crowd marketplace (Amazon Mechanical Turk), explore design considerations
important for building systems that use crowd storage, and outline ideas for
future research in this area.
Finding dependencies between actions using the crowd
Decisions, recommendations, and machine learning
/
Lasecki, Walter S.
/
Weingard, Leon
/
Ferguson, George
/
Bigham, Jeffrey P.
Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems
2014-04-26
v.1
p.3095-3098
© Copyright 2014 ACM
Summary: Activity recognition can provide computers with the context underlying user
inputs, enabling more relevant responses and more fluid interaction. However,
training these systems is difficult because it requires observing every
possible sequence of actions that comprise a given activity. Prior work has
enabled the crowd to provide labels in real-time to train automated systems
on-the-fly, but numerous examples are still needed before the system can
recognize an activity on its own. To reduce the need to collect this data by
observing users, we introduce ARchitect, a system that uses the crowd to
capture the dependency structure of the actions that make up activities. Our
tests show that over seven times as many examples can be collected using our
approach versus relying on direct observation alone, demonstrating that by
leveraging the understanding of the crowd, it is possible to more easily train
automated systems.
Glance: enabling rapid interactions with data using the crowd
Interactivity
/
Lasecki, Walter S.
/
Gordon, Mitchell
/
Dow, Steven P.
/
Bigham, Jeffrey P.
Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems
2014-04-26
v.2
p.511-514
© Copyright 2014 ACM
Summary: Behavioral coding is a common technique in the social sciences and human
computer interaction for extracting meaning from video data [3]. Since computer
vision cannot yet reliably interpret human actions and emotions, video coding
remains a time-consuming manual process done by a small team of researchers. We
present Glance, a tool that allows researchers to rapidly analyze video
datasets for behavioral events that are difficult to detect automatically.
Glance uses the crowd to interpret natural language queries, and then
aggregates and summarizes the content of the video. We show that Glance can
accurately code events in video in a fraction of the time it would take a
single person. We also investigate speed improvements made possible by
recruiting large crowds, showing that Glance is able to code 80% of an
hour-long video in just 5 minutes. Rapid coding allows participants to have a
"conversation with their data" to rapidly develop and refine research
hypotheses in ways not previously possible.
Selfsourcing personal tasks
Works-in-progress
/
Teevan, Jaime
/
Liebling, Daniel J.
/
Lasecki, Walter S.
Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems
2014-04-26
v.2
p.2527-2532
© Copyright 2014 ACM
Summary: Large tasks can be overwhelming. For example, many people have thousands of
digital photographs that languish in unorganized archives because it is
difficult and time consuming to gather them into meaningful collections. Such
tasks are hard to start because they seem to require long uninterrupted periods
of effort to make meaningful progress. We propose the idea of selfsourcing as a
way to help people to perform large personal information tasks by breaking them
into manageable microtasks. Using ideas from crowdsourcing and task management,
selfsourcing can help people take advantage of existing gaps in time and
recover quickly from interruptions. We present several achievable selfsourcing
scenarios and explore how they can facilitate information work in
interruption-driven environments.
Helping students keep up with real-time captions by pausing and highlighting
Education
/
Lasecki, Walter S.
/
Kushalnagar, Raja
/
Bigham, Jeffrey P.
Proceedings of the 2014 International Cross-Disciplinary Conference on Web
Accessibility (W4A)
2014-04-07
p.39
© Copyright 2014 ACM
Summary: We explore methods for improving the readability of real-time captions by
allowing users to more easily switch their gaze between multiple visual
information sources. Real-time captioning provides deaf and hard of hearing
(DHH) users with access to spoken content during live events, and the web has
allowed these services to be provided via remotely-located captioning services,
and for web content itself. However, despite caption benefits, spoken language
reading rates often result in DHH users falling behind spoken content,
especially when the audio is paired with visual references. This is
particularly true in classroom settings, where multi-modal content is the norm,
and captions are often poorly positioned in the room, relative to speakers.
Additionally, this accommodation can benefit other students who face temporary
or "situational" disabilities such as listening to unfamiliar speech accents,
or if a student is in a location with poor acoustics.
In this paper, we explore pausing and highlighting as a means of helping DHH
students keep up with live classroom content by helping them track their place
when reading text involving visual references. Our experiments show that by
providing users with a tool to more easily track their place in a transcript
while viewing live video, it is possible for them to follow visual content that
might otherwise have been missed. Both pausing and highlighting have a positive
impact on students' scores on comprehension tests, but highlighting is
preferred to pausing, and yields nearly twice as large of an improvement. We
then discuss several issues with captioning that we observed during our design
process and user study, and then suggest future work that builds on these
insights.
Information extraction and manipulation threats in crowd-powered systems
Performing crowd work
/
Lasecki, Walter S.
/
Teevan, Jaime
/
Kamar, Ece
Proceedings of ACM CSCW 2014 Conference on Computer-Supported Cooperative
Work and Social Computing
2014-02-15
v.1
p.248-256
© Copyright 2014 ACM
Summary: Crowd-powered systems have become a popular way to augment the capabilities
of automated systems in real-world settings. Many of these systems rely on
human workers to process potentially sensitive data or make important
decisions. This puts these systems at risk of unintentionally releasing
sensitive data or having their outcomes maliciously manipulated. While almost
all crowd-powered approaches account for errors made by individual workers, few
factor in active attacks on the system. In this paper, we analyze different
forms of threats from individuals and groups of workers extracting information
from crowd-powered systems or manipulating these systems' outcomes. Via a set
of studies performed on Amazon's Mechanical Turk platform and involving 1,140
unique workers, we demonstrate the viability of these threats. We show that the
current system is vulnerable to coordinated attacks on a task based on the
requests of another task and that a significant portion of Mechanical Turk
workers are willing to contribute to an attack. We propose several possible
approaches to mitigating these threats, including leveraging workers who are
willing to go above and beyond to help, automatically flagging sensitive
content, and using workflows that conceal information from each individual,
while still allowing the group to complete a task. Our findings enable the
crowd to continue to play an important part in automated systems, even as the
data they use and the decisions they support become increasingly important.
Accessibility Evaluation of Classroom Captions
/
Kushalnagar, Raja S.
/
Lasecki, Walter S.
/
Bigham, Jeffrey P.
ACM Transactions on Accessible Computing
2014-01
v.5
n.3
p.7
© Copyright 2014 ACM
Summary: Real-time captioning enables deaf and hard of hearing (DHH) people to follow
classroom lectures and other aural speech by converting it into visual text
with less than a five second delay. Keeping the delay short allows end-users to
follow and participate in conversations. This article focuses on the
fundamental problem that makes real-time captioning difficult: sequential
keyboard typing is much slower than speaking. We first surveyed the audio
characteristics of 240 one-hour-long captioned lectures on YouTube, such as
speed and duration of speaking bursts. We then analyzed how these
characteristics impact caption generation and readability, considering
specifically our human-powered collaborative captioning approach. We note that
most of these characteristics are also present in more general domains. For our
caption comparison evaluation, we transcribed a classroom lecture in real-time
using all three captioning approaches. We recruited 48 participants (24 DHH) to
watch these classroom transcripts in an eye-tracking laboratory. We presented
these captions in a randomized, balanced order. We show that both hearing and
DHH participants preferred and followed collaborative captions better than
those generated by automatic speech recognition (ASR) or professionals due to
the more consistent flow of the resulting captions. These results show the
potential to reliably capture speech even during sudden bursts of speed, as
well as for generating "enhanced" captions, unlike other human-powered
captioning approaches.
Answering visual questions with conversational crowd assistants
Papers
/
Lasecki, Walter S.
/
Thiha, Phyo
/
Zhong, Yu
/
Brady, Erin
/
Bigham, Jeffrey P.
Fifteenth Annual ACM SIGACCESS Conference on Assistive Technologies
2013-10-21
p.18
© Copyright 2013 ACM
Summary: Blind people face a range of accessibility challenges in their everyday
lives, from reading the text on a package of food to traveling independently in
a new place. Answering general questions about one's visual surroundings
remains well beyond the capabilities of fully automated systems, but recent
systems are showing the potential of engaging on-demand human workers (the
crowd) to answer visual questions. The input to such systems has generally been
a single image, which can limit the interaction with a worker to one question;
or video streams where systems have paired the end user with a single worker,
limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a
system that assists users over the course of longer interactions by engaging
workers in a continuous conversation with the user about a video stream from
the user's mobile device. We demonstrate the benefit of using multiple crowd
workers instead of just one in terms of both latency and accuracy, then conduct
a study with 10 blind users that shows Chorus:View answers common visual
questions more quickly and accurately than existing approaches. We conclude
with a discussion of users' feedback and potential future work on interactive
crowd support of blind users.
Real-time captioning by non-experts with legion scribe
Posters and demos
/
Lasecki, Walter S.
/
Miller, Christopher D.
/
Kushalnagar, Raja
/
Bigham, Jeffrey P.
Fifteenth Annual ACM SIGACCESS Conference on Assistive Technologies
2013-10-21
p.55
© Copyright 2013 ACM
Summary: Real-time captioning provides people who are deaf or hard of hearing access
to speech in settings such as classrooms and live events. The most reliable
approach to provide these captions is to recruit an expert stenographer who is
able to type at natural speaking rates, but they charge more than $100 USD per
hour and must be scheduled in advance. We introduce Legion Scribe (Scribe), a
system that allows 3-5 ordinary people who can hear and type to jointly caption
speech in real-time. Each person is unable to type at natural speaking rates,
and so is asked only to type part of what they hear. Scribe automatically
stitches all of the partial captions together to form a complete caption
stream. We have shown that the accuracy of Scribe captions approaches that of a
professional stenographer, while its latency and cost is dramatically lower.
Chorus: a crowd-powered conversational assistant
Crowd & creativity
/
Lasecki, Walter S.
/
Wesley, Rachel
/
Nichols, Jeffrey
/
Kulkarni, Anand
/
Allen, James F.
/
Bigham, Jeffrey P.
Proceedings of the 2013 ACM Symposium on User Interface Software and
Technology
2013-10-08
v.1
p.151-162
© Copyright 2013 ACM
Summary: Despite decades of research attempting to establish conversational
interaction between humans and computers, the capabilities of automated
conversational systems are still limited. In this paper, we introduce Chorus, a
crowd-powered conversational assistant. When using Chorus, end users converse
continuously with what appears to be a single conversational partner. Behind
the scenes, Chorus leverages multiple crowd workers to propose and vote on
responses. A shared memory space helps the dynamic crowd workforce maintain
consistency, and a game-theoretic incentive mechanism helps to balance their
efforts between proposing and voting. Studies with 12 end users and 100 crowd
workers demonstrate that Chorus can provide accurate, topical responses,
answering nearly 93% of user queries appropriately, and staying on-topic in
over 95% of responses. We also observed that Chorus has advantages over pairing
an end user with a single crowd worker and end users completing their own tasks
in terms of speed, quality, and breadth of assistance. Chorus demonstrates a
new future in which conversational assistants are made usable in the real world
by combining human and machine intelligence, and may enable a useful new way of
interacting with the crowds powering other systems.