[1]
Word Embedding based Generalized Language Model for Information Retrieval
Short Papers
/
Ganguly, Debasis
/
Roy, Dwaipayan
/
Mitra, Mandar
/
Jones, Gareth J. F.
Proceedings of the 2015 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2015-08-09
p.795-798
© Copyright 2015 ACM
Summary: Word2vec, a state-of-the-art word embedding technique has gained a lot of
interest in the NLP community. The embedding of the word vectors helps to
retrieve a list of words that are used in similar contexts with respect to a
given word. In this paper, we focus on using the word embeddings for enhancing
retrieval effectiveness. In particular, we construct a generalized language
model, where the mutual independence between a pair of words (say t and t') no
longer holds. Instead, we make use of the vector embeddings of the words to
derive the transformation probabilities between words. Specifically, the event
of observing a term t in the query from a document d is modeled by two distinct
events, that of generating a different term t', either from the document itself
or from the collection, respectively, and then eventually transforming it to
the observed query term t. The first event of generating an intermediate term
from the document intends to capture how well does a term contextually fit
within a document, whereas the second one of generating it from the collection
aims to address the vocabulary mismatch problem by taking into account other
related terms in the collection. Our experiments, conducted on the standard
TREC collection, show that our proposed method yields significant improvements
over LM and LDA-smoothed LM baselines.
[2]
Multidisciplinary Team Dynamics in Service Design -- The Facilitating Role
of Pattern Language
Full Papers
/
Athavankar, Uday
/
Khambete, Pramod
/
Roy, Debjani
/
Chaudhary, Sujata
/
Kimbahune, Sanjay
/
Doke, Pankaj
/
Devkar, Sujit
Proceedings of the IndiaHCI 2014 International Conference on Human Computer
Interaction
2014-12-07
p.16-25
© Copyright 2014 ACM
Summary: Service design is an evolving discipline. Service value is co-created by
service providers and their customer. The complex nature of services requires
collaboration in a multidisciplinary team at the design stage itself to create
service systems that lead to a delightful customer experience. While working in
a multidisciplinary team for service design there is a need to effectively
capture the knowledge of participants from different disciplines and integrate
it in the design process. Team dynamics play an important role in this context
as it is an unconscious, psychological force that influences the direction of a
team's behavior and performance. Therefore, there needs to be a language that
serves as lingua franca to improve the communication and a medium to ensure
effective collaboration within a team. It this paper we share our study of the
team dynamics in a multidisciplinary team while designing for services, and
highlight the role of pattern language as an effective mediating entity.
[3]
Exploring Cards for Patterns to Support Pattern Language Comprehension and
Application in Service Design
Posters
/
Athvankar, Uday
/
Khambete, Pramod
/
Doke, Pankaj
/
Kimbahune, Sanjay
/
Devkar, Sujit
/
Roy, Debjani
/
Chaudhary, Sujata
Proceedings of the IndiaHCI 2014 International Conference on Human Computer
Interaction
2014-12-07
p.112-115
© Copyright 2014 ACM
Summary: Service Design is a complex activity that requires collaboration among
multiple stakeholders. Research indicates that pattern language can help
multidisciplinary team overcome the service design complexity. This success
hinges critically on comprehension and use of pattern language by the
multidisciplinary team. Literature shows that the research focus has been use
of pattern language, not the means required for comprehending pattern language.
To address this need, we explored the use of pattern cards as a tool to
support the comprehension of pattern language. In this paper, we share
experiences of using pattern cards in studies conducted to understand the
complex field of rural healthcare services. The participants of different
domains used the cards for easy reference while designing service
interventions. Analysis showed that pattern cards facilitated the team to
comprehend patterns easily and helped in externalizing thoughts when used with
experience journey map. It enabled discussions among the team keeping pace of
the design process.
[4]
Supporting treatment of people living with HIV / AIDS in resource limited
settings with IVRs
Personal health and wellbeing
/
Joshi, Anirudha
/
Rane, Mandar
/
Roy, Debjani
/
Emmadi, Nagraj
/
Srinivasan, Padma
/
Kumarasamy, N.
/
Pujari, Sanjay
/
Solomon, Davidson
/
Rodrigues, Rashmi
/
Saple, D. G.
/
Sen, Kamalika
/
Veldeman, Els
/
Rutten, Romain
Proceedings of ACM CHI 2014 Conference on Human Factors in Computing Systems
2014-04-26
v.1
p.1595-1604
© Copyright 2014 ACM
Summary: We developed an interactive voice response (IVR) system called TAMA
(Treatment Advice by Mobile Alerts) that provides treatment support to people
living with HIV / AIDS (PLHA) in developing countries, who are on
antiretroviral therapy (ART). We deployed TAMA with 54 PLHA in 5 HIV clinics in
India for a period of 12 weeks. During the study, we gathered feedback about
TAMA's design and usage. Additionally, we conducted detailed qualitative
interviews and analysed usage logs. We found that TAMA was usable and viable in
the real life settings of PLHA and it had many desirable effects on their
treatment adherence. We developed insights that inform the design of TAMA and
some of these can be generalised to design of other long-term, frequent-use IVR
applications for users in developing countries in the healthcare domain and
beyond.
[5]
A portable audio/video recorder for longitudinal study of child development
Poster session
/
Vosoughi, Soroush
/
Goodwin, Matthew S.
/
Washabaugh, Bill
/
Roy, Deb
Proceedings of the 2012 International Conference on Multimodal Interfaces
2012-10-22
p.193-200
© Copyright 2012 ACM
Summary: Collection and analysis of ultra-dense, longitudinal observational data of
child behavior in natural, ecologically valid, non-laboratory settings holds
significant promise for advancing the understanding of child development and
developmental disorders such as autism. To this end, we created the Speechome
Recorder -- a portable version of the embedded audio/video recording technology
originally developed for the Human Speechome Project -- to facilitate swift,
cost-effective deployment in home environments. Recording child behavior daily
in these settings will enable detailed study of developmental trajectories in
children from infancy through early childhood, as well as typical and atypical
dynamics of communication and social interaction as they evolve over time. Its
portability makes possible potentially large-scale comparative study of
developmental milestones in both neurotypical and developmentally delayed
children. In brief, the Speechome Recorder was designed to reduce cost,
complexity, invasiveness and privacy issues associated with naturalistic,
longitudinal recordings of child development.
[6]
Design Opportunities for Supporting Treatment of People Living with HIV /
AIDS in India
Interaction Design for Developing Regions
/
Joshi, Anirudha
/
Rane, Mandar
/
Roy, Debjani
/
Sali, Shweta
/
Bharshankar, Neha
/
Kumarasamy, N.
/
Pujari, Sanjay
/
Solomon, Davidson
/
Sharma, H. Diamond
/
Saple, D. G.
/
Rutten, Romain
/
Ganju, Aakash
/
Van Dam, Joris
Proceedings of IFIP INTERACT'11: Human-Computer Interaction
2011-09-05
v.2
p.315-332
Keywords: HIV/AIDS; healthcare; adherence; user study; design for development
© Copyright 2011 IFIP
Summary: We describe a qualitative user study that we conducted with 64 people living
with HIV/AIDS (PLHA) in India recruited from private sector clinics. Our aim
was to investigate information gaps, problems, and opportunities for design of
relevant technology solutions to support HIV treatment. Our methodology
included clinic visits, observations, discussion with doctors and counsellors,
contextual interviews with PLHA, diary studies, technology tryouts, and home
visits. Analysis identified user statements, observations, breakdowns,
insights, and design ideas. We consolidated our findings across users with an
affinity. We found that despite several efforts, PLHA have limited access to
authentic information. Some know facts and procedures, but lack conceptual
understanding of HIV. Challenges include low education, no access to
technology, lack of socialisation, less time with doctors and counsellors, high
power-distance between PLHA and doctors and counsellors, and information
overload. Information solutions based on mobile phones can lead to better
communication and improve treatment adherence and effectiveness if they are
based on the following: repetition, visualisation, organisation, localisation,
and personalisation of information, improved socialisation, and complementing
current efforts in clinics.
[7]
Grounding spatial language for video search
Speech and language
/
Tellex, Stefanie
/
Kollar, Thomas
/
Shaw, George
/
Roy, Nicholas
/
Roy, Deb
Proceedings of the 2010 International Conference on Multimodal Interfaces
2010-11-08
p.31
© Copyright 2010 ACM
Summary: The ability to find a video clip that matches a natural language description
of an event would enable intuitive search of large databases of surveillance
video. We present a mechanism for connecting a spatial language query to a
video clip corresponding to the query. The system can retrieve video clips
matching millions of potential queries that describe complex events in video
such as "people walking from the hallway door, around the island, to the
kitchen sink." By breaking down the query into a sequence of independent
structured clauses and modeling the meaning of each component of the structure
separately, we are able to improve on previous approaches to video retrieval by
finding clips that match much longer and more complex queries using a rich set
of spatial relations such as "down" and "past." We present a rigorous analysis
of the system's performance, based on a large corpus of task-constrained
language collected from fourteen subjects. Using this corpus, we show that the
system effectively retrieves clips that match natural language descriptions:
58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we
show that spatial relations play an important role in the system's performance.
[8]
Toward understanding natural language directions
Paper session 5: natural language interaction
/
Kollar, Thomas
/
Tellex, Stefanie
/
Roy, Deb
/
Roy, Nicholas
Proceedings of the 5th ACM/IEEE International Conference on Human Robot
Interaction
2010-03-02
p.259-266
Keywords: direction understanding, route instructions, spatial language
© Copyright 2010 ACM
Summary: Speaking using unconstrained natural language is an intuitive and flexible
way for humans to interact with robots. Understanding this kind of linguistic
input is challenging because diverse words and phrases must be mapped into
structures that the robot can understand, and elements in those structures must
be grounded in an uncertain environment. We present a system that follows
natural language directions by extracting a sequence of spatial description
clauses from the linguistic input and then infers the most probable path
through the environment given only information about the environmental geometry
and detected visible objects. We use a probabilistic graphical model that
factors into three key components. The first component grounds landmark phrases
such as "the computers" in the perceptual frame of the robot by exploiting
co-occurrence statistics from a database of tagged images such as Flickr.
Second, a spatial reasoning component judges how well spatial relations such as
"past the computers" describe a path. Finally, verb phrases such as "turn
right" are modeled according to the amount of change in orientation in the
path. Our system follows 60% of the directions in our corpus to within 15
meters of the true destination, significantly outperforming other approaches.
[9]
Grounding spatial prepositions for video search
Doctoral spotlight oral session
/
Tellex, Stefanie
/
Roy, Deb
Proceedings of the 2009 International Conference on Multimodal Interfaces
2009-11-02
p.253-260
Keywords: spatial language, video retrieval
© Copyright 2009 ACM
Summary: Spatial language video retrieval is an important real-world problem that
forms a test bed for evaluating semantic structures for natural language
descriptions of motion on naturalistic data. Video search by natural language
query requires that linguistic input be converted into structures that operate
on video in order to find clips that match a query. This paper describes a
framework for grounding the meaning of spatial prepositions in video. We
present a library of features that can be used to automatically classify a
video clip based on whether it matches a natural language query. To evaluate
these features, we collected a corpus of natural language descriptions about
the motion of people in video clips. We characterize the language used in the
corpus, and use it to train and test models for the meanings of the spatial
prepositions "to," "across," "through," "out," "along," "towards," and
"around." The classifiers can be used to build a spatial language video
retrieval system that finds clips matching queries such as "across the
kitchen."
[10]
Object schemas for responsive robotic language use
Technical papers
/
Hsiao, Kai-yuh
/
Vosoughi, Soroush
/
Tellex, Stefanie
/
Kubat, Rony
/
Roy, Deb
Proceedings of the 3rd ACM/IEEE International Conference on Human Robot
Interaction
2008-03-12
p.233-240
Keywords: affordances, behavior-based, language grounding, object schema, robot
© Copyright 2008 ACM
Summary: The use of natural language should be added to a robot system without
sacrificing responsiveness to the environment. In this paper, we present a
robot that manipulates objects on a tabletop in response to verbal interaction.
Reactivity is maintained by using concurrent interaction processes, such as
visual trackers and collision detection processes. The interaction processes
and their associated data are organized into object schemas, each representing
a physical object in the environment, based on the target of each process. The
object schemas then serve as discrete structures of coordination between
reactivity, planning, and language use, permitting rapid integration of
information from multiple sources.
[11]
Spatial routines for a simulated speech-controlled vehicle
Assistive robotics
/
Tellex, Stefanie
/
Roy, Deb
Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot
Interaction
2006-03-02
p.156-163
Keywords: language grounding, situated language processing, spatial language, spatial
routines, visual routines, wheelchair
© Copyright 2006 ACM
Summary: We have defined a lexicon of words in terms of spatial routines, and used
that lexicon to build a speech controlled vehicle in a simulator. A spatial
routine is a script composed from a set of primitive operations on occupancy
grids, analogous to Ullman's visual routines. The vehicle understands the
meaning of context-dependent natural language commands such as "Go across the
room." When the system receives a command, it combines definitions from the
lexicon according to the parse structure of the command, creating a script that
selects a goal for the vehicle. Spatial routines may provide the basis for
interpreting spatial language in a broad range of physically situated language
understanding systems.
[12]
Probabilistic grounding of situated speech using plan recognition and
reference resolution
Semantics and dialog
/
Gorniak, Peter
/
Roy, Deb
Proceedings of the 2005 International Conference on Multimodal Interfaces
2005-10-04
p.138-143
Keywords: grounding, language, plan recognition, situated, speech, understanding
© Copyright 2005 ACM
Summary: Situated, spontaneous speech may be ambiguous along acoustic, lexical,
grammatical and semantic dimensions. To understand such a seemingly difficult
signal, we propose to model the ambiguity inherent in acoustic signals and in
lexical and grammatical choices using compact, probabilistic representations of
multiple hypotheses. To resolve semantic ambiguities we propose a situation
model that captures aspects of the physical context of an utterance as well as
the speaker's intentions, in our case represented by recognized plans. In a
single, coherent Framework for Understanding Situated Speech (FUSS) we show how
these two influences, acting on an ambiguous representation of the speech
signal, complement each other to disambiguate form and content of situated
speech. This method produces promising results in a game playing environment
and leaves room for other types of situation models.
[13]
Elvis: situated speech and gesture understanding for a robotic chandelier
Multimodal applications
/
Juster, Joshua
/
Roy, Deb
Proceedings of the 2004 International Conference on Multimodal Interfaces
2004-10-13
p.90-96
Keywords: gesture, grounded, input methods, lighting, multimodal, natural interaction,
situated, speech
© Copyright 2004 ACM
Summary: We describe a home lighting robot that uses directional spotlights to create
complex lighting scenes. The robot senses its visual environment using a
panoramic camera and attempts to maintain its target goal state by adjusting
the positions and intensities of its lights. Users can communicate desired
changes in the lighting environment through speech and gesture (e.g., "Make it
brighter over there"). Information obtained from these two modalities are
combined to form a goal, a desired change in the lighting of the scene. This
goal is then incorporated into the system's target goal state. When the target
goal state and the world are out of alignment, the system formulates a
sensorimotor plan that acts on the world to return the system to homeostasis.
[14]
A self-paced approach to hypermedia design for patient education
Hypermedia documentation
/
Roy, Debopriyo
ACM 22nd International Conference on Computer Documentation
2004-10-10
p.27-32
© Copyright 2004 ACM
Summary: Traditional theories on multimedia design have considered the importance of
modality effect to a large extent. The stress on modality effect has often
de-emphasized the importance of what information architecture can do to control
modality effect if information presentation is self-paced instead of system
paced. We have considered a patient education module as our case study. I
propose a conversational interactive patient education module as a solution
which responds to individual reader needs during hypermedia interaction. In
this article, I take an initial step towards this approach, testing patient
education modules with and without narration to support text and static
graphics. Our results suggest that levels of reader comprehension and accuracy
for modules with and without narration have similar performance. Readers have
shown a preference towards using narration, online text and graphics based on
individual task, if the system permits a self-paced interaction. Thus, we argue
that modality effect may be influenced with a self-paced system.
[15]
Augmenting user interfaces with adaptive speech commands
Speech and gaze
/
Gorniak, Peter
/
Roy, Deb
Proceedings of the 2003 International Conference on Multimodal Interfaces
2003-11-05
p.176-179
Keywords: machine learning, phoneme recognition, robust speech interfaces, user
modelling
© Copyright 2003 ACM
Summary: We present a system that augments any unmodified Java application with an
adaptive speech interface. The augmented system learns to associate spoken
words and utterances with interface actions such as button clicks. Speech
learning is constantly active and searches for correlations between what the
user says and does. Training the interface is seamlessly integrated with using
the interface. As the user performs normal actions, she may optionally verbally
describe what she is doing. By using a phoneme recognizer, the interface is
able to quickly learn new speech commands. Speech commands are chosen by the
user and can be recognized robustly due to accurate phonetic modelling of the
user's utterances and the small size of the vocabulary learned for a single
application. After only a few examples, speech commands can replace mouse
clicks. In effect, selected interface functions migrate from keyboard and mouse
to speech. We demonstrate the usefulness of this approach by augmenting jfig, a
drawing application, where speech commands save the user from the distraction
of having to use a tool palette.
[16]
A visually grounded natural language interface for reference to spatial
scenes
Posters
/
Gorniak, Peter
/
Roy, Deb
Proceedings of the 2003 International Conference on Multimodal Interfaces
2003-11-05
p.219-226
Keywords: cognitive modelling, computational semantics, natural language
understanding, vision based semantics
© Copyright 2003 ACM
Summary: Many user interfaces, from graphic design programs to navigation aids in
cars, share a virtual space with the user. Such applications are often ideal
candidates for speech interfaces that allow the user to refer to objects in the
shared space. We present an analysis of how people describe objects in spatial
scenes using natural language. Based on this study, we describe a system that
uses synthetic vision to "see" such scenes from the person's point of view, and
that understands complex natural language descriptions referring to objects in
the scenes. This system is based on a rich notion of semantic compositionality
embedded in a grounded language understanding framework. We describe its
semantic elements, their compositional behaviour, and their grounding through
the synthetic vision system. To conclude, we evaluate the performance of the
system on unconstrained input.
[17]
Towards Visually-Grounded Spoken Language Acquisition
/
Roy, Deb
Proceedings of the 2002 International Conference on Multimodal Interfaces
2002-10-14
p.105
© Copyright 2002 IEEE Computer Society
Summary: A characteristic shared by most approaches to natural language understanding
and generation is the use of symbolic representations of word and sentence
meanings. Frames and semantic nets are examples of symbolic representations.
Symbolic methods are inappropriate for applications which require natural
language semantics to be linked to perception, as is the case in tasks such as
scene description or human-robot interaction. This paper presents two
implemented systems, one that learns to generate, and one that learns to
understand visually-grounded spoken language. These implementations are part of
our ongoing effort to develop a comprehensive model of perceptually-grounded
semantics.
[18]
Medical Device Requirements: A View from Canada
4: MULTIPLE-SESSION SYMPOSIA: Global Challenges in Science, Technology,
Design, and Regulation [Single-Session Symposium]
/
Roy, Denis
Proceedings of the Joint IEA 14th Triennial Congress and Human Factors and
Ergonomics Society 44th Annual Meeting
2000-07-30
v.44
n.4
p.530-532
© Copyright 2000 HFES
Summary: This paper provides an overview of the Canadian Medical Device Regulations
and focuses on those requirements that may impact on human factors and
ergonomic issues. Additionally, the paper provides examples of specific
incidents with medical devices that occurred in Canada where ergonomic and
human factor issues were directly responsible for the problem.
[19]
Perceptual Intelligence: learning gestures and words for individualized,
adaptive interfaces
/
Pentland, A.
/
Roy, D.
/
Wren, C.
Proceedings of the Eighth International Conference on Human-Computer
Interaction
1999-08-22
v.1
p.286-290
© Copyright 1999 Lawrence Erlbaum Associates
[20]
Stimulating Research into Gestural Human Machine Interaction
Round Table
/
Panayi, Marilyn
/
Roy, David M.
/
Richardson, James
GW 1999: Gesture Workshop
1999-03-17
p.317-331
© Copyright 1999 Springer-Verlag
Summary: This is the summary report of the roundtable session held at the end of
Gesture Workshop '99. This first roundtable aimed to act as a forum of
discussion for issues and concerns relating to the achievements, future
development, and potential of the field of gestural and sign-language based
human computer interaction.
[21]
A Phoneme Probability Display for Individuals with Hearing Disabilities
/
Roy, Deb
/
Pentland, Alex
Third Annual ACM SIGACCESS Conference on Assistive Technologies
1998-04-15
p.165-168
© Copyright 1998 ACM
Summary: We are building an aid for individuals with hearing impairments which
converts continuous speech into an animated visual display. A speech analysis
system continuously estimates phoneme probabilities from the input acoustic
stream. Phoneme symbols are displayed graphically with brightness in
proportion to estimated phoneme probabilities. We use an automated layout
algorithm to design the display to group acoustically confusable phonemes
together in the graphical display.
[22]
NewsComm: A Hand-Held Interface for Interactive Access to Structured Audio
PAPERS: News and Mail
/
Roy, Deb K.
/
Schmandt, Chris
Proceedings of ACM CHI 96 Conference on Human Factors in Computing Systems
1996-04-14
v.1
p.173-180
Keywords: Audio interfaces, Hand-held computers, Structured audio
© Copyright 1996 ACM
Summary: The NewsComm system delivers personalized news and other program material as
audio to mobile users through a hand-held playback device. This paper focuses
on the iterative design and user testing of the hand-held interface. The
interface was first designed and tested in a software-only environment and then
ported to a custom hardware platform. The hand-held device enables navigation
through audio recordings based on structural information which is extracted
from the audio using digital signal processing techniques. The interface
design addresses the problems of designing a hand-held and primarily non-visual
interface for accessing large amounts of structured audio recordings.
[23]
Gestural Human-Machine Interaction for People with Severe Speech and Motor
Impairment Due to Cerebral Palsy
SHORT PAPERS: Enhancing Interaction
/
Roy, David M.
/
Panayi, Marilyn
/
Erenshteyn, Roman
/
Foulds, Richard
/
Fawcus, Robert
Proceedings of ACM CHI'94 Conference on Human Factors in Computing Systems
1994-04-24
v.2
p.313-314
Keywords: Gesture recognition, Disability, Cerebral palsy, Performance art,
Electromyogram, EMG, Artificial neural networks
© Copyright 1994 Association for Computing Machinery
Summary: The objective of the research is to develop a new method of human-machine
interaction that reflects and harnesses the abilities of people with severe
speech and motor impairment due to cerebral palsy (SSMICP). Human-human
interaction within the framework of drama and mime was used to elicit 120
gestures from twelve students with SSMICP. 27 dynamic arm gestures were
monitored using biomechanical and bioelectric sensors. Neural networks are
being used to analyze the data and to realize the gestural human-machine
interface. Preliminary results show that two visually similar gestures can be
differentiated by neural networks.