HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,346,990
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Sarma_A* Results: 19 Sorted by: Date  Comments?
Help Dates
Limit:   
Foraging Among an Overabundance of Similar Variants End-User Programming / Ragavan, Sruti Srinivasa / Kuttal, Sandeep Kaur / Hill, Charles / Sarma, Anita / Piorkowski, David / Burnett, Margaret Proceedings of the ACM CHI'16 Conference on Human Factors in Computing Systems 2016-05-07 v.1 p.3509-3521
ACM Digital Library Link
Summary: Foraging among too many variants of the same artifact can be problematic when many of these variants are similar. This situation, which is largely overlooked in the literature, is commonplace in several types of creative tasks, one of which is exploratory programming. In this paper, we investigate how novice programmers forage through similar variants. Based on our results, we propose a refinement to Information Foraging Theory (IFT) to include constructs about variation foraging behavior, and propose refinements to computational models of IFT to better account for foraging among variants.

Perceptions of answer quality in an online technical question and answer forum Short Papers / Hart, Kerry / Sarma, Anita Proceedings of the 2014 International Workshop on Cooperative and Human Aspects of Software Engineering 2014-06-02 p.103-106
ACM Digital Library Link
Summary: Software developers are used to seeking information from authoritative texts, such as a technical manuals, or from experts with whom they are familiar. Increasingly, developers seek information in online question and answer forums, where the quality of the information is variable. To a novice, it may be challenging to filter good information from bad. Stack Overflow is a Q&A forum that introduces a social reputation element: users rate the quality of post-ed answers, and answerers can accrue points and rewards for writing answers that are rated highly by their peers. A user that consistently authors good answers will develop a good 'reputation' as recorded by these points. While this system was designed with the intent to incentivize high-quality answers, it has been suggested that information seekers -- and particularly technical novices -- may rely on the social reputation of the answerer as a proxy for answer quality. In this paper, we investigate the role that this social factor -- as well as other answer characteristics -- plays in the information filtering process of technical novices in the context of Stack Over-flow. The results of our survey conducted on Amazon.com's Mechanical Turk indicate that technical novices assess information quality based on the intrinsic qualities of the answer, such as presentation and content, suggesting that novices are wary to rely on social cues in the Q&A context.

What makes an image popular? Content quality & popularity / Khosla, Aditya / Sarma, Atish Das / Hamid, Raffay Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.1 p.867-876
ACM Digital Library Link
Summary: Hundreds of thousands of photographs are uploaded to the internet every minute through various social networking and photo sharing platforms. While some images get millions of views, others are completely ignored. Even from the same users, different photographs receive different number of views. This begs the question: What makes a photograph popular? Can we predict the number of views a photograph will receive even before it is uploaded? These are some of the questions we address in this work. We investigate two key components of an image that affect its popularity, namely the image content and social context. Using a dataset of about 2.3 million images from Flickr, we demonstrate that we can reliably predict the normalized view count of images with a rank correlation of 0.81 using both image content and social cues. In this paper, we show the importance of image cues such as color, gradients, deep learning features and the set of objects present, as well as the importance of various social cues such as number of friends or number of photos uploaded that lead to high or low popularity of images.

E-commerce product search: personalization, diversification, and beyond WWW 2014 tutorials / Sarma, Atish Das / Parikh, Nish / Sundaresan, Neel Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.189-190
ACM Digital Library Link
Summary: The focus of this tutorial will is e-commerce product search. Several challenges appear in this context, both from a research standpoint as well as an application standpoint. We present various approaches adopted in the industry, review well-known research techniques developed over the last decade, draw parallels to traditional web search highlighting the new challenges in this setting, and dig deep into some of the algorithmic and technical approaches developed. A specific approach that advances theoretical techniques and illustrates practical impact considered here is of identifying most suited results quickly from a large database. Settings span cold start users and advanced users for whom personalization is possible. In this context, top-$k$ and skylines are discussed as they form a key approach that spans the web, data mining, and database communities. These present powerful tools for search across multi-dimensional items with clear preferences within each attribute, like product search as opposed to regular web search.

The "expression gap": do you like what you share? WWW 2014 posters / Sarma, Atish Das / Si, Si / Churchill, Elizabeth F. / Sundaresan, Neel Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.247-248
ACM Digital Library Link
Summary: While recommendation profiles increasingly leverage social actions such as "shares", the predictive significance of such actions is unclear. To what extent do public shares correlate with other online behaviors such as searches, views and purchases? Based on an analysis of 950,000 users' behavioral, transactional, and social sharing data on a global online commerce platform, we show that social "shares", or publicly posted expressions of interest do not correlate with non-public behaviors such as views and purchases. A key takeaway is that there is a "gap" between public and non-public actions online, suggesting that marketers and advertisers need to be cautious in their estimation of the significance of social sharing.

Beyond modeling private actions: predicting social shares WWW 2014 posters / Si, Si / Sarma, Atish Das / Churchill, Elizabeth F. / Sundaresan, Neel Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.377-378
ACM Digital Library Link
Summary: We study the problem of predicting sharing behavior from e-commerce sites to friends on social networks via share widgets. The contextual variation in an action that is private (like rating a movie on Netflix), to one shared with friends online (like sharing an item on Facebook), to one that is completely public (like commenting on a YouTube video) introduces behavioral differences that pose interesting challenges. In this paper, we show that users' interests manifest in actions that spill across different types of channels such as sharing, browsing, and purchasing. This motivates leveraging all such signals available from the e-commerce platform. We show that carefully incorporating signals from these interactions significantly improves share prediction accuracy.

On the benefits of providing versioning support for end users: An empirical study / Kuttal, Sandeep K. / Sarma, Anita / Rothermel, Gregg ACM Transactions on Computer-Human Interaction 2014-02 v.21 n.2 p.9
ACM Digital Library Link
Summary: End users with little formal programming background are creating software in many different forms, including spreadsheets, web macros, and web mashups. Web mashups are particularly popular because they are relatively easy to create, and because many programming environments that support their creation are available. These programming environments, however, provide no support for tracking versions or provenance of mashups. We believe that versioning support can help end users create, understand, and debug mashups. To investigate this belief, we have added versioning support to a popular wire-oriented mashup environment, Yahoo! Pipes. Our enhanced environment, which we call "Pipes Plumber," automatically retains versions of pipes and provides an interface with which pipe programmers can browse histories of pipes and retrieve specific versions. We have conducted two studies of this environment: an exploratory study and a larger controlled experiment. Our results provide evidence that versioning helps pipe programmers create and debug mashups. Subsequent qualitative results provide further insights into the barriers faced by pipe programmers, the support for reuse provided by our approach, and the support for debugging provided.

Optimal hashing schemes for entity matching Research papers / Dalvi, Nilesh / Rastogi, Vibhor / Dasgupta, Anirban / Sarma, Anish Das / Sarlos, Tamas Proceedings of the 2013 International Conference on the World Wide Web 2013-05-13 v.1 p.295-306
ACM Digital Library Link
Summary: In this paper, we consider the problem of devising blocking schemes for entity matching. There is a lot of work on blocking techniques for supporting various kinds of predicates, e.g. exact matches, fuzzy string-similarity matches, and spatial matches. However, given a complex entity matching function in the form of a Boolean expression over several such predicates, we show that it is an important and non-trivial problem to combine the individual blocking techniques into an efficient blocking scheme for the entity matching function, a problem that has not been studied previously.
    In this paper, we make fundamental contributions to this problem. We consider an abstraction for modeling complex entity matching functions as well as blocking schemes. We present several results of theoretical and practical interest for the problem. We show that in general, the problem of computing the optimal blocking strategy is NP-hard in the size of the DNF formula describing the matching function. We also present several algorithms for computing the exact optimal strategies (with exponential complexity, but often feasible in practice) as well as fast approximation algorithms. We experimentally demonstrate over commercially used rule-based matching systems over real datasets at Yahoo!, as well as synthetic datasets, that our blocking strategies can be an order of magnitude faster than the baseline methods, and our algorithms can efficiently find good blocking strategies.

Debugging support for end user mashup programming Papers: novel programming / Kuttal, Sandeep Kaur / Sarma, Anita / Rothermel, Gregg Proceedings of ACM CHI 2013 Conference on Human Factors in Computing Systems 2013-04-27 v.1 p.1609-1618
ACM Digital Library Link
Summary: Programming for the web can be an intimidating task, particularly for non-professional ("end-user") programmers. Mashup programming environments attempt to remedy this by providing support for such programming. It is well known, however, that mashup programmers create applications that contain bugs. Furthermore, mashup programmers learn from examples and reuse other mashups, which causes bugs to propagate to other mashups. In this paper we classify the bugs that occur in a large corpus of Yahoo! Pipes mashups. We describe support we have implemented in the Yahoo! Pipes environment to provide automatic error detection techniques that help mashup programmers localize and correct these bugs. We present the results of a think-aloud study comparing the experiences of end-user mashup programmers using and not using our support. Our results show that our debugging enhancements do help these programmers localize and correct bugs more effectively and efficiently.

Dynamic covering for recommendation systems KM track: recommender systems / Antonellis, Ioannis / Sarma, Anish Das / Dughmi, Shaddin Proceedings of the 2012 ACM Conference on Information and Knowledge Management 2012-10-29 p.26-34
ACM Digital Library Link
Summary: In this paper, we identify a fundamental algorithmic problem that we term succinct dynamic covering (SDC), arising in many modern-day web applications, including ad-serving and online recommendation systems such as in eBay, Netflix, and Amazon. Roughly speaking, SDC applies two restrictions to the well-studied Max-Coverage problem [14]: Given an integer k, X={1,2,...,n} and I={S_1,...,S_m}, S_i subseteq X, find |J| subseteq I, such that |J| < k and (union_S_in_J S) is as large as possible. The two restrictions applied by SDC are: (1) Dynamic: At query-time, we are given a query Q subseteq X, and our goal is to find J such that Q bigcap (union_S_J S) is as large as possible; Space-constrained: We don't have enough space to store (and process) the entire input; specifically, we have o(mn), and maybe as little as O((m+n)polylog(mn)) space. A solution to SDC maintains a small data structure, and uses this data structure to answer most dynamic queries with high accuracy. We call such a scheme a Coverage Oracle.
    We present algorithms and complexity results for coverage oracles. We present deterministic and probabilistic near-tight upper and lower bounds on the approximation ratio of SDC as a function of the amount of space available to the oracle. Our lower bound results show that to obtain constant-factor approximations we need Omega(mn) space. Fortunately, our upper bounds present an explicit tradeoff between space and approximation ratio, allowing us to determine the amount of space needed to guarantee certain accuracy.

An automatic blocking mechanism for large-scale de-duplication tasks DB track: web data management / Sarma, Anish Das / Jain, Ankur / Machanavajjhala, Ashwin / Bohannon, Philip Proceedings of the 2012 ACM Conference on Information and Knowledge Management 2012-10-29 p.1055-1064
ACM Digital Library Link
Summary: De-duplication -- identification of distinct records referring to the same real-world entity -- is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, blocking has been identified as a technique of dividing the dataset for pairwise comparisons, thereby trading off recall of identified duplicates for efficiency. Traditional de-duplication tasks, while challenging, typically involved a fixed schema such as Census data or medical records. However, with the presence of large, diverse sets of structured data on the web and the need to organize it effectively on content portals, de-duplication systems need to scale in a new dimension to handle a large number of schemas, tasks and data sets, while handling ever larger problem sizes. In addition, when working in a map-reduce framework it is important that canopy formation be implemented as a hash function, making the canopy design problem more challenging. We present CBLOCK, a system that addresses these challenges.
    CBLOCK learns hash functions automatically from attribute domains and a labeled dataset consisting of duplicates. Subsequently, CBLOCK expresses blocking functions using a hierarchical tree structure composed of atomic hash functions. The application may guide the automated blocking process based on architectural constraints, such as by specifying a maximum size of each block (based on memory requirements), impose disjointness of blocks (in a grid environment), or specify a particular objective function trading off recall for efficiency. As a post-processing step to automatically generated blocks, CBLOCK rolls-up smaller blocks to increase recall. We present experimental results on two large-scale de-duplication datasets from a commercial search engine -- consisting of over 140K movies and 40K restaurants respectively -- and demonstrate the utility of CBLOCK.

Your two weeks of fame and your grandmother's Web mining / Cook, James / Sarma, Atish Das / Fabrikant, Alex / Tomkins, Andrew Proceedings of the 2012 International Conference on the World Wide Web 2012-04-16 v.1 p.919-928
ACM Digital Library Link
Summary: Did celebrity last longer in 1929, 1992 or 2009? We investigate the phenomenon of fame by mining a collection of news articles that spans the twentieth century, and also perform a side study on a collection of blog posts from the last 10 years. By analyzing mentions of personal names, we measure each person's time in the spotlight, and watch the distribution change from a century ago to a year ago. We expected to find a trend of decreasing durations of fame as news cycles accelerated and attention spans became shorter. Instead, we find a remarkable consistency through most of the period we study. Through a century of rapid technological and societal change, through the appearance of Twitter, communication satellites and the Internet, we do not observe a significant change in typical duration of celebrity. We also study the most famous of the famous, and find different results depending on our method for measuring duration of fame. With a method that may be thought of as measuring a spike of attention around a single narrow news story, we see the same result as before: stories last as long now as they did in 1930. A second method, which may be thought of as measuring the duration of public interest in a person, indicates that famous people's presence in the news is becoming longer rather than shorter, an effect most likely driven by the wider distribution and higher volume of media in modern times. Similar studies have been done with much shorter timescales specifically in the context of information spreading on Twitter and similar social networking site. However, to the best of our knowledge, this is the first massive scale study of this nature that spans over a century of archived data, thereby allowing us to track changes across decades.

EDITED BOOK Search Computing: Broadening Web Search Lecture Notes in Computer Science 7538 / Ceri, Stefano / Brambilla, Marco 2012 n.16 p.254 Springer Berlin Heidelberg
DOI: 10.1007/978-3-642-34213-4
ISBN: 978-3-642-34212-7 (print), 978-3-642-34213-4 (online)
Link to Digital Content at Springer
== Extraction and Integration ==
Web Data Reconciliation: Models and Experiences (1-15)
	+ Blanco, Lorenzo
	+ Crescenzi, Valter
	+ Merialdo, Paolo
	+ Papotti, Paolo
A Domain Independent Framework for Extracting Linked Semantic Data from Tables (16-33)
	+ Mulwad, Varish
	+ Finin, Tim
	+ Joshi, Anupam
Knowledge Extraction from Structured Sources (34-52)
	+ Unbehauen, Jörg
	+ Hellmann, Sebastian
	+ Auer, Sören
	+ Stadler, Claus
Extracting Information from Google Fusion Tables (53-67)
	+ Brambilla, Marco
	+ Ceri, Stefano
	+ Cinefra, Nicola
	+ Sarma, Anish Das
	+ Forghieri, Fabio
	+ et al
Materialization of Web Data Sources (68-81)
	+ Bozzon, Alessandro
	+ Ceri, Stefano
	+ Zagorac, Srdan
== Query and Visualization Paradigms ==
Natural Language Interfaces to Data Services (82-97)
	+ Guerrisi, Vincenzo
	+ Torre, Pietro La
	+ Quarteroni, Silvia
Mobile Multi-domain Search over Structured Web Data (98-110)
	+ Aral, Atakan
	+ Akin, Ilker Zafer
	+ Brambilla, Marco
Clustering and Labeling of Multi-dimensional Mixed Structured Data (111-126)
	+ Brambilla, Marco
	+ Zanoni, Massimiliano
Visualizing Search Results: Engineering Visual Patterns Development for the Web (127-142)
	+ Morales-Chaparro, Rober
	+ Preciado, Juan Carlos
	+ Sánchez-Figueroa, Fernando
== Exploring Linked Data ==
Extending SPARQL Algebra to Support Efficient Evaluation of Top-K SPARQL Queries (143-156)
	+ Bozzon, Alessandro
	+ Valle, Emanuele Della
	+ Magliacane, Sara
Thematic Clustering and Exploration of Linked Data (157-175)
	+ Castano, Silvana
	+ Ferrara, Alfio
	+ Montanelli, Stefano
Support for Reusable Explorations of Linked Data in the Semantic Web (176-190)
	+ Cohen, Marcelo
	+ Schwabe, Daniel
== Games, Social Search and Economics ==
A Survey on Proximity Measures for Social Networks (191-206)
	+ Cohen, Sara
	+ Kimelfeld, Benny
	+ Koutrika, Georgia
Extending Search to Crowds: A Model-Driven Approach (207-222)
	+ Bozzon, Alessandro
	+ Brambilla, Marco
	+ Ceri, Stefano
	+ Mauri, Andrea
BetterRelations: Collecting Association Strengths for Linked Data Triples with a Game (223-239)
	+ Hees, Jörn
	+ Roth-Berghofer, Thomas
	+ Biedert, Ralf
	+ Adrian, Benjamin
	+ Dengel, Andreas
An Incentive-Compatible Revenue-Sharing Mechanism for the Economic Sustainability of Multi-domain Search Based on Advertising (240-254)
	+ Brambilla, Marco
	+ Ceppi, Sofia
	+ Gatti, Nicola
	+ Gerding, Enrico H.

Building a generic debugger for information extraction pipelines Poster session: knowledge management / Sarma, Anish Das / Jain, Alpa / Bohannon, Philip Proceedings of the 2011 ACM Conference on Information and Knowledge Management 2011-10-24 p.2229-2232
ACM Digital Library Link
Summary: Complex information extraction (IE) pipelines are becoming an integral component of most text processing frameworks. We introduce a first system to help IE users analyze extraction pipeline semantics and operator transformations interactively while debugging. This allows the effort to be proportional to the need, and to focus on the portions of the pipeline under the greatest suspicion. We present a generic debugger for running post-execution analysis of any IE pipeline consisting of arbitrary types of operators. For this, we propose an effective provenance model for IE pipelines which captures a variety of operator types, ranging from those for which full to no specifications are available. We have evaluated our proposed algorithms and provenance model on large-scale real-world extraction pipelines.

STCML: an extensible XML-based language for socio-technical modeling Short papers / Georgas, John C. / Sarma, Anita Proceedings of the 2011 International Workshop on Cooperative and Human Aspects of Software Engineering 2011-05-21 p.61-64
ACM Digital Library Link
Summary: Understanding the complex dependencies between the technical artifacts of software engineering and the social processes involved in their development has the potential to improve the processes we use to engineer software as well as the eventual quality of the systems we produce. A foundational capability in grounding this study of socio-technical concerns is the ability to explicitly model technical and social artifacts as well as the dependencies between them. This paper presents the STCML language, intended to support the modeling of core socio-technical aspects in software development in a highly extensible fashion. We present the basic structure of the language, discuss important language design principles, and offer an example of its application.

Which bug should i fix: helping new developers onboard a new project Short papers / Wang, Jianguo / Sarma, Anita Proceedings of the 2011 International Workshop on Cooperative and Human Aspects of Software Engineering 2011-05-21 p.76-79
ACM Digital Library Link
Summary: A typical entry point for new developers in an open source project is to contribute a bug fix. However, finding an appropriate bug and an appropriate fix for that bug requires a good understanding of the project, which is nontrivial. Here, we extend Tesseract -- an interactive project exploration environment -- to allow new developers to search over bug descriptions in a project to quickly identify and explore bugs of interest and their related resources. More specifically, we extended Tesseract with search capabilities that enable synonyms and similar-bugs search over bug descriptions in a bug repository. The goal is to enable users to identify bugs of interest, resources related to that bug, (e.g., related files, contributing developers, communication records), and visually explore the appropriate socio-technical dependencies for the selected bug in an interactive manner. Here we present our search extension to Tesseract.

Coordination in innovative design and engineering: observations from a lunar robotics project Designing for collaboration II / Dabbish, Laura A. / Wagstrom, Patrick / Sarma, Anita / Herbsleb, James D. GROUP'10: International Conference on Supporting Group Work 2010-11-06 p.225-234
ACM Digital Library Link
Summary: Coordinating activities across groups in systems engineering or product development projects is critical to project success, but substantially more difficult when the work is innovative and dynamic. It is not clear how technology should best support cross-group collaboration on these types of projects. Recent work on coordination in dynamic settings has identified cross-boundary knowledge exchange as a critical mechanism for aligning activities. In order to inform the design of collaboration technology for creative work settings, we examined the nature of cross-group knowledge exchange in an innovative engineering research project developing a lunar rover robot as part of the Google Lunar X-Prize competition. Our study extends the understanding of communication and coordination in creative design work, and contributes to theory on coordination. We introduce four types of cross-team knowledge exchange mechanisms we observed on this project and discuss challenges associated with each. We consider implications for the design of collaboration technology to support cross-team knowledge exchange in dynamic, creative work environments.

Continuous coordination within the context of cooperative and human aspects of software engineering / Al-Ani, Ban / Trainer, Erik / Ripley, Roger / Sarma, Anita / van der Hoek, André / Redmiles, David Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering 2008-05-13 p.1-4
ACM Digital Library Link
Summary: We have developed software tools that aim to support the cooperative software engineering tasks and promote an awareness of social dependencies that is essential to successful coordination. The tools share common characteristics that can be traced back to the principles of the Continuous Coordination (CC) paradigm. However, the development of each sprung from carrying out a different set of activities during its development process. In this paper, we outline the principles of the CC paradigm, the tools that implement these principles and focus on the social aspects of software engineering. Finally, we discuss the socio-technical and human-centered processes we adopted to develop these tools. Our conclusion is that the cooperative dimension of our tools represents the cooperation between researchers, subjects, and field sites. Our conclusion suggests that the development processes adopted to develop like-tools need to reflect this cooperative dimension.

Detecting near-duplicates for web crawling Similarity search / Manku, Gurmeet Singh / Jain, Arvind / Sarma, Anish Das Proceedings of the 2007 International Conference on the World Wide Web 2007-05-08 p.141-150
ACM Digital Library Link
Summary: Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the course of developing a near-duplicate detection system for a multi-billion page repository, we make two research contributions. First, we demonstrate that Charikar's fingerprinting technique is appropriate for this goal. Second, we present an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k. Our technique is useful for both online queries (single fingerprints) and all batch queries (multiple fingerprints). Experimental evaluation over real data confirms the practicality of our design.