HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,646,468
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Gupta_M* Results: 23 Sorted by: Date  Comments?
Help Dates
Limit:   
[1] CricketLinking: Linking Event Mentions from Cricket Match Reports to Ball Entities in Commentaries Demonstrations / Gupta, Manish Proceedings of the 2015 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-09 p.1033-1034
ACM Digital Library Link
Summary: The 2011 Cricket World Cup final match was watched by around 135 million people. Such a huge viewership demands a great experience for users of online cricket portals. Many portals like espncricinfo.com host a variety of content related to recent matches including match reports and ball-by-ball commentaries. When reading a match report, reader experience can be significantly improved by augmenting (on demand) the event mentions in the report with detailed commentaries. We build an event linking system CricketLinking which first identifies event mentions from the reports and then links them to a set of balls. Finding linkable mentions is challenging because unlike entity linking problem settings, we do not have a concrete set of event entities to link to. Further, depending on the event type, event mentions could be linked to a single ball, or to a set of balls. Hence, identifying mention type as well as linking becomes challenging. We use a large number of domain specific features to learn classifiers for mention and mention type detection. Further, we leverage structured match, context similarity and sequential proximity to perform accurate linking. Finally, context based summarization is performed to provide a concise briefing of linked balls to each mention.

[2] Information Retrieval with Verbose Queries Tutorials / Gupta, Manish / Bendersky, Michael Proceedings of the 2015 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-09 p.1121-1124
ACM Digital Library Link
Summary: Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like Facebook's Graph Search or Google's Knowledge Graph. However the performance of textbook information retrieval techniques for such verbose queries is not as good as that for their shorter counterparts. Thus, effective handling of verbose queries has become a critical factor for adoption of information retrieval techniques in this new breed of search applications. Over the past decade, the information retrieval community has deeply explored the problem of transforming natural language verbose queries using operations like reduction, weighting, expansion, reformulation and segmentation into more effective structural representations. However, thus far, there was not a coherent and organized tutorial on this topic. In this tutorial, we aim to put together various research pieces of the puzzle, provide a comprehensive and structured overview of various proposed methods, and also list various application scenarios where effective verbose query processing can make a significant difference.

[3] Characterizing Credit Card Black Markets on the Web WebQuality 2015 / Bulakh, Vlad / Gupta, Minaxi Companion Proceedings of the 2015 International Conference on the World Wide Web 2015-05-18 v.2 p.1435-1440
ACM Digital Library Link
Summary: We study carding shops that sell stolen credit and debit card information online. By bypassing the anti-scrapping mechanisms they use, we find that the prices of cards depend heavily on factors such as the issuing bank, country of origin, and whether the card can be used in brick-and-mortar stores or not. Almost 70% of cards sold by these outfits are priced at or below the cost banks incur in re-issuing them. Ironically, this makes buying their own cards more economical for the banks than re-issuing. We also find that the monthly revenues for the carding shops we study are high enough to justify the risk fraudsters take. Further, inventory at carding outfits seems to follow data breaches and the impact of delayed deployment of the smart chip technology is evident in the disproportionate share the U.S. commands in the underground card fraud economy.

[4] Ballet hero: building a garment for memetic embodiment in dance learning Design exhibition / Hallam, James / McKenna, Alison / Keen, Emily / Gupta, Mudit / Lee, Christa Adjunct Proceedings of the 2014 International Symposium on Wearable Computers 2014-09-13 v.2 p.49-54
ACM Digital Library Link
Summary: This paper describes the analysis and design of a wearable technology garment intended to aid with the instruction of ballet technique to adult beginners. A phenomenological framework is developed and used to assess physiological training tools. Following this, a garment is developed that incorporates visual feedback inspired by animation techniques that more directly convey the essential movements of ballet. The garment design is presented, and a discussion is provided on the challenges of constructing an e-textile garment using contemporary materials and techniques.

[5] Modeling the evolution of product entities Poster session (short papers) / Radhakrishnan, Priya / Gupta, Manish / Varma, Vasudeva Proceedings of the 2014 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2014-07-06 p.923-926
ACM Digital Library Link
Summary: A large number of web queries are related to product entities. Studying evolution of product entities can help analysts understand the change in particular attribute values for these products. However, studying the evolution of a product requires us to be able to link various versions of a product together in a temporal order. While it is easy to temporally link recent versions of products in a few domains manually, solving the problem in general is challenging. The ability to temporally order and link various versions of a single product can also improve product search engines. In this paper, we tackle the problem of finding the previous version (predecessor) of a product entity. Given a repository of product entities, we first parse the product names using a CRF model. After identifying entities corresponding to a single product, we solve the problem of finding the previous version of any given particular version of the product. For the second task, we leverage innovative features with a Naïve Bayes classifier. Our methods achieve a precision of 88% in identifying the product version from product entity names, and a precision of 53% in identifying the predecessor.

[6] CharBoxes: a system for automatic discovery of character infoboxes from books Demo session / Gupta, Manish / Bansal, Piyush / Varma, Vasudeva Proceedings of the 2014 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2014-07-06 p.1255-1256
ACM Digital Library Link
Summary: Entities are centric to a large number of real world applications. Wikipedia shows entity infoboxes for a large number of entities. However, not much structured information is available about character entities in books. Automatic discovery of characters from books can help in effective summarization. Such a structured summary which not just introduces characters in the book but also provides a high level relationship between them can be of critical importance for buyers. This task involves the following challenging novel problems: 1. automatic discovery of important characters given a book; 2. automatic social graph construction relating the discovered characters; 3. automatic summarization of text most related to each of the characters; and 4. automatic infobox extraction from such summarized text for each character. As part of this demo, we design mechanisms to address these challenges and experiment with publicly available books.

[7] EDIUM: Improving Entity Disambiguation via User Modeling Short Paper Session 1 / Bansal, Romil / Panem, Sandeep / Gupta, Manish / Varma, Vasudeva Proceedings of ECIR'14, the 2014 European Conference on Information Retrieval 2014-04-13 p.418-423
Keywords: Entity Disambiguation; Knowledge Graph; User Modeling
Link to Digital Content at Springer
Summary: Entity Disambiguation is the task of associating entity name mentions in text to the correct referent entities in the knowledge base, with the goal of understanding and extracting useful information from the document. Entity disambiguation is a critical component of systems designed to harness information shared by users on microblogging sites like Twitter. However, noise and lack of context in tweets makes disambiguation a difficult task. In this paper, we describe an Entity Disambiguation system, EDIUM, which uses User interest Models to disambiguate the entities in the user's tweets. Our system jointly models the user's interest scores and the context disambiguation scores, thus compensating the sparse context in the tweets for a given user. We evaluated the system's entity linking capabilities on tweets from multiple users and showed that improvement can be achieved by combining the user models and the context based models.

[8] Entity Tracking in Real-Time Using Sub-topic Detection on Twitter Short Paper Session 1 / Panem, Sandeep / Bansal, Romil / Gupta, Manish / Varma, Vasudeva Proceedings of ECIR'14, the 2014 European Conference on Information Retrieval 2014-04-13 p.528-533
Keywords: Sub-Topic Detection; Clustering; Entity Tracking; Text Mining
Link to Digital Content at Springer
Summary: The velocity, volume and variety with which Twitter generates text is increasing exponentially. It is critical to determine latent sub-topics from such tweet data at any given point of time for providing better topic-wise search results relevant to users' informational needs. The two main challenges in mining sub-topics from tweets in real-time are (1) understanding the semantic and the conceptual representation of the tweets, and (2) the ability to determine when a new sub-topic (or cluster) appears in the tweet stream. We address these challenges by proposing two unsupervised clustering approaches. In the first approach, we generate a semantic space representation for each tweet by keyword expansion and keyphrase identification. In the second approach, we transform each tweet into a conceptual space that represents the latent concepts of the tweet. We empirically show that the proposed methods outperform the state-of-the-art methods.

[9] Towards a social media analytics platform: event detection and user profiling for Twitter WWW 2014 tutorials / Gupta, Manish / Li, Rui / Chang, Kevin Chen-Chuan Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.193-194
ACM Digital Library Link
Summary: Microblog data differs significantly from the traditional text data with respect to a variety of dimensions. Microblog data contains short documents, SMS kind of language, and is full of code mixing. Though a lot of it is mere social babble, it also contains fresh news coming from human sensors at a humungous rate. Given such interesting characteristics, the world wide web community has witnessed a large number of research tasks for microblogging platforms recently. Event detection on Twitter is one of the most popular such tasks with a large number of applications. The proposed tutorial on social analytics for Twitter will contain three parts. In the first part, we will discuss research efforts towards detection of events from Twitter using both the tweet content as well as other external sources. We will also discuss various applications for which event detection mechanisms have been put to use. Merely detecting events is not enough. Applications require that the detector must be able to provide a good description of the event as well. In the second part, we will focus on describing events using the best phrase, event type, event timespan, and credibility. In the third part, we will discuss user profiling for Twitter with a special focus on user location prediction. We will conclude with a summary and thoughts on future directions.

[10] Cross market modeling for query-entity matching WWW 2014 posters / Gupta, Manish / Borole, Prashant / Hebbar, Praful / Mehta, Rupesh / Nayak, Niranjan Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.285-286
ACM Digital Library Link
Summary: Given a query, the query-entity (QE) matching task involves identifying the best matching entity for the query. When modeling this task as a binary classification problem, two issues arise: (1) features in specific global markets (like de-at: German users in Austria) are quite sparse compared to other markets like en-us, and (2) training data is expensive to obtain in multiple markets and hence limited. Can we leverage some form of cross market data/features for effective query-entity matching in sparse markets? Our solution consists of three main modules: (1) Cross Market Training Data Leverage (CMTDL) (2) Cross Market Feature Leverage (CMFL), and (3) Cross Market Output Data Leverage (CMODL). Each of these parts perform "signal" sharing at different points during the classification process. Using a combination of these strategies, we show significant improvements in query-impression weighted coverage for the query-entity matching task.

[11] Identifying fraudulently promoted online videos WebQuality 2014 workshop / Bulakh, Vlad / Dunn, Christopher W. / Gupta, Minaxi Companion Proceedings of the 2014 International Conference on the World Wide Web 2014-04-07 v.2 p.1111-1116
ACM Digital Library Link
Summary: Fraudulent product promotion online, including online videos, is on the rise. In order to understand and defend against this ill, we engage in the fraudulent video economy for a popular video sharing website, YouTube, and collect a sample of over 3,300 fraudulently promoted videos and 500 bot profiles that promote them. We then characterize fraudulent videos and profiles and train supervised machine learning classifiers that can successfully differentiate fraudulent videos and profiles from legitimate ones.

[12] Modeling click and relevance relationship for sponsored search Posters: internet monetization and incentives / Zhang, Wei Vivian / Chen, Ye / Gupta, Mitali / Sett, Swaraj / Yan, Tak W. Companion Proceedings of the 2013 International Conference on the World Wide Web 2013-05-13 v.2 p.119-120
ACM Digital Library Link
Summary: Click-through rate (CTR) prediction and relevance ranking are two fundamental problems in web advertising. In this study, we address the problem of modeling the relationship between CTR and relevance for sponsored search. We used normalized relevance scores comparable across all queries to represent relevance when modeling with CTR, instead of directly using human judgment labels or relevance scores valid only within same query. We classified clicks by identifying their relevance quality using dwell time and session information, and compared all clicks versus selective clicks effects when modeling relevance.
    Our results showed that the cleaned click signal outperforms raw click signal and others we explored, in terms of relevance score fitting. The cleaned clicks include clicks with dwell time greater than 5 seconds and last clicks in session. Besides traditional thoughts that there is no linear relation between click and relevance, we showed that the cleaned click based CTR can be fitted well with the normalized relevance scores using a quadratic regression model. This relevance-click model could help to train ranking models using processed click feedback to complement expensive human editorial relevance labels, or better leverage relevance signals in CTR prediction.

[13] Fast query evaluation for ad retrieval Poster presentations / Chen, Ye / Gupta, Mitali / Yan, Tak W. Proceedings of the 2012 International Conference on the World Wide Web 2012-04-16 v.2 p.479-480
ACM Digital Library Link
Summary: We describe a fast query evaluation method for ad document retrieval in online advertising, based upon the classic WAND algorithm. The key idea is to localize per-topic term upper bounds into homogeneous ad groups. Our approach is not only theoretically motivated by a topical mixture model; but empirically justified by the characteristics of the ad domain, that is, short and semantically focused documents with natural hierarchy. We report experimental results using artificial and real-world query-ad retrieval data, and show that the tighter-bound WAND outperforms the traditional approach by 35.4% reduction in number of full evaluations.

[14] Trust analysis with clustering Poster session / Gupta, Manish / Sun, Yizhou / Han, Jiawei Proceedings of the 2011 International Conference on the World Wide Web 2011-03-28 v.2 p.53-54
ACM Digital Library Link
Summary: Web provides rich information about a variety of objects. Trustability is a major concern on the web. Truth establishment is an important task so as to provide the right information to the user from the most trustworthy source. Trustworthiness of information provider and the confidence of the facts it provides are inter-dependent on each other and hence can be expressed iteratively in terms of each other. However, a single information provider may not be the most trustworthy for all kinds of information. Every information provider has its own area of competence where it can perform better than others. We derive a model that can evaluate trustability on objects and information providers based on clusters (groups). We propose a method which groups the set of objects for which similar set of providers provide "good" facts, and provides better accuracy in addition to high quality object clusters.

[15] Connecting the next billion web users Panel session / Rastogi, Rajeev / Cutrell, Ed / Gupta, Manish / Jhunjhunwala, Ashok / Narayan, Ramkumar / Sanghal, Rajeev Proceedings of the 2011 International Conference on the World Wide Web 2011-03-28 v.2 p.329-330
ACM Digital Library Link
Summary: With 2 billion users, the World Wide Web has indeed come a long way. However, of the 4.8 billion people living in Asia and Africa, only 1 in 5 has access to the Web. For instance, in India, the 100 million Web users constitute less than 10% of the total population of 1.2 billion. So it is universally accepted that the next billion users will come from emerging markets like Brazil, China, India, Indonesia and Russia. Emerging markets have a number of unique characteristics: Large dense populations with low incomes, Lack of infrastructure in terms of broadband, electricity, etc., Poor PC penetration due to limited affordability, High illiteracy rates and inability to read/write, Plethora of local languages and dialects, General paucity of local content, especially in local languages, Explosive growth in the number of mobile phones. The panel will debate the various technical challenges in overcoming the digital divide, and potential approaches to bring the Web to the underserved populations of the developing world.

[16] Spoken Web: a mobile cloud based parallel web for the masses Keynote / Gupta, Manish Proceedings of the 2011 International Cross-Disciplinary Conference on Web Accessibility (W4A) 2011-03-28 v.2 p.1
ACM Digital Library Link
Summary: In India and several other countries, most notably in Africa, the penetration of the personal computer and the internet remains relatively low. However, there has been a huge surge in the adoption of simple mobile phones (there are over 700 million mobile phone numbers in India), and this penetration continues to grow at a fast pace. We will present Spoken Web, an attempt to create a new world wide web for the masses in these countries, accessible over the telephone network and hosted in a cloud. The Spoken Web platform facilitates easy creation of user-generated content that populates 'voice sites', and allows contextual traversal of voice sites interconnected via hyperlinks based on the Hyperspeech Transfer Protocol. We present our experience from pilots conducted in villages in Andhra Pradesh, Gujarat, and other states in India. These pilots demonstrate the ease with which a semi-literate and non-IT savvy population can create voice sites with locally relevant content, including schedule of education/training classes, agricultural information, and entertainment related content, and their strong interest in accessing this information over the telephone network. We describe several outstanding challenges and opportunities in creating and using a Spoken Web for facilitating exchange of information and conducting business transactions.

[17] iCollaborate: harvesting value from enterprise web usage Demonstrations / Kale, Ajinkya / Burris, Thomas / Shah, Bhavesh / Venkatesan, T. L. Prasanna / Velusamy, Lakshmanan / Gupta, Manish / Degerattu, Melania Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2010-07-19 p.699
Keywords: enterprise social data, social browsing
ACM Digital Library Link
Summary: We are in a phase of 'Participatory Web' in which users add value' to the information on the web by publishing, tagging and sharing. The Participatory Web has enormous potential for an enterprise because unlike the users of the internet an enterprise is a community that shares common goals, assumptions, vocabulary and interest and has reliable user identification and mutual trust along with a central governance and incentives to collaborate. Everyday, the employees of an organization locate content relevant to their work on the web. Finding this information takes time, expertise and creativity, which costs an organization money. That is, the web pages employees find are knowledge assets owned by the enterprise. This investment in web-based knowledge assets is lost every time the enterprise fails to capture and reuse them. iCollaborate is tooled to capture user's web interaction, persist and analyze it, and feed that interaction back into the community -- the enterprise.

[18] LINKREC: a unified framework for link recommendation with user attributes and graph structure WWW posters / Yin, Zhijun / Gupta, Manish / Weninger, Tim / Han, Jiawei Proceedings of the 2010 International Conference on the World Wide Web 2010-04-26 v.1 p.1211-1212
Keywords: link recommendation, random walk
ACM Digital Library Link
Summary: With the phenomenal success of networking sites (e.g., Facebook, Twitter and LinkedIn), social networks have drawn substantial attention. On online social networking sites, link recommendation is a critical task that not only helps improve user experience but also plays an essential role in network growth. In this paper we propose several link recommendation criteria, based on both user attributes and graph structure. To discover the candidates that satisfy these criteria, link relevance is estimated using a random walk algorithm on an augmented social graph with both attribute and structure information. The global and local influence of the attributes is leveraged in the framework as well. Besides link recommendation, our framework can also rank attributes in a social network. Experiments on DBLP and IMDB data sets demonstrate that our method outperforms state-of-the-art methods based on network structure and node attribute information for link recommendation.

[19] Adding GPS-Control to Traditional Thermostats: An Exploration of Potential Energy Savings and Design Challenges At Home with Pervasive Applications / Gupta, Manu / Intille, Stephen S. / Larson, Kent Proceedings of Pervasive 2009: International Conference on Pervasive Computing 2009-05-11 p.95-114
Link to Digital Content at Springer
Summary: Although manual and programmable home thermostats can save energy when used properly, studies have shown that over 40% of U.S. homes may not use energy-saving temperature setbacks when homes are unoccupied. We propose a system for augmenting these thermostats using just-in-time heating and cooling based on travel-to-home distance obtained from location-aware mobile phones. Analyzing GPS travel data from 8 participants (8-12 weeks each) and heating and cooling characteristics from 5 homes, we report results of running computer simulations estimating potential energy savings from such a device. Using a GPS-enabled thermostat might lead to savings of as much as 7% for some households that do not regularly use the temperature setback afforded by manual and programmable thermostats. Significantly, these savings could be obtained without requiring any change in occupant behavior or comfort level, and the technology could be implemented affordably by exploiting the ubiquity of mobile phones. Additional savings may be possible with modest context-sensitive prompting. We report on design considerations identified during a pilot test of a fully-functional implementation of the system.

[20] Predicting click through rate for job listings Posters Wednesday, April 22, 2009 / Gupta, Manish Proceedings of the 2009 International Conference on the World Wide Web 2009-04-20 p.1053-1054
Keywords: CPC, CTR, GBDT, click through rate, gradient boosted decision trees, jobs, linear regression, prediction, treenet
ACM Digital Library Link
Summary: Click Through Rate (CTR) is an important metric for ad systems, job portals, recommendation systems. CTR impacts publisher's revenue, advertiser's bid amounts in "pay for performance" business models. We learn regression models using features of the job, optional click history of job, features of "related" jobs. We show that our models predict CTR much better than predicting avg. CTR for all job listings, even in absence of the click history for the job listing.

[21] Detecting image spam using visual features and near duplicate detection Security I: misc / Mehta, Bhaskar / Nangia, Saurabh / Gupta, Manish / Nejdl, Wolfgang Proceedings of the 2008 International Conference on the World Wide Web 2008-04-21 p.497-506
Keywords: email spam, image analysis, machine learning
ACM Digital Library Link
Summary: Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.

[22] Fast algorithms for topk personalized pagerank queries Posters / Gupta, Manish / Pathak, Amit / Chakrabarti, Soumen Proceedings of the 2008 International Conference on the World Wide Web 2008-04-21 p.1225-1226
Keywords: hubrank, node-deletion, pagerank, personalized, top-k
ACM Digital Library Link
Summary: In entity-relation (ER) graphs (V,E), nodes V represent typed entities and edges E represent typed relations. For dynamic personalized PageRank queries, nodes are ranked by their steady-state probabilities obtained using the standard random surfer model. In this work, we propose a framework to answer top-k graph conductance queries. Our top-k ranking technique leads to a 4X speedup, and overall, our system executes queries 200-1600X faster than whole-graph PageRank. Some queries might contain hard predicates i.e. predicates that must be satisfied by the answer nodes. E.g. we may seek authoritative papers on public key cryptography, but only those written during 1997. We extend our system to handle hard predicates. Our system achieves these substantial query speedups while consuming only 10-20% of the space taken by a regular text index.

[23] Sonic Grid: an auditory interface for the visually impaired to navigate GUI-based environments Short papers / Jagdish, Deepak / Sawhney, Rahul / Gupta, Mohit / Nangia, Shreyas Proceedings of the 2008 International Conference on Intelligent User Interfaces 2008-01-13 p.337-340
ACM Digital Library Link
Summary: This paper explores the prototype design of an auditory interface enhancement called the Sonic Grid that helps visually impaired users navigate GUI-based environments. The Sonic Grid provides an auditory representation of GUI elements embedded in a two-dimensional interface, giving a 'global' spatial context for use of auditory icons, ear-cons and speech feedback. This paper introduces the Sonic Grid, discusses insights gained through participatory design with members of the visually impaired community, and suggests various applications of the technique, including its use to ease the learning curve for using computers by the visually impaired.