HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,588,656
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Xiong_H* Results: 29 Sorted by: Date  Comments?
Help Dates
Limit:   
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 29 Jump to: 2015 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 |
[1] CCS-TA: quality-guaranteed online task allocation in compressive crowdsensing Sensing with crowd / Wang, Leye / Zhang, Daqing / Pathak, Animesh / Chen, Chao / Xiong, Haoyi / Yang, Dingqi / Wang, Yasha Proceedings of the 2015 International Conference on Ubiquitous Computing 2015-09-07 p.683-694
ACM Digital Library Link
Summary: Data quality and budget are two primary concerns in urban-scale mobile crowdsensing applications. In this paper, we leverage the spatial and temporal correlation among the data sensed in different sub-areas to significantly reduce the required number of sensing tasks allocated (corresponding to budget), yet ensuring the data quality. Specifically, we propose a novel framework called CCS-TA, combining the state-of-the-art compressive sensing, Bayesian inference, and active learning techniques, to dynamically select a minimum number of sub-areas for sensing task allocation in each sensing cycle, while deducing the missing data of unallocated sub-areas under a probabilistic data accuracy guarantee. Evaluations on real-life temperature and air quality monitoring datasets show the effectiveness of CCS-TA. In the case of temperature monitoring, CCS-TA allocates 18.0-26.5% fewer tasks than baseline approaches, allocating tasks to only 15.5% of the sub-areas on average while keeping overall sensing error below 0.25°C in 95% of the cycles.

[2] Multi-source Information Fusion for Personalized Restaurant Recommendation Short Papers / Sun, Jing / Xiong, Yun / Zhu, Yangyong / Liu, Junming / Guan, Chu / Xiong, Hui Proceedings of the 2015 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-09 p.983-986
ACM Digital Library Link
Summary: In this paper, we study the problem of personalized restaurant recommendations. Specifically, we develop a probabilistic factor analysis framework, named RMSQ-MF, which has the ability in exploiting multi-source information, such as the users' task, their friends' preferences, and human mobility patterns, for personalized restaurant recommendations. The rationale of this work is motivated by two observations. First, people's preferences can be affected by their friends. Second, human mobility patterns can reflect the popularity of restaurants to a certain degree. Finally, empirical studies on real-world data demonstrate that the proposed method outperforms benchmark methods with a significant margin.

[3] Remix in 3D Printing: What your Sources say About You WebSci Track Papers & Posters / Papadimitriou, Spiros / Papalexakis, Evangelos / Liu, Bin / Xiong, Hui Companion Proceedings of the 2015 International Conference on the World Wide Web 2015-05-18 v.2 p.367-368
ACM Digital Library Link
Summary: Concurrently with the recent, rapid adoption of 3D printing technologies, online sharing of 3D-printable designs is growing equally rapidly, even though it has received far less attention. We study remix relationships on Thingiverse, the dominant online repository and social network for 3D printing. We collected data of designs published over five years, and we find that remix ties exhibit both homophily and inverse-homophily across numerous key metrics, which is stronger compared to other kinds of social and content links. This may have implications on graph prediction tasks, as well as on the design of 3D-printable content repositories.

[4] Fused one-vs-all mid-level features for fine-grained visual categorization Multimedia Analysis and Mining / Zhang, Xiaopeng / Xiong, Hongkai / Zhou, Wengang / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.287-296
ACM Digital Library Link
Summary: As an emerging research topic, fine-grained visual categorization has been attracting growing attentions in recent years. Due to the large inter-class similarity and intra-class variance, recognizing objects in fine-grained domains is extremely challenging, and sometimes even humans can not recognize them accurately. Traditional bag-of-words model could obtain desirable results for basic-level category classification by weak alignment using spatial pyramid matching model, but may easily fail in fine-grained domains since the discriminative features are not only subtle but also extremely localized. The fine differences often get swamped by those irrelevant features, and it is virtually impossible to distinguish them. To address the problems above, we propose a new framework for fine-grained visual categorization. We strengthen the spatial correspondence among parts by including foreground segmentation and part localization. Based on the part representations of the images, we learn a large set of mid-level features which are more suitable for fine-grained tasks. Comparing with the low level features directly extracted from the images, the learned one-vs-all mid-level features enjoy the following advantages. First, the dimension of the mid-level features is relatively small. In order to obtain high classification accuracy, the dimension of the low level features usually reaches several thousand to tens of thousand, and becomes even larger when introducing spatial pyramid model. However, the dimension of our mid-level features is related to the number of classes, which is far less. Second, each entry of the proposed mid-level features is meaningful, which forms a more compact representation of the image. Third, the mid-level features are more robust than the low level ones, which is helpful for classification. Fourth, the learning process of the mid-level features is independent and can be easily combined with other techniques to boost the performance. We evaluate the proposed approach on the extensive fine-grained dataset CUB 200-2011 and Stanford Dogs, by learning the mid-level features based on the popular Fisher vectors and convolutional neural network, we boost the classification accuracy by a considerable margin and advance the state-of-the-art performance in fine-grained visual categorization.

[5] Influence Maximization over Large-Scale Social Networks: A Bounded Linear Approach KM Session 1: Social Networks & Social Media I / Liu, Qi / Xiang, Biao / Chen, Enhong / Xiong, Hui / Tang, Fangshuang / Yu, Jeffrey Xu Proceedings of the 2014 ACM Conference on Information and Knowledge Management 2014-11-03 p.171-180
ACM Digital Library Link
Summary: Information diffusion in social networks is emerging as a promising solution to successful viral marketing, which relies on the effective and efficient identification of a set of nodes with the maximal social influence. While there are tremendous efforts on the development of social influence models and algorithms for social influence maximization, limited progress has been made in terms of designing both efficient and effective algorithms for finding a set of nodes with the maximal social influence. To this end, in this paper, we provide a bounded linear approach for influence computation and influence maximization. Specifically, we first adopt a linear and tractable approach to describe the influence propagation. Then, we develop a quantitative metric, named Group-PageRank, to quickly estimate the upper bound of the social influence based on this linear approach. More importantly, we provide two algorithms Linear and Bound, which exploit the linear approach and Group-PageRank for social influence maximization. Finally, extensive experimental results demonstrate that (a) the adopted linear approach has a close relationship with traditional models and Group-PageRank provides a good estimation of social influence; (b) Linear and Bound can quickly find a set of the most influential nodes and both of them are scalable for large-scale social networks.

[6] Multi-task Multi-view Learning for Heterogeneous Tasks KM Session 5: Classification II / Jin, Xin / Zhuang, Fuzhen / Xiong, Hui / Du, Changying / Luo, Ping / He, Qing Proceedings of the 2014 ACM Conference on Information and Knowledge Management 2014-11-03 p.441-450
ACM Digital Library Link
Summary: Multi-task multi-view learning deals with the learning scenarios where multiple tasks are associated with each other through multiple shared feature views. All previous works for this problem assume that the tasks use the same set of class labels. However, in real world there exist quite a few applications where the tasks with several views correspond to different set of class labels. This new learning scenario is called Multi-task Multi-view Learning for Heterogeneous Tasks in this study. Then, we propose a Multi-tAsk MUlti-view Discriminant Analysis (MAMUDA) method to solve this problem. Specifically, this method collaboratively learns the feature transformations for different views in different tasks by exploring the shared task-specific and problem intrinsic structures. Additionally, MAMUDA method is convenient to solve the multi-class classification problems. Finally, the experiments on two real-world problems demonstrate the effectiveness of MAMUDA for heterogeneous tasks.

[7] Predicting the Popularity of Online Serials with Autoregressive Models KM Session 17: Web Data Mining / Chang, Biao / Zhu, Hengshu / Ge, Yong / Chen, Enhong / Xiong, Hui / Tan, Chang Proceedings of the 2014 ACM Conference on Information and Knowledge Management 2014-11-03 p.1339-1348
ACM Digital Library Link
Summary: Recent years have witnessed the rapid prevalence of online serials, which play an important role in our daily entertainment. A critical demand along this line is to predict the popularity of online serials, which can enable a wide range of applications, such as online advertising, and serial recommendation. However, compared with traditional online media such as user-generated content (UGC), online serials have unique characteristics of sequence dependence, release date dependence as well as unsynchronized update regularity. Therefore, the popularity prediction for online serials is a nontrivial task and still under-addressed. To this end, in this paper we present a comprehensive study for predicting the popularity of online serials with autoregressive models. Specifically, we first introduce a straightforward yet effective Naive Autoregressive (NAR) model based on the correlations of serial episodes. Furthermore, we develop a sophisticated model, namely Transfer Autoregressive (TAR) model, to capture the dynamic behaviors of audiences, which can achieve better prediction performance than the NAR model. Indeed, the two models can reveal the popularity generation from different perspectives. In addition, as a derivative of the TAR model, we also design a novel metric, namely favor, for evaluating the quality of online serials. Finally, extensive experiments on two real-world data sets clearly show that both models are effective and outperform baselines in terms of the popularity prediction for online serials. And the new metric performs better than other metrics for quality estimation.

[8] Eye Glance Behavior Associated with Cell-Phone Use: Examination with Naturalistic Driving Data Surface Transportation: ST4 -- Naturalistic Driving Research / Bao, Shan / Flannagan, Carol / Xiong, Huimin / Sayer, Jim Proceedings of the Human Factors and Ergonomics Society 2014 Annual Meeting 2014-10-27 p.2112-2116
doi 10.1177/1541931214581444
Link to HFES Digital Content
Summary: The purpose of this study is to examine eye-glance patterns of drivers engaged in cell phone related tasks. To observe eye-glance patterns, researchers used naturalistic driving data from the Integrated Vehicle-Based Safety Systems field operational test to construct and tabulate two datasets. One dataset included gaze data that were coded from cell phone conversation clips by fifty different drivers under different driving conditions. The second dataset was created in a similar way using video clips from twenty-four drivers who engaged in visual-manual tasks (e.g., texting and dialing). Mixed-model analyses were conducted. Results showed that drivers' on-road gazes were longer when they were engaged in a cell phone conversation than when they were not engaged. Off-road gaze length was the same, regardless of task involvement. In contrast, drivers who engaged in visual-manual tasks had substantially shorter on-road gaze length compared to when those same drivers were not involved in visual-manual tasks.

[9] CrowdRecruiter: selecting participants for piggyback crowdsensing under probabilistic coverage constraint Sensing the crowd / Zhang, Daqing / Xiong, Haoyi / Wang, Leye / Chen, Guanling Proceedings of the 2014 International Joint Conference on Pervasive and Ubiquitous Computing 2014-09-13 v.1 p.703-714
ACM Digital Library Link
Summary: This paper proposes a novel participant selection framework, named CrowdRecruiter, for mobile crowdsensing. CrowdRecruiter operates on top of energy-efficient Piggyback Crowdsensing (PCS) task model and minimizes incentive payments by selecting a small number of participants while still satisfying probabilistic coverage constraint. In order to achieve the objective when piggybacking crowdsensing tasks with phone calls, CrowdRecruiter first predicts the call and coverage probability of each mobile user based on historical records. It then efficiently computes the joint coverage probability of multiple users as a combined set and selects the near-minimal set of participants, which meets coverage ratio requirement in each sensing cycle of the PCS task. We evaluated CrowdRecruiter extensively using a large-scale real-world dataset and the results show that the proposed solution significantly outperforms three baseline algorithms by selecting 10.0% -- 73.5% fewer participants on average under the same probabilistic coverage constraint.

[10] Cost-Aware Collaborative Filtering for Travel Tour Recommendations / Ge, Yong / Xiong, Hui / Tuzhilin, Alexander / Liu, Qi ACM Transactions on Information Systems 2014-01 v.32 n.1 p.4
ACM Digital Library Link
Summary: Advances in tourism economics have enabled us to collect massive amounts of travel tour data. If properly analyzed, this data could be a source of rich intelligence for providing real-time decision making and for the provision of travel tour recommendations. However, tour recommendation is quite different from traditional recommendations, because the tourist's choice is affected directly by the travel costs, which includes both financial and time costs. To that end, in this article, we provide a focused study of cost-aware tour recommendation. Along this line, we first propose two ways to represent user cost preference. One way is to represent user cost preference by a two-dimensional vector. Another way is to consider the uncertainty about the cost that a user can afford and introduce a Gaussian prior to model user cost preference. With these two ways of representing user cost preference, we develop different cost-aware latent factor models by incorporating the cost information into the probabilistic matrix factorization (PMF) model, the logistic probabilistic matrix factorization (LPMF) model, and the maximum margin matrix factorization (MMMF) model, respectively. When applied to real-world travel tour data, all the cost-aware recommendation models consistently outperform existing latent factor models with a significant margin.

[11] Ranking fraud detection for mobile apps: a holistic view KM track: mobile and event mining / Zhu, Hengshu / Xiong, Hui / Ge, Yong / Chen, Enhong Proceedings of the 2013 ACM Conference on Information and Knowledge Management 2013-10-27 p.619-628
ACM Digital Library Link
Summary: Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps' sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps' ranking and rating behaviors through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the Apple's App Store for a long time period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as well as some regularity of ranking fraud activities.

[12] Drivers' Selected Settings for Adaptive Cruise Control (ACC): Implications for Long-Term Use Surface Transportation: ST6 -- In-Vehicle Driver Support Systems / Xiong, Huimin / Boyle, Linda Ng Proceedings of the Human Factors and Ergonomics Society 2013 Annual Meeting 2013-09-30 p.1928-1932
doi 10.1177/1541931213571431
Link to HFES Digital Content
Summary: Adaptive Cruise Control (ACC) is a system that assists drivers on longitudinal control by automatically adjusting the throttle. Users can set the speed and gap setting based on their driving preferences. In this study, drivers' ACC use pattern and selection choices are examined based on their level of experience and geographical location. Experienced ACC users from urban settings in Washington are compared to less urbanized areas in Iowa. Information on novice ACC users were also collected in Washington and compared with experienced ACC users within the same area. The outcomes show that although similar use patterns do exist, there are differences in geographical locations and experience levels that impact drivers' choice of ACC settings. In Iowa, experienced ACC drivers select faster speed and closer time headway distance than drivers in Washington State. This suggests that use of ACC differ given environmental surroundings. Within Washington, experienced ACC users set faster speed, closer time headway distance, and intervened less compared with novice ACC users. This suggests that drivers' behavior may change with greater exposure to ACC, which can provide insights on drivers' automation reliance after extended use.

[13] effSense: energy-efficient and cost-effective data uploading in mobile crowdsensing Workshop: PUCAA: 1st international workshop on pervasive urban crowdsensing architecture and applications / Wang, Leye / Zhang, Daqing / Xiong, Haoyi Adjunct Proceedings of the 2013 International Joint Conference on Pervasive and Ubiquitous Computing 2013-09-08 v.2 p.1075-1086
ACM Digital Library Link
Summary: Energy consumption and mobile data cost are two key factors affecting users' willingness to participate in crowdsensing tasks. While data-plan users are mostly concerned about the energy consumption, non-data-plan users are more sensitive to data transmission cost incurred. Traditional ways of data collection in mobile crowdsensing often go to two extremes: either uploading the sensed data online in real-time or fully offline after the whole sensing task is finished. In this paper, we propose effSense -- a novel energy-efficient and cost-effective data uploading framework leveraging the delay-tolerant mechanisms. Specifically, effSense reduces the data cost of non-data-plan users by maximally offloading the data to Bluetooth/WiFi gateways or data-plan users encountered to relay the data to the server; it reduces energy consumption of data-plan users by uploading data in parallel with a call or using less-energy demand networks (e.g. Bluetooth). By leveraging the prediction of critical events such as user's future calls or encounters, effSense selects the optimal uploading scheme for both types of users. Our evaluation with MIT Reality Mining and Nodobo datasets show that effSense can save 55%~65% energy and 45%~50% data cost for the two types of users, respectively, compared with the traditional uploading schemes.

[14] Detecting and Tracking Topics and Events from Web Search Logs / Liu, Hongyan / He, Jun / Gu, Yingqin / Xiong, Hui / Du, Xiaoyong ACM Transactions on Information Systems 2012-11 v.30 n.4 p.21
ACM Digital Library Link
Summary: Recent years have witnessed increased efforts on detecting topics and events from Web search logs, since this kind of data not only capture web content but also reflect the users' activities. However, the majority of existing work is focused on exploiting clustering techniques for topic and event detection. Due to the huge size and the evolving nature of Web data, existing clustering approaches are limited to meet the real-time demand. To that end, in this article, we propose a method called LETD to detect evolving topics in a timely manner. Also, we design the techniques to extract events from topics and to infer the evolving relationship among the events. For topic detection, we first provide a measurement to select the important URLs, which are most likely to describe a real-life topic. Then, starting from these selected URLs, we exploit the local expansion method to find other topic-related URLs. Moreover, in the LETD framework, we design algorithms based on Random Walk and Markov Random Fields (MRF), respectively. Because the LETD method exploits a divide-and-conquer strategy to process the data, it is more efficient than existing methods based on clustering techniques. To better illustrate the LETD framework, we develop a demo system StoryTeller which can discover hot topics and events, infer the evolving relationships among events, and visualize information in a storytelling way. This demo system can provide a global view of the topic development and help users target the interesting events more conveniently. Finally, experimental results on real-world Microsoft click-through data have shown that StoryTeller can find real-life hot topics and meaningful evolving relationships among events, and has also demonstrated the efficiency and effectiveness of the LETD method.

[15] Exploiting enriched contextual information for mobile app classification Knowledge management short paper session / Zhu, Hengshu / Cao, Huanhuan / Chen, Enhong / Xiong, Hui / Tian, Jilei Proceedings of the 2012 ACM Conference on Information and Knowledge Management 2012-10-29 p.1617-1621
ACM Digital Library Link
Summary: A key step for the mobile app usage analysis is to classify apps into some predefined categories. However, it is a nontrivial task to effectively classify mobile apps due to the limited contextual information available for the analysis. To this end, in this paper, we propose an approach to first enrich the contextual information of mobile apps by exploiting the additional Web knowledge from the Web search engine. Then, inspired by the observation that different types of mobile apps may be relevant to different real-world contexts, we also extract some contextual features for mobile apps from the context-rich device logs of mobile users. Finally, we combine all the enriched contextual information into a Maximum Entropy model for training a mobile app classifier. The experimental results based on 443 mobile users' device logs clearly show that our approach outperforms two state-of-the-art benchmark methods with a significant margin.

[16] Influential seed items recommendation Short papers / Liu, Qi / Xiang, Biao / Chen, Enhong / Ge, Yong / Xiong, Hui / Bao, Tengfei / Zheng, Yi Proceedings of the 2012 ACM Conference on Recommender Systems 2012-09-09 p.245-248
ACM Digital Library Link
Summary: In this paper, we present a systematic perspective study on choosing and evaluating the initial seed items that will be recommended to the cold start users. We first construct an item consumption correlation network to capture the existing users' general consumption behaviors. Then, we formalize initial items recommendation as the influential seed set selection problem. Along this line, we present several methods, each of which selects seed items according to different rules. Finally, the experimental results on two real-world data sets verify that with different seed items, the users' consumption numbers will be quite different. Meanwhile, the results also provide many deep insights into these selection methods and their recommended seed items.

[17] Accelerate TV-L1 optical flow with edge-based image decomposition and its implementation on mobile phone / Wang, Botao / Zhu, Qingxiang / Xiong, Hongkai / Luo, Chuanfei Proceedings of the 2011 International Conference on Mobile and Ubiquitous Multimedia 2011-12-07 p.144-151
ACM Digital Library Link
Summary: Variational methods are among the most accurate techniques of optical flow computation. TV-L1 optical flow, which is based on L1-norm data fidelity term and total variation (TV) regularization term, preserves discontinuities in the flow field and also can deal with large displacements. However, the TV-L1 optical flow method is inaccurate near edges and computationally intensive. In this paper, we proposed a technique, called Edge-based Image Decomposition (EID), to improve the accuracy in the edge areas and also accelerate the original TV-L1 method. EID improves the performance by decomposing image into edge regions and flat regions, and also assigns computing power discriminatively. We evaluated our algorithm on Middlebury datasets and proved that by applying EID, 30% of run-time can be saved with no loss in accuracy, and with same run-time, 7% of accuracy can be promoted. In addition, we implemented our EID-enhanced TV-L1 optical flow algorithm on mobile phone with Android operating system. Our application calculates the optical flow field between two images and can be used to generate the disparity map and reconstruct 3D scenes.

[18] Towards expert finding by leveraging relevant categories in authority ranking Poster session: knowledge management / Zhu, Hengshu / Cao, Huanhuan / Xiong, Hui / Chen, Enhong / Tian, Jilei Proceedings of the 2011 ACM Conference on Information and Knowledge Management 2011-10-24 p.2221-2224
ACM Digital Library Link
Summary: How to improve authority ranking is a crucial research problem for expert finding. In this paper, we propose a novel framework for expert finding based on the authority information in the target category as well as the relevant categories. First, we develop a scalable method for measuring the relevancy between categories through topic models. Then, we provide a link analysis approach for ranking user authority by considering the information in both the target category and the relevant categories. Finally, the extensive experiments on two large-scale real-world Q&A data sets clearly show that the proposed method outperforms the baseline methods with a significant margin.

[19] Collaborative filtering with collective training Poster session 2 / Ge, Yong / Xiong, Hui / Tuzhilin, Alexander / Liu, Qi Proceedings of the 2011 ACM Conference on Recommender Systems 2011-10-23 p.281-284
ACM Digital Library Link
Summary: Rating sparsity is a critical issue for collaborative filtering. For example, the well-known Netflix Movie rating data contain ratings of only about 1% user-item pairs. One way to address this rating sparsity problem is to develop more effective methods for training rating prediction models. To this end, in this paper, we introduce a collective training paradigm to automatically and effectively augment the training ratings. Essentially, the collective training paradigm builds multiple different Collaborative Filtering (CF) models separately, and augments the training ratings of each CF model by using the partial predictions of other CF models for unknown ratings. Along this line, we develop two algorithms, Bi-CF and Tri-CF, based on collective training. For Bi-CF and Tri-CF, we collectively and iteratively train two and three different CF models via iteratively augmenting training ratings for individual CF model. We also design different criteria to guide the selection of augmented training ratings for Bi-CF and Tri-CF. Finally, the experimental results show that Bi-CF and Tri-CF algorithms can significantly outperform baseline methods, such as neighborhood-based and SVD-based models.

[20] PHD-THESIS Combining subject expert experimental data with standard data in Bayesian mixture modeling / Xiong, Hui / Allen, Theodore 2011 Columbus, Ohio Ohio State University, Industrial and Systems Engineering
oclcnum: 755555928
Keywords: Quality engineering
Keywords: Bayesian mixture model
Keywords: Topic model
Keywords: Unstructured data
Keywords: Freestyle text
Keywords: Collapsed Gibbs Sampling
Keywords: Text mining
Keywords: Data mining
Keywords: Human computer interaction
Keywords: Subject matter experT
rave.ohiolink.edu/etdc/view.cgi
Summary: Engineers face many quality-related datasets containing free-style text or images. For example, a database could include summaries of complaints filed by customers, or descriptions of the causes of rework or maintenance or of the associated actions taken, or a collection of quality inspection images of welded tubes. The goal of this dissertation is to enable engineers to input a database of free-style text or image data and then obtain a set of clusters or "topics" with intuitive definitions and information about the degree of commonality that together helps prioritize system improvement. The proposed methods generate Pareto charts of ranked clusters or topics with their interpretability improved by input from the analyst or method user. The combination of subject matter expert data with standard data is the novel feature of the methods considered. Prior to the methods proposed here, analysts applied Bayesian mixture models and had limited recourse if the cluster or topic definitions failed to be interpretable or are at odds with the knowledge of subject matter experts. The associated "Subject Matter Expert Refined Topic" (SMERT) model permits on-going knowledge elicitation and high-level human expert data integration to address the issues regarding: (1) unsupervised topic models often produce results to user, and (2) to provide a "Hierarchical Analysis Designed Latency Experiment" (HANDLE) for human expert to interact with the model results. If grouping are missing key elements, so-called "boosting" these elements is possible. If certain members of a cluster are nonsensical or nonphysical, so-called "zapping" these nonsensical elements is possible. We also describe a fast Collapsed Gibbs Sampling (CGS) algorithm for SMERT method, which offers the capacity to efficiently SMERT model large datasets but which is associated with approximations in certain cases. We use three case studies to illustrate the proposed methods. The first relates to scrap text reports for a Chinese manufacturer of stone products. The second relates to laser welding of tube joints and images characterizing bead shape. The third case study relates to consumer reports text user reviews of the Toyota Camry. The user reviews cover 10 years and the widely publicized acceleration issue. In all cases, the SMERT models help provide interpretable groupings of records in a way that could facilitate data-driven prioritization of improvement actions.

[21] Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification KM track: classification and clustering / Zhuang, Fuzhen / Luo, Ping / Shen, Zhiyong / He, Qing / Xiong, Yuhong / Shi, Zhongzhi / Xiong, Hui Proceedings of the 2010 ACM Conference on Information and Knowledge Management 2010-10-26 p.359-368
ACM Digital Library Link
Summary: The distribution difference among multiple data domains has been considered for the cross-domain text classification problem. In this study, we show two new observations along this line. First, the data distribution difference may come from the fact that different domains use different key words to express the same concept. Second, the association between this conceptual feature and the document class may be stable across domains. These two issues are actually the distinction and commonality across data domains.
    Inspired by the above observations, we propose a generative statistical model, named Collaborative Dual-PLSA (CD-PLSA), to simultaneously capture both the domain distinction and commonality among multiple domains. Different from Probabilistic Latent Semantic Analysis (PLSA) with only one latent variable, the proposed model has two latent factors y and z, corresponding to word concept and document class respectively. The shared commonality intertwines with the distinctions over multiple domains, and is also used as the bridge for knowledge transformation. We exploit an Expectation Maximization (EM) algorithm to learn this model, and also propose its distributed version to handle the situation where the data domains are geographically separated from each other. Finally, we conduct extensive experiments over hundreds of classification tasks with multiple source domains and multiple target domains to validate the superiority of the proposed CD-PLSA model over existing state-of-the-art methods of supervised and transfer learning. In particular, we show that CD-PLSA is more tolerant of distribution differences.

[22] Exploiting user interests for collaborative filtering: interests expansion via personalized ranking Poster session 3: KM track / Liu, Qi / Chen, Enhong / Xiong, Hui / Ding, Chris H. Q. Proceedings of the 2010 ACM Conference on Information and Knowledge Management 2010-10-26 p.1697-1700
ACM Digital Library Link
Summary: In real applications, a given user buys or rates an item based on his/her interests. Learning to leverage this interest information is often critical for recommender systems. However, in existing recommender systems, the information about latent user interests are largely under-explored. To that end, in this paper, we propose an interest expansion strategy via personalized ranking based on the topic model, named iExpand, for building an interest-oriented collaborative filtering framework. The iExpand method introduces a three-layer, user-interest-item, representation scheme, which leads to more interpretable recommendation results and helps the understanding of the interactions among users, items, and user interests. Moreover, iExpand strategically deals with many issues, such as the overspecialization and the cold-start problems. Finally, we evaluate iExpand on benchmark data sets, and experimental results show that iExpand outperforms state-of-the-art methods.

[23] Top-Eye: top-k evolving trajectory outlier detection Poster session 3: KM track / Ge, Yong / Xiong, Hui / Zhou, Zhi-hua / Ozdemir, Hasan / Yu, Jannite / Lee, K. C. Proceedings of the 2010 ACM Conference on Information and Knowledge Management 2010-10-26 p.1733-1736
ACM Digital Library Link
Summary: The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for identifying abnormal moving activities. Indeed, various aspects of abnormality of moving patterns have recently been exploited, such as wrong direction and wandering. However, there is no recognized way of combining different aspects into an unified evolving abnormality score which has the ability to capture the evolving nature of abnormal moving trajectories. To that end, in this paper, we provide an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way. Specifically, in TOP-EYE, we introduce a decay function to mitigate the influence of the past trajectories on the evolving outlying score, which is defined based on the evolving moving direction and density of trajectories. This decay function enables the evolving computation of accumulated outlying scores along the trajectories. An advantage of TOP-EYE is to identify evolving outliers at very early stage with relatively low false alarm rate. Finally, experimental results on real-world location traces show that TOP-EYE can effectively capture evolving abnormal trajectories.

[24] Enhancing recommender systems under volatile user interest drifts IR personalization and social search I / Cao, Huanhuan / Chen, Enhong / Yang, Jie / Xiong, Hui Proceedings of the 2009 ACM Conference on Information and Knowledge Management 2009-11-02 p.1257-1266
ACM Digital Library Link
Summary: This paper presents a systematic study of how to enhance recommender systems under volatile user interest drifts. A key development challenge along this line is how to track user interests dynamically. To this end, we first define four types of interest patterns to understand users' rating behaviors and analyze the properties of these patterns. We also propose a rating graph and rating chain based approach for detecting these interest patterns. For each users' rating series, a rating graph and a rating chain are constructed based on the similarities between rated items. The type of a given user's interest pattern is identified through the density of the corresponding rating graph and the continuity of the corresponding rating chain. In addition, we propose a general algorithm framework for improving recommender systems by exploiting these identified patterns. Finally, experimental results on a real-world data set show that the proposed rating graph based approach is effective for detecting user interest patterns, which in turn help to improve the performance of recommender systems.

[25] What's behind topic formation and development: a perspective of community core groups Poster session 6: IR track / Qian, Tieyun / Li, Qing / Liu, Bing / Xiong, Hui / Srivastava, Jaideep / Sheu, Phillip Proceedings of the 2009 ACM Conference on Information and Knowledge Management 2009-11-02 p.1843-1846
ACM Digital Library Link
Summary: Over the past several years, there has been a great interest in topic detection and tracking (TDT). Recently, analyzing general research trend from the huge amount of history documents also arouses considerable attention. However, existing work on TDT mainly focuses on overall trend analysis, and is unable to address questions such as "what determines the evolution of a topic?" and "when and how does a new topic get formed?".
    In this paper, we propose a core group model to explain the dynamics and further segment topic development. According to the division phase and interphase in the life cycle of a core group, a topic is separated into four states, i.e. birth state, extending state, saturation state and shrinkage state. Experimental results on a real dataset show that the division of a core group brings on the generation of a new topic, and the progress of an entire topic is closely correlated to the growth of a core group during its interphase.
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 29 Jump to: 2015 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 |