HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,646,435
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: Tian_Q* Results: 23 Sorted by: Date  Comments?
Help Dates
Limit:   
[1] Perception-Guided Multimodal Feature Fusion for Photo Aesthetics Assessment Multimedia HCI and QoE / Zhang, Luming / Gao, Yue / Zhang, Chao / Zhang, Hanwang / Tian, Qi / Zimmermann, Roger Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.237-246
ACM Digital Library Link
Summary: Photo aesthetic quality evaluation is a challenging task in multimedia and computer vision fields. Conventional approaches suffer from the following three drawbacks: 1) the deemphasized role of semantic content that is many times more important than low-level visual features in photo aesthetics; 2) the difficulty to optimally fuse low-level and high-level visual cues in photo aesthetics evaluation; and 3) the absence of a sequential viewing path in the existing models, as humans perceive visually salient regions sequentially when viewing a photo.
    To solve these problems, we propose a new aesthetic descriptor that mimics humans sequentially perceiving visually/semantically salient regions in a photo. In particular, a weakly supervised learning paradigm is developed to project the local aesthetic descriptors (graphlets in this work) into a low-dimensional semantic space. Thereafter, each graphlet can be described by multiple types of visual features, both at low-level and in high-level. Since humans usually perceive only a few salient regions in a photo, a sparsity-constrained graphlet ranking algorithm is proposed that seamlessly integrates both the low-level and the high-level visual cues. Top-ranked graphlets are those visually/semantically prominent graphlets in a photo. They are sequentially linked into a path that simulates the process of humans actively viewing. Finally, we learn a probabilistic aesthetic measure based on such actively viewing paths (AVPs) from the training photos that are marked as aesthetically pleasing by multiple users. Experimental results show that: 1) the AVPs are 87.65% consistent with real human gaze shifting paths, as verified by the eye-tracking data; and 2) our photo aesthetic measure outperforms many of its competitors.

[2] Fused one-vs-all mid-level features for fine-grained visual categorization Multimedia Analysis and Mining / Zhang, Xiaopeng / Xiong, Hongkai / Zhou, Wengang / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.287-296
ACM Digital Library Link
Summary: As an emerging research topic, fine-grained visual categorization has been attracting growing attentions in recent years. Due to the large inter-class similarity and intra-class variance, recognizing objects in fine-grained domains is extremely challenging, and sometimes even humans can not recognize them accurately. Traditional bag-of-words model could obtain desirable results for basic-level category classification by weak alignment using spatial pyramid matching model, but may easily fail in fine-grained domains since the discriminative features are not only subtle but also extremely localized. The fine differences often get swamped by those irrelevant features, and it is virtually impossible to distinguish them. To address the problems above, we propose a new framework for fine-grained visual categorization. We strengthen the spatial correspondence among parts by including foreground segmentation and part localization. Based on the part representations of the images, we learn a large set of mid-level features which are more suitable for fine-grained tasks. Comparing with the low level features directly extracted from the images, the learned one-vs-all mid-level features enjoy the following advantages. First, the dimension of the mid-level features is relatively small. In order to obtain high classification accuracy, the dimension of the low level features usually reaches several thousand to tens of thousand, and becomes even larger when introducing spatial pyramid model. However, the dimension of our mid-level features is related to the number of classes, which is far less. Second, each entry of the proposed mid-level features is meaningful, which forms a more compact representation of the image. Third, the mid-level features are more robust than the low level ones, which is helpful for classification. Fourth, the learning process of the mid-level features is independent and can be easily combined with other techniques to boost the performance. We evaluate the proposed approach on the extensive fine-grained dataset CUB 200-2011 and Stanford Dogs, by learning the mid-level features based on the popular Fisher vectors and convolutional neural network, we boost the classification accuracy by a considerable margin and advance the state-of-the-art performance in fine-grained visual categorization.

[3] Social Embedding Image Distance Learning Multimedia Recommendations / Liu, Shaowei / Cui, Peng / Zhu, Wenwu / Yang, Shiqiang / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.617-626
ACM Digital Library Link
Summary: Image distance (similarity) is a fundamental and important problem in image processing. However, traditional visual features based image distance metrics usually fail to capture human cognition. This paper presents a novel Social embedding Image Distance Learning (SIDL) approach to embed the similarity of collective social and behavioral information into visual space. The social similarity is estimated according to multiple social factors. Then a metric learning method is especially designed to learn the distance of visual features from the estimated social similarity. In this manner, we can evaluate the cognitive image distance based on the visual content of images. Comprehensive experiments are designed to investigate the effectiveness of SIDL, as well as the performance in the image recommendation and reranking tasks. The experimental results show that the proposed approach makes a marked improvement compared to the state-of-the-art image distance metrics. An interesting observation is given to show that the learned image distance can better reflect human cognition.

[4] Salable Image Search with Reliable Binary Code Posters 1 / Ren, Guangxin / Cai, Junjie / Li, Shipeng / Yu, Nenghai / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.769-772
ACM Digital Library Link
Summary: In many existing image retrieval algorithms, Bag-of-Words (BoW) model has been widely adopted for image representation. To achieve accurate indexing and efficient retrieval, local features such as the SIFT descriptor are extracted and quantized to visual words. One of the most popular quantization scheme is scalar quantization, which generates binary signature with an empirical threshold value. However, such binarization strategy inevitably suffers from the quantization loss induced by each quantized bit and impairs the effectiveness of search performance. In this paper, we investigate the reliability of each bit in scalar quantization and propose a novel reliable binary SIFT feature. We move one step ahead to incorporate the reliability in both index word expansion and feature similarity. Our proposed approach not only accelerates the search speed by narrowing search space, but also improves the retrieval accuracy by alleviating the impact of unreliable quantized bits. Experimental results demonstrate that the proposed approach achieves significant improvement in retrieval efficiency and accuracy.

[5] Personalized Visual Vocabulary Adaption for Social Image Retrieval Posters 2 / Niu, Zhenxing / Zhang, Shiliang / Gao, Xinbo / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.993-996
ACM Digital Library Link
Summary: With the popularity of mobile devices and social networks, users can easily build their personalized image sets. Thus, personalized image analysis, indexing, and retrieval have become important topics in social media analysis. Because of users' diverse preferences, their personalized image sets are usually related to specific topics and show large feature distribution bias from general Internet images. Therefore, the visual vocabulary trained on general Internet images may could not fit across users' personalized image sets very well. To improve the image retrieval performance on personalized image sets, we propose the personalized visual vocabulary adaption which removes non-discriminative visual words and replaces them with more exact and discriminative ones, i.e., adapt a general vocabulary toward a specific user's image set. The proposed algorithm updates the visual vocabulary during off-line feature quantization, and operates on a limited number of visual words, hence shows satisfying efficiency. Extensive experiments of image search on public datasets demonstrate the efficiency and superior performance of our approach.

[6] Image Re-ranking with an Alternating Optimization Posters 3 / Pang, Shanmin / Xue, Jianru / Gao, Zhanning / Tian, Qi Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.1141-1144
ACM Digital Library Link
Summary: In this work, we propose an efficient image re-ranking method, without additional memory cost compared with the baseline method[8], to re-rank all retrieved images. The motivation of the proposed method is that, there are usually many visual words in the query image that only give votes to irrelevant images. With this observation, we propose to only use visual words which can help to find relevant images to re-rank the retrieved images. To achieve the goal, we first find some similar images to the query by maximizing a quadratic function when given an initial ranking of the retrieved images. Then we select query visual words with an alternating optimization strategy: (1) at each iteration, select words based on the similar images that we have found and (2) in turn, update the similar images with the selected words. These two steps are repeated until convergence. Experimental results on standard benchmark datasets show that the proposed method outperforms spatial based re-ranking methods.

[7] Discriminative coupled dictionary hashing for fast cross-media retrieval Session 4c: more hashing / Yu, Zhou / Wu, Fei / Yang, Yi / Tian, Qi / Luo, Jiebo / Zhuang, Yueting Proceedings of the 2014 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2014-07-06 p.395-404
ACM Digital Library Link
Summary: Cross-media hashing, which conducts cross-media retrieval by embedding data from different modalities into a common low-dimensional Hamming space, has attracted intensive attention in recent years. The existing cross-media hashing approaches only aim at learning hash functions to preserve the intra-modality and inter-modality correlations, but do not directly capture the underlying semantic information of the multi-modal data. We propose a discriminative coupled dictionary hashing (DCDH) method in this paper. In DCDH, the coupled dictionary for each modality is learned with side information (e.g., categories). As a result, the coupled dictionaries not only preserve the intra-similarity and inter-correlation among multi-modal data, but also contain dictionary atoms that are semantically discriminative (i.e., the data from the same category is reconstructed by the similar dictionary atoms). To perform fast cross-media retrieval, we learn hash functions which map data from the dictionary space to a low-dimensional Hamming space. Besides, we conjecture that a balanced representation is crucial in cross-media retrieval. We introduce multi-view features on the relatively "weak" modalities into DCDH and extend it to multi-view DCDH (MV-DCDH) in order to enhance their representation capability. The experiments on two real-world data sets show that our DCDH and MV-DCDH outperform the state-of-the-art methods significantly on cross-media retrieval.

[8] Topology preserving hashing for similarity search Similarity search / Zhang, Lei / Zhang, Yongdong / Tang, Jinhui / Gu, Xiaoguang / Li, Jintao / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.123-132
ACM Digital Library Link
Summary: Binary hashing has been widely used for efficient similarity search. Learning efficient codes has become a research focus and it is still a challenge. In many cases, the real-world data often lies on a low-dimensional manifold, which should be taken into account to capture meaningful neighbors with hashing. The importance of a manifold is its topology, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, e.g. the relative ranking of neighbors of each subregion. Most existing hashing methods try to preserve the neighborhood relationships by mapping similar points to close codes, while ignoring the neighborhood rankings. Moreover, most hashing methods lack in providing a good ranking for query results since they use Hamming distance as the similarity metric, and in practice, there are often a lot of results sharing the same distance to a query. In this paper, we propose a novel hashing method to solve these two issues jointly. The proposed method is referred to as Topology Preserving Hashing (TPH). TPH is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space. The learning stage of TPH is formulated as a generalized eigendecomposition problem with closed form solutions. Experimental comparisons with other state-of-the-art methods on three noted image benchmarks demonstrate the efficacy of the proposed method.

[9] Stereotime: a wireless 2D and 3D switchable video communication system Demos / Yang, You / Liu, Qiong / Gao, Yue / Xiong, Binbin / Yu, Li / Luan, Huanbo / Ji, Rongrong / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.473-474
ACM Digital Library Link
Summary: Mobile 3D video communication, especially with 2D and 3D compatible, is a new paradigm for both video communication and 3D video processing. Current techniques face challenges in mobile devices when bundled constraints such as computation resource and compatibility should be considered. In this work, we present a wireless 2D and 3D switchable video communication to handle the previous challenges, and name it as Stereotime. The methods of Zig-Zag fast object segmentation, depth cues detection and merging, and texture-adaptive view generation are used for 3D scene reconstruction. We show the functionalities and compatibilities on 3D mobile devices in WiFi network environment.

[10] Object coding on the semantic graph for scene classification Posters / Chen, Jingjing / Han, Yahong / Cao, Xiaochun / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.493-496
ACM Digital Library Link
Summary: In the scene classification, a scene can be considered as a set of object cliques. Objects inside each clique have semantic correlations with each other, while two objects from different cliques are relatively independent. To utilize these correlations for better recognition performance, we propose a new method -- Object Coding on the Semantic Graph to address the scene classification problem. We first exploit prior knowledge by making statistics on a large number of labeled images and calculating the dependency degree between objects. Then, a graph is built to model the semantic correlations between objects. This semantic graph captures semantics by treating the objects as vertices and the objects affinities as the weights of edges. By encoding this semantic knowledge into the semantic graph, object coding is conducted to automatically select a set of object cliques that have strongly semantic correlations to represent a specific scene. The experimental results show that the Object Coding on semantic graph can improve the classification accuracy.

[11] Beyond bag of words: image representation in sub-semantic space Posters / Zhang, Chunjie / Wang, Shuhui / Liang, Chao / Liu, Jing / Huang, Qingming / Li, Haojie / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.497-500
ACM Digital Library Link
Summary: Due to the semantic gap, the low-level features are not able to semantically represent images well. Besides, traditional semantic related image representation may not be able to cope with large inter class variations and are not very robust to noise. To solve these problems, in this paper, we propose a novel image representation method in the sub-semantic space. First, examplar classifiers are trained by separating each training image from the others and serve as the weak semantic similarity measurement. Then a graph is constructed by combining the visual similarity and weak semantic similarity of these training images. We partition this graph into visually and semantically similar sub-sets. Each sub-set of images are then used to train classifiers in order to separate this sub-set from the others. The learned sub-set classifiers are then used to construct a sub-semantic space based representation of images. This sub-semantic space is not only more semantically meaningful but also more reliable and resistant to noise. Finally, we make categorization of images using this sub-semantic space based representation on several public datasets to demonstrate the effectiveness of the proposed method.

[12] What are the distance metrics for local features? Posters / Mao, Zhendong / Zhang, Yongdong / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.505-508
ACM Digital Library Link
Summary: Previous research has found that the distance metric for similarity estimation is determined by the underlying data noise distribution. The well known Euclidean (L2) and Manhattan (L1) metrics are then justified when the additive noise are Gaussian and Exponential, respectively. However, finding a suitable distance metric for local features is still a challenge when the underlying noise distribution is unknown and could be neither Gaussian nor Exponential. To address this issue, we introduce a modeling framework for arbitrary noise distributions and propose a generalized distance metric for local features based on this framework. We prove that the proposed distance is equivalent to the L1 or the L2 distance when the noise is Gaussian or Exponential. Furthermore, we justify the Hamming metric when the noise meets the given conditions. In that case, the proposed distance is a linear mapping of the Hamming distance. The proposed metric has been extensively tested on a benchmark data set with five state-of-the-art local features: SIFT, SURF, BRIEF, ORB and BRISK. Experiments show that our framework better models the real noise distributions and that more robust results can be obtained by using the proposed distance metric.

[13] Locality preserving verification for image search Posters / Pang, Shanmin / Xue, Jianru / Zheng, Nanning / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.529-532
ACM Digital Library Link
Summary: Establishing correct correspondences between two images has a wide range of applications, such as 2D and 3D registration, structure from motion, and image retrieval. In this paper, we propose a new matching method based on spatial constraints. The proposed method has linear time complexity, and is efficient when applying it to image retrieval. The main assumption behind our method is that, the local geometric structure among a feature point and its neighbors, is not easily affected by both geometric and photometric transformations, and thus should be preserved in their corresponding images. We model this local geometric structure by linear coefficients that reconstruct the point from its neighbors. The method is flexible, as it can not only estimate the number of correct matches between two images efficiently, but also determine the correctness of each match accurately. Furthermore, it is simple and easy to be implemented. When applying the proposed method on re-ranking images in an image search engine, it outperforms the-state-of-the-art techniques.

[14] Undo the codebook bias by linear transformation for visual applications Posters / Zhang, Chunjie / Zhang, Yifan / Wang, Shuhui / Pang, Junbiao / Liang, Chao / Huang, Qingming / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.533-536
ACM Digital Library Link
Summary: The bag of visual words model (BoW) and its variants have demonstrate their effectiveness for visual applications and have been widely used by researchers. The BoW model first extracts local features and generates the corresponding codebook, the elements of a codebook are viewed as visual words. The local features within each image are then encoded to get the final histogram representation. However, the codebook is dataset dependent and has to be generated for each image dataset. This costs a lot of computational time and weakens the generalization power of the BoW model. To solve these problems, in this paper, we propose to undo the dataset bias by codebook linear transformation. To represent every points within the local feature space using Euclidean distance, the number of bases should be no less than the space dimensions. Hence, each codebook can be viewed as a linear transformation of these bases. In this way, we can transform the pre-learned codebooks for a new dataset. However, not all of the visual words are equally important for the new dataset, it would be more effective if we can make some selection using sparsity constraints and choose the most discriminative visual words for transformation. We propose an alternative optimization algorithm to jointly search for the optimal linear transformation matrixes and the encoding parameters. Image classification experimental results on several image datasets show the effectiveness of the proposed method.

[15] Static saliency vs. dynamic saliency: a comparative study Scene understanding / Nguyen, Tam V. / Xu, Mengdi / Gao, Guangyu / Kankanhalli, Mohan / Tian, Qi / Yan, Shuicheng Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.987-996
ACM Digital Library Link
Summary: Recently visual saliency has attracted wide attention of researchers in the computer vision and multimedia field. However, most of the visual saliency-related research was conducted on still images for studying static saliency. In this paper, we give a comprehensive comparative study for the first time of dynamic saliency (video shots) and static saliency (key frames of the corresponding video shots), and two key observations are obtained: 1) video saliency is often different from, yet quite related with, image saliency, and 2) camera motions, such as tilting, panning or zooming, affect dynamic saliency significantly. Motivated by these observations, we propose a novel camera motion and image saliency aware model for dynamic saliency prediction. The extensive experiments on two static-vs-dynamic saliency datasets collected by us show that our proposed method outperforms the state-of-the-art methods for dynamic saliency prediction. Finally, we also introduce the application of dynamic saliency prediction for dynamic video captioning, assisting people with hearing impairments to better entertain videos with only off-screen voices, e.g., documentary films, news videos and sports videos.

[16] Scale based region growing for scene text detection Scene understanding / Mao, Junhua / Li, Houqiang / Zhou, Wengang / Yan, Shuicheng / Tian, Qi Proceedings of the 2013 ACM International Conference on Multimedia 2013-10-21 p.1007-1016
ACM Digital Library Link
Summary: Scene text is widely observed in our daily life and has many important multimedia applications. Unlike document text, scene text usually exhibits large variations in font and language, and suffers from low resolution, occlusions and complex background. In this paper, we present a novel scale-based region growing algorithm for scene text detection. We first distinguish SIFT features in text regions from those in background by exploring the inter- and intra-statistics of SIFT features. Then scene text regions in images are identified by scale-based region growing, which explores the geometric context of SIFT keypoints in local regions. Our algorithm is very effective to detect multilingual text in various fonts, sizes, and with complex background. In addition, it offers insights on efficiently deploying local features in numerous applications, such as visual search. We evaluate our algorithm on three datasets and achieve the state-of-the-art performance.

[17] Extraction of Light Stripe Centerline Based on Self-adaptive Thresholding and Contour Polygonal Representation Ergonomics of Work with Computers / Tian, Qingguo / Yang, Yujie / Zhang, Xiangyu / Ge, Baozhen DHM 2013: 4th International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics, and Risk Management, Part II: Human Body Modeling and Ergonomics 2013-07-21 v.2 p.292-301
Keywords: centerline extraction; light stripe; integral image thresholding; polygon representation; adaptive center of mass
Link to Digital Content at Springer
Summary: Extracting light stripe centerline is the key step in the line-structure light scanning visual measuring system. It directly determines the quality of three-dimensional point clouds obtained from images. Due to the reflectivity and/or color of object surface, illumination condition change and other factors, gray value and curvature of light stripe in image will vary greatly that makes it very difficulty to completely and precisely extract sub-pixel centerline. This paper presents a novel method for light stripe centerline extraction efficiently. It combines the integral image thresholding method, polygon representation of light stripe contour and adaptive center of mass method together. It firstly locates light stripe region and produces binary image no matter how change gray values of light stripe against background. Then the contour of light stripe is extracted and approximately represented by polygon. Based on the local orthogonal relationship between directions of light stripe cross-section and corresponding polygon segment, the direction of light stripe cross-section is calculated quickly. Along this direction, sub-pixel centerline coordinates are calculated using adaptive center of mass method. 3D scanning experiments with human model dressed colorful swimsuit on a self-designed line laser 3D scanning system are implemented. Some comparisons such as light stripe segmentation using 3 thresholding methods, the time used and the smoothness are given and the results show that the proposed method can acquire satisfying data. The mean time used for one image is not beyond 5 ms and the completeness and smoothness of point clouds acquired by presented methods are better than those of other two methods. This demonstrates the effectiveness and practicability of the proposed method.

[18] Automated description generation for indoor floor maps Posters and demonstrations / Paladugu, Devi A. / Maguluri, Hima Bindu / Tian, Qiongjie / Li, Baoxin Fourteenth Annual ACM SIGACCESS Conference on Assistive Technologies 2012-10-22 p.211-212
ACM Digital Library Link
Summary: People with visual impairment generally suffer from diminished freedom in navigating an environment. A practical need is to navigate through unfamiliar indoor environments such as school buildings, hotels, etc., for which commonly-used existing tools like canes, seeing-eye dogs and GPS devices cannot provide adequate support. We demonstrate a prototype system that aims at addressing this practical need. The input to the system is the name of the building/establishment supplied by a user, which is used by a web crawler to determine the availability of a floor map on the corresponding website. If available, the map is downloaded and used by the proposed system to generate a verbal description giving an overview of the locations of key landmarks inside the map with respect to one another. Our preliminary survey and experiments indicate that this is a promising direction to pursue in supporting indoor navigation for the visually impaired.

[19] Exploring tag relevance for image tag re-ranking Poster abstracts / Xiao, Jie / Zhou, Wengang / Tian, Qi Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12 p.1069-1070
ACM Digital Library Link
Summary: In this paper, we propose to explore the relevance between tags for image tag re-ranking. The key component is to define a global tag-tag similarity matrix, which is achieved by analysis in both semantic and visual aspects. The text semantic relevance is explored by the Latent Semantic Indexing (LSI) model [1]. For the visual information, the tag-relevance can be propagated by reconstructing exemplar images with visually and semantically consistent images. Based on our tag relevance matrix, a random-walk approach is leveraged to discover the significance of each tag. Finally, all tags in an image are re-ranked by their significance values. Extensive experiments show its effectiveness on an image dataset with a large tags vocabulary.

[20] Efficient l<sub>p</sub>-norm multiple feature metric learning for image categorization Poster session: information retrieval / Wang, Shuhui / Huang, Qingming / Jiang, Shuqiang / Tian, Qi Proceedings of the 2011 ACM Conference on Information and Knowledge Management 2011-10-24 p.2077-2080
ACM Digital Library Link
Summary: Previous metric learning approaches are only able to learn the metric based on single concatenated multivariate feature representation. However, for many real world problems with multiple feature representation such as image categorization, the model trained by previous approaches will degrade because of sparsity brought by significant dimension growth and uncontrolled influence from each feature channel. In this paper, we propose an efficient distance metric learning model which adapts Distance Metric Learning on multiple feature representations. The aim is to learn the Mahalanobis matrices for each independent feature and their non-sparse lp-norm weight coefficients simultaneously by maximizing the margin of the overall learned distance metric among the pairs from the same class and the distance of pairs from different classes. We further extend this method to nonlinear kernel learning and category specific metric learning, which demonstrate the applicability of using many existing kernels for image data and exploring the hierarchical semantic structures for large scale image datasets. Experiments on various datasets demonstrate the promising power of our method.

[21] Auto-calibration of a Laser 3D Color Digitization System Advances in Digital Human Modeling / Li, Xiaojie / Ge, Bao-zhen / Zhao, Dan / Tian, Qing-guo / Young, K. David DHM 2009: 2nd International Conference on Digital Human Modeling 2009-07-19 p.691-699
Link to Digital Content at Springer
Summary: A typical 3D color digitization system is composed of 3D sensors to obtain 3D information, and color sensors to obtain color information. Sensor calibration plays a key role in determining the correctness and accuracy of the 3D color digitization data. In order to carry out the calibration quickly and accurately, this paper introduces an automated calibration process which utilizes 3D dynamic precision fiducials, with which calibration dot pairs are extracted automatically, and as the corresponding data are processed via a calibration algorithm. This automated was experimentally verified to be fast and effective. Both the 3D information and color information are extracted such that the 3D sensors and the color sensors are calibrated with one automated calibration process. We believe it is the first such calibration process for a 3D color digitization system.

[22] Color 3D Digital Human Modeling and Its Applications to Animation and Anthropometry Part I: Shape and Movement Modeling and Anthropometry / Ge, Bao-zhen / Tian, Qing-guo / Young, K. David / Sun, Yu-chen DHM 2007: 1st International Conference on Digital Human Modeling 2007-07-22 p.82-91
Link to Digital Content at Springer
Summary: With the rapid advancement in laser technology, computer vision, and embedded computing, the application of laser scanning to the digitization of three dimensional physical realities has become increasingly widespread. In this paper, we focus on research results embodied in a 3D human body color digitization system developed at Tianjin University, and in collaboration with the Hong Kong University of Science and Technology. In digital human modeling, the first step involves the acquisition of the 3D human body data. We have over the years developed laser scanning technological know-how from first principles to support our research activities on building the first 3D digital human database for ethnic Chinese. The disadvantage of the conventional laser scanning is that surface color information is not contained in the point cloud data. By adding color imaging sensors to the developed multi-axis laser scanner, both the 3D human body coordinate data and the body surface color mapping are acquired. Our latest development is focused on skeleton extraction which is the key step towards human body animation, and applications to dynamic anthropometry. For dynamic anthropometric measurements, we first use an animation algorithm to adjust the 3D digital human to the required standard posture for measurement, and then fix the feature points and feature planes based on human body geometric characteristics. Utilizing the feature points, feature planes, and the extracted human body skeleton, we have measured 40 key sizes for the stand posture, and the squat posture. These experimental results will be given, and the factors that affect the measurement precision are analyzed through qualitative and quantitative analyses.

[23] Content-based summarization for personal image library Posters / Lim, Joo-Hwee / Li, Jun / Mulhem, Philippe / Tian, Qi JCDL'03: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries 2003-05-27 p.393
ACM Digital Library Citation
Summary: With the accumulation of consumer's personal image library, the problem of managing, browsing, querying and presenting photos effectively and efficiently would become critical. We propose a framework for automatic organization of personal image libraries based on analysis of image creation time stamps and image contents to facilitate browsing and summarization of images.