HCI Bibliography Home | HCI Conferences | HuEvent Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
HuEvent Tables of Contents: 14

Proceedings of the 2014 International Workshop on Human Centered Event Understanding from Multimedia

Fullname:Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia
Editors:Ansgar Scherp; Vasileios Mezaris; Bogdan Ionescu; Francesco De Natale
Location:Orlando, Florida
Dates:2014-Nov-07
Publisher:ACM
Standard No:ISBN: 978-1-4503-3120-3; ACM DL: Table of Contents; hcibib: HuEvent14
Papers:9
Pages:48
Links:Website | Conference Website
  1. Detection of Events in Video
  2. Personal and Social Events and User Interaction
  3. Position Paper Session

Detection of Events in Video

Entity centric Feature Pooling for Complex Event Detection BIBAFull-Text 1-5
  Ishani Chakraborty; Hui Cheng; Omar Javed
In this paper, we propose an entity centric region of interest detection and visual-semantic pooling scheme for complex event detection in YouTube-like videos. Our method is based on the hypothesis that many YouTube-like videos involve people interacting with each other and objects in their vicinity. Based on this hypothesis, we first discover an Area of Interest (AoI) map in image keyframes and then use the AoI map for localized pooling of features. The AoI map is derived from image based saliency cues weighted by the actionable space of the person involved in the event. We extract the actionable space of the person based on human position and gaze based attention allocated per region. Based on the AoI map, we divide the image into disparate regions, pool features separately from each region and finally combine them into a single image signature. To this end, we show that our proposed semantically pooled image signature contains discriminative information that detects visual events favorably as compared to state of the art approaches.
Skeleton-augmented Human Action Understanding by Learning with Progressively Refined Data BIBAFull-Text 7-10
  Shih-En Wei; Nick C. Tang; Yen-yu Lin; Ming-Fang Weng; Hong-Yuan Mark Liao
With the aim at accurate action video retrieval, we firstly present an approach that can infer the implicit skeleton structure for a query action, an RGB video, and then propose to expand this query with the inferred skeleton for improving the performance of retrieval. It is inspired by the observation that skeleton structures can compactly and effectively represent human actions, and are helpful in bridging the semantic gap in action retrieval. The focal point is hence on action skeleton estimation in RGB videos. Specifically, an iterative training procedure is developed to select relevant training data for inferring the skeleton of an input action, since corrupt training data not only degrades performance but also complicates the learning process. Through the iterations, relevant training data are gradually revealed, while more accurate skeletons are inferred with the refined training set. The proposed approach is evaluated on ChaLearn 2013. Significant performance gains in action retrieval are achieved with the aid of the inferred skeletons.
Using Minute-by-Minute Match Report for Semantic Event Annotation in Soccer Video BIBAFull-Text 11-16
  Zengkai Wang; Junqing Yu
In this work, we propose a soccer video annotation approach based on semantic matching with coarse time constraint, where video event and external text information -- match report are synchronized by their semantic correspondence along the temporal sequences. Different from the state of the art soccer video analysis methods which assume that the time of event occurrence is given precisely in second, this work solves the problem that how to annotate the soccer video using the match report with coarse gained time information. Compared with previous approaches, the contributions of our approach include the following. 1) The approach synchronizes the video content and text description by their high-level semantics with coarse time constraint instead of the exact timestamp. In fact, most of the text descriptions from the famous sport websites provide the coarse time information in minutes rather than seconds. Therefore, we argue that our approach is more generalized. 2) We propose an attack-defense transition analysis (ADTA) based soccer video event boundary detection method. The previous methods give coarse boundaries which could be refined, or simply give the clips with fixed duration which may cause larger bias. The results of our method are more in line with the development process of soccer events. 3) Different with the existing audio features analysis based whistle detection method, we propose a novel Hough transformation based whistle detection algorithm from the perspective of image processing, which facilitates the game start time detection combing with the ellipse detection algorithm, and further helps the synchronization of video and text events. The experimental results conducted on large amount of soccer videos validated the effectiveness of our proposed approach.
Event Understanding in Endoscopic Surgery Videos BIBAFull-Text 17-22
  Mario Guggenberger; Michael Riegler; Mathias Lux; Pål Halvorsen
Event detection and understanding is an important area in computer science and especially multimedia. The term event is very broad, and we want to propose a novel event based view on endoscopic surgeries. Thus, with the novel view on surgery in this paper, we want to provide a better understanding and possible way of segmentation of the whole event surgery but also the included sub-events. To achieve this sophisticated goal, we present an annotation tool in combination with a thinking aloud test with an experienced surgeon.

Personal and Social Events and User Interaction

Concept-based Image Clustering and Summarization of Event-related Image Collections BIBAFull-Text 23-28
  Christina Papagiannopoulou; Vasileios Mezaris
In this work we deal with the problem of summarizing image collections that correspond to a single event each. For this, we adopt a clustering-based approach, and we perform a comparative study of different clustering algorithms and image representations. As part of this study, we propose and examine the possibility of using trained concept detectors so as to represent each image with a vector of concept detector responses, which is then used as input to the clustering algorithms. A technique which indicates which concepts are the most informative ones for clustering is also introduced, allowing us to prune the employed concept detectors. Following the clustering, a summary of the collection (thus, also of the event) can be formed by selecting one or more images per cluster, according to different possible criteria. The combination of clustering and concept-based image representation is experimentally shown to result in the formation of clusters and summaries that match well the human expectations.
Sentiment Flow for Video Interestingness Prediction BIBAFull-Text 29-34
  Sejong Yoon; Vladimir Pavlovic
Computational analysis and prediction of digital media interestingness is a challenging task, largely driven by subjective nature of interestingness. Several attempts were made to construct a reliable measure and obtain a better understanding of interestingness based on various psychological study results. However, most current works focus on interestingness prediction for images. While the video affective analysis has been studied for quite some time, there are few works that explicitly try to predict interestingness of videos. In this work, we extend a recent pilot study on the video interestingness prediction by using a mid-level representation of sentiment (emotion) sequence. We evaluate our proposed framework on three datasets including the datasets proposed by the pilot study and show that the result effectively verifies a promising utility of the approach.
User Emotion Sensing in Search Process based on Chromatic Sensation BIBAFull-Text 35-39
  Tomoko Kajiyama; Shin'ichi Satoh
The sensing of user emotion in the Web space is typically performed using user logs, e.g., pages visiting by users and/or the texts input by users. These techniques are essentially based on text to identify the target in which the users showed interest. However, it is difficult to estimate the emotion only by text. On the other hand, it is known that the color sensation has direct connection to human's affective sensation and thus is suitable for emotion sensing. Based on this, we propose a model for sensing user emotion on the basis of the psychological principle, namely, color sensation. The model extracts the present feeling of the user by using the colors browsed or selected by the user. The model has five elements: an algorithm for extracting feature colors that potentially represent user emotion, an emotion database for describing the relationships between emotions and colors, an algorithm for extracting the user emotion from the extracted feature color, images symbolizing information for making it easier for users to find relevant information that match their present feeling, and an interface for browsing information from sense-related viewpoints. As a first step in implementing this model, it was tested using 50 people searching for an application using an intuitive interface. The results revealed that the colors they selected were potentially related to their feelings especially for users with ambiguous information needs.
Investigating Human Factors in Image Forgery Detection BIBAFull-Text 41-44
  Parag Shridhar Chandakkar; Baoxin Li
In today's age of internet and social media, one can find an enormous volume of forged images on-line. These images have been used in the past to convey falsified information and achieve harmful intentions. The spread and the effect of the social media only makes this problem more severe. While creating forged images has become easier due to software advancements, there is no automated algorithm which can reliably detect forgery.
   Image forgery detection can be seen as a subset of image understanding problem. Human performance is still the gold-standard for these type of problems when compared to existing state-of-art automated algorithms. We conduct a subjective evaluation test with the aid of eye-tracker to investigate into human factors associated with this problem. We compare the performance of an automated algorithm and humans for forgery detection problem. We also develop an algorithm which uses the data from the evaluation test to predict the difficulty-level of an image. The experimental results presented in this paper should facilitate development of better algorithms in the future.

Position Paper Session

On the Personalization of Event-Based Systems BIBAFull-Text 45-48
  Opher Etzion; Fabiana Forunier
In this paper we describe our position about personalization as a paradigm shift that is going to affect life in many areas. For Internet of Things applications, personalization is a critical success factor, especially in detecting situations in real-time. We discuss the need for personalization, and compare the Internet of Things to the traditional Internet to draw conclusions on the gaps. We also discuss "The Event Model" as a direction towards such personalization. Finally we present multi-disciplinary research challenges for enabling personalization for Internet of Things applications.