HCI Bibliography Home | HCI Conferences | BELIV Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
BELIV Tables of Contents: 0608101214

Proceedings of the 2014 Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization

Fullname:BELIV'14 Proceedings of the 2014 AVI Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization
Editors:Heidi Lam; Petra Isenberg; Tobias Isenberg; Michael Sedlmair
Location:Paris, France
Standard No:ISBN: 978-1-4503-3209-5; ACM DL: Table of Contents; hcibib: BELIV14
Links:Conference Website
  1. Rethinking evaluation level-abstracted task vs. in situ evaluation
  2. Cognitive processes & interaction
  3. New techniques I -- eye tracking
  4. New techniques II -- crowdsourcing
  5. Adopting methods from other fields
  6. Experience reports

Rethinking evaluation level-abstracted task vs. in situ evaluation

Visualizing dimensionally-reduced data: interviews with analysts and a characterization of task sequences BIBAFull-Text 1-8
  Matthew Brehmer; Michael Sedlmair; Stephen Ingram; Tamara Munzner
We characterize five task sequences related to visualizing dimensionally-reduced data, drawing from data collected from interviews with ten data analysts spanning six application domains, and from our understanding of the technique literature. Our characterization of visualization task sequences for dimensionally-reduced data fills a gap created by the abundance of proposed techniques and tools that combine high-dimensional data analysis, dimensionality reduction, and visualization, and is intended to be used in the design and evaluation of future techniques and tools. We discuss implications for the evaluation of existing work practices, for the design of controlled experiments, and for the analysis of post-deployment field observations.
User tasks for evaluation: untangling the terminology throughout visualization design and development BIBAFull-Text 9-15
  Alexander Rind; Wolfgang Aigner; Markus Wagner; Silvia Miksch; Tim Lammarsch
User tasks play a pivotal role in evaluation throughout visualization design and development. However, the term 'task' is used ambiguously within the visualization community. In this position paper, we critically analyze the relevant literature and systematically compare definitions for 'task' and the usage of related terminology. In doing so, we identify a three-dimensional conceptual space of user tasks in visualization. Using these dimensions, visualization researchers can better formulate their contributions which helps advance visualization as a whole.
Considerations for characterizing domain problems BIBAFull-Text 16-22
  Kirsten M. Winters; Denise Lach; Judith B. Cushing
The nested blocks and guidelines model is a useful template for creating design and evaluation criteria, because it aligns design to need [17]. Characterizing the outermost block of the nested model -- the domain problem -- is challenging, mainly due to the nature of contemporary inquiries in various domains, which are dynamic and, by definition, difficult to problematize. We offer here our emerging conceptual framework, based on the central question in our research study -- what visualization works for whom and in which situation, to consider when characterizing the outermost block, the domain problem, of the nested model [18].
Navigating reductionism and holism in evaluation BIBAFull-Text 23-26
  Michael Correll; Eric Alexander; Danielle Albers; Alper Sarikaya; Michael Gleicher
In this position paper, we enumerate two approaches to the evaluation of visualizations which are associated with two approaches to knowledge formation in science: reductionism, which holds that the understanding of complex phenomena is based on the understanding of simpler components; and holism, which states that complex phenomena have characteristics more than the sum of their parts and must be understood as complete, irreducible units. While we believe that each approach has benefits for evaluating visualizations, we claim that strict adherence to one perspective or the other can make it difficult to generate a full evaluative picture of visualization tools and techniques. We argue for movement between and among these perspectives in order to generate knowledge that is both grounded (i.e. its constituent parts work) and validated (i.e. the whole operates correctly). We conclude with examples of techniques which we believe represent movements of this sort from our own work, highlighting areas where we have both "built up" reductionist techniques into larger contexts, and "broken down" holistic techniques to create generalizable knowledge.

Cognitive processes & interaction

Evaluation methodology for comparing memory and communication of analytic processes in visual analytics BIBAFull-Text 27-34
  Eric D. Ragan; John R. Goodall
Provenance tools can help capture and represent the history of analytic processes. In addition to supporting analytic performance, provenance tools can be used to support memory of the process and communication of the steps to others. Objective evaluation methods are needed to evaluate how well provenance tools support analysts' memory and communication of analytic processes. In this paper, we present several methods for the evaluation of process memory, and we discuss the advantages and limitations of each. We discuss methods for determining a baseline process for comparison, and we describe various methods that can be used to elicit memory of an analysis for evaluation. Additionally, we discuss methods for conducting quantitative and qualitative analyses of process memory. We discuss the methodology in the context of a case study in using the evaluation methods for a user study. By organizing possible memory evaluation methods and providing a meta-analysis of the potential benefits and drawbacks of different approaches, this paper can inform study design and encourage objective evaluation of process memory and communication.
Just the other side of the coin? From error- to insight-analysis BIBAFull-Text 35-40
  Michael Smuc
To shed more light on data explorers dealing with complex information visualizations in real world scenarios, new methodologies and models are needed which overcome existing explanatory gaps. Therefore, a novel model to analyze users' errors and insights is outlined that is derived from Rasmussen's model on different levels of cognitive processing, and integrates explorers' skills, schemes, and knowledge. After locating this model in the landscape of theories for visual analytics, the main building blocks of the model, where three cognitive processing levels are interlinked, are described in detail. Finally, its applicability, challenges in measurement and future research options are discussed.
Evaluating user behavior and strategy during visual exploration BIBAFull-Text 41-45
  Khairi Reda; Andrew E. Johnson; Jason Leigh; Michael E. Papka
Visualization practitioners have traditionally focused on evaluating the outcome of the visual analytic process, as opposed to studying how that process unfolds. Since user strategy would likely influence the outcome of visual analysis and the nature of insights acquired, it is important to understand how the analytic behavior of users is shaped by variations in the design of the visualization interface. This paper presents a technique for evaluating user behavior in exploratory visual analysis scenarios. We characterize visual exploration as a fluid activity involving transitions between mental and interaction states. We show how micro-patterns in these transitions can be captured and analyzed quantitatively to reveal differences in the exploratory behavior of users, given variations in the visualization interface.
Value-driven evaluation of visualizations BIBAFull-Text 46-53
  John Stasko
Existing evaluations of data visualizations often employ a series of low-level, detailed questions to be answered or benchmark tasks to be performed. While that methodology can be helpful to determine a visualization's usability, such evaluations overlook the key benefits that visualization uniquely provides over other data analysis methods. I propose a value-driven evaluation of visualizations in which a person illustrates a system's value through four important capabilities: minimizing the time to answer diverse questions, spurring the generation of insights and insightful questions, conveying the essence of the data, and generating confidence and knowledge about the data's domain and context. Additionally, I explain how interaction is instrumental in creating much of the value that can be found in visualizations.

New techniques I -- eye tracking

Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli BIBAFull-Text 54-60
  Kuno Kurzhals; Cyrill Fabian Bopp; Jochen Bässler; Felix Ebinger; Daniel Weiskopf
For the analysis of eye movement data, an increasing number of analysis methods have emerged to examine and analyze different aspects of the data. In particular, due to the complex spatio-temporal nature of gaze data for dynamic stimuli, there has been a need and recent trend toward the development of visualization and visual analytics techniques for such data. With this paper, we provide benchmark data to test visualization and visual analytics methods, but also other analysis techniques for gaze processing. In particular, for eye tracking data from video stimuli, existing datasets often provide few information about recorded eye movement patterns and, therefore, are not comprehensive enough to allow for a faithful assessment of the analysis methods. Our benchmark data consists of three ingredients: the dynamic stimuli in the form of video, the eye tracking data, and annotated areas of interest. We designed the video stimuli and the tasks for the participants of the eye tracking experiments in a way to trigger typical viewing patterns, including attentional synchrony, smooth pursuit, and switching of the focus of attention. In total, we created 11 videos with eye tracking data acquired from 25 participants.
Evaluating visual analytics with eye tracking BIBAFull-Text 61-69
  Kuno Kurzhals; Brian Fisher; Michael Burch; Daniel Weiskopf
The application of eye tracking for the evaluation of humans' viewing behavior is a common approach in psychological research. So far, the use of this technique for the evaluation of visual analytics and visualization is less prominent. We investigate recent scientific publications from the main visualization and visual analytics conferences and journals that include an evaluation by eye tracking. Furthermore, we provide an overview of evaluation goals that can be achieved by eye tracking and state-of-the-art analysis techniques for eye tracking data. Ideally, visual analytics leads to a mixed-initiative cognitive system where the mechanism of distribution is the interaction of the user with visualization environments. Therefore, we also include a discussion of cognitive approaches and models to include the user in the evaluation process. Based on our review of the current use of eye tracking evaluation in our field and the cognitive theory, we propose directions of future research on evaluation methodology, leading to the grand challenge of developing an evaluation approach to the mixed-initiative cognitive system of visual analytics.
Towards analyzing eye tracking data for evaluating interactive visualization systems BIBAFull-Text 70-77
  Tanja Blascheck; Thomas Ertl
Eye tracking can be a suitable evaluation method for determining which regions and objects of a stimulus a human viewer perceived. Analysts can use eye tracking as a complement to other evaluation methods for a more holistic assessment of novel visualization techniques beyond time and error measures. Up to now, most stimuli in eye tracking are either static stimuli or videos. Since interaction is an integral part of visualization, an evaluation should include interaction. In this paper, we present an extensive literature review on evaluation methods for interactive visualizations. Based on the literature review we propose ideas for analyzing eye movement data from interactive stimuli. This requires looking critically at challenges induced by interactive stimuli. The first step is to collect data using different study methods. In our case, we look at using eye tracking, interaction logs, and thinking-aloud protocols. In addition, this requires a thorough synchronization of the mentioned study methods. To analyze the collected data new analysis techniques have to be developed. We investigate existing approaches and how we can adapt them to new data types as well as sketch ideas how new approaches can look like.

New techniques II -- crowdsourcing

Gamification as a paradigm for the evaluation of visual analytics systems BIBAFull-Text 78-86
  Nafees Ahmed; Klaus Mueller
The widespread web-based connectivity of people all over the world has yielded new opportunities to recruit humans for visual analytics evaluation and for an abundance of other tasks. Known as crowdsourcing, humans typically receive monetary incentives to participate. However, while these payments are small per evaluation, the cost can add up for realistically-sized studies. Furthermore, since the reward is money, the quality of the evaluation can suffer. Our approach uses radically different incentives, namely entertainment, pleasure, and the feeling of success. We propose a theory, methodology and framework that can allow any visual analytics researcher to turn his/her evaluation task into an entertaining online game. First experiences with a prototype have shown that such an approach allows ten-thousands of evaluations to be done in a matter of days at no cost which is completely unthinkable with conventional methods.
Crowdster: enabling social navigation in web-based visualization using crowdsourced evaluation BIBAFull-Text 87-94
  Yuet Ling Wong; Niklas Elmqvist
Evaluation is typically seen as a validation tool for visualization, but the proliferation of web-based visualization is enabling a radical new approach that uses crowdsourced evaluation for emergent collaboration where one user's efforts facilitate a crowd of future users. The idea is simple: instead of using clickstreams, keyboard input, and interaction logs to collect performance metrics for individual participants in a user study, the interaction data is aggregated from the running visualization, integrated back into the visual representation, and then the new interaction data is collected and evaluated with the old data. Known as social navigation, this enables users to build on the work of previous users, for example by seeing collective annotations, the most commonly selected data points, and the most popular locations on the visual space. However, while web-based visualizations by definition are distributed using a web server, most do not maintain the server-side database connections and aggregation mechanisms to achieve this. To bridge this gap between social navigation, its evaluation and visualization, we present Crowdster, a framework that supports capturing, aggregating, and visualizing user interaction data. We give three examples to showcase the Crowdster framework: a Google Maps app that shows the navigation trails of previous users, a scatterplot matrix that visualizes a density distribution of the most selected data points, and a node-link visualization that supports collective graph layout.
Repeated measures design in crowdsourcing-based experiments for visualization BIBAFull-Text 95-102
  Alfie Abdul-Rahman; Karl J. Proctor; Brian Duffy; Min Chen
Crowdsourcing platforms, such as Amazon's Mechanical Turk (MTurk), are providing visualization researchers with a new avenue for conducting empirical studies. While such platforms offer several advantages over lab-based studies, they also feature some "unknown" or "uncontrolled" variables, which could potentially introduce serious confounding effects in the resultant measurement data. In this paper, we present our experience of using repeated measures in three empirical studies using MTurk. Each study presented participants with a set of stimuli, each featuring a condition of an independent variable. Participants were exposed to stimuli repeatedly in a pseudo-random order through four trials and their responses were measured digitally. Only a small portion of the participants were able to perform with absolute consistency for all stimuli throughout each experiment. This suggests that a repeated measures design is highly desirable (if not essential) when designing empirical studies for crowdsourcing platforms. Additionally, the majority of participants performed their tasks with reasonable consistency when all stimuli in an experiment are considered collectively. In other words, to most participants, inconsistency occurred occasionally. This suggests that crowdsourcing remains a valid experimental environment, provided that one can integrate the means to observe and alleviate the potential confounding effects of "unknown" or "uncontrolled" variables in the design of the experiment.

Adopting methods from other fields

Evaluation of information visualization techniques: analysing user experience with reaction cards BIBAFull-Text 103-109
  Tanja Mercun
The paper originates from the idea that in the field of information visualization, positive user experience is extremely important if we wish to see users adopt and engage with the novel information visualization tools. Suggesting the use of product reaction card method to evaluate user experience, the paper gives an example of FrbrVis prototype to demonstrate how the results of this method could be analysed and used for comparing different designs. The authors also propose five dimensions of user experience (UX) that could be gathered from reaction cards and conclude that the results from reaction cards mirror and add to other performance and preference indicators.
Toward visualization-specific heuristic evaluation BIBAFull-Text 110-117
  Alvin Tarrell; Ann Fruhling; Rita Borgo; Camilla Forsell; Georges Grinstein; Jean Scholtz
This position paper describes heuristic evaluation as it relates to visualization and visual analytics. We review heuristic evaluation in general, then comment on previous process-based, performance-based, and framework-based efforts to adapt the method to visualization-specific needs. We postulate that the framework-based approach holds the most promise for future progress in development of visualization-specific heuristics, and propose a specific framework as a starting point. We then recommend a method for community involvement and input into the further development of the heuristic framework and more detailed design and evaluation guidelines.
Experiences and challenges with evaluation methods in practice: a case study BIBAFull-Text 118-125
  Simone Kriglstein; Margit Pohl; Nikolaus Suchy; Johannes Gärtner; Theresia Gschwandtner; Silvia Miksch
The development of information visualizations for companies poses specific challenges, especially for evaluation processes. It is advisable to test these visualizations under realistic circumstances. Because of various constraints, this can be quite difficult. In this paper, we discuss three different methods which can be used to conduct evaluations in companies. These methods are appropriate for different stages in the software life cycle (design phase, development, deployment) and reflect an iterative approach in evaluation. Based on an overview of available evaluation methods we argue that this combination of fairly lightweight methods is especially appropriate for evaluations of information visualizations in companies. These methods complement each other and emphasize different aspects of the evaluation. Based on this case study, we try to generalize our lessons learned from our experiences of conducting evaluations in this context.
More bang for your research buck: toward recommender systems for visual analytics BIBAFull-Text 126-133
  Leslie M. Blaha; Dustin L. Arendt; Fairul Mohd-Zaid
We propose a set of common sense steps required to develop a recommender system for visual analytics. Such a system is an essential way to get additional mileage out of costly user studies, which are typically archived post publication. Crucially, we propose conducting user studies in a manner that allows machine learning techniques to elucidate relationships between experimental data (i.e., user performance) and metrics about the data being visualized and candidate visual representations. We execute a case study within our framework to extract simple rules of thumb that relate different data metrics and visualization characteristics to patterns of user errors on several network analysis tasks. Our case study suggests a research agenda supporting the development of general, robust visualization recommender systems.
Sanity check for class-coloring-based evaluation of dimension reduction techniques BIBAFull-Text 134-141
  Michaël Aupetit
Dimension Reduction techniques used to visualize multidimensional data provide a scatterplot spatialization of data similarities. A widespread way to evaluate the quality of such DR techniques is to use labeled data as a ground truth and to call the reader as a witness to qualify the visualization by looking at class-cluster correlations within the scatterplot. We expose the pitfalls of this evaluation process and we propose a principled solution to guide researchers to improve the way they use this visual evaluation of DR techniques.

Experience reports

Oopsy-daisy: failure stories in quantitative evaluation studies for visualizations BIBAFull-Text 142-146
  Sung-Hee Kim; Ji Soo Yi; Niklas Elmqvist
Designing, conducting, and interpreting evaluation studies with human participants is challenging. While researchers in cognitive psychology, social science, and human-computer interaction view competence in evaluation study methodology a key job skill, it is only recently that visualization researchers have begun to feel the need to learn this skill as well. Acquiring such competence is a lengthy and difficult process fraught with much trial and error. Recent work on patterns for visualization evaluation is now providing much-needed best practices for how to evaluate a visualization technique with human participants. However, negative examples of evaluation methods that fail, yield no usable results, or simply do not work are still missing, mainly because of the difficulty and lack of incentive for publishing negative results or failed research. In this paper, we take the position that there are many good ideas with the best intentions for how to evaluate a visualization tool that simply do not work. We call upon the community to help collect these negative examples in order to show the other side of the coin: what not to do when trying to evaluate visualization.
Pre-design empiricism for information visualization: scenarios, methods, and challenges BIBAFull-Text 147-151
  Matthew Brehmer; Sheelagh Carpendale; Bongshin Lee; Melanie Tory
Empirical study can inform visualization design, both directly and indirectly. Pre-design empirical methods can be used to characterize work practices and their associated problems in a specific domain, directly motivating design choices during the subsequent development of a specific application or technique. They can also be used to understand how individuals, existing tools, data, and contextual factors interact, indirectly informing later research in our community. Contexts for empirical study vary and practitioners should carefully consider finding the most appropriate methods for any given situation. This paper discusses some of the challenges associated with conducting pre-design studies by way of four illustrative scenarios, highlighting the methods as well as the challenges unique to the visualization domain. We encourage researchers and practitioners to conduct more pre-design empirical studies and describe in greater detail their use of empirical methods for informing design.
Field experiment methodology for pair analytics BIBAFull-Text 152-159
  Linda T. Kaastra; Brian Fisher
This paper describes a qualitative research methodology developed for experimental studies of collaborative visual analysis. In much of this work we build upon Herbert H. Clark's Joint Activity Theory to infer cognitive processes from field experiments testing collaborative decision making over data. As is true of any methodology, it provides the underlying conceptual structure and analytic processes that can be adapted by other researchers to devise their own studies and analyze their results. Our focus is on collaborative use of visual information systems for aircraft safety analysis, however the methods can be/have been extended to other tasks and analysts.
Utility evaluation of models BIBAFull-Text 160-167
  Jean Scholtz; Oriana Love; Mark Whiting; Duncan Hodges; Lia Emanuel; Danaë Stanton Fraser
In this paper, we present three case studies of utility evaluations of underlying models in software systems: a user-model, technical and social models both singly and in combination, and a research-based model for user identification. Each of the three cases used a different approach to evaluating the model and each had challenges to overcome in designing and implementing the evaluation. We describe the methods we used and challenges faced in designing the evaluation procedures, summarize the lessons learned, enumerate considerations for those undertaking such evaluations, and present directions for future work.