HCI Bibliography Home | HCI Conferences | ETRA Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ETRA Tables of Contents: 000204060810121416

Proceedings of the 2006 Symposium on Eye Tracking Research & Applications

Fullname:Proceedings of the 2006 Symposium Eye Tracking Research & Applications
Editors:Kari-Jouko Räihä; Andrew T. Duchowski
Location:San Diego, California
Dates:2006-Mar-27 to 2006-Mar-29
Publisher:ACM
Standard No:ISBN: 1-59593-305-0; ACM DL: Table of Contents hcibib: ETRA06
Papers:44
Pages:175
Links:Conference Series Home Page
  1. Keynote speaker
  2. Visual attention & eye movement control
  3. Late breaking results: oral presentations
  4. Late breaking results: poster presentations
  5. Assistive/user interfaces
  6. Advances in eye tracking technology
  7. Gaze-contingent display/video
  8. Comprehension and cognition
  9. Performance analysis

Keynote speaker

Communication through eye-gaze: where we have been, where we are now and where we can go from here BIBAFull-Text 9
  Howell Istance
Throughout the history of gaze tracking, there have been several dimensions along which the evolution of gaze-based communication can be viewed. Perhaps the most important of these dimensions is the ease of use, or usability, of systems incorporating eye tracking. Usable communication through eye-gaze has been a goal for many years and offers the prospect of effortless and fast communication for able bodied and disabled users alike. To date such communication has been hampered by a number of problems limiting its widespread uptake. Systems have evolved over time and can provide effective means of communication within restricted bounds, but these are typically incompatible and limited to a few application areas, and each has suffered from particular usability problems. As a consequence uptake remains low and the cost of individual eyetracking systems remains high. However, more is being understood and published about the usability requirements for eye-gaze communication systems, particularly for users with different types of disability. With the advance of research and technology we are now seeing genuinely usable systems which can be used for a broad range of applications and, with this, the prospect of much wider acceptance of gaze as a means of communication.
   A second dimension is how we can utilise our communication through eye gaze. Much work has been undertaken addressing the nature of control and the concepts of active and passive control, or command-based and non-command based interaction. Active control and the giving of commands by eye to on-screen keyboards and other control interfaces is now well known and has lead to greatly improved usability via compensation for the limitations of eyetrackers as a data source, and by providing predictive and corrective capabilities to the user interface. Passive monitoring of gaze position leads to the notion of gaze-aware objects which are capable of responding to user attention in a way appropriate to the specific task context. Early work by Starker and Bolt [1990], for example, assigned objects in a virtual world gaze based indices of interest where control was mediated by system evaluation of user interest without the need for active user control. By employing these concepts current gaze control systems have now achieved acceptable ease of use by making on-screen objects gaze-aware, allowing compensation for tracking and manipulation inaccuracies. Gaze aware interaction is now migrating from the confines of the desktop to the user task space in the real world within a ubiquitous computing context. Instead of attempting to track gaze position in world space relative to the user, with the many difficulties this presents in inaccuracies and encumbering equipment, gaze tracking can be moved to many ubiquitous objects in the real world. Visual contact and manipulation with gaze aware instrumented objects is now possible by equipping objects with eye-contact sensors detecting infra-red corneal reflection from users looking at these objects. Alternatively, objects can be equipped with infra-red emitters and the detection of the corneal reflection of these can be moved to low-cost head-mounted cameras worn by the user. These two approaches to visual contact detection parallel desk-mounted and head-mounted eye-tracking systems.
   In the future, we can expect very real benefits from gaze-based communication in a wider set of task domains as ubiquitous systems become more able to make informed decisions about the intent of a user. Such systems will finally liberate eye gaze communication from the confines of laboratory and desktop, to the real world, enabling low-cost communication through gaze to be available for all.
   This paper gratefully acknowledges the support of the COGAIN Communication by Gaze Interaction European Commission Fp6 Network of Excellence project.

Visual attention & eye movement control

Causal saliency effects during natural vision BIBAKFull-Text 11-18
  Ran Carmi; Laurent Itti
Salient stimuli, such as color or motion contrasts, attract human attention, thus providing a fast heuristic for focusing limited neural resources on behaviorally relevant sensory inputs. Here we address the following questions: What types of saliency attract attention and how do they compare to each other during natural vision? We asked human participants to inspect scene-shuffled video clips, tracked their instantaneous eye-position, and quantified how well a battery of computational saliency models predicted overt attentional selections (saccades). Saliency effects were measured as a function of total viewing time, proximity to abrupt scene transitions (jump cuts), and inter-participant consistency. All saliency models predicted overall attentional selection well above chance, with dynamic models being equally predictive to each other, and up to 3.6 times more predictive than static models. The prediction accuracy of all dynamic models was twice higher than their average for saccades that were initiated immediately after jump cuts, and led to maximal inter-participant consistency. Static models showed mixed results in these circumstances, with some models having weaker prediction accuracy than their average. These results demonstrate that dynamic visual cues play a dominant causal role in attracting attention, while static visual cues correlate with attentional selection mostly due to top-down causes.
Keywords: attention, computational modeling, eye-movements, natural vision, saliency
Performance of the two-stage, dual-mode oculomotor servo system BIBAKFull-Text 19-26
  James T. Fulton
Creating an optimum man-machine interface in the visual domain requires detailed knowledge of the human Precision Optical Servo system (POS) with particular focus on the oculomotor servosystem. The physiology of the human oculomotor system is more advanced than that of even his closest animal relatives. To satisfy the human's needs for defensive safety as well as provide the analytical capability supporting his inquisitiveness, two primary modes of information analysis are provided within the POS. Within the analytical mode, humans employ a multidimensional, fundamentally luminance-based, narrow-field-of-view associative correlator for interp and percept extraction. This associative correlation process relies upon the orthogonal phase coherent character of the "tremor" associated with the fine motions of the eyes. Simultaneously, a lower resolution, fundamentally luminance-based, spatial-change-detection mechanism is used to maintain awareness of a larger external environment. To support the above modes, a two-stage servomechanism is used. The angular performances of these two stages are quite different. These differences have a profound impact on good interface design.
   This paper provides a schematic of the complete POS, a more detailed description of the oculomotor servosystem, and the numerics describing its performance parameters. These parameters lead to the minimum recognition interval required for symbolic displays. Color is shown to play an ancillary, though important, role in the capability of the POS of the visual system. A New Chromaticity Diagram is offered that makes it easier to understand the role of color in POS operation and in color perception in general. All of the above descriptions are supported by a larger scale schematic of the overall sensory/cognitive system.
Keywords: chromaticity diagram, human physiology, servomechanisms, vision temporal response
Computational mechanisms for gaze direction in interactive visual environments BIBAKFull-Text 27-32
  Robert J. Peters; Laurent Itti
Next-generation immersive virtual environments and video games will require virtual agents with human-like visual attention and gaze behaviors. A critical step is to devise efficient visual processing heuristics to select locations that would attract human gaze in complex dynamic environments. One promising approach to designing such heuristics draws on ideas from computational neuroscience. We compared several such heuristics with eye movement recordings from five observers playing video games, and found that heuristics which detect outliers from the global distribution of visual features were better predictors of human gaze than were purely local heuristics. Heuristics sensitive to dynamic events performed best overall. Further, heuristic prediction power differed more between games than between different human observers. Our findings suggest simple neurally-inspired algorithmic methods to predict where humans look while playing video games.
Keywords: active vision, computational modeling, eye movements, immersive environments, video games, visual attention

Late breaking results: oral presentations

An environmental investigation of wayfinding in a nursing home BIBAFull-Text 33
  R. A. Schuchard; B. R. Connell; P. Griffiths
The objective of this pilot study was to obtain preliminary information on wayfinding information placement for NH (nursing home) residents by finding where older adults with mild dementia look during a wayfinding task. Wayfinding problems (e.g., an inability to find or recognize a destination) are common among NH residents. These behaviors have been related to falls and fractures [Buchner and Larson 1987] and are a significant source of staff stress [Bright 1986]. Wayfinding problems are person-environment problems from deficits in spatial orientation making it difficult to maintain a cognitive map of the route to a desired location, as well as with deficits that impact abilities to plan and carry out goal-directed travel and to ignore irrelevant and distracting stimuli [Liu et al. 1991; Passini et al. 1995]. Design-for best practices advocate supporting preserved wayfinding abilities. However, little attention has been given to the information placement in the NH environment to ensure that it is likely to be seen. A case report states that NH residents with dementia look down as they walk around a NH unit, looking on hallway floors and the lower part of walls (below waist level) where wayfinding information is almost never placed [Adachi 1999]. Optimal information placement has the potential to positively impact the independence, safety, and QOL of residents as well as improve staff satisfaction.
One-point calibration gaze tracking method BIBAFull-Text 34
  Takehiko Ohno
A novel gaze tracking method that requires only one calibration marker for personal calibration is proposed. In general, personal calibration is known to be a troublesome task, which requires the user looks at nine to twenty calibration markers in succession. Unlike traditional methods, the proposed method drastically reduces the cost of personal calibration. This method, which is called the One-Point Calibration method (OPC method), requires only one calibration marker for the personal calibration. While the user looks at the calibration marker, the difference between the user's eyeball shape and the eyeball model used in calculating the user's gaze direction is estimated, and residual error is compensated by the parameters derived by the calibration.
Secure graphical password system for high traffic public areas BIBAFull-Text 35
  Bogdan Hoanca; Kenrick Mock
Graphical passwords are expected to be easier to recall, less likely to be written down and have the potential to provide a richer symbol space than text based passwords. For example, a user might authenticate by clicking a series of points on an image, selecting a series of tiles, or by drawing a series of lines on the screen [Davis et al. 2004]. An example of the tiled approach is the Real User Corporation's PassFaces™ system [Real User, 2005] illustrated in Figure 1. For both text and graphical password entry systems the user needs to carefully enter the password in case a malicious user is observing the session via "shoulder surfing." Although some authors assume that graphical passwords will be entered on a small screen with a reduced observation angle [Jansen 2004], and thus dismiss the likelihood of shoulder surfing, this assumption is not always true.
Using the eyes to encode and recognize social scenes BIBAFull-Text 36
  Elina Birmingham; Walter F. Bischof; Alan Kingstone
In a previous study, we found that observers look mostly at the eyes when viewing natural scenes containing one or more people (Birmingham et al. submitted). This prioritization of eye regions occurred regardless of the type of scene being viewed (e.g. scenes with one person vs. scenes with several people, see Figure 1). The finding that observers attend preferentially to the eyes when freely viewing scenes suggests that they are the most informative regions of the scene. As a consequence, one might also expect that observers encode and/or recognize scenes through information from the eyes. This prediction is in line with the finding that when viewing object scenes in preparation for a later memory test, observers tend to fixate more informative objects more frequently than less informative objects (Henderson et al. 1999).
Use of eye-tracking in ergonomics: a field study of lift truck operators' work activity BIBAFull-Text 37
  Denis Giguère; Nicolas Gagné; Steve Vezeau
Besides the work of Hella [1991], few studies exist on the visual behavior and visual needs of operators of powered industrial vehicles. Lift trucks, as an example, serve a useful purpose in industry but are also involved in occupational injuries and deaths. An analysis of work inspectors' reports [CSST 2005] from 1974 to 2004 identified 84 cases where the accident was fatal either for the lift truck operator or a co-worker. In 28 of these cases, the lack of visibility offered by the lift truck or due to the load carried, or the visual demands of the operator at the time of the accident, were either cited or presumed factors. Collision and overturning are the main types of fatal accidents. The aim of this paper is to present preliminary results of the use of an eye-tracker to study vision-related issues, and to discuss the feasibility of using such an instrument in field studies.
Gaze alignment of interlocutors in conversational dialogues BIBAFull-Text 38
  Kerstin Hadelich; Matthew W. Crocker
In the area of Psycholinguistics, eye-tracking has been a successful and valuable tool for the investigation of on-line processes of language comprehension and language production (e.g., [Griffin and Bock 2000], [Tanenhaus et al. 1995]). However, the application of eye-tracking to the investigation of mechanisms underlying more naturalistic language use, e.g. dialogue, has so far been limited to the examination of eye-movements of either the speaker or the listener in isolation (e.g., [Brown-Schmidt et al. 2005]; [Richardson and Dale 2004]). Even offline dialogue experiments investigating, e.g. priming effects, usually involve only one "real" participant while their interlocutor is a confederate of the experimenter. In order to test predictions coming from dialogue models (e.g., [Pickering and Garrod 2004]) and in order to provide the kinds of evidence necessary for their further development, experimental methods that directly examine behaviour of participants actually engaged in a conversation are needed. Additionally, eye-tracking measures established in psycholinguistic monologue research need to be compared with their dialogue processing counterparts. Furthermore, new measures describing the relation between speaker and listener eye-movements in communication are needed, as they can give rise to the language mechanisms underlying conversational interaction.
Improving web usability for the hard-of-hearing BIBAFull-Text 39
  Miki Namatame; Muneo Kitajima
Recently, with continued advances in information technology, there is an ever-growing amount of information accumulated on the World Wide Web. At the same time, the need to make this information accessible to any person who needs it becomes a serious issue. This paper focuses on web content accessibility for the hard-of-hearing. This is motivated by the fact that the first author, who is engaged in educating hard-of-hearing persons through daily classes, felt that hard-of-hearing persons would interact with web pages differently than hearing persons. Using web-based interactive course materials seems effective since they allow the creator to control the presentation of the content. However, the issue of how the hard-of-hearing interact with the web has not been adequately studied.
An eye tracking interface for image search BIBAFull-Text 40
  Oyewole Oyekoya; Fred Stentiford
Eye tracking presents an adaptive approach that can capture the user's current needs and tailor the retrieval accordingly. Applying eye tracking to image retrieval requires that new strategies be devised that can use visual and algorithmic data to obtain natural and rapid retrieval of images. Recent work showed that the eye is faster than the mouse as a source of visual input in a target image identification task [Oyekoya and Stentiford 2005]. We explore the viability of using the eye to drive an image retrieval interface. In a visual search task, users are asked to find a target image in a database and the number of steps to the target image are counted. It is reasonable to believe that users will look at the objects in which they are interested during a search [Oyekoya and Stentiford 2004] and this provides the machine with the necessary information to retrieve a succession of plausible candidate images for the user.
Averaging scan patterns and what they can tell us BIBAFull-Text 41
  Helene Hembrooke; Matt Feusner; Geri Gay
While fixation data and saccadic indices have been employed as individual measures, less frequent analysis has been attempted on the sequence or pattern data that these two indices together create. The patterns of fixations and saccades are rich and complex, and hence, they are typically studied as individual paths or parts of paths that are isolated. Analysis is usually a visual comparison, although more recently there have been studies that have compared scan patterns statistically [Josephson and Holmes 2002; Pan et al. 2002; Yarbus 1967]. In the current work we explore a method for deriving an "average" scan pattern aggregated from many users viewing the same visual stimulus (web site).

Late breaking results: poster presentations

Visual attention in 3D video games BIBAFull-Text 42
  Su Yan; Magy Seif El-Nasr
Visual attention has long been an important topic in psychology and cognitive science. Recently, results from visual attention research (Haber et al. 2001; Myszkowski et al. 2001) are being adopted by computer graphics research. Due to speed limitations, there has been a movement to use a perception-based rendering approach where the rendering process itself takes into account where the user is most likely looking (Haber et al. 2001). Examples include trying to achieve real-time global illumination by concentrating the global illumination calculation only in parts of the scene that are salient (Myszkowski 2002). Video games have achieved a high degree of popularity because of such advances in computer graphics. These techniques are also important because they have enabled game environments to be used in applications such as health therapy and training.
Location location location: viewing patterns on WWW pages BIBAFull-Text 43
  Laura Granka; Helene Hembrooke; Geri Gay
This study investigates which components of a web page are most likely to both attract and maintain a viewer's attention. We measure these two aspects of viewing behavior -- attention onset and maintenance -- through an analysis of eye movements on three Web page components -- page location, element size, and information density. More specifically, the present research addresses how the overall composition and structure of a Web page influences an individual's ability to perceive content.
Simulation of effects of horizontal camera slippage on corneal reflections BIBAFull-Text 44
  T. Haslwanter; C. Kitzmueller; M. Scheubmayr
While the video-based measurement of eye movements, also referred to as "video-oculography" (VOG), has many advantages, it also suffers from a serious disadvantage which has not been solved yet: using images of the eye, how can we distinguish between a movement of the eye-in-the-head on the one hand, and a movement of the camera with respect to the head on the other? To distinguish between the two, we need additional information about the orientation and position of the camera with respect to the head.
Perspective error compensation of pupillography using glint images BIBAFull-Text 45
  InBum Lee; KwangSuk Park
This paper suggests a new method to compensate perspective error in measuring real size of pupil of pupillography for pupil light response. To get real size of pupil, the distance between cornea and camera should be calculated. The glints on the eye of the image were used to estimate the distance. The suggested method was validated using telecentric lens.
Evaluation of a multimedia learning exercise using oculo-motors BIBAFull-Text 46
  Minoru Nakayama; Yasutaka Shimizu
A lot of multimedia learning materials and software have been developed as learning tools. Their usability for learning has not often been discussed, however. The issue of system usability is often considered regarding various other processes, but learning materials should also be evaluated. The indices of oculo-motors can be used for usability tests [Nakayama et al. 2002], so they may also indicate the usability of learning materials. To examine the feasibility of using oculo-motors to evaluate multimedia learning materials, an exercise was tested under experimental conditions.
The influence of web browsing experience on web-viewing behavior BIBAFull-Text 47
  Yoshiko Habuchi; Haruhiko Takeuchi; Muneo Kitajima
The World Wide Web has become an important source of information, as much as traditional media like books, newspapers, and television. While there have been many studies on Web searching, research into Web-viewing behavior using eye-tracking systems has only recently begun [Pan et al., 2004]. Josephson and Holmes [2002] studied Web-viewing behavior focusing on the category of Web page visual design. They suggested that eye movements were affected by the following two factors: (1) visual design of Web pages and (2) habitually preferred path across the visual stimuli. However, these previous studies did not sufficiently consider the user's experience. The purpose of this study is to investigate how past Web-browsing experience influences Web-viewing behavior. We used a detailed questionnaire to measure a user's Web-browsing experience and analyzed the eye-tracking data based on the user's prior Web experience.
Eye movements and pupil dilation during event perception BIBAFull-Text 48
  Tim J. Smith; Martyn Whitwell; John Lee
Human observers segment ongoing activities into events that are reliable across observers [Newtson and Engquist 1976]. Segments can be small ("fine") or large ("coarse") with clusters of fine-grained segments relating hierarchically to coarse segments. Segmentation behaviour occurs even without instruction indicated by neural activity in the Medial Temporal complex (MT+)and Frontal Eye Field (FEF). Similar activation is observed during active segmentation [Zacks et al. 2001]. These two brain regions are known to be active during the processing of visual motion (MT+) and guiding saccadic eye movements (FEF). This, along with behavioural evidence [Zacks 2004], indicates that visual motion may play an important role in identifying events.
Pupil brightness variation as a function of gaze direction BIBAFull-Text 49
  Javier San Agustin; Arantxa Villanueva; Rafael Cabeza
Pupil detection represents one of the most critical aspects for eye tracking systems based on video oculography. A robust segmentation of the aforementioned feature determines to a large extent the degree of performance of the system. However, a question remains unsolved... why does the pupil gray level change in the image? Apart from the possible room lighting variation, can the eyeball physiology influence its final level in the image by itself? The answer is yes. In this paper a further step in the work by Nguyen et al. [Nguyen et al. 2002] is proposed in which this eyeball characteristic was noticed but not explained. This paper gives some enlightenment to this effect finding a physiological reason for it. From the results it is clear that the pupil brightness can be a valid image feature and can contribute together with alternative ones to improve the tracking [Hammoud 2005]. A deep knowledge about its behavior is undoubtedly highly interesting. The matter should be to study how the retina reacts to the light in order to know how it can influence its final level in the image. The retina is not a uniform surface; in the fovea there is a higher density of cones and the ganglion cells are highly packed. When the eye is entered with a beam of light, the light can be reflected and absorbed at the various layers of the retina. Normally near infrared lighting is used because it is not visible for humans. Actually in this range of wavelength the reflected light is dominated by the light scattered back from the choroid: the last layer before the sclera that supports the retina, and it is precisely for this wavelength for which the retina presents the highest reflectance. A bright pupil tracking is conducted following the same method as in the works by Nguyen et al. [Nguyen et al. 2002] and Miller [Miller et al. 1995] but more exhaustive experiments are conducted. A ray of light directed to the fovea needs to cross a thicker layer in order to reach the choroid which produces a decrease of effective light intensity, a stronger reflection and consequently a brighter pupil can be expected if the most eccentric part of the retina is reached. Vertical rotations of the eyeball about its center are sketched in figure 1. From the figure it is clear that the pupil will appear brighter if the subject is looking at the upper part of the screen than for cases in which points in the lower part are fixated. Regarding to left and right eye rotations the fovea is horizontally and temporally displaced from the eyeball back pole. That means that the visual axis and the fovea present an angular offset with respect to the symmetry axis of the eye but with opposite sign depending on the eye. Following the same reasoning as the one used for vertical rotations it is clear that a brighter pupil could be expected for points on the right part of the screen for the left eye. A symmetrical behavior appears for the right eye being the points on the left part of the screen the ones with higher pupil levels.
The relation of eye fixation patterns with emotional content and episodic memory BIBAFull-Text 50
  Luiz Henrique M. do Canto-Pereira; Breno Santos; Edgard Morya; Carlos H. Morimoto; Ronald Ranvaud
The focus of visual attention is closely related to eye movements and fixations, while episodic memory has been defined as the ability to be consciously aware of an earlier experience [Bond, 2005]. The role of emotional content plays a crucial role in the ability of recalling a previous event. Here we investigate eye fixation patterns and their spatial distribution using ordinary kriging, a geostatistical interpolation method [Canto-Pereira et al., 2005], in a task where emotion and episodic memory were assessed.
Optical eye models for gaze tracking BIBAFull-Text 51
  Jeffrey B. Mulligan
The traditional "bottom-up" approach to video gaze tracking consists of measuring image features, such as the position of the pupil, corneal reflex, limbus, etc. These measurements are mapped to gaze angles using coefficients obtained from calibration data, collected as a cooperative subject voluntarily fixates a series of known targets. This may be contrasted with a "top-down" approach in which the pose parameters of a model of the eye are adjusted in conjunction with a camera model to obtain a match to image data. One advantage of the model-based approach is provided by robustness to changes in geometry, in particular the disambiguation of translation and rotation. A second advantage is that the pose estimates obtained are in absolute angular units (e.g., degrees); traditional calibration serves only to determine the relation between the visual and optical axes, and provide a check for the model. While traditional grid calibration methods may not need to be applied, a set of views of the eye in a variety of poses is needed to determine the model parameters for an individual. When relative motion between the head and the camera is eliminated (as with a head-mounted camera), the model parameters can be determined from as few as two images. A single point calibration is required to determine the angular offset between the line-of-sight and the observed optical axis.
Effect of letter spacing on eye movements and reading performance BIBAFull-Text 52
  Yu-Chi Tai; James E. Sheedy; John Hayes
Previous studies have shown that, when presented in rapid stream (i.e., RSVP paradigm), word recognition speed for strings of three-letter words increases approximately 10% with large letter spacing, both in the fovea and the periphery (up to 10° eccentricity). A possible explanation is that small spacing may cause features of individual characters to overlap with one another, thus reducing text legibility, impeding letter and word recognition, and slowing down the reading process. In contrast, increasing letter spacing reduced the crowding effect until it was so wide that the word shape information is disrupted, or extends beyond the visual span, and thus slows down the reading.
A widget library for gaze-based interaction elements BIBAFull-Text 53
  Wolfgang Beinhauer
Eye-control as an interaction mechanism for desktop computers was proposed already long time ago (e.g.; [Bolt 1982]). However, few systematic approaches to universal design guidelines and interaction elements for gaze-controlled user interfaces have been made. Graphical user interfaces comprise of two components: first, a pointing device, and secondly, the graphical interface itself, where the pointing device is sliding on and triggers actions. Whereas a lot of research work has focused on the pointing device, such as eye tracking hardware, improvements of the calibration process and measuring accuracy, the design of an appropriate user interface optimized for gaze control has not been covered extensively so far.
Eye movements and motor programming in a Time-To-Contact task BIBAFull-Text 54
  Edgard Morya; Marco Bertolassi; Adhemar Pettri Filho; Carlos H. Morimoto; Ronald Ranvaud
In previous experiments investigating motor control in a Time-To-Contact task [Morya et al., 2003], events occurring 400-600 ms prior to contact (but not earlier or later) caused volunteers to anticipate their estimate of when contact occurred. Many such mislocalization or mistiming effects have been discussed in the literature [Nijhuan, 1994; van Beers et al. 2001]. In preliminary eye-tracking experiments [Morya et al. 2004], with a simplified version of the task, involuntary shifts in gaze suggested the presence of attentional shifts as volunteers prepared to respond, that might be associated with their anticipations. To better understand the factors involved in these observations, gaze was systematically recorded changing the speed of the moving target, and with different instructions as to where the volunteers should look as they performed the Time-To-Contact task.
Eye typing with common cameras BIBAFull-Text 55
  Dan Witzner Hansen; John Paulin Hansen
Low cost eye tracking has received an increased attention due to the rapid developments in tracking hardware (video boards, digital camera and CPU's) [Hansen and Pece 2005; OpenEyes 2005]. We present a gaze typing system based on components that can be bought in most consumer hardware stores around the world. These components are for example cameras and graphics cards that are made in large quantities. This kind of hardware differs from what is often claimed to be "off-the-shelf components", but which in fact is hardware only available from particular vendors.
   Institutions that supply citizens with communication aids may be reluctant to invest large amounts of money in new equipment that they are unfamiliar with. Recent investigtions estimate that less than 2000 systems have actually been used by Europeans, even though more than half a million disabled people in Europe could potentially benefit from it. The main group of present users consists of people with motor neuron disease (MND) and amyotrophic lateral sclerosis (ALS). If the price of gaze communication systems can be lowered, it could become a preferred means of control for a large group of people [Jordansen et al. 2005]. Present commercial gaze trackers e.g. [Tobii 2005; LC-Technologies 2004] are easy to use, robust and sufficiently accurate for many screen-based applications but their costs exceed the budget of most people.
   We use a standard uncalibrated 400$ Sony consumer camera (Sony handycam DCR-HC14E) to obtain the image data. The camera is stationary and placed on a tripod close (variable) to the monitor, but the geometry of the user, monitor and camera varies among sequences. However, the users are sitting about 50-60 cm away from a 17" screen. A typical example of the setup is shown in figure 1. We use Sony standard video option for 'night vision' to create an glint with the build-in IR light emitter.
   Eye tracking based on common components is subject to several unknown factors as various system parameters (i.e. camera parameters and geometry) are unknown. Algorithms that employ robust statistical principles to accommodate uncertainties in image data as well as in gaze estimates in the typing process are therefore needed. We propose to use the RANSAC algorithm [Fischler and Bolles 1981] for both robust maximum likelihood estimation of iris observations [Hansen and Pece 2005] as well as for handling outliers in the calibration procedure [Morimoto et al. 2000].Our low-resolution gaze tracker can be calibrated in less than 3 minutes by looking at 9 predefined positions on the screen. The users sit on a standard office chair without headrests or other physical constraints. Under these conditions we have succeeded in tracking the gaze of people, obtaining accuracies about 160 pixels on screen. This is still less than accuracies claimed by the best current off-the-shelf eye trackers systems (i.e. 30-60 pixels). However comparing these eye trackers wouldn't be correct as they are based on different hardware and image data.
   Low-cost gaze trackers do not need to be as accurate and robust as the commercial systems, if they are used together with applications designed to tolerate noisy inputs.
   We use the GazeTalk [COGAIN 2005] typing communication system components and have through proper design of the typing interface, reduced the need for high accuracy. We have observed typing speeds in the range of 3-5 words per minute for untrained subjects using large on-screen buttons and a new noise tolerant dwell-time principle. We modify the traditional dwell-time activation to one that maintains a full distribution of all hypothetical button selections and then activate one button when the evidence become high enough.
Mobile eye tracking as a basis for real-time control of a gaze driven head-mounted video camera BIBAFull-Text 56
  Guido Boening; Klaus Bartl; Thomas Dera; Stanislavs Bardins; Erich Schneider; Thomas Brandt
Eye trackers based on video-oculographic (VOG) methods are a convenient means for oculomotor research. This work focused on the development of a VOG device that allows mobile eye tracking. It was especially designed to support a head-mounted gaze driven camera system presented in a companion paper [Wagner et al. 2006] (see Figure 1). The target applications of such a device can be seen in surgery, medical and behavioral sciences as well as in the documentation and teaching of manual tasks. One major aim was the design of a lightweight head mount at low costs. Since the actuators of the gaze camera require close-to-real-time control, the software was implemented on standard PC-hardware by using well-known VOG algorithms that were optimized for short latencies.
Eye drawing with gaze estimation model BIBAFull-Text 57
  Chiu Po Chan; Alvin W. Yeo
Many research works in gaze-controlled interfaces have focused on target selection with eye gaze. For example, eye typing is built based on gaze patterns and language models. However only parts of these methods in target selection can be transferred to other domains. Unlike target selection in eye typing, eye-drawing target-selection is very small and the target could be less than 100 pixels target surface size.

Assistive/user interfaces

A comparative usability study of two Japanese gaze typing systems BIBAKFull-Text 59-66
  Kenji Itoh; Hirotaka Aoki; John Paulin Hansen
The complex interplay between gaze tracker accuracy and interface design is the focus of this paper. Two slightly different variants of GazeTalk, a hierarchical typing interface, were contrasted with a novel interface, Dasher, in which text entry is done by continuous navigation. All of the interfaces were tested with a good and a deliberate bad calibration of the tracker. The purpose was to investigate, if performance indices normally used for evaluation of typing systems, such as characters per minute (CPM) and error-rate, could differentiate between the conditions, and thus guide an iterative system development of both trackers and interfaces. Gaze typing with one version of the static, hierarchical menu systems was slightly faster than the others. Error measures, in terms of rate of backspacing, were also significantly different for the systems, while the deliberate bad tracker calibrations did not have any measurable effect. Learning effects were evident under all conditions. Power-law-of-practice learning models suggested that Dasher might be more efficient than GazeTalk in the long run.
Keywords: Japanese text typing, alternative communication, assistive technology, gaze interaction, usability
Speech-augmented eye gaze interaction with small closely spaced targets BIBAKFull-Text 67-72
  Darius Miniotas; Oleg Spakov; Ivan Tugoy; I. Scott MacKenzie
Eye trackers have been used as pointing devices for a number of years. Due to inherent limitations in the accuracy of eye gaze, however, interaction is limited to objects spanning at least one degree of visual angle. Consequently, targets in gaze-based interfaces have sizes and layouts quite distant from "natural settings". To accommodate accuracy constraints, we developed a multimodal pointing technique combining eye gaze and speech inputs. The technique was tested in a user study on pointing at multiple targets. Results suggest that in terms of a footprint-accuracy tradeoff, pointing performance is best (~93%) for targets subtending 0.85 degrees with 0.3-degree gaps between them. User performance is thus shown to approach the limit of practical pointing. Effectively, developing a user interface that supports hands-free interaction and has a design similar to today's common interfaces is feasible.
Keywords: eye tracking, eye-based interaction, human performance, pointing
Empathic tutoring software agents using real-time eye tracking BIBAKFull-Text 73-78
  Hua Wang; Mark Chignell; Mitsuru Ishizuka
This paper describes an empathic software agent (ESA) interface using eye movement information to facilitate empathy-relevant reasoning and behavior. Eye movement tracking is used to monitor user's attention and interests, and to personalize the agent behaviors. The system reacts to user eye information in real-time, recording eye gaze and pupil dilation data during the learning process. Based on these measures, the ESA infers the focus of attention and motivational status of the learner and responds accordingly with affective (display of emotion) and instructional behaviors. In addition to describing the design and implementation of empathic software agents, this paper will report on some preliminary usability test results concerning how users respond to the empathic functions that are provided.
Keywords: character agent, e-learning, educational interface, eye movements, eye tracking, eye-aware interfaces, tracing, tutoring

Advances in eye tracking technology

Compensating for eye tracker camera movement BIBAKFull-Text 79-85
  Susan M. Kolakowski; Jeff B. Pelz
An algorithm was developed to improve prediction of eye position from video-based eye tracker data. Eye trackers that determine eye position relying on images of pupil and corneal reflection positions typically make poor differentiation between changes in eye position and movements of the camera relative to the subject's head. The common method employed by video-based eye trackers to determine eye position involves calculation of the vector difference between the center of the pupil and the center of the corneal reflection under the assumption that the centers of the pupil and the corneal reflection change in unison when the camera moves with respect to the head. This assumption was tested and is shown to increase prediction error. Also, predicting the corneal reflection center is inherently less precise than that of the pupil due to the reflection's small size. Typical approaches thus generate eye positions that can only be as robust as the relatively noisy corneal reflection data. An algorithm has been developed to more effectively account for camera movements with respect to the head as well as reduce the noise in the final eye position prediction. This algorithm was tested and is shown to be particularly robust during the common situation when sharp eye movements occur intermixed with smooth head-to-camera changes.
Keywords: algorithm, camera compensation, eye tracking, noise
A single camera eye-gaze tracking system with free head motion BIBAKFull-Text 87-94
  Craig Hennessey; Borna Noureddin; Peter Lawrence
Eye-gaze as a form of human machine interface holds great promise for improving the way we interact with machines. Eye-gaze tracking devices that are non-contact, non-restrictive, accurate and easy to use will increase the appeal for including eye-gaze information in future applications. The system we have developed and which we describe in this paper achieves these goals using a single high resolution camera with a fixed field of view. The single camera system has no moving parts which results in rapid reacquisition of the eye after loss of tracking. Free head motion is achieved using multiple glints and 3D modeling techniques. Accuracies of under 1° of visual angle are achieved over a field of view of 14x12x20 cm and over various hardware configurations, camera resolutions and frame rates.
Keywords: eye model, eye-gaze tracking, fast reacquisition, free head motion, human computer interface, human machine interface, single camera
openEyes: a low-cost head-mounted eye-tracking solution BIBAKFull-Text 95-100
  Dongheng Li; Jason Babcock; Derrick J. Parkhurst
Eye tracking has long held the promise of being a useful methodology for human computer interaction. However, a number of barriers have stood in the way of the integration of eye tracking into everyday applications, including the intrusiveness, robustness, availability, and price of eye-tracking systems. To lower these barriers, we have developed the openEyes system. The system consists of an open-hardware design for a digital eye tracker that can be built from low-cost off-the-shelf components, and a set of open-source software tools for digital image capture, manipulation, and analysis in eye-tracking applications. We expect that the availability of this system will facilitate the development of eye-tracking applications and the eventual integration of eye tracking into the next generation of everyday human computer interfaces. We discuss the methods and technical challenges of low-cost eye tracking as well as the design decisions that produced our current system.
Keywords: consumer-grade off-the-shelf parts, human computer interaction, video-based eye-tracking

Gaze-contingent display/video

Perceptual attention focus prediction for multiple viewers in case of multimedia perceptual compression with feedback delay BIBAKFull-Text 101-108
  Oleg Komogortsev; Javed Khan
Human eyes have limited perception capabilities. Only 2 degrees of our 180 degree vision field provide the highest quality of perception. Due to this fact the idea of perceptual attention focus emerged to allow a visual content to be changed in a way that only part of the visual field where a human attention is directed to is encoded with a high quality. The image quality in the periphery can be reduced without a viewer noticing it. This compression approach allows a significant decrease in bit-rate for a video stream, and in the case of the 3D stream rendering, it decreases the computational burden. A number of previous researchers have investigated the topic of real-time perceptual attention focus but only for a single viewer. In this paper we investigate a dynamically changing multi-viewer scenario. In this type of scenario a number of people are watching the same visual content at the same time. Each person is using eye-tracking equipment. The visual content (video, 3D stream) is sent through a network with a large transmission delay. The area of the perceptual attention focus is predicted for the viewers to compensate for the delay value and identify the area of the image which requires highest quality coding.
Keywords: compression, media adaptation, perceptual attention prediction
Gaze-contingent temporal filtering of video BIBAKFull-Text 109-115
  Martin Böhme; Michael Dorr; Thomas Martinetz; Erhardt Barth
We describe an algorithm for manipulating the temporal resolution of a video in real time, contingent upon the viewer's direction of gaze. The purpose of this work is to study the effect that a controlled manipulation of the temporal frequency content in real-world scenes has on eye movements. We build on the work of Perry and Geisler [1998; 2002], who manipulate spatial resolution as a function of gaze direction, allowing them to mimic the resolution distribution of the human retina or to simulate the effect of various diseases (e.g. glaucoma).Our temporal filtering algorithm is similar to that of Perry and Geisler in that we interpolate between the levels of a multiresolution pyramid. However, in our case, the pyramid is built along the temporal dimension, and this requires careful management of the buffering of video frames and of the order in which the filtering operations are performed. On a standard personal computer, the algorithm achieves real-time performance (30 frames per second) on high-resolution videos (960 by 540 pixels).We present experimental results showing that the manipulation performed by the algorithm reduces the number of high-amplitude saccades and can remain unnoticed by the observer.
Keywords: foveation, gaze-contingent display, temporal multiresolution pyramid
A pivotable head mounted camera system that is aligned by three-dimensional eye movements BIBAKFull-Text 117-124
  Philipp Wagner; Klaus Bartl; Wolfgang Günthner; Erich Schneider; Thomas Brandt; Heinz Ulbrich
The first proof of concept of an eye movement driven head camera system was recently presented. This innovative device utilized voluntary and reflexive eye movements, which were registered by video-oculography and computed online, as signals to drive servo motors which then aligned the camera along the user's gaze direction. However, with just two degrees of freedom, this camera motion device could not compensate for roll motions around the optical axis of the system. Therefore a new three-degree-of-freedom camera motion device that is able to reproduce the whole range of possible eye movements has now been implemented. In doing so, it allows a freely mobile user to aim the optical axis of the head mounted camera system at the target(s) in the visual field at which he/she is looking, while the ocular reflexes minimize image shaking by naturally counter-rolling the "gaze in space" of the camera during head and visual scene movements as well as during locomotion. A camera guided in this way mimics the natural exploration of a visual scene and acquires video sequences from the perspective of a mobile user, while the oculomotor reflexes naturally stabilize the camera on target during head and target movements. Various documentation and teaching applications in health care, industry, and research are conceivable. This work presents the implementation of the new camera motion device and its, integration into a head camera setup including the eye tracking device.
Keywords: calibration, camera motion device, eye tracking, parallel mechanism, vestibulo-ocular reflex

Comprehension and cognition

An eye-tracking methodology for characterizing program comprehension processes BIBAKFull-Text 125-132
  Roman Bednarik; Markku Tukiainen
Program comprehension processes have previously been studied using methodologies such as think-aloud or comprehension summary analysis. Eye-tracking, however, has not been previously widely applied to studies of behavioral aspects of programming. We present a study in which program comprehension was investigated with a help of a remote eye-tracker. Novice and intermediate programmers used a program visualization tool to aid their comprehension while the location of fixations, fixation durations and attention switching between the areas of interest were recorded.
   In this paper 1) we propose an approach how to investigate trends in repeated-measures sparse-data of few cases captured by an eye-tracker and 2) using this technique, we characterize the development of program comprehension strategies during dynamic program visualization with help of eye-movement data.
Keywords: eye-movement tracking methodology, program comprehension, program visualization, psychology of programming
Analyzing individual performance of source code review using reviewers' eye movement BIBAKFull-Text 133-140
  Hidetake Uwano; Masahide Nakamura; Akito Monden; Ken-ichi Matsumoto
This paper proposes to use eye movements to characterize the performance of individuals in reviewing source code of computer programs. We first present an integrated environment to measure and record the eye movements of the code reviewers. Based on the fixation data, the environment computes the line number of the source code that the reviewer is currently looking at. The environment can also record and play back how the eyes moved during the review process. We conducted an experiment to analyze 30 review processes (6 programs, 5 subjects) using the environment. As a result, we have identified a particular pattern, called scan, in the subjects' eye movements. Quantitative analysis showed that reviewers who did not spend enough time for the scan tend to take more time for finding defects.
Keywords: computer program, eye movement, human factor, source code review
Eye tracking insights into cognitive modeling BIBAKFull-Text 141-147
  Mike Bartels; Sandra P. Marshall
The original 2000 AMBR project sought to evaluate how well four human performance models simulated behavior of human participants. Participants and models completed a modified version of an air traffic control task and were compared on the dimensions of performance, reaction time and subjective workload ratings. The current study replicated the human performance findings of the previous phase of AMBR and added eye tracking analysis to enhance understanding of participants' behavior. Examination of gaze position and patterns of eye movement provided evidence that participants adopted different visual strategies to complete the task in different display conditions and at different levels of demand. Applicability of eye tracking analyses to cognitive models is discussed.
Keywords: AMBR, eye tracking, human performance modeling

Performance analysis

eyePatterns: software for identifying patterns and similarities across fixation sequences BIBAKFull-Text 149-154
  Julia M. West; Anne R. Haake; Evelyn P. Rozanski; Keith S. Karn
Fixation sequence analysis can reveal the cognitive strategies that drive eye movements. Unfortunately this type of analysis is not as common as other popular eye movement measures, such as fixation duration and trace length, because the proper tools for fixation sequence analysis are not incorporated into most popular eye movement software. This paper describes eyePatterns, a new tool for discovering similarities in fixation sequences and identifying the experimental variables that may influence their characteristics.
Keywords: data analysis, eye tracking, sequence analysis
Clutter or content?: how on-screen enhancements affect how TV viewers scan and what they learn BIBAKFull-Text 155-162
  Sheree Josephson; Michael E. Holmes
The influence of "on-screen enhancements" such as headline bars and bottom-of-the-screen crawlers on viewing of TV was tested using television news stories. Eye-movement data were recorded for participants' viewings of three news stories in three design levels: a standard screen, a screen with crawler; and a screen with headline bar and crawler. The influence of screen design on distribution of fixations was measured in a 3x3 MANOVA (screen design x story topic) with the ratios of fixation time in defined areas of interest (AOIs) to overall story length as dependent variables. The influence of screen design on the sequential resemblance of eye-path sequences was examined with a string-edit method; multidimensional scaling and cluster analysis were used to group sequences according to their intersequence resemblances as measured by Levenshtein distance. The influence of screen design on processing of story content was measured by story content recall.
   Results indicated screen design influenced distribution of fixation time across AOIs. Fixation sequence length was unrelated to screen design but was a strong influence on sequence resemblances; however, fixations in the headline and crawler AOIs also contributed to eye-path groupings, Screen design's influence on story content recall was limited to enhanced recall of key story points when headlines related to those points were present, although recall for other story points diminished.
Keywords: design, eye tracking, eye-path comparison, recall, television, visual attention
Gaze behavior of spotters during an air-to-ground search BIBAKFull-Text 163-179
  James L. Croft; Daniel J. Pittman; Charles (Chip) T. Scialfa
Crashed aircraft must be located quickly to minimize loss of life, often requiring visual search from the air. This study was designed to develop methods for evaluating the gaze behaviors of spotters during air-to-ground search and to compare field derived measures with similar lab measures reported in the literature. A secondary aim was to assess adherence to a prescribed scan path, evaluate search effectiveness, and determine the predictors of task success. Eye movements were measured in 10 volunteer spotters while searching from the air for ground targets. Static visual acuity at several eccentricities and contrast levels and performance on a lab-based search performance were also measured. Gaze relative to the head was transformed to gaze relative to the ground using information from the scene. Coverage and task success were similar to literature values from a lab-based study of air-to-ground search. Air search task success could be predicted best from a combination of gaze and laboratory variables and, like previous lab-based research, experience was not one of them. Results from this field study provide some support for the generalizability of lab research. In both lab and field research performance is quite poor. Future improvements in air search and rescue success will depend upon improvements in training, the refinement of scan tactics, changes to the task methods or environment, or modifications to parameters of the search exercise.
Keywords: air-to-ground search, detection, eye movements, visual search