HCI Bibliography Home | HCI Conferences | ESP Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ESP Tables of Contents: 868791939697

Empirical Studies of Programmers: Fifth Workshop

Fullname:Empirical Studies of Programmers: Fifth Workshop
Editors:Curtis R. Cook; Jean C. Scholtz; James C. Spohrer
Location:Palo Alto, California
Dates:1993-Dec-03 to 1993-Dec-15
Publisher:Ablex Publishing Corporation
Standard No:ISBN 0-56750-088-9 (cloth) 0-56750-089-7 (paper); hcibib: ESP93
  1. Keynote
  2. Panels
  3. Papers
  4. Posters


What is Software Interaction Design? BIB --
  Terry Winograd


End-User Programming BIBA 1-2
  Wayne D. Gray; Bonnie E. John; Bonnie A. Nardi; Marian Petre; James C. Spohrer; Althea A. Turner
End-user programming involves the end user building new tools, not simply using an application. Hence, word processing is not an example of end-user programming while building style sheets for a word processor would be. Using communication software is not, writing a script for the communication software is. Using someone else's spreadsheet is not, building your own spreadsheet is. Using someone else's HyperCard stack is not, building your own is. Running someone else's cognitive model is not, building a cognitive model that fits your theory is.
   This definition includes both specialized software for experts (for example, Edmonds, O'Brien, & Bayley, 1993), semi-domain specialized software such as spreadsheets, as well as intendedly general purpose (but specialized anyway) software such as HyperCard. The two defining characteristics are: building software tools (what the end-user programming language, EPL, is used for) and characteristics of the user (whose main interest is in building a tool for which they, among possible others, will be a user). Hence, LISP could be considered an EPL for C programmers who use EMACS.
Has of ESP Research Improved Programming Instruction? BIBA 3-5
  Marcia C. Linn; Michael J. Clancy; Lydia Mann; Philip Miller; Elliot Soloway
This panel discussion will address:
  • (a) how instructors use current research on programming and
  • (b) how future research might contribute to better teaching. We hope to help participants at the ESP V conference synthesize current studies and identify future directions for empirical work. In addition, we hope to alert researchers to unanticipated consequences of current empirical studies for programming courses.
  • Papers

    Beyond Program Understanding: A Look at Programming Expertise in Industry BIBA 6-25
      Lucy M. Berlin
    In the computer industry, expert programmers must often relearn parts of their craft as they retool themselves to new computer languages, programming environments, software frameworks and systems. Our study of consulting interactions between these apprentices and experts has given insights into this collaborative work practice and into the knowledge gaps of programmers in a new environment.
       In this paper we characterize the apprenticeship interactions we observed, the skills experts use in collaborative problem solving, the hard-to-find information they emphasize, and the tutoring skills they exhibit. The observations also indirectly suggest the multi-faceted knowledge required for real-life programming expertise, and the knowledge and skills that make experts so much more effective in their daily work.
    The Collaboration Thread: A Formative Evaluation of Object-Oriented Education BIBA 26-41
      John M. Carroll; Mary Beth Rosson; Mark K. Singley
    We are exploring a type of critical incident analysis that groups together sets of causally related user episodes; we refer to these as "critical threads." The episodes of a critical thread are sometimes less-than-critical when viewed in isolation, which can be a problem in formative evaluation, since when taken together, these same episodes can expose major underlying usability issues. We use psychological design rationale to construct a unifying description of the set of user episodes comprising a critical thread (i.e., as a sort of abstract and distributed user scenario). Such a description guides the recognition of pieces of a critical thread in data and the articulation of underlying usability themes embodied across the various constituent episodes.
    Externalising Information During Coding Activities: Effects of Expertise, Environment and Task BIBA 42-61
      Simon P. Davies
    This paper presents empirical evidence for differences in the nature of problem solver's information externalisation strategies. Two experiments concerned with programming behaviour are reported which suggest that experts tend to rely much more upon the use of external memory sources in situations where the device they use to construct the program hinders the utilisation of a display in the service of performance. Experts and novices also appear to externalise different kinds of information during problem solving. Hence, experts tend to externalise low level information, mainly to aid simulation, whereas novices develop higher level representations which might be characterised as transformations or re-representations of the problem state. Moreover in the case of experts, the nature of externalised information appears to depend upon whether they are generating a program as opposed to comprehending it. These results provide support for a display-based view of problem solving. Moreover these studies address strategic differences in the externalisation of information, which until now have remained unexplored in accounts of display-based behaviour. Finally, the paper suggests a number of implications for the design of tools intended to support the programming process and for systems aimed at teaching programming skills.
    Mind Your Ps and Qs: Using Parentheses and Quotes in LISP BIBA 62-85
      Elizabeth A. Davis; Marcia C. Linn; Lydia M. Mann; Michael J. Clancy
    Learning the Ps and Qs of LISP turns out to be more challenging than most textbooks and many instructors anticipate. By "minding your Ps and Qs" we refer to appropriate use of parentheses and quotes. We investigate (a) the aspects of Ps and Qs that are most likely to confuse novice programmers, and (b) the course of knowledge development that novice programmers follow in making sense of Ps and Qs in LISP.
       We devised a 23-problem LISP Ps and Qs assessment and administered it to 36 students in individual interviews. We identified seven rules that accounted for most of the inappropriate conjectures made by students about Ps and Qs. Students applied each conjecture or "rule" either consistently or intermittently. Based on their patterns of rule use, students could be categorized as rule refiners, rule users, or rule seekers.
       These results suggest that students struggle to make sense of information about LISP syntax and about the nature of rules in programming. In making sense of programming rules, for example, students might draw on their knowledge of rules for English grammar or rules for algebra symbol manipulation. It is tempting to conclude that students need more explicit instruction in LISP but our observations suggest, instead, that students need encouragement and support as they construct personal views of programming. Learning the Ps and Qs of LISP requires the same process of knowledge integration and refinement characteristic of more complex learning.
    Tales of Debugging From the Front Lines BIBA 86-112
      Marc Eisenstadt
    A world-wide trawl for debugging anecdotes elicited replies from 78 respondents, including a number of implementors of well-known commercial software. The stories included descriptions of bugs, bug-fixing strategies, discourses on the philosophy of programming, and several highly amusing and informative reminiscences. Experiences included using a steel ruler to debug a COBOL line printer listing, browsing through a punched card deck to debug an early FORTRAN compiler, and struggling in vain to find intermittent bugs on popular commercial products. An analysis of the anecdotes reveals three primary dimensions of interest: why the bugs were difficult to find, how the bugs were found, and root causes of bugs. Half of the difficulties arose from just two sources: (i) large temporal or spatial chasms between the root cause and the symptom, and (ii) bugs that rendered debugging tools inapplicable. Techniques for bug-finding were dominated by reports of data-gathering (e.g. print statements) and hand-simulation, which together accounted for almost 80% of the reported techniques. The two biggest causes of bugs were (i) memory overwrites and (ii) vendor-supplied hardware or software faults, which together accounted for more than 40% of the reported bugs. The paper discusses the implications of these findings for the design of program debuggers, and explores the possible role of a large repository/data base of debugging anecdotes.
    Learning Computer Programming: A Route to General Reasoning Skills? BIBA 113-136
      Adrienne Y. Lee; Nancy Pennington
    The learning of computer programming in schools has often been promoted as a basis for the learning of general thinking skills. Thus, a fundamental question about computer programming skill is whether it "transfers" to reasoning in other domains. Our research investigates whether expert diagnostic strategies will transfer spontaneously from established programming skill to another, unfamiliar domain. We then examine whether diagnostic reasoning can be taught to novices, in the context of learning to program, in a way that liberates the strategy from its content. The first experiment examined experienced subjects (extensive programming but no electronics) and inexperienced subjects (no programming or electronics) performances in two domains (programming and electronics) when domain specific information was provided. Results suggest that practicing a component of programming skill (debugging) will produce a general diagnostic skill that can transfer spontaneously across domains. The second experiment examined the training of inexperienced subjects for transfer. Experimental subjects learned more than controls, but did not show more transfer. More training may be necessary for subjects to reach advanced levels of the skill and thereby show transfer.
    Comparing the Comprehensibility of Textual and Graphical Programs: The Case of Petri Nets BIBA 137-161
      Thomas G. Moher; David C. Mak; Brad Blumenthal; Laura M. Leventhal
    In an experiment inspired by Green, Petre, and Bellamy (1991), three forms of Petri net representations were tested against two textual program representations for comprehensibility. Two tasks were employed: question-answering and matching. The results reaffirmed the textual match-mismatch phenomenon frequently reported for circumstantial vs. sequential programs, but failed to find a match-mismatch for alternative net representations. Petri nets appeared to be more well-suited in general to backwards questions, but performance was strongly dependent to the layout of the Petri nets. In general, the results indicate that the efficacy of a graphical program representation is not only task-specific, but also highly sensitive to seemingly ancillary issues such as layout and the degree of factoring.
    Does Programming Knowledge or Design Strategy Determine Shifts of Focus in Prolog Programming? BIBA 162-186
      Thomas C. Ormerod; Linden J. Ball
    In this paper we examine the nature of expertise in program writing, in particular the factors which underlie the order in which code is generated by Prolog programmers. Verbal and keystroke recordings were taken from five expert subjects coding solutions to a problem requiring a recursive list-processing solution. A quantitative analysis of transcripts revealed a wide variation between subjects in the presence of non-linearities in code generation, with one subject demonstrating almost perfect linear development of code whilst others showed varying degrees of non-linearity. On the other hand, there was little evidence of deviation from a structured approach to code development even by experts producing code in a non-linear fashion. Qualitative analysis of verbal protocols revealed two key factors which determined the sequence of code generation: these were 1) switches between different views of the programming problem during solution development; and 2) the operation of problem scheduling strategies which created agenda for tackling coding sub-problems We discuss our findings in terms of current theories of programming expertise, and propose that the notion of programming 'plans' is neither necessary nor sufficient to account for the shifts of focus in the coding of our expert subjects. Plans may be a component of programming expertise, but they cannot alone account for the different coding orders observed in the construction of similar programs. Instead, we argue that a theory of programming expertise must account for the role of design strategies such as structured problem decomposition and problem scheduling that are employed by experts in developing code.
    An Analysis of Novice Programmers Learning a Second Language BIBA 187-205
      Jean Scholtz; Susan Wiedenbeck
    This research studied novice programmers with some Pascal knowledge during their initial attempts at learning another programming language. We wanted to identify the programming knowledge they had previously acquired and determine if they were able to use this knowledge in learning a second language. We found that plan structure differences could be used to predict problems programmers encountered. Additionally, we discovered that novices were hampered in transferring to a new language, not only by features of the new language, but by inadequate or missing knowledge of both programming constructs from their first language and programming concepts in general.
    Positive Test Bias in Software Testing by Professionals: What's Right and What's Wrong BIBA 206-221
      Barbee Teasley; Laura Marie Leventhal; Diane S. Rohlman
    Software testing, which consumes substantial effort in software development, is a virtually unexplored area in human-computer interaction. At Bowling Green State University, we have a program of research which is looking at the application of judgment and decision-making theory to software testing, focusing on the role of positive test bias in software testing. Studies of naturalistic testing tasks, as well as ones which follow common laboratory models in this area, have found ample evidence that testers have a positive test bias. This bias is manifest as a tendency to execute about four times as many positive tests, designed to show that "the program works" (i.e., valid data), as tests which challenge the program (i.e., use invalid data). While positive tests do uncover errors in a program and need to be done, failure to do negative tests leaves much of the program invalidated.
       Our studies have also shown that the expertise of the subjects, the completeness of the software specifications, and the presence / absence of program errors may reduce positive test bias. Talk-aloud data suggests that advanced computer science students and professional programmers do invent specifications to test in the absence of actual specifications, but still exhibit positive test bias.


    Program Comprehension of Literate Programs by Novice Programmers BIBA 222
      Christopher F. Bertholf; Jeanne Scholtz
    This study compares comprehension of Lit style literate programs with that of traditional modular programs with both internal and external documentation. Literate programming (Knuth, 1984)* enhances a computer program by incorporating program text into a comprehensive design document. Although not previously well defined, we believe Knuth's concept has great intuitive appeal, fits in well with a multi-disciplinary approach to automating portions of the software engineering process, and can be adapted easily to the incorporation of empirically derived principles of program comprehension.
       The Lit system developed by Chris Bertholf employs many of Knuth's principles for literate style programs as well as several others; the program text is incorporated into a comprehensive design document which uses typographic cues and a book style presentation paradigm. A program description and information about design history, the task domain, and implementation are included in the program document. The table of contents provides information about the overall structure of the program. In addition, algorithms are documented in pseudo-code and documentation of anticipated modifications is included. Extensive documentation of the usage of variables, procedures, and functions is also included.
       Does this increased amount of documentation and the unique presentation format hinder or facilitate program comprehension? This study compared the comprehension results of 20 novice programmers randomly divided into two groups and given either a traditional modular FORTRAN program or an equivalent Lit style literate program to modify. Subjects performed the task of completing an incomplete program; all program modifications were made on paper, thus syntax errors were expected. The elapsed time to produce a solution was recorded, and several measures of comprehension were collected and analyzed. Completed programs were judged as completely correct, functionally correct with syntax errors, or incorrect. The overall result was that subjects given the literate programs found a solution more often than did subjects using the traditional modular programs. None of the subjects given the modular programs were able to produce even functionally correct solutions. In addition, none of the subjects given Lit style literate programs modified sections of code that were unrelated to the modification specification while all of the subjects given traditional modular programs modified sections of code which were unrelated to the modification specification. Similar results have also been obtained with advanced programmers in another related study.
       Although this study did not attempt to isolate the factors which aided in comprehension, it did show that the Lit style programs are useful for program maintenance tasks. Future research in this area should concentrate on isolating the factors that produced such a marked distinction in performance between the Lit style literate program group and the traditional program group.
       * Knuth, D. (1984). Literate Programming. The Computer Journal, 27(2), 97-112.
    The Dynamic Construction of Work Organizations During Team Programming: Elements of a Process of Dynamic Organization BIBA 223
      Nick V. Flor
    The complexity and enormity of most computer programming tasks suggests that their successful completion by software teams requires not only careful design but also careful planning of both the work organizations and the coordination of results for the various programming subtasks. The goal of this paper is to show that useful work can be accomplished by individual-centered work organizations acting in the best interests of their own tasks and reacting opportunistically to useful information in other work organizations. The activities of a pair of programmers working on a software maintenance task will be analyzed in detail. It will be argued that what looks on the surface like planned collaboration between the programmers is actually the consequence of a dynamic organizational process. Concurrent with the analysis conducted, the elements of this process will be identified and their potential role in shaping the construction of new work organizations will be discussed.
    How Programmers Visualize Programs BIBA 224
      Lindsey Ford
    How does a programmer visualize a computer language? How does a programmer visualize the execution of a program? We have explored these questions with learners of object-oriented programming. We provided them with a set of graphic and animation creation tools and assigned them a practical project to design and implement programs that would animate features of the language C++. They developed programs that interfaced with the tools and thus produced animations of their own design of features of C++ of their own choosing. So, for example, some learners provided animations that visualized how loop, choice, assignment constructs worked; other animations focused on visualizing class hierarchy, inheritance and overloading; yet others visualized dynamic memory operations. At stages through their designs and implementations we interviewed the learners to determine what aspects of C++ they wanted to visualize and why they wanted to visualize it in a certain way. Finally, we examined their animations and the programs they had developed to generate the animations.
       From these results we conclude that: (1) learners use various abstractions when visualizing; (2) a study of programmers' visualizations provides a complementary view to textual-based empirical studies of programmers; (3) programmers frequently represent the same textual programming construct in different visual forms; (4) visualization provides a framework for studying learners' misconceptions; and (5) visualization exercises for learners appear to foster programming skills.
    Analysis of Experiences with Modifying Computer Programs BIBA 225
      Arun Lakhotia
    The paper analyzes the author's experience with modifying large, real-world programs written by other programmers. It finds that Brooks' domain and programming knowledge based hypothesis-test-refine paradigm explains the author's approach to understanding programs and the differences in performance in comparison with his students. Zvegintov's 9-step process of change is found to be a good first level decomposition of the (physical) tasks performed when making corrective changes to a software system.
       The paper also makes some new observations. Besides modularity and levels of abstractions, the organization of source code in hierarchy of directories also influence on the ease of locating code segments relevant to a change request. The functionality of a program is not only understood from its documentation but also by executing it and inferring relations between its inputs and outputs; an approach analogous to concept identification. When introducing a new function in an existing program, a programmer attempts to find subproblems that have been solved by other parts of the program so as to mimic their solutions. Quite often this means copying large code segments. However, when deleting a function, the code implementing it is not destroyed, only execution paths leading to it are disconnected; leaving behind dead-code. The replicated and dead code segments are major contributors to the difficulty in understanding and modifying programs.
    Very High-Level Debugging: How Novice Ada Concurrent Programmers Respond to ADAT BIBAK 226
      Arthur V. Lopes; Rachelle S. Heller; Michael B. Feldman; Dianne C. Martin
    This paper describes the study that was carried out to evaluate how novice concurrent Ada programmers respond to an Automated Debugger for Ada Tasks (ADAT). ADAT is a programming tool that implements a debugging concept in which non-syntactic errors are detected and the user is guided to correct the errors. The process of identifying and correcting a non-syntactic error is named Very High-Level Debugging. The traditional static analysis was extended through the use of a rule-based system (CLIPS). The source code of a SmallAda (student compiler for an Ada subset) program is searched for likely execution-time anomalies in task activation and communication. Some race conditions and deadlocks are among the anomalies dealt by ADAT. Each anomaly is associated with a corrective procedure. ADAT was implemented to test the idea of Very High-Level Debugging. An experiment was performed using two groups of 20 subjects each. An experimental group and a control group were used in a two stage experiment. In stage one, the subjects in the experimental group used the SmallAda system with the ADAT tool available, and the subjects in the control group used the SmallAda system without the ADAT tool. Subjects from both groups were asked to find and correct one bug in each of two Small-Ada programs. The SAPM (SmallAda Parallel Monitoring) tool was available in both groups. In Stage Two, both groups were asked to use the SmallAda system to extend a SmallAda concurrent program. At this stage of this experiment, the conditions under which the subjects worked were identical. The goal of the experiment was to test the following two hypothesis: a) The use of the ADAT improves the performance of the debugging activity; b) The use of the ADAT provides an improvement in the understanding of concurrency. Analysis of the experimental results showed that ADAT improves the performance of the debugging activity as well as the learning process. ADAT also shows promise as an intelligent trainer.
    Keywords: Software testing, Intelligent computer aided training, CLIPS, Ada, Expert systems, Debugging, Programming training, Concurrent programming
    Programmer Managed Using Lean Techniques BIBA 227
      Peter Middleton
    This paper is concerned with the dynamics of how programmers interact with other team members. It examines the management of the tasks of design, coding and maintenance. It also contributes to the area of learning and knowledge transfer. This research describes how lessons learned in lean, as opposed to mass production, might be applied to software construction.
       Attempts to raise the productivity of information systems development often involve adding more technology, for example -- CASE tools, 4th Generation Languages and Relational Databases. The evidence from other industries suggests that higher quality and productivity can be obtained with less technology.
       This paper reports initial observations from an empirical pilot study of 2 small teams of programmers managed using lean or Just-In-Time (JIT) techniques for constructing software. It concludes that JIT approaches do significantly alter the dynamics of the groups. The work is of higher quality and learning happens more quickly. The problem is that it clearly exposes people with weak performance, and therefore an organisation needs to be particularly willing and able to assist these members of staff.
    A Scoring System for Software Designs BIBA 228
      Bob Rehder; Nancy Pennington; Adrienne Y. Lee
    A system for scoring software designs produced in experimental settings is proposed and described. The system allows for a complete and multifaceted expression of a software design, making it ideal for comparing designs generated in different languages, paradigms, and methodologies. The system is able to characterize the different strengths (and weaknesses) that each design possess, and do so in a way that is "paradigm neutral", that is, is not unfairly biased towards one language, paradigm, or methodology. As a result of the thoroughness of this scoring system, a completeness score for a design may be computed which reflects the completeness of the design in an absolute sense. In addition, the scoring system characterizes each design component as being specified at a certain level of abstraction. Two different notions of level of abstraction, "level of refinement" and "level of decomposition", are compared. The scoring system allows for the representation of design alternatives and optional features, recognizing that software design problems are not sufficiently constrained to identify a unique solution. Techniques for scoring designs and generating dependent measures are described.
    The Recognition of Concurrent Programming Plans by Novice and Expert Programmers: Implications for the Parsimony of the Plan Theory of Programming Expertise BIBA 229
      Vincent Shah; Ray Waddington; Tom Carey; Peter Buhr
    The concept of programming plans has generated much discussion as to whether it adequately explains behavioural differences between novice and expert programmers. Experimental tools, such as PROUST, Bridge and UNIVERSE, have applied programming plans in different roles. However, most of the research in this area has been centered in the sequential programming paradigm. As a result, one can only speculate the extent of plan theory across different paradigms.
       This study provides some insight into this matter by examining plans in concurrent programming. Rist's (1986) methodology was adapted to confirm the existence of a well-established set of plans that expert concurrent programmers had accumulated from their wealth of experience. The novice subjects were expected to slowly acquire these plans as they gained expertise over time. No evidence could be found to support these tendencies but a significant correlation was observed between concurrent plan recognition and academic performance. The findings from this study raise a number of questions about the extent and completeness of plan theory. It also provides a starting point for further research on concurrent programming behaviour that is aimed towards designing and developing effective concurrent programming tools and environments.
    Essential Competencies of Software Engineers Derived from Critical Incident Interviews BIBA 230
      Richard T. Turley; James M. Bieman
    We present the results of a two phase study designed to determine the competencies that separate exceptional from non-exceptional performance of professional software engineers. In Phase 1, we use the Critical Incident Interview technique in an in-depth review of 20 professional software engineers employed by a major computer firm. The Critical Incident Interview technique is a rigorous method for determining critical job requirements from structured interviews with workers. We find that one biographical factor, Years at Company in Software, is significantly related to exceptional performance. We also analyze competencies identified by software managers. By combining the data obtained through the interviews and by the managers, we identify 38 essential competencies of software engineers.
       In Phase 2 of the study, we perform a quantitative study to differentially relate these competencies to the performance of the engineers. Phase 2 uses a "Q-Sort" survey instrument on a sample of 129 software engineers including 41 exceptional and 88 non-exceptional engineers. Five competencies have a significantly higher mean for exceptional engineers -- Helps Others, Proactively Attempts to Influence Project Direction by Influencing Management, Exhibits and Articulates Strong Beliefs and Convictions, Mastery of Skills and Techniques, Maintains "Big Picture" View, while four competencies have significantly higher mean for non-exceptional engineers -- Seeks Help From Others, Responds to Schedule Pressure by Sacrificing Parts of Design Process, Driven by Desire to Contribute, Willingness to Confront Others.
       In addition to identifying essential competencies, our results demonstrate the effectiveness of the Critical Incident Interview technique and the Q-Sort instrument for collecting software engineering process data.
    Concurrent Microlanguages: Demonstration of an Experimental Method for the Empirical Study of Concurrent Programming BIBA 231
      Ray Waddington
    Empirical studies of computer programming help our understanding one of the most complex human cognitive skills. That understanding contributes to the design of software tools. The long-term hopes of the latest Computer Aided Software Engineering tools include the automation of coding. However, even if this hope is ever realized, it is likely to be some time before programmers become obsolete. For the foreseeable future, then, we can usefully apply our understanding of programming to assist human programmers.
       Currently there is a significant gap in the empirical study of programming: no work has been done in the domain of concurrent programming. This poster discusses the design and application of microlanguages as an experimental method in the empirical study of concurrent programming. Two concurrent microlanguages are presented. These were designed to support a program of empirical research in the domain of concurrent programming. One microlanguage uses semaphores, the other uses rendezvous as the inter-process communication primitive. (Although the method could be applied to any inter-process communication primitive.) The results of one experiment are presented, which evaluates the use of these primitives in a program comprehension task using expert programmers. The result favors the rendezvous construct at a reduced level of significance.