HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 161718192021222324252627282930313233

ACM Transactions on Information Systems 26

Editors:Gary Marchionini
Dates:2008
Volume:26
Publisher:ACM
Standard No:ISSN 1046-8188; HF S548.125 A33
Papers:25
Links:Table of Contents
  1. TOIS 2008 Volume 26 Issue 1
  2. TOIS 2008 Volume 26 Issue 2
  3. TOIS 2008 Volume 26 Issue 3
  4. TOIS 2008 Volume 26 Issue 4

TOIS 2008 Volume 26 Issue 1

Repeatable evaluation of search services in dynamic environments BIBAFull-Text 1
  Eric C. Jensen; Steven M. Beitzel; Abdur Chowdhury; Ophir Frieder
In dynamic environments, such as the World Wide Web, a changing document collection, query population, and set of search services demands frequent repetition of search effectiveness (relevance) evaluations. Reconstructing static test collections, such as in TREC, requires considerable human effort, as large collection sizes demand judgments deep into retrieved pools. In practice it is common to perform shallow evaluations over small numbers of live engines (often pairwise, engine A vs. engine B) without system pooling. Although these evaluations are not intended to construct reusable test collections, their utility depends on conclusions generalizing to the query population as a whole. We leverage the bootstrap estimate of the reproducibility probability of hypothesis tests in determining the query sample sizes required to ensure this, finding they are much larger than those required for static collections. We propose a semiautomatic evaluation framework to reduce this effort. We validate this framework against a manual evaluation of the top ten results of ten Web search engines across 896 queries in navigational and informational tasks. Augmenting manual judgments with pseudo-relevance judgments mined from Web taxonomies reduces both the chances of missing a correct pairwise conclusion, and those of finding an errant conclusion, by approximately 50%.
Frequency-based identification of correct translation equivalents (FITE) obtained through transformation rules BIBAFull-Text 2
  Ari Pirkola; Jarmo Toivonen; Heikki Keskustalo; Kalervo Järvelin
We devised a novel statistical technique for the identification of the translation equivalents of source words obtained by transformation rule based translation (TRT). The effectiveness of the technique called frequency-based identification of translation equivalents (FITE) was tested using biological and medical cross-lingual spelling variants and out-of-vocabulary (OOV) words in Spanish-English and Finnish-English TRT. The results showed that, depending on the source language and frequency corpus, FITE-TRT (the identification of translation equivalents from TRT's translation set by means of the FITE technique) may achieve high translation recall. In the case of the Web as the frequency corpus, translation recall was 89.2%-91.0% for Spanish-English FITE-TRT. For both language pairs FITE-TRT achieved high translation precision: 95.0%-98.8%. The technique also reliably identified native source language words: source words that cannot be correctly translated by TRT. Dictionary-based CLIR augmented with FITE-TRT performed substantially better than basic dictionary-based CLIR where OOV keys were kept intact. FITE-TRT with Web document frequencies was the best technique among several fuzzy translation/matching approaches tested in cross-language retrieval experiments. We also discuss the application of FITE-TRT in the automatic construction of multilingual dictionaries.
A formal model of annotations of digital content BIBAFull-Text 3
  Maristella Agosti; Nicola Ferro
This article is a study of the themes and issues concerning the annotation of digital contents, such as textual documents, images, and multimedia documents in general. These digital contents are automatically managed by different kinds of digital library management systems and more generally by different kinds of information management systems.
   Even though this topic has already been partially studied by other researchers, the previous research work on annotations has left many open issues. These issues concern the lack of clarity about what an annotation is, what its features are, and how it is used. These issues are mainly due to the fact that models and systems for annotations have only been developed for specific purposes. As a result, there is only a fragmentary picture of the annotation and its management, and this is tied to specific contexts of use and lacks-general validity.
   The aim of the article is to provide a unified and integrated picture of the annotation, ranging from defining what an annotation is to providing a formal model. The key ideas of the model are: the distinction between the meaning and the sign of the annotation, which represent the semantics and the materialization of an annotation, respectively; the clear formalization of the temporal dimension involved with annotations; and the introduction of a distributed hypertext between digital contents and annotations. Therefore, the proposed formal model captures both syntactic and semantic aspects of the annotations. Furthermore, it is built on previously existing models and may be seen as an extension of them.
Does a one-size recommendation system fit all? the effectiveness of collaborative filtering based recommendation systems across different domains and search modes BIBAFull-Text 4
  Il Im; Alexander Hars
Collaborative filtering (CF) is a personalization technology that generates recommendations for users based on others' evaluations. CF is used by numerous e-commerce Web sites for providing personalized recommendations. Although much research has focused on refining collaborative filtering algorithms, little is known about the effects of user and domain characteristics on the accuracy of collaborative filtering systems. In this study, the effects of two factors -- product domain and users' search mode -- on the accuracy of CF are investigated. The effects of those factors are tested using data collected from two experiments in two different product domains, and from two large CF datasets, EachMovie and Book-Crossing. The study shows that the search mode of the users strongly influences the accuracy of the recommendations. CF works better when users look for specific information than when they search for general information. The accuracy drops significantly when data from different modes are mixed. The study also shows that CF is more accurate for knowledge domains than for consumer product domains. The results of this study imply that for more accurate recommendations, collaborative filtering systems should be able to identify and handle users' mode of search, even within the same domain and user group.
Error correction vs. query garbling for Arabic OCR document retrieval BIBAFull-Text 5
  Kareem Darwish; Walid Magdy
Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.

TOIS 2008 Volume 26 Issue 2

Classification-aware hidden-web text database selection BIBAFull-Text 6
  Panagiotis G. Ipeirotis; Luis Gravano
Many valuable text databases on the web have noncrawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multiple such "hidden-web" text databases at once through a unified query interface. An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query. The state-of-the-art database selection techniques rely on statistical summaries of the database contents, generally including the database vocabulary and associated word frequencies. Unfortunately, hidden-web text databases typically do not export such summaries, so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying. We present a novel "focused-probing" sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are representative of the topic coverage of the database. Our algorithm is the first to construct content summaries that include the frequencies of the words in the database. Unfortunately, Zipf's law practically guarantees that for any relatively large database, content summaries built from moderately sized document samples will fail to cover many low-frequency words; in turn, incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To enhance the sparse document samples and improve the database selection decisions, we exploit the fact that topically similar databases tend to have similar vocabularies, so samples extracted from databases with a similar topical focus can complement each other. We have developed two database selection algorithms that exploit this observation. The first algorithm proceeds hierarchically and selects the best categories for a query, and then sends the query to the appropriate databases in the chosen categories. The second algorithm uses "shrinkage," a statistical technique for improving parameter estimation in the face of sparse data, to enhance the database content summaries with category-specific words. We describe how to modify existing database selection algorithms to adaptively decide (at runtime) whether shrinkage is beneficial for a query. A thorough evaluation over a variety of databases, including 315 real web databases as well as TREC data, suggests that the proposed sampling methods generate high-quality content summaries and that the database selection algorithms produce significantly more relevant database selection decisions and overall search results than existing algorithms.
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace BIBAFull-Text 7
  Ahmed Abbasi; Hsinchun Chen
One of the problems often associated with online anonymity is that it hinders social accountability, as substantiated by the high levels of cybercrime. Although identity cues are scarce in cyberspace, individuals often leave behind textual identity traces. In this study we proposed the use of stylometric analysis techniques to help identify individuals based on writing style. We incorporated a rich set of stylistic features, including lexical, syntactic, structural, content-specific, and idiosyncratic attributes. We also developed the Writeprints technique for identification and similarity detection of anonymous identities. Writeprints is a Karhunen-Loeve transforms-based technique that uses a sliding window and pattern disruption algorithm with individual author-level feature sets. The Writeprints technique and extended feature set were evaluated on a testbed encompassing four online datasets spanning different domains: email, instant messaging, feedback comments, and program code. Writeprints outperformed benchmark techniques, including SVM, Ensemble SVM, PCA, and standard Karhunen-Loeve transforms, on the identification and similarity detection tasks with accuracy as high as 94% when differentiating between 100 authors. The extended feature set also significantly outperformed a baseline set of features commonly used in previous research. Furthermore, individual-author-level feature sets generally outperformed use of a single group of attributes.
Towards a belief-revision-based adaptive and context-sensitive information retrieval system BIBAFull-Text 8
  Raymond Y. K. Lau; Peter D. Bruza; Dawei Song
In an adaptive information retrieval (IR) setting, the information seekers' beliefs about which terms are relevant or nonrelevant will naturally fluctuate. This article investigates how the theory of belief revision can be used to model adaptive IR. More specifically, belief revision logic provides a rich representation scheme to formalize retrieval contexts so as to disambiguate vague user queries. In addition, belief revision theory underpins the development of an effective mechanism to revise user profiles in accordance with information seekers' changing information needs. It is argued that information retrieval contexts can be extracted by means of the information-flow text mining method so as to realize a highly autonomous adaptive IR system. The extra bonus of a belief-based IR model is that its retrieval behavior is more predictable and explanatory. Our initial experiments show that the belief-based adaptive IR system is as effective as a classical adaptive IR system. To our best knowledge, this is the first successful implementation and evaluation of a logic-based adaptive IR model which can efficiently process large IR collections.
Locality-Based pruning methods for web search BIBAFull-Text 9
  Edleno Silva de Moura; Celia Francisca dos Santos; Bruno Dos santos de Araujo; Altigran Soares da Silva; Pavel Calado; Mario A. Nascimento
This article discusses a novel approach developed for static index pruning that takes into account the locality of occurrences of words in the text. We use this new approach to propose and experiment on simple and effective pruning methods that allow a fast construction of the pruned index. The methods proposed here are especially useful for pruning in environments where the document database changes continuously, such as large-scale web search engines. Extensive experiments are presented showing that the proposed methods can achieve high compression rates while maintaining the quality of results for the most common query types present in modern search engines, namely, conjunctive and phrase queries. In the experiments, our locality-based pruning approach allowed reducing search engine indices to 30% of their original size, with almost no reduction in precision at the top answers. Furthermore, we conclude that even an extremely simple locality-based pruning method can be competitive when compared to complex methods that do not rely on locality information.
DirichletRank: Solving the zero-one gap problem of PageRank BIBAFull-Text 10
  Xuanhui Wang; Tao Tao; Jian-Tao Sun; Azadeh Shakery; Chengxiang Zhai
Link-based ranking algorithms are among the most important techniques to improve web search. In particular, the PageRank algorithm has been successfully used in the Google search engine and has been attracting much attention recently. However, we find that PageRank has a "zero-one gap" problem which, to the best of our knowledge, has not been addressed in any previous work. This problem can be potentially exploited to spam PageRank results and make the state-of-the-art link-based antispamming techniques ineffective. The zero-one gap problem arises as a result of the current ad hoc way of computing transition probabilities in the random surfing model. We therefore propose a novel DirichletRank algorithm which calculates these probabilities using Bayesian estimation with a Dirichlet prior. DirichletRank is a variant of PageRank, but does not have the problem of zero-one gap and can be analytically shown substantially more resistant to some link spams than PageRank. Experiment results on TREC data show that DirichletRank can achieve better retrieval accuracy than PageRank due to its more reasonable allocation of transition probabilities. More importantly, experiments on the TREC dataset and another real web dataset from the Webgraph project show that, compared with the original PageRank, DirichletRank is more stable under link perturbation and is significantly more robust against both manually identified web spams and several simulated link spams. DirichletRank can be computed as efficiently as PageRank, and thus is scalable to large-scale web applications.
On ranking techniques for desktop search BIBAFull-Text 11
  Sara Cohen; Carmel Domshlak; Naama Zwerdling
Users tend to store huge amounts of files, of various formats, on their personal computers. As a result, finding a specific, desired file within the file system is a challenging task. This article addresses the desktop search problem by considering various techniques for ranking results of a search query over the file system. First, basic ranking techniques, which are based on various file features (e.g., file name, access date, file size, etc.), are considered and their effectiveness is empirically analyzed. Next, two learning-based ranking schemes are presented, and are shown to be significantly more effective than the basic ranking methods. Finally, a novel ranking technique, based on query selectiveness, is considered for use during the cold-start period of the system. This method is also shown to be empirically effective, even though it does not involve any learning.

TOIS 2008 Volume 26 Issue 3

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums BIBAFull-Text 12
  Ahmed Abbasi; Hsinchun Chen; Arab Salem
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.
Interpreting TF-IDF term weights as making relevance decisions BIBAFull-Text 13
  Ho Chung Wu; Robert Wing Pong Luk; Kam Fai Wong; Kui Lam Kwok
A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these "local" relevance decisions as the "document-wide" relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity -- log p(&rmacr;|t in d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by &rmacr;) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.
A basis for information retrieval in context BIBAFull-Text 14
  Massimo Melucci
Information retrieval (IR) models based on vector spaces have been investigated for a long time. Nevertheless, they have recently attracted much research interest. In parallel, context has been rediscovered as a crucial issue in information retrieval. This article presents a principled approach to modeling context and its role in ranking information objects using vector spaces. First, the article outlines how a basis of a vector space naturally represents context, both its properties and factors. Second, a ranking function computes the probability of context in the objects represented in a vector space, namely, the probability that a contextual factor has affected the preparation of an object.
Incremental cluster-based retrieval using compressed cluster-skipping inverted files BIBAFull-Text 15
  Ismail Sengor Altingovde; Engin Demir; Fazli Can; Özgür Ulusoy
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
Unified relevance models for rating prediction in collaborative filtering BIBAFull-Text 16
  Jun Wang; Arjen P. de Vries; Marcel J. T. Reinders
Collaborative filtering aims at predicting a user's interest for a given item based on a collection of user profiles. This article views collaborative filtering as a problem highly related to information retrieval, drawing an analogy between the concepts of users and items in recommender systems and queries and documents in text retrieval.
   We present a probabilistic user-to-item relevance framework that introduces the concept of relevance into the related problem of collaborative filtering. Three different models are derived, namely, a user-based, an item-based, and a unified relevance model, and we estimate their rating predictions from three sources: the user's own ratings for different items, other users' ratings for the same item, and ratings from different but similar users for other but similar items.
   To reduce the data sparsity encountered when estimating the probability density function of the relevance variable, we apply the nonparametric (data-driven) density estimation technique known as the Parzen-window method (or kernel-based density estimation). Using a Gaussian window function, the similarity between users and/or items would, however, be based on Euclidean distance. Because the collaborative filtering literature has reported improved prediction accuracy when using cosine similarity, we generalize the Parzen-window method by introducing a projection kernel.
   Existing user-based and item-based approaches correspond to two simplified instantiations of our framework. User-based and item-based collaborative filterings represent only a partial view of the prediction problem, where the unified relevance model brings these partial views together under the same umbrella. Experimental results complement the theoretical insights with improved recommendation accuracy. The unified model is more robust to data sparsity because the different types of ratings are used in concert.
Assessing multivariate Bernoulli models for information retrieval BIBAFull-Text 17
  David E. Losada; Leif Azzopardi
Although the seminal proposal to introduce language modeling in information retrieval was based on a multivariate Bernoulli model, the predominant modeling approach is now centered on multinomial models. Language modeling for retrieval based on multivariate Bernoulli distributions is seen inefficient and believed less effective than the multinomial model. In this article, we examine the multivariate Bernoulli model with respect to its successor and examine its role in future retrieval systems. In the context of Bayesian learning, these two modeling approaches are described, contrasted, and compared both theoretically and computationally. We show that the query likelihood following a multivariate Bernoulli distribution introduces interesting retrieval features which may be useful for specific retrieval tasks such as sentence retrieval. Then, we address the efficiency aspect and show that algorithms can be designed to perform retrieval efficiently for multivariate Bernoulli models, before performing an empirical comparison to study the behaviorial aspects of the models. A series of comparisons is then conducted on a number of test collections and retrieval tasks to determine the empirical and practical differences between the different models. Our results indicate that for sentence retrieval the multivariate Bernoulli model can significantly outperform the multinomial model. However, for the other tasks the multinomial model provides consistently better performance (and in most cases significantly so). An analysis of the various retrieval characteristics reveals that the multivariate Bernoulli model tends to promote long documents whose nonquery terms are informative. While this is detrimental to the task of document retrieval (documents tend to contain considerable nonquery content), it is valuable for other tasks such as sentence retrieval, where the retrieved elements are very short and focused.

TOIS 2008 Volume 26 Issue 4

Introduction to keeping, refinding and sharing personal information BIBFull-Text 18
  Deborah Barreau; Robert Capra; Susan Dumais; William Jones; Manuel Pérez-Quiñones
How people recall, recognize, and reuse search results BIBAFull-Text 19
  Jaime Teevan
When a person issues a query, that person has expectations about the search results that will be returned. These expectations can be based on the current information need, but are also influenced by how the searcher believes the search engine works, where relevant results are expected to be ranked, and any previous searches the individual has run on the topic. This paper looks in depth at how the expectations people develop about search result lists during an initial query affect their perceptions of and interactions with future repeat search result lists. Three studies are presented that give insight into how people recall, recognize, and reuse results. The first study (a study of recall) explores what people recall about previously viewed search result lists. The second study (a study of recognition) builds on the first to reveal that people often recognize a result list as one they have seen before even when it is quite different. As long as those aspects that the searcher remembers about the initial list remain the same, other aspects can change significantly. This is advantageous because, as the third study (a study of reuse) shows, when a result list appears to have changed, people have trouble re-using the previously viewed content in the list. They are less likely to find what they are looking for, less happy with the result quality, more likely to find the task hard, and more likely to take a long time searching. Although apparent consistency is important for reuse, people's inability to recognize change makes consistency without stagnation possible. New relevant results can be presented where old results have been forgotten, making both old and new content easy to find.
Improved search engines and navigation preference in personal information management BIBAFull-Text 20
  Ofer Bergman; Ruth Beyth-Marom; Rafi Nachmias; Noa Gradovitch; Steve Whittaker
Traditionally users access their personal files mainly by using folder navigation. We evaluate whether recent improvements in desktop search have changed this fundamental aspect of Personal Information Management (PIM). We tested this in two studies using the same questionnaire: (a) The Windows Study, a longitudinal comparison of Google Desktop and Windows XP Search Companion, and (b) The Mac Studya large scale comparison of Mac Spotlight and Sherlock. There were few effects for improved search. First, regardless of search engine, there was a strong navigation preference: on average, users estimated that they used navigation for 56-68% of file retrieval events but searched for only 4-15% of events. Second, the effect of improving the quality of the search engine on search usage was limited and inconsistent. Third, search was used mainly as a last resort when users could not remember file location. Finally, there was no evidence that using improved desktop search engines leads people to change their filing habits to become less reliant on hierarchical file organization. We conclude by offering theoretical explanations for navigation preference, relating to differences between PIM and Internet retrieval, and suggest alternative design directions for PIM systems.
Exploring memory in email refinding BIBAFull-Text 21
  David Elsweiler; Mark Baillie; Ian Ruthven
Human memory plays an important role in personal information management (PIM). Several scholars have noted that people refind information based on what they remember and it has been shown that people adapt their management strategies to compensate for the limitations of memory. Nevertheless, little is known about what people tend to remember about their personal information and how they use their memories to refind. The aim of this article is to increase our understanding of the role that memory plays in the process of refinding personal information. Concentrating on email re-finding, we report on a user study that investigates what attributes of email messages participants remember when trying to refind. We look at how the attributes change in different scenarios and examine the factors which impact on what is remembered.
Meta methods for model sharing in personal information systems BIBAFull-Text 22
  Stefan Siersdorfer; Sergej Sizov
This article introduces a methodology for automatically organizing document collections into thematic categories for Personal Information Management (PIM) through collaborative sharing of machine learning models in an efficient and privacy-preserving way. Our objective is to combine multiple independently learned models from several users to construct an advanced ensemble-based decision model by taking the knowledge of multiple users into account in a decentralized manner, for example, in a peer-to-peer overlay network. High accuracy of the corresponding supervised (classification) and unsupervised (clustering) methods is achieved by restrictively leaving out uncertain documents rather than assigning them to inappropriate topics or clusters with low confidence. We introduce a formal probabilistic model for the resulting ensemble based meta methods and explain how it can be used for constructing estimators and for goal-oriented tuning. Comprehensive evaluation results on different reference data sets illustrate the viability of our approach.
Organizing and managing personal electronic files: A mechanical engineer's perspective BIBAFull-Text 23
  B. J. Hicks; A. Dong; R. Palmer; H. C. Mcalpine
This article deals with the organization and management of the computer files handled by mechanical engineers on their personal computers. In engineering organizations, a wide variety of electronic files (documents) are necessary to support both business processes and the activities of design and manufacture. Whilst a large number of files and hence information is formally archived, a significant amount of additional information and knowledge resides in electronic files on personal computers. The widespread use of these personal information stores means that all information is retained. However, its reuse is problematic for all but the individual as a result of the naming and organization of the files. To begin to address this issue, a study of the use and current practices for managing personal electronic files is described. The study considers the fundamental classes of files handled by engineers and analyses the organization of these files across the personal computers of 40 participants. The study involves a questionnaire and an electronic audit. The results of these qualitative and quantitative elements are used to elicit an understanding of the practices and requirements of engineers for managing personal electronic files. A potential scheme for naming and organizing personal electronic files is discussed as one possible way to satisfy these requirements. The aim of the scheme is to balance the personal nature of data storage with the need for personal records to be shared with others to support knowledge reuse in engineering organizations. Although this article is concerned with mechanical engineers, the issues dealt with are relevant to knowledge-based industries and, in particular, teams of knowledge workers.
Information scraps: How and why information eludes our personal information management tools BIBAFull-Text 24
  Michael Bernstein; Max Van Kleek; David Karger; M. C. Schraefel
In this article we investigate information scraps -- personal information where content has been scribbled on Post-it notes, scrawled on the corners of sheets of paper, stuck in our pockets, sent in email messages to ourselves, and stashed in miscellaneous digital text files. Information scraps encode information ranging from ideas and sketches to notes, reminders, shipment tracking numbers, driving directions, and even poetry. Although information scraps are ubiquitous, we have much still to learn about these loose forms of information practice. Why do we keep information scraps outside of our traditional PIM applications? What role do information scraps play in our overall information practice? How might PIM applications be better designed to accommodate and support information scraps' creation, manipulation and retrieval?
   We pursued these questions by studying the information scrap practices of 27 knowledge workers at five organizations. Our observations shed light on information scraps' content, form, media, and location. From this data, we elaborate on the typical information scrap lifecycle, and identify common roles that information scraps play: temporary storage, archiving, work-in-progress, reminding, and management of unusual data. These roles suggest a set of unmet design needs in current PIM tools: lightweight entry, unconstrained content, flexible use and adaptability, visibility, and mobility.
Editorial: Reviewer merits and review control in an age of electronic manuscript management systems BIBAFull-Text 25
  Gary Marchionini
Peer review is an important resource of scholarly communities and must be managed and nurtured carefully. Electronic manuscript management systems have begun to improve some aspects of workflow for conferences and journals but also raise issues related to reviewer roles and reputations and the control of reviews over time. Professional societies should make their policies related to reviews and reviewer histories clear to authors and reviewers, develop strategies and tools to facilitate good and timely reviews, and facilitate the training of new reviewers.