HCI Bibliography Home | HCI Journals | About TOIS | Journal Info | TOIS Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TOIS Tables of Contents: 1718192021222324252627282930313233

ACM Transactions on Information Systems 27

Editors:Jamie Callan
Standard No:ISSN 1046-8188; HF S548.125 A33
Links:Table of Contents
  1. TOIS 2009 Volume 27 Issue 1
  2. TOIS 2009 Volume 27 Issue 2
  3. TOIS 2009 Volume 27 Issue 3
  4. TOIS 2009 Volume 27 Issue 4

TOIS 2009 Volume 27 Issue 1

Sound and complete relevance assessment for XML retrieval BIBAKFull-Text 1
  Benjamin Piwowarski; Andrew Trotman; Mounia Lalmas
In information retrieval research, comparing retrieval approaches requires test collections consisting of documents, user requests and relevance assessments. Obtaining relevance assessments that are as sound and complete as possible is crucial for the comparison of retrieval approaches. In XML retrieval, the problem of obtaining sound and complete relevance assessments is further complicated by the structural relationships between retrieval results.
   A major difference between XML retrieval and flat document retrieval is that the relevance of elements (the retrievable units) is not independent of that of related elements. This has major consequences for the gathering of relevance assessments. This article describes investigations into the creation of sound and complete relevance assessments for the evaluation of content-oriented XML retrieval as carried out at INEX, the evaluation campaign for XML retrieval. The campaign, now in its seventh year, has had three substantially different approaches to gather assessments and has finally settled on a highlighting method for marking relevant passages within documents -- even though the objective is to collect assessments at element level. The different methods of gathering assessments at INEX are discussed and contrasted. The highlighting method is shown to be the most reliable of the methods.
Keywords: INEX, XML, XML retrieval, evaluation, passage retrieval, relevance assessment
Rank-biased precision for measurement of retrieval effectiveness BIBAKFull-Text 2
  Alistair Moffat; Justin Zobel
A range of methods for measuring the effectiveness of information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings. For example, recall is not well founded as a measure of satisfaction, since the user of an actual system cannot judge recall. Average precision is derived from recall, and suffers from the same problem. In addition, average precision lacks key stability properties that are needed for robust experiments. In this article, we introduce a new effectiveness metric, rank-biased precision, that avoids these problems. Rank-biased pre-cision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.
Keywords: Recall, average precision, pooling, precision, relevance
Trusting spam reporters: A reporter-based reputation system for email filtering BIBAKFull-Text 3
  Elena Zheleva; Aleksander Kolcz; Lise Getoor
Spam is a growing problem; it interferes with valid email and burdens both email users and service providers. In this work, we propose a reactive spam-filtering system based on reporter reputation for use in conjunction with existing spam-filtering techniques. The system has a trust-maintenance component for users, based on their spam-reporting behavior. The challenge that we consider is that of maintaining a reliable system, not vulnerable to malicious users, that will provide early spam-campaign detection to reduce the costs incurred by users and systems. We report on the utility of a reputation system for spam filtering that makes use of the feedback of trustworthy users. We evaluate our proposed framework, using actual complaint feedback from a large population of users, and validate its spam-filtering performance on a collection of real email traffic over several weeks. To test the broader implication of the system, we create a model of the behavior of malicious reporters, and we simulate the system under various assumptions using a synthetic dataset.
Keywords: Spam filtering, reputation systems, trust
Extended probabilistic HAL with close temporal association for psychiatric query document retrieval BIBAKFull-Text 4
  Jui-Feng Yeh; Chung-Hsien Wu; Liang-Chih Yu; Yu-Sheng Lai
Psychiatric query document retrieval can assist individuals to locate query documents relevant to their depression-related problems efficiently and effectively. By referring to relevant documents, individuals can understand how to alleviate their depression-related symptoms according to recommendations from health professionals. This work presents an extended probabilistic Hyperspace Analog to Language (epHAL) model to achieve this aim. The epHAL incorporates the close temporal associations between words in query documents to represent word cooccurrence relationships in a high-dimensional context space. The information flow mechanism further combines the query words in the epHAL space to infer related words for effective information retrieval. The language model perplexity is considered as the criterion for model optimization. Finally, the epHAL is adopted for psychiatric query document retrieval, and indicates its superiority in information retrieval over traditional approaches.
Keywords: Hyperspace Analog to Language (HAL) model, Information retrieval, information flow, query documents
combinFormation: Mixed-initiative composition of image and text surrogates promotes information discovery BIBAKFull-Text 5
  Andruid Kerne; Eunyee Koh; Steven M. Smith; Andrew Webb; Blake Dworaczyk
combinFormation is a mixed-initiative creativity support tool for searching, browsing, organizing, and integrating information. Images and text are connected to represent surrogates (enhanced bookmarks), optimizing the use of human cognitive facilities. Composition, an alternative to lists and spatial hypertext, is used to represent a collection of surrogates as a connected whole, using principles from art and design. This facilitates the creative process of information discovery, in which humans develop new ideas while finding and collecting information. To provoke the user to think about the large space of potentially relevant information resources, a generative agent proactively engages in collecting information resources, forming image and text surrogates, and composing them visually. The agent develops the collection and its visual representation over time, enabling the user to see ideas and relationships. To keep the human in control, we develop interactive mechanisms for authoring the composition and directing the agent. In a field study in an interdisciplinary course on The Design Process, over a hundred students alternated using combinFormation and Google+Word to collect prior work on information discovery invention assignments. The students that used combinFormation's mixed-initiative composition of image and text surrogates performed better.
Keywords: Creativity support tools, clustering, collections, creative cognition, exploratory search, field study, focused crawler, information discovery, mixed-initiative systems, relevance feedback, semantics, software agents
Toward automatic facet analysis and need negotiation: Lessons from mediated search BIBAKFull-Text 6
  Jimmy Lin; Philip Wu; Eileen Abels
This work explores the hypothesis that interactions between a trained human search intermediary and an information seeker can inform the design of interactive IR systems. We discuss results from a controlled Wizard-of-Oz case study, set in the context of the TREC 2005 HARD track evaluation, in which a trained intermediary executed an integrated search and interaction strategy based on conceptual facet analysis and informed by need negotiation techniques common in reference interviews. Having a human "in the loop" yielded large improvements over fully automated systems as measured by standard ranked-retrieval metrics, demonstrating the value of mediated search. We present a detailed analysis of the intermediary's actions to gain a deeper understanding of what worked and why. One contribution is a taxonomy of clarification types informed both by empirical results and existing theories in library and information science. We discuss how these findings can guide the development of future systems. Overall, this work illustrates how studying human information-seeking processes can lead to better information retrieval applications.
Keywords: Reference interview, interactive information retrieval

TOIS 2009 Volume 27 Issue 2

Automatic metadata generation using associative networks BIBAKFull-Text 7
  Marko A. Rodriguez; Johan Bollen; Herbert Van De Sompel
In spite of its tremendous value, metadata is generally sparse and incomplete, thereby hampering the effectiveness of digital information services. Many of the existing mechanisms for the automated creation of metadata rely primarily on content analysis which can be costly and inefficient. The automatic metadata generation system proposed in this article leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources. Because of its independence from content analysis, it can be applied to a wide variety of resource media types and is shown to be computationally inexpensive. The proposed method operates through two distinct phases. Occurrence and cooccurrence algorithms first generate an associative network of repository resources leveraging existing repository metadata. Second, using the associative network as a substrate, metadata associated with metadata-rich resources is propagated to metadata-poor resources by means of a discrete-form spreading activation algorithm. This article discusses the general framework for building associative networks, an algorithm for disseminating metadata through such networks, and the results of an experiment and validation of the proposed method using a standard bibliographic dataset.
Keywords: Associative networks, metadata generation, particle-swarms
An analysis of latent semantic term self-correlation BIBAKFull-Text 8
  Laurence A. F. Park; Kotagiri Ramamohanarao
Latent semantic analysis (LSA) is a generalized vector space method that uses dimension reduction to generate term correlations for use during the information retrieval process. We hypothesized that even though the dimension reduction establishes correlations between terms, the dimension reduction is causing a degradation in the correlation of a term to itself (self-correlation). In this article, we have proven that there is a direct relationship to the size of the LSA dimension reduction and the LSA self-correlation. We have also shown that by altering the LSA term self-correlations we gain a substantial increase in precision, while also reducing the computation required during the information retrieval process.
Keywords: Latent semantic analysis, term correlation
An adaptive threshold framework for event detection using HMM-based life profiles BIBAKFull-Text 9
  Chien Chin Chen; Meng Chang Chen; Ming-Syan Chen
When an event occurs, it attracts attention of information sources to publish related documents along its lifespan. The task of event detection is to automatically identify events and their related documents from a document stream, which is a set of chronologically ordered documents collected from various information sources. Generally, each event has a distinct activeness development so that its status changes continuously during its lifespan. When an event is active, there are a lot of related documents from various information sources. In contrast when it is inactive, there are very few documents, but they are focused. Previous works on event detection did not consider the characteristics of the event's activeness, and used rigid thresholds for event detection. We propose a concept called life profile, modeled by a hidden Markov model, to model the activeness trends of events. In addition, a general event detection framework, LIPED, which utilizes the learned life profiles and the burst-and-diverse characteristic to adjust the event detection thresholds adaptively, can be incorporated into existing event detection methods. Based on the official TDT corpus and contest rules, the evaluation results show that existing detection methods that incorporate LIPED achieve better performance in the cost and F1 metrics, than without.
Keywords: Event detection, TDT, clustering, hidden Markov models, life profiles, topic detection
Information filtering and query indexing for an information retrieval model BIBAKFull-Text 10
  Christos Tryfonopoulos; Manolis Koubarakis; Yannis Drougas
In the information filtering paradigm, clients subscribe to a server with continuous queries or profiles that express their information needs. Clients can also publish documents to servers. Whenever a document is published, the continuous queries satisfying this document are found and notifications are sent to appropriate clients. This article deals with the filtering problem that needs to be solved efficiently by each server: Given a database of continuous queries db and a document d, find all queries q ∈ db that match d. We present data structures and indexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries expressed in the model AWP. AWP is based on named attributes with values of type text, and its query language includes Boolean and word proximity operators.
Keywords: Information filtering, performance evaluation, query indexing algorithms, selective dissemination of information
User language model for collaborative personalized search BIBAKFull-Text 11
  Gui-Rong Xue; Jie Han; Yong Yu; Qiang Yang
Traditional personalized search approaches rely solely on individual profiles to construct a user model. They are often confronted by two major problems: data sparseness and cold-start for new individuals. Data sparseness refers to the fact that most users only visit a small portion of Web pages and hence a very sparse user-term relationship matrix is generated, while cold-start for new individuals means that the system cannot conduct any personalization without previous browsing history. Recently, community-based approaches were proposed to use the group's social behaviors as a supplement to personalization. However, these approaches only consider the commonality of a group of users and still cannot satisfy the diverse information needs of different users. In this article, we present a new approach, called collaborative personalized search. It considers not only the commonality factor among users for defining group user profiles and global user profiles, but also the specialties of individuals. Then, a statistical user language model is proposed to integrate the individual model, group user model and global user model together. In this way, the probability that a user will like a Web page is calculated through a two-step smoothing mechanism. First, a global user model is used to smooth the probability of unseen terms in the individual profiles and provide aggregated behavior of global users. Then, in order to precisely describe individual interests by looking at the behaviors of similar users, users are clustered into groups and group-user models are constructed. The group-user models are integrated into an overall model through a cluster-based language model. The behaviors of the group users can be utilized to enhance the performance of personalized search. This model can alleviate the two aforementioned problems and provide a more effective personalized search than previous approaches. Large-scale experimental evaluations are conducted to show that the proposed approach substantially improves the relevance of a search over several competitive methods.
Keywords: Collaborative personalized search, clustering, cold-start, data Sparseness, smoothing, user language model
Textual analysis of stock market prediction using breaking financial news: The AZFin text system BIBAKFull-Text 12
  Robert P. Schumaker; Hsinchun Chen
Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: bag of words, noun phrases, and named entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a support vector machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.
Keywords: SVM, prediction, stock market

TOIS 2009 Volume 27 Issue 3

Clusters, language models, and ad hoc information retrieval BIBAKFull-Text 13
  Oren Kurland; Lillian Lee
The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in terms of mean average precision (MAP) and recall, and our new interpolation algorithm posts statistically significant performance improvements for both metrics over all six corpora tested. An important aspect of our work is the way we model corpus structure. In contrast to most previous work on cluster-based retrieval that partitions the corpus, we demonstrate the effectiveness of a simple strategy based on a nearest-neighbors approach that produces overlapping clusters.
Keywords: Language modeling, aspect models, cluster hypothesis, cluster-based language models, clustering, interpolation model, smoothing
Robust result merging using sample-based score estimates BIBAKFull-Text 14
  Milad Shokouhi; Justin Zobel
In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.
Keywords: Result merging, distributed information retrieval, result fusion, uncooperative collections
SEA: Segment-enrich-annotate paradigm for adapting dialog-based content for improved accessibility BIBAKFull-Text 15
  K. Selçuk Candan; Mehmet E. Dönderler; Terri Hedgpeth; Jong Wook Kim; Qing Li; Maria Luisa Sapino
While navigation within complex information spaces is a problem for all users, the problem is most evident with individuals who are blind who cannot simply locate, point, and click on a link in hypertext documents with a mouse. Users who are blind have to listen searching for the link in the document using only the keyboard and a screen reader program, which may be particularly inefficient in large documents with many links or deep hierarchies that are hard to navigate. Consequently, they are especially penalized when the information being searched is hidden under multiple layers of indirections. In this article, we introduce a segment-enrich-annotate (SEA) paradigm for adapting digital content with deep structures for improved accessibility. In particular, we instantiate and evaluate this paradigm through the iCare-Assistant, an assistive system for helping students who are blind in accessing Web and electronic course materials. Our evaluations, involving the participation of students who are blind, showed that the iCare-Assistant system, built based on the SEA paradigm, reduces the navigational overhead significantly and enables user who are blind access complex online course servers effectively.
Keywords: Web navigational aids, annotation, assistive technology for blind users, educational discussion boards and Web sites, segmentation
Semisupervised SVM batch mode active learning with applications to image retrieval BIBAKFull-Text 16
  Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu
Support vector machine (SVM) active learning is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional SVM active learning has two main drawbacks. First, the performance of SVM is usually limited by the number of labeled examples. It often suffers a poor performance for the small-sized labeled examples, which is the case in relevance feedback. Second, conventional approaches do not take into account the redundancy among examples, and could select multiple examples that are similar (or even identical). In this work, we propose a novel scheme for explicitly addressing the drawbacks. It first learns a kernel function from a mixture of labeled and unlabeled data, and therefore alleviates the problem of small-sized training data. The kernel will then be used for a batch mode active learning method to identify the most informative and diverse examples via a min-max framework. Two novel algorithms are proposed to solve the related combinatorial optimization: the first approach approximates the problem into a quadratic program, and the second solves the combinatorial optimization approximately by a greedy algorithm that exploits the merits of submodular functions. Extensive experiments with image retrieval using both natural photo images and medical images show that the proposed algorithms are significantly more effective than the state-of-the-art approaches. A demo is available at http://msm.cais.ntu.edu.sg/LSCBIR/.
Keywords: Content-based image retrieval, active learning, batch mode active learning, human-computer interaction, semisupervised learning, support vector machines
Bounded coordinate system indexing for real-time video clip search BIBAKFull-Text 17
  Zi Huang; Heng Tao Shen; Jie Shao; Xiaofang Zhou; Bin Cui
Recently, video clips have become very popular online. The massive influx of video clips has created an urgent need for video search engines to facilitate retrieving relevant clips. Different from traditional long videos, a video clip is a short video often expressing a moment of significance. Due to the high complexity of video data, efficient video clip search from large databases turns out to be very challenging. We propose a novel video clip representation model called the Bounded Coordinate System (BCS), which is the first single representative capturing the dominating content and content -- changing trends of a video clip. It summarizes a video clip by a coordinate system, where each of its coordinate axes is identified by principal component analysis (PCA) and bounded by the range of data projections along the axis. The similarity measure of BCS considers the operations of translation, rotation, and scaling for coordinate system matching. Particularly, rotation and scaling reflect the difference of content tendencies. Compared with the quadratic time complexity of existing methods, the time complexity of measuring BCS similarity is linear. The compact video representation together with its linear similarity measure makes real-time search from video clip collections feasible. To further improve the retrieval efficiency for large video databases, a two-dimensional transformation method called Bidistance Transformation (BDT) is introduced to utilize a pair of optimal reference points with respect to bidirectional axes in BCS. Our extensive performance study on a large database of more than 30,000 video clips demonstrates that BCS achieves very high search accuracy according to human judgment. This indicates that content tendencies are important in determining the meanings of video clips and confirms that BCS can capture the inherent moment of video clip to some extent that better resembles human perception. In addition, BDT outperforms existing indexing methods greatly. Integration of the BCS model and BDT indexing can achieve real-time search from large video clip databases.
Keywords: Video search, indexing, query processing, summarization
A novel framework for efficient automated singer identification in large music databases BIBAKFull-Text 18
  Jialie Shen; John Shepherd; Bin Cui; Kian-Lee Tan
Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for efficient automated singer recognition. HSI uses multiple low-level features extracted from both vocal and nonvocal music segments to enhance the identification process; it achieves this via a hybrid architecture that builds profiles of individual singer characteristics based on statistical mixture models. An extensive experimental study on a large music database demonstrates the superiority of our method over state-of-the-art approaches in terms of effectiveness, efficiency, scalability, and robustness.
Keywords: EM algorithm, Gaussian mixture models, Music retrieval, classification, evaluation, singer identification, statistical modeling

TOIS 2009 Volume 27 Issue 4

PageRank: Functional dependencies BIBKFull-Text 19
  Paolo Boldi; Massimo Santini; Sebastiano Vigna
Keywords: PageRank, damping factor, power method
Building a framework for the probability ranking principle by a family of expected weighted rank BIBAKFull-Text 20
  Edward Kai Fung Dang; Ho Chung Wu; Robert Wing Pong Luk; Kam Fai Wong
A new principles framework is presented for retrieval evaluation of ranked outputs. It applies decision theory to model relevance decision preferences and shows that the Probability Ranking Principle (PRP) specifies optimal ranking. It has two new components, namely a probabilistic evaluation model and a general measure of retrieval effectiveness. Its probabilities may be interpreted as subjective or objective ones. Its performance measure is the expected weighted rank which is the weighted average rank of a retrieval list. Starting from this measure, the expected forward rank and some existing retrieval effectiveness measures (e.g., top n precision and discounted cumulative gain) are instantiated using suitable weighting schemes after making certain assumptions. The significance of these instantiations is that the ranking prescribed by PRP is shown to be optimal simultaneously for all these existing performance measures. In addition, the optimal expected weighted rank may be used to normalize the expected weighted rank of retrieval systems for (summary) performance comparison (across different topics) between systems. The framework also extends PRP and our evaluation model to handle graded relevance, thereby generalizing the discussed, existing measures (e.g., top n precision) and probabilistic retrieval models for graded relevance.
Keywords: Probability ranking principle, optimization
A few good topics: Experiments in topic set reduction for retrieval evaluation BIBAKFull-Text 21
  John Guiver; Stefano Mizzaro; Stephen Robertson
We consider the issue of evaluating information retrieval systems on the basis of a limited number of topics. In contrast to statistically-based work on sample sizes, we hypothesize that some topics or topic sets are better than others at predicting true system effectiveness, and that with the right choice of topics, accurate predictions can be obtained from small topics sets. Using a variety of effectiveness metrics and measures of goodness of prediction, a study of a set of TREC and NTCIR results confirms this hypothesis, and provides evidence that the value of a topic set for this purpose does generalize.
Keywords: Search effectiveness, evaluation experiments, test corpora, topic selection
A distributed, service-based framework for knowledge applications with multimedia BIBAKFull-Text 22
  David Dupplaw; Srinandan Dasmahapatra; Bo Hu; Paul Lewis; Nigel Shadbolt
The current trend in distributed systems is towards service-based integration. This article describes an ontology-driven framework implemented to provide knowledge management for data of different modalities, with multimedia processing, annotation, and reasoning provided by remote services. The framework was developed in, and is presented in the context of, the Medical Imaging and Advanced Knowledge Technologies (MIAKT) project that sought to support the Multidisciplinary Meetings (MDMs) that take place during breast cancer screening for diagnosing the patient. However, the architecture is entirely independent of the specific application domain and can be quickly prototyped into new domains. An Enterprise server provides resource access to a client-side presentation application which, in turn, provides knowledge visualization and markup of any supported media, as defined by a domain-dependent ontology-supported language.
Keywords: Semantic Web, breast cancer, decision support, health, ontologies, services
Cyberchondria: Studies of the escalation of medical concerns in Web search BIBAKFull-Text 23
  Ryen W. White; Eric Horvitz
The World Wide Web provides an abundant source of medical information. This information can assist people who are not healthcare professionals to better understand health and illness, and to provide them with feasible explanations for symptoms. However, the Web has the potential to increase the anxieties of people who have little or no medical training, especially when Web search is employed as a diagnostic procedure. We use the term cyberchondria to refer to the unfounded escalation of concerns about common symptomatology, based on the review of search results and literature on the Web. We performed a large-scale, longitudinal, log-based study of how people search for medical information online, supported by a survey of 515 individuals' health-related search experiences. We focused on the extent to which common, likely innocuous symptoms can escalate into the review of content on serious, rare conditions that are linked to the common symptoms. Our results show that Web search engines have the potential to escalate medical concerns. We show that escalation is associated with the amount and distribution of medical content viewed by users, the presence of escalatory terminology in pages visited, and a user's predisposition to escalate versus to seek more reasonable explanations for ailments. We also demonstrate the persistence of postsession anxiety following escalations and the effect that such anxieties can have on interrupting user's activities across multiple sessions. Our findings underscore the potential costs and challenges of cyberchondria and suggest actionable design implications that hold opportunity for improving the search and navigation experience for people turning to the Web to interpret common symptoms.
Keywords: Cyberchondria
MUADDIB: A distributed recommender system supporting device adaptivity BIBAKFull-Text 24
  Domenico Rosaci; Giuseppe M. L. Sarné; Salvatore Garruzzo
Web recommender systems are Web applications capable of generating useful suggestions for visitors of Internet sites. However, in the case of large user communities and in presence of a high number of Web sites, these tasks are computationally onerous, even more if the client software runs on devices with limited resources. Moreover, the quality of the recommendations strictly depends on how the recommendation algorithm takes into account the currently used device. Some approaches proposed in the literature provide multidimensional recommendations considering, besides items and users, also the exploited device. However, these systems do not efficiently perform, since they assign to either the client or the server the arduous cost of computing recommendations. In this article, we argue that a fully distributed organization is a suitable solution to improve the efficiency of multidimensional recommender systems. In order to address these issues, we propose a novel distributed architecture, called MUADDIB, where each user's device is provided with a device assistant that autonomously retrieves information about the user's behavior. Moreover, a single profiler, associated with the user, periodically collects information coming from the different user's device assistants to construct a global user's profile. In order to generate recommendations, a recommender precomputes data provided by the profilers. This way, the site manager has only the task of suitably presenting the content of the site, while the computation of the recommendations is assigned to the other distributed components. Some experiments conducted on real data and using some well-known metrics show that the system works more effectively and efficiently than other device-based distributed recommenders.
Keywords: Recommender systems, adaptivity, personalization