| Sound and complete relevance assessment for XML retrieval | | BIBAK | Full-Text | 1 | |
| Benjamin Piwowarski; Andrew Trotman; Mounia Lalmas | |||
| In information retrieval research, comparing retrieval approaches requires
test collections consisting of documents, user requests and relevance
assessments. Obtaining relevance assessments that are as sound and complete as
possible is crucial for the comparison of retrieval approaches. In XML
retrieval, the problem of obtaining sound and complete relevance assessments is
further complicated by the structural relationships between retrieval results.
A major difference between XML retrieval and flat document retrieval is that the relevance of elements (the retrievable units) is not independent of that of related elements. This has major consequences for the gathering of relevance assessments. This article describes investigations into the creation of sound and complete relevance assessments for the evaluation of content-oriented XML retrieval as carried out at INEX, the evaluation campaign for XML retrieval. The campaign, now in its seventh year, has had three substantially different approaches to gather assessments and has finally settled on a highlighting method for marking relevant passages within documents -- even though the objective is to collect assessments at element level. The different methods of gathering assessments at INEX are discussed and contrasted. The highlighting method is shown to be the most reliable of the methods. Keywords: INEX, XML, XML retrieval, evaluation, passage retrieval, relevance
assessment | |||
| Rank-biased precision for measurement of retrieval effectiveness | | BIBAK | Full-Text | 2 | |
| Alistair Moffat; Justin Zobel | |||
| A range of methods for measuring the effectiveness of information retrieval
systems has been proposed. These are typically intended to provide a
quantitative single-value summary of a document ranking relative to a query.
However, many of these measures have failings. For example, recall is not well
founded as a measure of satisfaction, since the user of an actual system cannot
judge recall. Average precision is derived from recall, and suffers from the
same problem. In addition, average precision lacks key stability properties
that are needed for robust experiments. In this article, we introduce a new
effectiveness metric, rank-biased precision, that avoids these problems.
Rank-biased pre-cision is derived from a simple model of user behavior, is
robust if answer rankings are extended to greater depths, and allows accurate
quantification of experimental uncertainty, even when only partial relevance
judgments are available. Keywords: Recall, average precision, pooling, precision, relevance | |||
| Trusting spam reporters: A reporter-based reputation system for email filtering | | BIBAK | Full-Text | 3 | |
| Elena Zheleva; Aleksander Kolcz; Lise Getoor | |||
| Spam is a growing problem; it interferes with valid email and burdens both
email users and service providers. In this work, we propose a reactive
spam-filtering system based on reporter reputation for use in conjunction with
existing spam-filtering techniques. The system has a trust-maintenance
component for users, based on their spam-reporting behavior. The challenge that
we consider is that of maintaining a reliable system, not vulnerable to
malicious users, that will provide early spam-campaign detection to reduce the
costs incurred by users and systems. We report on the utility of a reputation
system for spam filtering that makes use of the feedback of trustworthy users.
We evaluate our proposed framework, using actual complaint feedback from a
large population of users, and validate its spam-filtering performance on a
collection of real email traffic over several weeks. To test the broader
implication of the system, we create a model of the behavior of malicious
reporters, and we simulate the system under various assumptions using a
synthetic dataset. Keywords: Spam filtering, reputation systems, trust | |||
| Extended probabilistic HAL with close temporal association for psychiatric query document retrieval | | BIBAK | Full-Text | 4 | |
| Jui-Feng Yeh; Chung-Hsien Wu; Liang-Chih Yu; Yu-Sheng Lai | |||
| Psychiatric query document retrieval can assist individuals to locate query
documents relevant to their depression-related problems efficiently and
effectively. By referring to relevant documents, individuals can understand how
to alleviate their depression-related symptoms according to recommendations
from health professionals. This work presents an extended probabilistic
Hyperspace Analog to Language (epHAL) model to achieve this aim. The epHAL
incorporates the close temporal associations between words in query documents
to represent word cooccurrence relationships in a high-dimensional context
space. The information flow mechanism further combines the query words in the
epHAL space to infer related words for effective information retrieval. The
language model perplexity is considered as the criterion for model
optimization. Finally, the epHAL is adopted for psychiatric query document
retrieval, and indicates its superiority in information retrieval over
traditional approaches. Keywords: Hyperspace Analog to Language (HAL) model, Information retrieval,
information flow, query documents | |||
| combinFormation: Mixed-initiative composition of image and text surrogates promotes information discovery | | BIBAK | Full-Text | 5 | |
| Andruid Kerne; Eunyee Koh; Steven M. Smith; Andrew Webb; Blake Dworaczyk | |||
| combinFormation is a mixed-initiative creativity support tool for searching,
browsing, organizing, and integrating information. Images and text are
connected to represent surrogates (enhanced bookmarks), optimizing the use of
human cognitive facilities. Composition, an alternative to lists and spatial
hypertext, is used to represent a collection of surrogates as a connected
whole, using principles from art and design. This facilitates the creative
process of information discovery, in which humans develop new ideas while
finding and collecting information. To provoke the user to think about the
large space of potentially relevant information resources, a generative agent
proactively engages in collecting information resources, forming image and text
surrogates, and composing them visually. The agent develops the collection and
its visual representation over time, enabling the user to see ideas and
relationships. To keep the human in control, we develop interactive mechanisms
for authoring the composition and directing the agent. In a field study in an
interdisciplinary course on The Design Process, over a hundred students
alternated using combinFormation and Google+Word to collect prior work on
information discovery invention assignments. The students that used
combinFormation's mixed-initiative composition of image and text surrogates
performed better. Keywords: Creativity support tools, clustering, collections, creative cognition,
exploratory search, field study, focused crawler, information discovery,
mixed-initiative systems, relevance feedback, semantics, software agents | |||
| Toward automatic facet analysis and need negotiation: Lessons from mediated search | | BIBAK | Full-Text | 6 | |
| Jimmy Lin; Philip Wu; Eileen Abels | |||
| This work explores the hypothesis that interactions between a trained human
search intermediary and an information seeker can inform the design of
interactive IR systems. We discuss results from a controlled Wizard-of-Oz case
study, set in the context of the TREC 2005 HARD track evaluation, in which a
trained intermediary executed an integrated search and interaction strategy
based on conceptual facet analysis and informed by need negotiation techniques
common in reference interviews. Having a human "in the loop" yielded large
improvements over fully automated systems as measured by standard
ranked-retrieval metrics, demonstrating the value of mediated search. We
present a detailed analysis of the intermediary's actions to gain a deeper
understanding of what worked and why. One contribution is a taxonomy of
clarification types informed both by empirical results and existing theories in
library and information science. We discuss how these findings can guide the
development of future systems. Overall, this work illustrates how studying
human information-seeking processes can lead to better information retrieval
applications. Keywords: Reference interview, interactive information retrieval | |||
| Automatic metadata generation using associative networks | | BIBAK | Full-Text | 7 | |
| Marko A. Rodriguez; Johan Bollen; Herbert Van De Sompel | |||
| In spite of its tremendous value, metadata is generally sparse and
incomplete, thereby hampering the effectiveness of digital information
services. Many of the existing mechanisms for the automated creation of
metadata rely primarily on content analysis which can be costly and
inefficient. The automatic metadata generation system proposed in this article
leverages resource relationships generated from existing metadata as a medium
for propagation from metadata-rich to metadata-poor resources. Because of its
independence from content analysis, it can be applied to a wide variety of
resource media types and is shown to be computationally inexpensive. The
proposed method operates through two distinct phases. Occurrence and
cooccurrence algorithms first generate an associative network of repository
resources leveraging existing repository metadata. Second, using the
associative network as a substrate, metadata associated with metadata-rich
resources is propagated to metadata-poor resources by means of a discrete-form
spreading activation algorithm. This article discusses the general framework
for building associative networks, an algorithm for disseminating metadata
through such networks, and the results of an experiment and validation of the
proposed method using a standard bibliographic dataset. Keywords: Associative networks, metadata generation, particle-swarms | |||
| An analysis of latent semantic term self-correlation | | BIBAK | Full-Text | 8 | |
| Laurence A. F. Park; Kotagiri Ramamohanarao | |||
| Latent semantic analysis (LSA) is a generalized vector space method that
uses dimension reduction to generate term correlations for use during the
information retrieval process. We hypothesized that even though the dimension
reduction establishes correlations between terms, the dimension reduction is
causing a degradation in the correlation of a term to itself
(self-correlation). In this article, we have proven that there is a direct
relationship to the size of the LSA dimension reduction and the LSA
self-correlation. We have also shown that by altering the LSA term
self-correlations we gain a substantial increase in precision, while also
reducing the computation required during the information retrieval process. Keywords: Latent semantic analysis, term correlation | |||
| An adaptive threshold framework for event detection using HMM-based life profiles | | BIBAK | Full-Text | 9 | |
| Chien Chin Chen; Meng Chang Chen; Ming-Syan Chen | |||
| When an event occurs, it attracts attention of information sources to
publish related documents along its lifespan. The task of event detection is to
automatically identify events and their related documents from a document
stream, which is a set of chronologically ordered documents collected from
various information sources. Generally, each event has a distinct activeness
development so that its status changes continuously during its lifespan. When
an event is active, there are a lot of related documents from various
information sources. In contrast when it is inactive, there are very few
documents, but they are focused. Previous works on event detection did not
consider the characteristics of the event's activeness, and used rigid
thresholds for event detection. We propose a concept called life profile,
modeled by a hidden Markov model, to model the activeness trends of events. In
addition, a general event detection framework, LIPED, which utilizes the
learned life profiles and the burst-and-diverse characteristic to adjust the
event detection thresholds adaptively, can be incorporated into existing event
detection methods. Based on the official TDT corpus and contest rules, the
evaluation results show that existing detection methods that incorporate LIPED
achieve better performance in the cost and F1 metrics, than without. Keywords: Event detection, TDT, clustering, hidden Markov models, life profiles, topic
detection | |||
| Information filtering and query indexing for an information retrieval model | | BIBAK | Full-Text | 10 | |
| Christos Tryfonopoulos; Manolis Koubarakis; Yannis Drougas | |||
| In the information filtering paradigm, clients subscribe to a server with
continuous queries or profiles that express their information needs. Clients
can also publish documents to servers. Whenever a document is published, the
continuous queries satisfying this document are found and notifications are
sent to appropriate clients. This article deals with the filtering problem that
needs to be solved efficiently by each server: Given a database of continuous
queries db and a document d, find all queries q ∈ db that match d. We
present data structures and indexing algorithms that enable us to solve the
filtering problem efficiently for large databases of queries expressed in the
model AWP. AWP is based on named attributes with values of type text, and its
query language includes Boolean and word proximity operators. Keywords: Information filtering, performance evaluation, query indexing algorithms,
selective dissemination of information | |||
| User language model for collaborative personalized search | | BIBAK | Full-Text | 11 | |
| Gui-Rong Xue; Jie Han; Yong Yu; Qiang Yang | |||
| Traditional personalized search approaches rely solely on individual
profiles to construct a user model. They are often confronted by two major
problems: data sparseness and cold-start for new individuals. Data sparseness
refers to the fact that most users only visit a small portion of Web pages and
hence a very sparse user-term relationship matrix is generated, while
cold-start for new individuals means that the system cannot conduct any
personalization without previous browsing history. Recently, community-based
approaches were proposed to use the group's social behaviors as a supplement to
personalization. However, these approaches only consider the commonality of a
group of users and still cannot satisfy the diverse information needs of
different users. In this article, we present a new approach, called
collaborative personalized search. It considers not only the commonality factor
among users for defining group user profiles and global user profiles, but also
the specialties of individuals. Then, a statistical user language model is
proposed to integrate the individual model, group user model and global user
model together. In this way, the probability that a user will like a Web page
is calculated through a two-step smoothing mechanism. First, a global user
model is used to smooth the probability of unseen terms in the individual
profiles and provide aggregated behavior of global users. Then, in order to
precisely describe individual interests by looking at the behaviors of similar
users, users are clustered into groups and group-user models are constructed.
The group-user models are integrated into an overall model through a
cluster-based language model. The behaviors of the group users can be utilized
to enhance the performance of personalized search. This model can alleviate the
two aforementioned problems and provide a more effective personalized search
than previous approaches. Large-scale experimental evaluations are conducted to
show that the proposed approach substantially improves the relevance of a
search over several competitive methods. Keywords: Collaborative personalized search, clustering, cold-start, data Sparseness,
smoothing, user language model | |||
| Textual analysis of stock market prediction using breaking financial news: The AZFin text system | | BIBAK | Full-Text | 12 | |
| Robert P. Schumaker; Hsinchun Chen | |||
| Our research examines a predictive machine learning approach for financial
news articles analysis using several different textual representations: bag of
words, noun phrases, and named entities. Through this approach, we investigated
9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500
stocks during a five week period. We applied our analysis to estimate a
discrete stock price twenty minutes after a news article was released. Using a
support vector machine (SVM) derivative specially tailored for discrete numeric
prediction and models containing different stock-specific variables, we show
that the model containing both article terms and stock price at the time of
article release had the best performance in closeness to the actual future
stock price (MSE 0.04261), the same direction of price movement as the future
price (57.1% directional accuracy) and the highest return using a simulated
trading engine (2.06% return). We further investigated the different textual
representations and found that a Proper Noun scheme performs better than the de
facto standard of Bag of Words in all three metrics. Keywords: SVM, prediction, stock market | |||
| Clusters, language models, and ad hoc information retrieval | | BIBAK | Full-Text | 13 | |
| Oren Kurland; Lillian Lee | |||
| The language-modeling approach to information retrieval provides an
effective statistical framework for tackling various problems and often
achieves impressive empirical performance. However, most previous work on
language models for information retrieval focused on document-specific
characteristics, and therefore did not take into account the structure of the
surrounding corpus, a potentially rich source of additional information. We
propose a novel algorithmic framework in which information provided by
document-based language models is enhanced by the incorporation of information
drawn from clusters of similar documents. Using this framework, we develop a
suite of new algorithms. Even the simplest typically outperforms the standard
language-modeling approach in terms of mean average precision (MAP) and recall,
and our new interpolation algorithm posts statistically significant performance
improvements for both metrics over all six corpora tested. An important aspect
of our work is the way we model corpus structure. In contrast to most previous
work on cluster-based retrieval that partitions the corpus, we demonstrate the
effectiveness of a simple strategy based on a nearest-neighbors approach that
produces overlapping clusters. Keywords: Language modeling, aspect models, cluster hypothesis, cluster-based language
models, clustering, interpolation model, smoothing | |||
| Robust result merging using sample-based score estimates | | BIBAK | Full-Text | 14 | |
| Milad Shokouhi; Justin Zobel | |||
| In federated information retrieval, a query is routed to multiple
collections and a single answer list is constructed by combining the results.
Such metasearch provides a mechanism for locating documents on the hidden Web
and, by use of sampling, can proceed even when the collections are
uncooperative. However, the similarity scores for documents returned from
different collections are not comparable, and, in uncooperative environments,
document scores are unlikely to be reported. We introduce a new merging method
for uncooperative environments, in which similarity scores for the sampled
documents held for each collection are used to estimate global scores for the
documents returned per query. This method requires no assumptions about
properties such as the retrieval models used. Using experiments on a wide range
of collections, we show that in many cases our merging methods are
significantly more effective than previous techniques. Keywords: Result merging, distributed information retrieval, result fusion,
uncooperative collections | |||
| SEA: Segment-enrich-annotate paradigm for adapting dialog-based content for improved accessibility | | BIBAK | Full-Text | 15 | |
| K. Selçuk Candan; Mehmet E. Dönderler; Terri Hedgpeth; Jong Wook Kim; Qing Li; Maria Luisa Sapino | |||
| While navigation within complex information spaces is a problem for all
users, the problem is most evident with individuals who are blind who cannot
simply locate, point, and click on a link in hypertext documents with a mouse.
Users who are blind have to listen searching for the link in the document using
only the keyboard and a screen reader program, which may be particularly
inefficient in large documents with many links or deep hierarchies that are
hard to navigate. Consequently, they are especially penalized when the
information being searched is hidden under multiple layers of indirections. In
this article, we introduce a segment-enrich-annotate (SEA) paradigm for
adapting digital content with deep structures for improved accessibility. In
particular, we instantiate and evaluate this paradigm through the
iCare-Assistant, an assistive system for helping students who are blind in
accessing Web and electronic course materials. Our evaluations, involving the
participation of students who are blind, showed that the iCare-Assistant
system, built based on the SEA paradigm, reduces the navigational overhead
significantly and enables user who are blind access complex online course
servers effectively. Keywords: Web navigational aids, annotation, assistive technology for blind users,
educational discussion boards and Web sites, segmentation | |||
| Semisupervised SVM batch mode active learning with applications to image retrieval | | BIBAK | Full-Text | 16 | |
| Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu | |||
| Support vector machine (SVM) active learning is one popular and successful
technique for relevance feedback in content-based image retrieval (CBIR).
Despite the success, conventional SVM active learning has two main drawbacks.
First, the performance of SVM is usually limited by the number of labeled
examples. It often suffers a poor performance for the small-sized labeled
examples, which is the case in relevance feedback. Second, conventional
approaches do not take into account the redundancy among examples, and could
select multiple examples that are similar (or even identical). In this work, we
propose a novel scheme for explicitly addressing the drawbacks. It first learns
a kernel function from a mixture of labeled and unlabeled data, and therefore
alleviates the problem of small-sized training data. The kernel will then be
used for a batch mode active learning method to identify the most informative
and diverse examples via a min-max framework. Two novel algorithms are proposed
to solve the related combinatorial optimization: the first approach
approximates the problem into a quadratic program, and the second solves the
combinatorial optimization approximately by a greedy algorithm that exploits
the merits of submodular functions. Extensive experiments with image retrieval
using both natural photo images and medical images show that the proposed
algorithms are significantly more effective than the state-of-the-art
approaches. A demo is available at http://msm.cais.ntu.edu.sg/LSCBIR/. Keywords: Content-based image retrieval, active learning, batch mode active learning,
human-computer interaction, semisupervised learning, support vector machines | |||
| Bounded coordinate system indexing for real-time video clip search | | BIBAK | Full-Text | 17 | |
| Zi Huang; Heng Tao Shen; Jie Shao; Xiaofang Zhou; Bin Cui | |||
| Recently, video clips have become very popular online. The massive influx of
video clips has created an urgent need for video search engines to facilitate
retrieving relevant clips. Different from traditional long videos, a video clip
is a short video often expressing a moment of significance. Due to the high
complexity of video data, efficient video clip search from large databases
turns out to be very challenging. We propose a novel video clip representation
model called the Bounded Coordinate System (BCS), which is the first single
representative capturing the dominating content and content -- changing trends
of a video clip. It summarizes a video clip by a coordinate system, where each
of its coordinate axes is identified by principal component analysis (PCA) and
bounded by the range of data projections along the axis. The similarity measure
of BCS considers the operations of translation, rotation, and scaling for
coordinate system matching. Particularly, rotation and scaling reflect the
difference of content tendencies. Compared with the quadratic time complexity
of existing methods, the time complexity of measuring BCS similarity is linear.
The compact video representation together with its linear similarity measure
makes real-time search from video clip collections feasible. To further improve
the retrieval efficiency for large video databases, a two-dimensional
transformation method called Bidistance Transformation (BDT) is introduced to
utilize a pair of optimal reference points with respect to bidirectional axes
in BCS. Our extensive performance study on a large database of more than 30,000
video clips demonstrates that BCS achieves very high search accuracy according
to human judgment. This indicates that content tendencies are important in
determining the meanings of video clips and confirms that BCS can capture the
inherent moment of video clip to some extent that better resembles human
perception. In addition, BDT outperforms existing indexing methods greatly.
Integration of the BCS model and BDT indexing can achieve real-time search from
large video clip databases. Keywords: Video search, indexing, query processing, summarization | |||
| A novel framework for efficient automated singer identification in large music databases | | BIBAK | Full-Text | 18 | |
| Jialie Shen; John Shepherd; Bin Cui; Kian-Lee Tan | |||
| Over the past decade, there has been explosive growth in the availability of
multimedia data, particularly image, video, and music. Because of this,
content-based music retrieval has attracted attention from the multimedia
database and information retrieval communities. Content-based music retrieval
requires us to be able to automatically identify particular characteristics of
music data. One such characteristic, useful in a range of applications, is the
identification of the singer in a musical piece. Unfortunately, existing
approaches to this problem suffer from either low accuracy or poor scalability.
In this article, we propose a novel scheme, called Hybrid Singer Identifier
(HSI), for efficient automated singer recognition. HSI uses multiple low-level
features extracted from both vocal and nonvocal music segments to enhance the
identification process; it achieves this via a hybrid architecture that builds
profiles of individual singer characteristics based on statistical mixture
models. An extensive experimental study on a large music database demonstrates
the superiority of our method over state-of-the-art approaches in terms of
effectiveness, efficiency, scalability, and robustness. Keywords: EM algorithm, Gaussian mixture models, Music retrieval, classification,
evaluation, singer identification, statistical modeling | |||
| PageRank: Functional dependencies | | BIBK | Full-Text | 19 | |
| Paolo Boldi; Massimo Santini; Sebastiano Vigna | |||
Keywords: PageRank, damping factor, power method | |||
| Building a framework for the probability ranking principle by a family of expected weighted rank | | BIBAK | Full-Text | 20 | |
| Edward Kai Fung Dang; Ho Chung Wu; Robert Wing Pong Luk; Kam Fai Wong | |||
| A new principles framework is presented for retrieval evaluation of ranked
outputs. It applies decision theory to model relevance decision preferences and
shows that the Probability Ranking Principle (PRP) specifies optimal ranking.
It has two new components, namely a probabilistic evaluation model and a
general measure of retrieval effectiveness. Its probabilities may be
interpreted as subjective or objective ones. Its performance measure is the
expected weighted rank which is the weighted average rank of a retrieval list.
Starting from this measure, the expected forward rank and some existing
retrieval effectiveness measures (e.g., top n precision and discounted
cumulative gain) are instantiated using suitable weighting schemes after making
certain assumptions. The significance of these instantiations is that the
ranking prescribed by PRP is shown to be optimal simultaneously for all these
existing performance measures. In addition, the optimal expected weighted rank
may be used to normalize the expected weighted rank of retrieval systems for
(summary) performance comparison (across different topics) between systems. The
framework also extends PRP and our evaluation model to handle graded relevance,
thereby generalizing the discussed, existing measures (e.g., top n precision)
and probabilistic retrieval models for graded relevance. Keywords: Probability ranking principle, optimization | |||
| A few good topics: Experiments in topic set reduction for retrieval evaluation | | BIBAK | Full-Text | 21 | |
| John Guiver; Stefano Mizzaro; Stephen Robertson | |||
| We consider the issue of evaluating information retrieval systems on the
basis of a limited number of topics. In contrast to statistically-based work on
sample sizes, we hypothesize that some topics or topic sets are better than
others at predicting true system effectiveness, and that with the right choice
of topics, accurate predictions can be obtained from small topics sets. Using a
variety of effectiveness metrics and measures of goodness of prediction, a
study of a set of TREC and NTCIR results confirms this hypothesis, and provides
evidence that the value of a topic set for this purpose does generalize. Keywords: Search effectiveness, evaluation experiments, test corpora, topic selection | |||
| A distributed, service-based framework for knowledge applications with multimedia | | BIBAK | Full-Text | 22 | |
| David Dupplaw; Srinandan Dasmahapatra; Bo Hu; Paul Lewis; Nigel Shadbolt | |||
| The current trend in distributed systems is towards service-based
integration. This article describes an ontology-driven framework implemented to
provide knowledge management for data of different modalities, with multimedia
processing, annotation, and reasoning provided by remote services. The
framework was developed in, and is presented in the context of, the Medical
Imaging and Advanced Knowledge Technologies (MIAKT) project that sought to
support the Multidisciplinary Meetings (MDMs) that take place during breast
cancer screening for diagnosing the patient. However, the architecture is
entirely independent of the specific application domain and can be quickly
prototyped into new domains. An Enterprise server provides resource access to a
client-side presentation application which, in turn, provides knowledge
visualization and markup of any supported media, as defined by a
domain-dependent ontology-supported language. Keywords: Semantic Web, breast cancer, decision support, health, ontologies, services | |||
| Cyberchondria: Studies of the escalation of medical concerns in Web search | | BIBAK | Full-Text | 23 | |
| Ryen W. White; Eric Horvitz | |||
| The World Wide Web provides an abundant source of medical information. This
information can assist people who are not healthcare professionals to better
understand health and illness, and to provide them with feasible explanations
for symptoms. However, the Web has the potential to increase the anxieties of
people who have little or no medical training, especially when Web search is
employed as a diagnostic procedure. We use the term cyberchondria to refer to
the unfounded escalation of concerns about common symptomatology, based on the
review of search results and literature on the Web. We performed a large-scale,
longitudinal, log-based study of how people search for medical information
online, supported by a survey of 515 individuals' health-related search
experiences. We focused on the extent to which common, likely innocuous
symptoms can escalate into the review of content on serious, rare conditions
that are linked to the common symptoms. Our results show that Web search
engines have the potential to escalate medical concerns. We show that
escalation is associated with the amount and distribution of medical content
viewed by users, the presence of escalatory terminology in pages visited, and a
user's predisposition to escalate versus to seek more reasonable explanations
for ailments. We also demonstrate the persistence of postsession anxiety
following escalations and the effect that such anxieties can have on
interrupting user's activities across multiple sessions. Our findings
underscore the potential costs and challenges of cyberchondria and suggest
actionable design implications that hold opportunity for improving the search
and navigation experience for people turning to the Web to interpret common
symptoms. Keywords: Cyberchondria | |||
| MUADDIB: A distributed recommender system supporting device adaptivity | | BIBAK | Full-Text | 24 | |
| Domenico Rosaci; Giuseppe M. L. Sarné; Salvatore Garruzzo | |||
| Web recommender systems are Web applications capable of generating useful
suggestions for visitors of Internet sites. However, in the case of large user
communities and in presence of a high number of Web sites, these tasks are
computationally onerous, even more if the client software runs on devices with
limited resources. Moreover, the quality of the recommendations strictly
depends on how the recommendation algorithm takes into account the currently
used device. Some approaches proposed in the literature provide
multidimensional recommendations considering, besides items and users, also the
exploited device. However, these systems do not efficiently perform, since they
assign to either the client or the server the arduous cost of computing
recommendations. In this article, we argue that a fully distributed
organization is a suitable solution to improve the efficiency of
multidimensional recommender systems. In order to address these issues, we
propose a novel distributed architecture, called MUADDIB, where each user's
device is provided with a device assistant that autonomously retrieves
information about the user's behavior. Moreover, a single profiler, associated
with the user, periodically collects information coming from the different
user's device assistants to construct a global user's profile. In order to
generate recommendations, a recommender precomputes data provided by the
profilers. This way, the site manager has only the task of suitably presenting
the content of the site, while the computation of the recommendations is
assigned to the other distributed components. Some experiments conducted on
real data and using some well-known metrics show that the system works more
effectively and efficiently than other device-based distributed recommenders. Keywords: Recommender systems, adaptivity, personalization | |||