HCI Bibliography Home | HCI Journals | About TWEB | Journal Info | TWEB Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TWEB Tables of Contents: 0102030405060708

ACM Transactions on The Web 6

Editors:Marc Najork
Dates:2012
Volume:6
Publisher:ACM
Standard No:ISSN:1559-1131 EISSN:1559-114X
Papers:19
Links:Journal Home Page | ACM Digital Library | Table of Contents
  1. TWEB 2012-03 Volume 6 Issue 1
  2. TWEB 2012-05 Volume 6 Issue 2
  3. TWEB 2012-09 Volume 6 Issue 3
  4. TWEB 2012-11 Volume 6 Issue 4

TWEB 2012-03 Volume 6 Issue 1

A Practical Architecture for an Anycast CDN BIBAFull-Text 1
  Hussein A. Alzoubi; Seungjoon Lee; Michael Rabinovich; Oliver Spatscheck; Jacobus Van Der Merwe
IP Anycast has many attractive features for any service that involve the replication of multiple instances across the Internet. IP Anycast allows multiple instances of the same service to be "naturally" discovered, and requests for this service to be delivered to the closest instance. However, while briefly considered as an enabler for content delivery networks (CDNs) when they first emerged, IP Anycast was deemed infeasible in that environment. The main reasons for this decision were the lack of load awareness of IP Anycast and unwanted side effects of Internet routing changes on the IP Anycast mechanism.
   In this article we re-evaluate IP Anycast for CDNs by proposing a load-aware IP Anycast CDN architecture. Our architecture is prompted by recent developments in route control technology, as well as better understanding of the behavior of IP Anycast in operational settings. Our architecture makes use of route control mechanisms to take server and network load into account to realize load-aware Anycast. We show that the resulting redirection requirements can be formulated as a Generalized Assignment Problem and present practical algorithms that address these requirements while at the same time limiting connection disruptions that plague regular IP Anycast. We evaluate our algorithms through trace based simulation using traces obtained from a production CDN network.
Efficient Search Engine Measurements BIBAFull-Text 2
  Ziv Bar-Yossef; Maxim Gurevich
We address the problem of externally measuring aggregate functions over documents indexed by search engines, like corpus size, index freshness, and density of duplicates in the corpus. State of the art estimators for such quantities [Bar-Yossef and Gurevich 2008b; Broder et al. 2006] are biased due to inaccurate approximation of the so called "document degrees". In addition, the estimators in Bar-Yossef and Gurevich [2008b] are quite costly, due to their reliance on rejection sampling.
   We present new estimators that are able to overcome the bias introduced by approximate degrees. Our estimators are based on a careful implementation of an approximate importance sampling procedure. Comprehensive theoretical and empirical analysis of the estimators demonstrates that they have essentially no bias even in situations where document degrees are poorly approximated.
   By avoiding the costly rejection sampling approach, our new importance sampling estimators are significantly more efficient than the estimators proposed in Bar-Yossef and Gurevich [2008b]. Furthermore, building on an idea from Broder et al. [2006], we discuss Rao-Blackwellization as a generic method for reducing variance in search engine estimators. We show that Rao-Blackwellizing our estimators results in performance improvements, without compromising accuracy.
Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights BIBAFull-Text 3
  Phillipa Gill; Martin Arlitt; Niklas Carlsson; Anirban Mahanti; Carey Williamson
Today's Web provides many different functionalities, including communication, entertainment, social networking, and information retrieval. In this article, we analyze traces of HTTP activity from a large enterprise and from a large university to identify and characterize Web-based service usage. Our work provides an initial methodology for the analysis of Web-based services. While it is nontrivial to identify the classes, instances, and providers for each transaction, our results show that most of the traffic comes from a small subset of providers, which can be classified manually. Furthermore, we assess both qualitatively and quantitatively how the Web has evolved over the past decade, and discuss the implications of these changes.
Camera Brand Congruence and Camera Model Propagation in the Flickr Social Graph BIBAFull-Text 4
  Adish Singla; Ingmar Weber
Given that my friends on Flickr use cameras of brand X, am I more likely to also use a camera of brand X? Given that one of these friends changes her brand, am I likely to do the same? Do new camera models pop up uniformly in the friendship graph? Or do early adopters then "convert" their friends? Which factors influence the conversion probability of a user? These are the kind of questions addressed in this work. Direct applications involve personalized advertising in social networks.
   For our study, we crawled a complete connected component of the Flickr friendship graph with a total of 67M edges and 3.9M users. 1.2M of these users had at least one public photograph with valid model metadata, which allowed us to assign camera brands and models to users and time slots. Similarly, we used, where provided in a user's profile, information about a user's geographic location and the groups joined on Flickr.
   Concerning brand congruence, our main findings are the following. First, a pair of friends on Flickr has a higher probability of being congruent, that is, using the same brand, compared to two random users (27% vs. 19%). Second, the degree of congruence goes up for pairs of friends (i) in the same country (29%), (ii) who both only have very few friends (30%), and (iii) with a very high cliqueness (38%). Third, given that a user changes her camera model between March-May 2007 and March-May 2008, high cliqueness friends are more likely than random users to do the same (54% vs. 48%). Fourth, users using high-end cameras are far more loyal to their brand than users using point-and-shoot cameras, with a probability of staying with the same brand of 60% vs 33%, given that a new camera is bought. Fifth, these "expert" users" brand congruence reaches 66% for high cliqueness friends. All these differences are statistically significant at 1%.
   As for the propagation of new models in the friendship graph, we observe the following. First, the growth of connected components of users converted to a particular, new camera model differs distinctly from random growth. Second, the decline of dissemination of a particular model is close to random decline. This illustrates that users influence their friends to change to a particular new model, rather than from a particular old model. Third, having many converted friends increases the probability of the user to convert herself. Here differences between friends from the same or from different countries are more pronounced for point-and-shoot than for digital single-lens reflex users. Fourth, there was again a distinct difference between arbitrary friends and high cliqueness friends in terms of prediction quality for conversion.
A Specialized Search Assistant for Learning Objects BIBAFull-Text 5
  Cecilia Curlango-Rosas; Gregorio A. Ponce; Gabriel A. Lopez-Morteo
The Web holds a great quantity of material that can be used to enhance classroom instruction. However, it is not easy to retrieve this material with the search engines currently available. This study produced a specialized search assistant based on Google that significantly increases the number of instances in which teachers find the desired learning objects as compared to using this popular public search engine directly. Success in finding learning objects by study participants went from 80% using Google alone to 96% when using our search assistant in one scenario and, in another scenario, from a 40% success rate with Google alone to 66% with our assistant. This specialized search assistant implements features such as bilingual search and term suggestion which were requested by teacher participants to help improve their searches. Study participants evaluated the specialized search assistant and found it significantly easier to use and more useful than the popular search engine for the purpose of finding learning objects.

TWEB 2012-05 Volume 6 Issue 2

Editorial BIBFull-Text 6
  Helen Ashman; Arun Iyengar; Marc Najork
Integrating trust management and access control in data-intensive Web applications BIBAFull-Text 7
  Sabrina De Capitani Di Vimercati; Sara Foresti; Sushil Jajodia; Stefano Paraboschi; Giuseppe Psaila; Pierangela Samarati
The widespread diffusion of Web-based services provided by public and private organizations emphasizes the need for a flexible solution for protecting the information accessible through Web applications. A promising approach is represented by credential-based access control and trust management. However, although much research has been done and several proposals exist, a clear obstacle to the realization of their benefits in data-intensive Web applications is represented by the lack of adequate support in the DBMSs. As a matter of fact, DBMSs are often responsible for the management of most of the information that is accessed using a Web browser or a Web service invocation.
   In this article, we aim at eliminating this gap, and present an approach integrating trust management with the access control of the DBMS. We propose a trust model with a SQL syntax and illustrate an algorithm for the efficient verification of a delegation path for certificates. Our solution nicely complements current trust management proposals allowing the efficient realization of the services of an advanced trust management model within current relational DBMSs. An important benefit of our approach lies in its potential for a robust end-to-end design of security for personal data in Web scenario, where vulnerabilities of Web applications cannot be used to violate the protection of the data residing on the database server. We also illustrate the implementation of our approach within an open-source DBMS discussing design choices and performance impact.
A hybrid approach for efficient Web service composition with end-to-end QoS constraints BIBAFull-Text 8
  Mohammad Alrifai; Thomas Risse; Wolfgang Nejdl
Dynamic selection of Web services at runtime is important for building flexible and loosely-coupled service-oriented applications. An abstract description of the required services is provided at design-time, and matching service offers are located at runtime. With the growing number of Web services that provide the same functionality but differ in quality parameters (e.g., availability, response time), a decision needs to be made on which services should be selected such that the user's end-to-end QoS requirements are satisfied. Although very efficient, local selection strategy fails short in handling global QoS requirements. Solutions based on global optimization, on the other hand, can handle global constraints, but their poor performance renders them inappropriate for applications with dynamic and realtime requirements. In this article we address this problem and propose a hybrid solution that combines global optimization with local selection techniques to benefit from the advantages of both worlds. The proposed solution consists of two steps: first, we use mixed integer programming (MIP) to find the optimal decomposition of global QoS constraints into local constraints. Second, we use distributed local selection to find the best Web services that satisfy these local constraints. The results of experimental evaluation indicate that our approach significantly outperforms existing solutions in terms of computation time while achieving close-to-optimal results.
Modellus: Automated modeling of complex internet data center applications BIBAFull-Text 9
  Peter Desnoyers; Timothy Wood; Prashant Shenoy; Rahul Singh; Sangameshwar Patil; Harrick Vin
The rising complexity of distributed server applications in Internet data centers has made the tasks of modeling and analyzing their behavior increasingly difficult. This article presents Modellus, a novel system for automated modeling of complex web-based data center applications using methods from queuing theory, data mining, and machine learning. Modellus uses queuing theory and statistical methods to automatically derive models to predict the resource usage of an application and the workload it triggers; these models can be composed to capture multiple dependencies between interacting applications.
   Model accuracy is maintained by fast, distributed testing, automated relearning of models when they change, and methods to bound prediction errors in composite models. We have implemented a prototype of Modellus, deployed it on a data center testbed, and evaluated its efficacy for modeling and analysis of several distributed multitier web applications. Our results show that this feature-based modeling technique is able to make predictions across several data center tiers, and maintain predictive accuracy (typically 95% or better) in the face of significant shifts in workload composition; we also demonstrate practical applications of the Modellus system to prediction and provisioning of real-world data center applications.
Friendship prediction and homophily in social media BIBAFull-Text 10
  Luca Maria Aiello; Alain Barrat; Rossano Schifanella; Ciro Cattuto; Benjamin Markines; Filippo Menczer
Social media have attracted considerable attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and topical components of social media has been only partially explored. Here, we study the presence of homophily in three systems that combine tagging social media with online social networks. We find a substantial level of topical similarity among users who are close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local similarity between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. This analysis suggests that users with similar interests are more likely to be friends, and therefore topical similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this hypothesis on several datasets, confirming that social networks constructed from topical similarity capture actual friendship accurately. When combined with topological features, topical similarity achieves a link prediction accuracy of about 92%.

TWEB 2012-09 Volume 6 Issue 3

A model-driven methodology to the content layout problem in web applications BIBAFull-Text 11
  Sara Comai; Davide Mazza
This article presents a model-driven approach for the design of the layout in a complex Web application, where large amounts of data are accessed. The aim of this work is to reduce, as much as possible, repetitive tasks and to factor out common aspects into different kinds of rules that can be reused across different applications. In particular, exploiting the conceptual elements of the typical models used for the design of a Web application, it defines presentation and layout rules at different levels of abstraction and granularity. A procedure for the automatic layout of the content of a page is proposed and evaluated, and the layout of advanced Web applications is discussed.
Extracting information networks from the blogosphere BIBAFull-Text 12
  Yuval Merhav; Filipe Mesquita; Denilson Barbosa; Wai Gen Yee; Ophir Frieder
We study the problem of automatically extracting information networks formed by recognizable entities as well as relations among them from social media sites. Our approach consists of using state-of-the-art natural language processing tools to identify entities and extract sentences that relate such entities, followed by using text-clustering algorithms to identify the relations within the information network. We propose a new term-weighting scheme that significantly improves on the state-of-the-art in the task of relation extraction, both when used in conjunction with the standard tf*idf scheme and also when used as a pruning filter. We describe an effective method for identifying benchmarks for open information extraction that relies on a curated online database that is comparable to the hand-crafted evaluation datasets in the literature. From this benchmark, we derive a much larger dataset which mimics realistic conditions for the task of open information extraction. We report on extensive experiments on both datasets, which not only shed light on the accuracy levels achieved by state-of-the-art open information extraction tools, but also on how to tune such tools for better results.
FoXtrot: Distributed structural and value XML filtering BIBAFull-Text 13
  Iris Miliaraki; Manolis Koubarakis
Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network.
Navigating tomorrow's web: From searching and browsing to visual exploration BIBAFull-Text 14
  Marian Dörk; Carey Williamson; Sheelagh Carpendale
We propose a new way of navigating the Web using interactive information visualizations, and present encouraging results from a large-scale Web study of a visual exploration system. While the Web has become an immense, diverse information space, it has also evolved into a powerful software platform. We believe that the established interaction techniques of searching and browsing do not sufficiently utilize these advances, since information seekers have to transform their information needs into specific, text-based search queries resulting in mostly text-based lists of resources. In contrast, we foresee a new type of information seeking that is high-level and more engaging, by providing the information seeker with interactive visualizations that give graphical overviews and enable query formulation. Building on recent work on faceted navigation, information visualization, and exploratory search, we conceptualize this type of information navigation as visual exploration and evaluate a prototype Web-based system that implements it. We discuss the results of a large-scale, mixed-method Web study that provides a better understanding of the potential benefits of visual exploration on the Web, and its particular performance challenges.

TWEB 2012-11 Volume 6 Issue 4

Cache-Based Query Processing for Search Engines BIBAFull-Text 15
  B. Barla Cambazoglu; Ismail Sengor Altingovde; Rifat Ozcan; Özgür Ulusoy
In practice, a search engine may fail to serve a query due to various reasons such as hardware/network failures, excessive query load, lack of matching documents, or service contract limitations (e.g., the query rate limits for third-party users of a search service). In this kind of scenarios, where the backend search system is unable to generate answers to queries, approximate answers can be generated by exploiting the previously computed query results available in the result cache of the search engine. In this work, we propose two alternative strategies to implement this cache-based query processing idea. The first strategy aggregates the results of similar queries that are previously cached in order to create synthetic results for new queries. The second strategy forms an inverted index over the textual information (i.e., query terms and result snippets) present in the result cache and uses this index to answer new queries. Both approaches achieve reasonable result qualities compared to processing queries with an inverted index built on the collection.
A Methodology for SIP and SOAP Integration Using Application-Specific Protocol Conversion BIBAFull-Text 16
  Goran Delac; Ivan Budiselic; Ivan Zuzak; Ivan Skuliber; Tomislav Stefanec
In recent years, the ubiquitous demands for cross-protocol application access are driving the need for deeper integration between SIP and SOAP. In this article we present a novel methodology for integrating these two protocols. Through an analysis of properties of SIP and SOAP we show that integration between these protocols should be based on application-specific converters. We describe a generic SIP/SOAP gateway that implements message handling and network and storage management while relying on application-specific converters to define session management and message mapping for a specific set of SIP and SOAP communication nodes. In order to ease development of these converters, we introduce an XML-based domain-specific language for describing application-specific conversion processes. We show how conversion processes can be easily specified in the language using message sequence diagrams of the desired interaction. We evaluate the presented methodology through performance analysis of the developed prototype gateway and high-level comparison with other solutions.
Workload Characterization and Performance Implications of Large-Scale Blog Servers BIBAFull-Text 17
  Myeongjae Jeon; Youngjae Kim; Jeaho Hwang; Joonwon Lee; Euiseong Seo
With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.
Beyond Social Graphs: User Interactions in Online Social Networks and their Implications BIBAFull-Text 18
  Christo Wilson; Alessandra Sala; Krishna P. N. Puttaswamy; Ben Y. Zhao
Social networks are popular platforms for interaction, communication, and collaboration between friends. Researchers have recently proposed an emerging class of applications that leverage relationships from social networks to improve security and performance in applications such as email, Web browsing, and overlay routing. While these applications often cite social network connectivity statistics to support their designs, researchers in psychology and sociology have repeatedly cast doubt on the practice of inferring meaningful relationships from social network connections alone. This leads to the question: "Are social links valid indicators of real user interaction? If not, then how can we quantify these factors to form a more accurate model for evaluating socially enhanced applications?" In this article, we address this question through a detailed study of user interactions in the Facebook social network. We propose the use of "interaction graphs" to impart meaning to online social links by quantifying user interactions. We analyze interaction graphs derived from Facebook user traces and show that they exhibit significantly lower levels of the "small-world" properties present in their social graph counterparts. This means that these graphs have fewer "supernodes" with extremely high degree, and overall graph diameter increases significantly as a result. To quantify the impact of our observations, we use both types of graphs to validate several well-known social-based applications that rely on graph properties to infuse new functionality into Internet applications, including Reliable Email (RE), SybilGuard, and the weighted cascade influence maximization algorithm. The results reveal new insights into each of these systems, and confirm our hypothesis that to obtain realistic and accurate results, ongoing research on social network applications studies of social applications should use real indicators of user interactions in lieu of social graphs.
Exploiting External Collections for Query Expansion BIBAFull-Text 19
  Wouter Weerkamp; Krisztian Balog; Maarten de Rijke
A persisting challenge in the field of information retrieval is the vocabulary mismatch between a user's information need and the relevant documents. One way of addressing this issue is to apply query modeling: to add terms to the original query and reweigh the terms. In social media, where documents usually contain creative and noisy language (e.g., spelling and grammatical errors), query modeling proves difficult. To address this, attempts to use external sources for query modeling have been made and seem to be successful. In this article we propose a general generative query expansion model that uses external document collections for term generation: the External Expansion Model (EEM). The main rationale behind our model is our hypothesis that each query requires its own mixture of external collections for expansion and that an expansion model should account for this. For some queries we expect, for example, a news collection to be most beneficial, while for other queries we could benefit more by selecting terms from a general encyclopedia. EEM allows for query-dependent weighing of the external collections.
   We put our model to the test on the task of blog post retrieval and we use four external collections in our experiments: (i) a news collection, (ii) a Web collection, (iii) Wikipedia, and (iv) a blog post collection. Experiments show that EEM outperforms query expansion on the individual collections, as well as the Mixture of Relevance Models that was previously proposed by Diaz and Metzler [2006]. Extensive analysis of the results shows that our naive approach to estimating query-dependent collection importance works reasonably well and that, when we use "oracle" settings, we see the full potential of our model. We also find that the query-dependent collection importance has more impact on retrieval performance than the independent collection importance (i.e., a collection prior).