HCI Bibliography Home | HCI Journals | About TWEB | Journal Info | TWEB Journal Volumes | Detailed Records | RefWorks | EndNote | Hide Abstracts
TWEB Tables of Contents: 0102030405060708

ACM Transactions on The Web 1

Editors:Helen Ashman; Arun Iyengar
Dates:2007
Volume:1
Publisher:ACM
Standard No:ISSN:1559-1131 EISSN:1559-114X
Papers:14
Links:Journal Home Page | ACM Digital Library | Table of Contents
  1. TWEB 2007-05 Volume 1 Issue 1
  2. TWEB 2007-08 Volume 1 Issue 2
  3. TWEB 2007-09 Volume 1 Issue 3

TWEB 2007-05 Volume 1 Issue 1

Introduction BIBFull-Text 1
  Helen Ashman; Arun Iyengar
Analytic modeling of multitier Internet applications BIBAFull-Text 2
  Bhuvan Urgaonkar; Giovanni Pacifici; Prashant Shenoy; Mike Spreitzer; Asser Tantawi
Since many Internet applications employ a multitier architecture, in this article, we focus on the problem of analytically modeling the behavior of such applications. We present a model based on a network of queues where the queues represent different tiers of the application. Our model is sufficiently general to capture (i) the behavior of tiers with significantly different performance characteristics and (ii) application idiosyncrasies such as session-based workloads, tier replication, load imbalances across replicas, and caching at intermediate tiers. We validate our model using real multitier applications running on a Linux server cluster. Our experiments indicate that our model faithfully captures the performance of these applications for a number of workloads and configurations. Furthermore, our model successfully handles a comprehensive range of resource utilization -- from 0 to near saturation for the CPU -- for two separate tiers. For a variety of scenarios, including those with caching at one of the application tiers, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. Our experiments also demonstrate the utility of the model for dynamic capacity provisioning, performance prediction, bottleneck identification, and session policing. In one scenario, where the request arrival rate increased from less than 1500 to nearly 4200 requests/minute, a dynamic provisioning technique employing our model was able to maintain response time targets by increasing the capacity of two of the tiers by factors of 2 and 3.5, respectively.
The comparative effectiveness of sponsored and nonsponsored links for Web e-commerce queries BIBAFull-Text 3
  Bernard J. Jansen
The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues by investigating the relevance of sponsored and nonsponsored links for e-commerce queries on the major search engines. The results show that average relevance ratings for sponsored and nonsponsored links are practically the same, although the relevance ratings for sponsored links are statistically higher. We used 108 ecommerce queries and 8,256 retrieved links for these queries from three major Web search engines: Yahoo!, Google, and MSN. In addition to relevance measures, we qualitatively analyzed the e-commerce queries, deriving five categorizations of underlying information needs. Product-specific queries are the most prevalent (48%). Title (62%) and summary (33%) are the primary basis for evaluating sponsored links with URL a distant third (2%). To gauge the effectiveness of sponsored search campaigns, we analyzed the sponsored links from various viewpoints. It appears that links from organizations with large sponsored search campaigns are more relevant than the average sponsored link. We discuss the implications for Web search engines and sponsored search as a long-term business model and as a mechanism for finding relevant information for searchers.
Mobile information access: A study of emerging search behavior on the mobile Internet BIBAFull-Text 4
  Karen Church; Barry Smyth; Paul Cotter; Keith Bradley
It is likely that mobile phones will soon come to rival more traditional devices as the primary platform for information access. Consequently, it is important to understand the emerging information access behavior of mobile Internet (MI) users especially in relation to their use of mobile handsets for information browsing and query-based search. In this article, we describe the results of a recent analysis of the MI habits of more than 600,000 European MI users, with a particular emphasis on the emerging interest in mobile search. We consider a range of factors including whether there are key differences between browsing and search behavior on the MI compared to the Web. We highlight how browsing continues to dominate mobile information access, but go on to show how search is becoming an increasingly popular information access alternative especially in relation to certain types of mobile handsets and information needs. Moreover, we show that sessions involving search tend to be longer and more data-rich than those that do not involve search. We also look at the type of queries used during mobile search and the way that these queries tend to be modified during the course of a mobile search session. Finally we examine the overlap among mobile search queries and the different topics mobile users are interested in.
The dynamics of viral marketing BIBAFull-Text 5
  Jure Leskovec; Lada A. Adamic; Bernardo A. Huberman
We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases follow a "long tail" where a significant share of purchases belongs to rarely sold items. We establish how the recommendation network grows over time and how effective it is from the viewpoint of the sender and receiver of the recommendations. While on average recommendations are not very effective at inducing purchases and do not spread very far, we present a model that successfully identifies communities, product, and pricing categories for which viral marketing seems to be very effective.
Efficient algorithms for Web services selection with end-to-end QoS constraints BIBAFull-Text 6
  Tao Yu; Yue Zhang; Kwei-Jay Lin
Service-Oriented Architecture (SOA) provides a flexible framework for service composition. Using standard-based protocols (such as SOAP and WSDL), composite services can be constructed by integrating atomic services developed independently. Algorithms are needed to select service components with various QoS levels according to some application-dependent performance requirements. We design a broker-based architecture to facilitate the selection of QoS-based services. The objective of service selection is to maximize an application-specific utility function under the end-to-end QoS constraints. The problem is modeled in two ways: the combinatorial model and the graph model. The combinatorial model defines the problem as a multidimension multichoice 0-1 knapsack problem (MMKP). The graph model defines the problem as a multiconstraint optimal path (MCOP) problem. Efficient heuristic algorithms for service processes of different composition structures are presented in this article and their performances are studied by simulations. We also compare the pros and cons between the two models.

TWEB 2007-08 Volume 1 Issue 2

Visualizing tags over time BIBAFull-Text 7
  Micah Dubinko; Ravi Kumar; Joseph Magnani; Jasmine Novak; Prabhakar Raghavan; Andrew Tomkins
We consider the problem of visualizing the evolution of tags within the Flickr (flickr.com) online image sharing community. Any user of the Flickr service may append a tag to any photo in the system. Over the past year, users have on average added over a million tags each week. Understanding the evolution of these tags over time is therefore a challenging task. We present a new approach based on a characterization of the most interesting tags associated with a sliding interval of time. An animation provided via Flash in a Web browser allows the user to observe and interact with the interesting tags as they evolve over time.
   New algorithms and data structures are required to support the efficient generation of this visualization. We combine a novel solution to an interval covering problem with extensions to previous work on score aggregation in order to create an efficient backend system capable of producing visualizations at arbitrary scales on this large dataset in real time.
Scouts, promoters, and connectors: The roles of ratings in nearest-neighbor collaborative filtering BIBAFull-Text 8
  Bharath Kumar Mohan; Benjamin J. Keller; Naren Ramakrishnan
Recommender systems aggregate individual user ratings into predictions of products or services that might interest visitors. The quality of this aggregation process crucially affects the user experience and hence the effectiveness of recommenders in e-commerce. We present a characterization of nearest-neighbor collaborative filtering that allows us to disaggregate global recommender performance measures into contributions made by each individual rating. In particular, we formulate three roles -- scouts, promoters, and connectors -- that capture how users receive recommendations, how items get recommended, and how ratings of these two types are themselves connected, respectively. These roles find direct uses in improving recommendations for users, in better targeting of items and, most importantly, in helping monitor the health of the system as a whole. For instance, they can be used to track the evolution of neighborhoods, to identify rating subspaces that do not contribute (or contribute negatively) to system performance, to enumerate users who are in danger of leaving, and to assess the susceptibility of the system to attacks such as shilling. We argue that the three rating roles presented here provide broad primitives to manage a recommender system and its community.
The effects of proxy bidding and minimum bid increments within eBay auctions BIBAFull-Text 9
  Alex Rogers; Esther David; Nicholas R. Jennings; Jeremy Schiff
We present a mathematical model of the eBay auction protocol and perform a detailed analysis of the effects that the eBay proxy bidding system and the minimum bid increment have on the auction properties. We first consider the revenue of the auction, and we show analytically that when two bidders with independent private valuations use the eBay proxy bidding system there exists an optimal value for the minimum bid increment at which the auctioneer's revenue is maximized. We then consider the sequential way in which bids are placed within the auction, and we show analytically that independent of assumptions regarding the bidders' valuation distribution or bidding strategy the number of visible bids placed is related to the logarithm of the number of potential bidders. Thus, in many cases, it is only a minority of the potential bidders that are able to submit bids and are visible in the auction bid history (despite the fact that the other hidden bidders are still effectively competing for the item). Furthermore, we show through simulation that the minimum bid increment also introduces an inefficiency to the auction, whereby a bidder who enters the auction late may find that its valuation is insufficient to allow them to advance the current bid by the minimum bid increment despite them actually having the highest valuation for the item. Finally, we use these results to consider appropriate strategies for bidders within real world eBay auctions. We show that while last-minute bidding (sniping) is an effective strategy against bidders engaging in incremental bidding (and against those with common values), in general, delaying bidding is disadvantageous even if delayed bids are sure to be received before the auction closes. Thus, when several bidders submit last-minute bids, we show that rather than seeking to bid as late as possible, a bidder should try to be the first sniper to bid (i.e., it should "snipe before the snipers").
Decoding the structure of the WWW: A comparative analysis of Web crawls BIBAFull-Text 10
  M. Ángeles Serrano; Ana Maguitman; Marián Boguñá; Santo Fortunato; Alessandro Vespignani
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution, such as degree-degree correlation functions or the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of the Web. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.

TWEB 2007-09 Volume 1 Issue 3

BrowserShield: Vulnerability-driven filtering of dynamic HTML BIBAFull-Text 11
  Charles Reis; John Dunagan; Helen J. Wang; Opher Dubrovsky; Saher Esmeir
Vulnerability-driven filtering of network data can offer a fast and easy-to-deploy alternative or intermediary to software patching, as exemplified in Shield [Wang et al. 2004]. In this article, we take Shield's vision to a new domain, inspecting and cleansing not just static content, but also dynamic content. The dynamic content we target is the dynamic HTML in Web pages, which have become a popular vector for attacks. The key challenge in filtering dynamic HTML is that it is undecidable to statically determine whether an embedded script will exploit the browser at runtime. We avoid this undecidability problem by rewriting web pages and any embedded scripts into safe equivalents, inserting checks so that the filtering is done at runtime. The rewritten pages contain logic for recursively applying runtime checks to dynamically generated or modified web content, based on known vulnerabilities. We have built and evaluated BrowserShield, a general framework that performs this dynamic instrumentation of embedded scripts, and that admits policies for customized runtime actions like vulnerability-driven filtering. We also explore other applications on top of BrowserShield.
Model-directed Web transactions under constrained modalities BIBAFull-Text 12
  Zan Sun; Jalal Mahmud; I. V. Ramakrishnan; Saikat Mukherjee
Online transactions (e.g., buying a book on the Web) typically involve a number of steps spanning several pages. Conducting such transactions under constrained interaction modalities as exemplified by small screen handhelds or interactive speech interfaces -- the primary mode of communication for visually impaired individuals -- is a strenuous, fatigue-inducing activity. But usually one needs to browse only a small fragment of a Web page to perform a transactional step such as a form fillout, selecting an item from a search results list, and so on. We exploit this observation to develop an automata-based process model that delivers only the "relevant" page fragments at each transactional step, thereby reducing information overload on such narrow interaction bandwidths. We realize this model by coupling techniques from content analysis of Web documents, automata learning and statistical classification. The process model and associated techniques have been incorporated into Guide-O, a prototype system that facilitates online transactions using speech/keyboard interface (Guide-O-Speech), or with limited-display size handhelds (Guide-O-Mobile). Performance of Guide-O and its user experience are reported.
Cache architecture for on-demand streaming on the Web BIBAFull-Text 13
  Raj Sharman; Shiva Shankar Ramanna; Ram Ramesh; Ram Gopal
On-demand streaming from a remote server through best-effort Internet poses several challenges because of network losses and variable delays. The primary technique used to improve the quality of distributed content service is replication. In the context of the Internet, Web caching is the traditional mechanism that is used. In this article we develop a new staged delivery model for a distributed architecture in which video is streamed from remote servers to edge caches where the video is buffered and then streamed to the client through a last-mile connection. The model uses a novel revolving indexed cache buffer management mechanism at the edge cache and employs selective retransmissions of lost packets between the remote and edge cache for a best-effort recovery of the losses. The new Web cache buffer management scheme includes a dynamic adjustment of cache buffer parameters based on network conditions. In addition, performance of buffer management and retransmission policies at the edge cache is modeled and assessed using a probabilistic analysis of the streaming process as well as system simulations. The influence of different endogenous control parameters on the quality of stream received by the client is studied. Calibration curves on the QoS metrics for different network conditions have been obtained using simulations. Edge cache management can be done using these calibration curves. ISPs can make use of calibration curves to set the values of the endogenous control parameters for specific QoS in real-time streaming operations based on network conditions. A methodology to benchmark transmission characteristics using real-time traffic data is developed to enable effective decision making on edge cache buffer allocation and management strategies.
Modeling process-driven and service-oriented architectures using patterns and pattern primitives BIBAFull-Text 14
  Uwe Zdun; Carsten Hentrich; Schahram Dustdar
Service-oriented architectures are increasingly used in the context of business processes. However, the proven practices for process-oriented integration of services are not well documented yet. In addition, modeling approaches for the integration of processes and services are neither mature nor do they exactly reflect the proven practices. In this article, we propose a pattern language for process-oriented integration of services to describe the proven practices. Our main contribution is a modeling concept based on pattern primitives for these patterns. A pattern primitive is a fundamental, precisely specified modeling element that represents a pattern. We present a catalog of pattern primitives that are precisely modeled using OCL constraints and map these primitives to the patterns in the pattern language of process-oriented integration of services. We also present a model validation tool that we have developed to support modeling the process-oriented integration of services, and an industrial case study in which we have applied our results.