Accounting for Taste: Ranking Curators and Content in Social Networks
Curation and Algorithms
/
Yu, Haizi
/
Deka, Biplab
/
Talton, Jerry O.
/
Kumar, Ranjitha
Proceedings of the ACM CHI'16 Conference on Human Factors in Computing
Systems
2016-05-07
v.1
p.2383-2389
© Copyright 2016 ACM
Summary: Ranking users in social networks is a well-studied problem, typically solved
by algorithms that leverage network structure to identify influential users and
recommend people to follow. In the last decade, however, curation -- users
sharing and promoting content in a network -- has become a central social
activity, as platforms like Facebook, Twitter, Pinterest, and GitHub drive
growth and engagement by connecting users through content and content to users.
While existing algorithms reward users that are highly active with higher
rankings, they fail to account for users' curatorial taste. This paper
introduces CuRank, an algorithm for ranking users and content in social
networks by explicitly modeling three characteristics of a good curator:
discerning taste, high activity, and timeliness. We evaluate CuRank on datasets
from two popular social networks -- GitHub and Vine -- and demonstrate its
efficacy at ranking content and identifying good curators.
What Makes a Brand Look Expensive?
Late-Breaking Works: Usable, Useful, and Desirable
/
Zhang, Jingxian
/
Kothari, Neel
/
Butt, Asad Imtiaz
/
Kumar, Ranjitha
Extended Abstracts of the ACM CHI'16 Conference on Human Factors in
Computing Systems
2016-05-07
v.2
p.3263-3268
© Copyright 2016 ACM
Summary: Branding is a powerful tool that companies use to control the perception of
their products' quality and price. A company's website is a digital vehicle for
conveying this brand information. The look and feel of a website often
influence a customer's impression of a brand's price category. To understand
what makes a brand look expensive, we evaluate the website designs of two
industries -- watches and cars. We ran a crowdsourced study to collect ratings
of perceived cost based on web page screenshots. By training a random forest
regression model over these ratings, we learned which visual features of
website design are predictive of perceived cost.
Easy Navigation through Instructional Videos using Automatically Generated
Table of Content
Demos
/
Gandhi, Ankit
/
Biswas, Arijit
/
Shrivastava, Kundan
/
Kumar, Ranjeet
/
Loomba, Sahil
/
Deshmukh, Om
Companion Proceedings of the 2016 International Conference on Intelligent
User Interfaces
2016-03-07
v.2
p.92-96
© Copyright 2016 ACM
Summary:
The amount of instructional videos available online, already in tens of
thousands of hours, is growing steadily. A major bottleneck in their wide
spread usage is the lack of tools for easy consumption of these videos. In this
demonstration, we present MMToC: Multimodal Method for Table of Content, a
technique that automatically generates a table of content for a given
instructional video and enables text-book-like efficient navigation through the
video. MMToC quantifies word saliency for visual words extracted from the
slides and spoken words obtained from the lecture transcript. These saliency
scores are combined using a dynamic programming based segmentation algorithm to
identify likely points in the video where the topic has changed. MMToC is a
web-based modular solution that can be used as a stand alone video navigation
solution or can be integrated with any e-platform for multimedia content
management. MMToC can be seen in action on a sample video at
104.130.241.45:8080/TopicTransitionV2/index.html.
Implementing the SimpleC Companion: Lessons Learned from In-Home
Intervention Studies
Home and Work Support
/
Kerssens, Chantal
/
Kumar, Renu
/
Adams, Anne Edith
/
Knott, Camilla C.
/
Rogers, Wendy A.
ITAP 2015: First International Conference on Human Aspects of IT for the
Aged Population, Part II: Design for Everyday Life
2015-08-02
v.2
p.278-289
Keywords: Assistive technology; Caregivers; Dementia; Seniors; Disease management;
Caregiver burden; Recruitment; Retention; Applied research; Field test;
mHealth; Healthcare technology
© Copyright 2015 Springer International Publishing Switzerland
Summary: This paper provides insights from our experiences that would guide the
implementation of home- and community-based intervention studies, in particular
field tests of technology in older adults with varying degrees of cognitive
impairment and their informal (family) caregivers. Critical issues include
recruitment in a vulnerable and frail population, intervention and protocol
design, environmental and technology-specific barriers to implementation, and
facilitators of success. Our experiences and recommendations should be relevant
to a broad range of longitudinal field tests, particularly those with older
adult populations.
Technology Transfer of HCI Research Innovations: Challenges and
Opportunities
Panels
/
Chilana, Parmit K.
/
Czerwinski, Mary P.
/
Grossman, Tovi
/
Harrison, Chris
/
Kumar, Ranjitha
/
Parikh, Tapan S.
/
Zhai, Shumin
Extended Abstracts of the ACM CHI'15 Conference on Human Factors in
Computing Systems
2015-04-18
v.2
p.823-828
© Copyright 2015 ACM
Summary: There has been a longstanding concern within HCI that even though we are
accumulating great innovations in the field, we rarely see these innovations
develop into products. Our panel brings together HCI researchers from academia
and industry who have been directly involved in technology transfer of one or
more HCI innovations. They will share their experiences around what it takes to
transition an HCI innovation from the lab to the market, including issues
around time commitment, funding, resources, and business expertise. More
importantly, our panelists will discuss and debate the tensions that we
(researchers) face in choosing design and evaluation methods that help us make
an HCI research contribution versus what actually matters when we go to market.
Ranking Designs and Users in Online Social Networks
WIP Theme: Social Computing
/
Deka, Biplab
/
Yu, Haizi
/
Ho, Devin
/
Huang, Zifeng
/
Talton, Jerry O.
/
Kumar, Ranjitha
Extended Abstracts of the ACM CHI'15 Conference on Human Factors in
Computing Systems
2015-04-18
v.2
p.1887-1892
© Copyright 2015 ACM
Summary: This work-in-progress presents a new algorithm that leverages social network
structure to rank designs and users in online design communities. The algorithm
is based on the intuition that the importance of a design should depend on the
rank of the users that created and promoted it, and the importance of a user
should depend on the rank of the designs he creates and promotes in turn. The
algorithm produces design rankings that are positively correlated with existing
social metrics such as number of likes, but also allows designs with
second-order social import to rise through the ranks. We demonstrate that the
algorithm converges, and analyze the rankings it produces on both simulated and
scraped social design networks.
Content-driven Multi-modal Techniques for Non-linear Video Navigation
Visualization / Video / Augmented Reality
/
Yadav, Kuldeep
/
Shrivastava, Kundan
/
Prasad, S. Mohana
/
Arsikere, Harish
/
Patil, Sonal
/
Kumar, Ranjeet
/
Deshmukh, Om
Proceedings of the 2015 International Conference on Intelligent User
Interfaces
2015-03-29
v.1
p.333-344
© Copyright 2015 ACM
Summary: The growth of Massive Open Online Courses (MOOCs) has been remarkable in the
last few years. A significant amount of MOOCs content is in the form of videos
and participants often use non-linear navigation to browse through a video.
This paper proposes the design of a system that provides non-linear navigation
in educational videos using features derived from a combination of audio and
visual content of a video. It provides multiple dimensions for quickly
navigating to a given point of interest in a video i.e., customized dynamic
time-aware word-cloud, video pages, and a 2-D timeline. In word-cloud, the
relative placement of the words indicates their temporal ordering in the video
whereas color codes are used to represent acoustic stress. The 2-D timeline is
used to present multiple occurrences of a keyword/concept in the video in
response to user click in the word-cloud. Additionally, visual content is
analyzed to identify frames with "maximum written content", known as video
pages. We conducted a user study with 20 users to evaluate the proposed system
and compared it with transcription-based interfaces used by major MOOC
providers. Our findings suggest that the proposed system leads to statistically
significant navigation time savings especially on multimodal navigation tasks.
The dynamics of repeat consumption
User behavior
/
Anderson, Ashton
/
Kumar, Ravi
/
Tomkins, Andrew
/
Vassilvitskii, Sergei
Proceedings of the 2014 International Conference on the World Wide Web
2014-04-07
v.1
p.419-430
© Copyright 2014 ACM
Summary: We study the patterns by which a user consumes the same item repeatedly over
time, in a wide variety domains ranging from check-ins at the same business
location to re-watches of the same video. We find that recency of consumption
is the strongest predictor of repeat consumption. Based on this, we develop a
model by which the item from $t$ timesteps ago is reconsumed with a probability
proportional to a function of t. We study theoretical properties of this model,
develop algorithms to learn reconsumption likelihood as a function of t, and
show a strong fit of the resulting inferred function via a power law with
exponential cutoff. We then introduce a notion of item quality, show that it
alone underperforms our recency-based model, and develop a hybrid model that
predicts user choice based on a combination of recency and quality. We show how
the parameters of this model may be jointly estimated, and show that the
resulting scheme outperforms other alternatives.
On estimating the average degree
Web mining 3
/
Dasgupta, Anirban
/
Kumar, Ravi
/
Sarlos, Tamas
Proceedings of the 2014 International Conference on the World Wide Web
2014-04-07
v.1
p.795-806
© Copyright 2014 ACM
Summary: Networks are characterized by nodes and edges. While there has been a spate
of recent work on estimating the number of nodes in a network, the
edge-estimation question appears to be largely unaddressed. In this work we
consider the problem of estimating the average degree of a large network using
efficient random sampling, where the number of nodes is not known to the
algorithm. We propose a new estimator for this problem that relies on access to
node samples under a prescribed distribution. Next, we show how to efficiently
realize this ideal estimator in a random walk setting. Our estimator has a
natural and simple implementation using random walks; we bound its performance
in terms of the mixing time of the underlying graph. We then show that our
estimators are both provably and practically better than many natural
estimators for the problem. Our work contrasts with existing theoretical work
on estimating average degree, which assume that a uniform random sample of
nodes is available and the number of nodes is known.
Triggering effective social support for online groups
/
Kumar, Rohit
/
Rosé, Carolyn P.
ACM Transactions on Interactive Intelligent Systems
2014-01
v.3
n.4
p.24
© Copyright 2014 ACM
Summary: Conversational agent technology is an emerging paradigm for creating a
social environment in online groups that is conducive to effective teamwork.
Prior work has demonstrated advantages in terms of learning gains and
satisfaction scores when groups learning together online have been supported by
conversational agents that employ Balesian social strategies. This prior work
raises two important questions that are addressed in this article. The first
question is one of generality. Specifically, are the positive effects of the
designed support specific to learning contexts? Or are they in evidence in
other collaborative task domains as well? We present a study conducted within a
collaborative decision-making task where we see that the positive effects of
the Balesian social strategies extend to this new context. The second question
is whether it is possible to increase the effectiveness of the Balesian social
strategies by increasing the context sensitivity with which the social
strategies are triggered. To this end, we present technical work that increases
the sensitivity of the triggering. Next, we present a user study that
demonstrates an improvement in performance of the support agent with the new,
more sensitive triggering policy over the baseline approach from prior work.
The technical contribution of this article is that we extend prior work
where such support agents were modeled using a composition of conversational
behaviors integrated within an event-driven framework. Within the present
approach, conversation is orchestrated through context-sensitive triggering of
the composed behaviors. The core effort involved in applying this approach
involves building a set of triggering policies that achieve this orchestration
in a time-sensitive and coherent manner. In line with recent developments in
data-driven approaches for building dialog systems, we present a novel
technique for learning behavior-specific triggering policies, deploying it as
part of our efforts to improve a socially capable conversational tutor agent
that supports collaborative learning.
A Hierarchical Behavior Analysis Approach for Automated Trainee Performance
Evaluation in Training Ranges
Augmented Cognition in Training and Education
/
Khan, Saad
/
Cheng, Hui
/
Kumar, Rakesh
FAC 2013: 7th International Conference on Foundations of Augmented Cognition
2013-07-21
p.60-69
© Copyright 2013 Springer-Verlag
Summary: In this paper we present a closed loop mixed reality training system that
provides automatic assessment of trainee performance during kinetic military
exercises. At the core of our system is a hierarchical behavior analysis
approach that integrates a number of data sensor modalities including
Audio/Video, RFID and IMUs to automatically capture trainee actions in a
comprehensive manner. Our behavior analysis and performance evaluation
framework uses a finite state machine (FSM) model in which trainee behaviors
are the states of the training scenario and the transitions of states are
caused by stimuli that we refer to as trigger events. The goal of behavior
analysis is to estimate the states of the trainees with respect to the training
scenario and quantify trainee performance. To robustly detect each state, we
build classifiers for each behavioral state and trigger event. At a given time,
based on the state estimation, a set of related classifiers are activated for
detecting trigger events and states that can be transitioned to and from the
current states. The overall structure of the FSM and trigger events is
determined by a Training Ontology that is specific to the training scenario.
Aggregating crowdsourced binary ratings
Research papers
/
Dalvi, Nilesh
/
Dasgupta, Anirban
/
Kumar, Ravi
/
Rastogi, Vibhor
Proceedings of the 2013 International Conference on the World Wide Web
2013-05-13
v.1
p.285-294
© Copyright 2013 ACM
Summary: In this paper we analyze a crowdsourcing system consisting of a set of users
and a set of binary choice questions. Each user has an unknown, fixed,
reliability that determines the user's error rate in answering questions. The
problem is to determine the truth values of the questions solely based on the
user answers. Although this problem has been studied extensively, theoretical
error bounds have been shown only for restricted settings: when the graph
between users and questions is either random or complete. In this paper we
consider a general setting of the problem where the user -- question graph can
be arbitrary. We obtain bounds on the error rate of our algorithm and show it
is governed by the expansion of the graph. We demonstrate, using several
synthetic and real datasets, that our algorithm outperforms the state of the
art.
Webzeitgeist: design mining the web
Papers: design for developers
/
Kumar, Ranjitha
/
Satyanarayan, Arvind
/
Torres, Cesar
/
Lim, Maxine
/
Ahmad, Salman
/
Klemmer, Scott R.
/
Talton, Jerry O.
Proceedings of ACM CHI 2013 Conference on Human Factors in Computing Systems
2013-04-27
v.1
p.3083-3092
© Copyright 2013 ACM
Summary: Advances in data mining and knowledge discovery have transformed the way Web
sites are designed. However, while visual presentation is an intrinsic part of
the Web, traditional data mining techniques ignore render-time page structures
and their attributes. This paper introduces design mining for the Web: using
knowledge discovery techniques to understand design demographics, automate
design curation, and support data-driven design tools. This idea is manifest in
Webzeitgeist, a platform for large-scale design mining comprising a repository
of over 100,000 Web pages and 100 million design elements. This paper describes
the principles driving design mining, the implementation of the Webzeitgeist
architecture, and the new class of data-driven design applications it enables.
Learning design patterns with Bayesian grammar induction
Tutorials & learning
/
Talton, Jerry
/
Yang, Lingfeng
/
Kumar, Ranjitha
/
Lim, Maxine
/
Goodman, Noah
/
Mech, Radomír
Proceedings of the 2012 ACM Symposium on User Interface Software and
Technology
2012-10-07
v.1
p.63-74
© Copyright 2012 ACM
Summary: Design patterns have proven useful in many creative fields, providing
content creators with archetypal, reusable guidelines to leverage in projects.
Creating such patterns, however, is a time-consuming, manual process, typically
relegated to a few experts in any given domain. In this paper, we describe an
algorithmic method for learning design patterns directly from data using
techniques from natural language processing and structured concept learning.
Given a set of labeled, hierarchical designs as input, we induce a
probabilistic formal grammar over these exemplars. Once learned, this grammar
encodes a set of generative rules for the class of designs, which can be
sampled to synthesize novel artifacts. We demonstrate the method on geometric
models and Web pages, and discuss how the learned patterns can drive new
interaction mechanisms for content creators.
Data-driven interactions for web design
Doctoral symposium
/
Kumar, Ranjitha
Adjunct Proceedings of the 2012 ACM Symposium on User Interface Software and
Technology
2012-10-07
v.2
p.51-54
© Copyright 2012 ACM
Summary: This thesis describes how data-driven approaches to Web design problems can
enable useful interactions for designers. It presents three machine learning
applications which enable new interaction mechanisms for Web design: rapid
retargeting between page designs, scalable design search, and generative
probabilistic model induction to support design interactions cast as
probabilistic inference. It also presents a scalable architecture for efficient
data-mining on Web designs, which supports these three applications.
Attention and Selection in Online Choice Tasks
Long Papers
/
Navalpakkam, Vidhya
/
Kumar, Ravi
/
Li, Lihong
/
Sivakumar, D.
Proceedings of the 2012 Conference on User Modeling, Adaptation and
Personalization
2012-07-16
p.200-211
© Copyright 2012 Springer-Verlag
Summary: The task of selecting one among several items in a visual display is
extremely common in daily life and is executed billions of times every day on
the Web. Attention is vital for selection, but the end-to-end process of what
draws and sustains attention, and how that influences selection, remains poorly
understood. We study this in a complex multi-item selection setting, where
participants selected one among eight news articles presented in a grid layout
on a screen. By varying the position, saliency, and topic of the news items, we
identify the relative importance of these visual and semantic factors in
attention and selection. We present a simple model of attention that predicts
many key features such as attention shifts and dwell time per item. Potential
applications of our findings include optimizing visual displays to drive user
attention.
Are web users really Markovian?
Web user behavioral analysis and modeling
/
Chierichetti, Flavio
/
Kumar, Ravi
/
Raghavan, Prabhakar
/
Sarlos, Tamas
Proceedings of the 2012 International Conference on the World Wide Web
2012-04-16
v.1
p.609-618
© Copyright 2012 ACM
Summary: User modeling on the Web has rested on the fundamental assumption of
Markovian behavior -- a user's next action depends only on her current state,
and not the history leading up to the current state. This forms the
underpinning of PageRank web ranking, as well as a number of techniques for
targeting advertising to users. In this work we examine the validity of this
assumption, using data from a number of Web settings. Our main result invokes
statistical order estimation tests for Markov chains to establish that Web
users are not, in fact, Markovian. We study the extent to which the Markovian
assumption is invalid, and derive a number of avenues for further research.
Bricolage: example-based retargeting for web design
Website & application design
/
Kumar, Ranjitha
/
Talton, Jerry O.
/
Ahmad, Salman
/
Klemmer, Scott R.
Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems
2011-05-07
v.1
p.2197-2206
© Copyright 2011 ACM
Summary: The Web provides a corpus of design examples unparalleled in human history.
However, leveraging existing designs to produce new pages is often difficult.
This paper introduces the Bricolage algorithm for transferring design and
content between Web pages. Bricolage employs a novel, structured-prediction
technique that learns to create coherent mappings between pages by training on
human-generated exemplars. The produced mappings are then used to automatically
transfer the content from one page into the style and layout of another. We
show that Bricolage can learn to accurately reproduce human page mappings, and
that it provides a general, efficient, and automatic technique for retargeting
content between a variety of real Web pages.
A politeness recognition tool for Hindi: with special emphasis on online
texts
PhD symposium
/
Kumar, Ritesh
Proceedings of the 2011 International Conference on the World Wide Web
2011-03-28
v.2
p.367-372
© Copyright 2011 ACM
Summary: This paper gives an overview of a politeness recognition tool (PoRT) for
Hindi that is currently under preparation. It describes the the kind of
problems that need to be tackled with before developing the tool, the approach
and the methodology that will be adopted for the development and testing of the
tool, the current progress and the future plan to achieve this goal.
Patent classification of the new invention using PLSA
/
Kumar, Ranjeet
/
Math, Shrishail
/
Tripathi, R. C.
/
Tiwari, M. D.
Proceedings of the 2010 International Conference on Intelligent Interactive
Technologies and Multimedia
2010-12-28
p.222-225
© Copyright 2010 ACM
Summary: In the current scenario of the world for Research and Development leading to
patenting, content classification in accordance with the subject areas to which
it belongs to is a challenging task. This is because today's R&D draws its
novelty/newness not in one technical area but a unique combination of different
technical areas. For example, a Typical ICT patent may be a composite effect
for advancing the knowledge in some combination of Control Engg, Electronic
Components, Databases Technology, Information retrieval methodology, Internet
and Wireless technology, Speech, Signal, and Image Processing etc. In this
paper, the work has been reported for the content classification for a newly
drafted patent document using Probabilistic Latent Semantic Analysis technique.
The probabilistic latent semantic analysis (PLSA) is used for automated
indexing of the document by creating an indexer which tokenizes the documents
and creates a proper generative model. Herein a singular value decomposition
model is used for compacting the size of term document matrix and their
co-occurrences in the matrix. The objective is to take up the large document
corpora generated from the past patent document to categorize documents based
on the concept generated model. The approach is illustrated and has been tested
for by an example classification of the content for two typical US Patent
Classes, and has been found to work well for them.
A writer-independent off-line signature verification system based on
signature morphology
/
Kumar, Rajesh
/
Kundu, Lopamudra
/
Chanda, Bhabatosh
/
Sharma, J. D.
Proceedings of the 2010 International Conference on Intelligent Interactive
Technologies and Multimedia
2010-12-28
p.261-265
© Copyright 2010 ACM
Summary: In this work, we address off-line signature verification as a
writer-independent system. We propose a set of morphological features,
extracted from off-line signature images. To examine the effectiveness of the
features, a publicly available signature database, namely CEDAR signature
database is used. A pair of signatures is fed to the system to give an
inference for their (dis)similarity. To get a compact set of features, a
multilayer perceptron based feature analysis technique is utilized. A 10-fold
cross-validation framework based on support vector machine is used for
verification. Receiver operator curve (ROC) analysis gives an equal error rate
(EER) of 11.59%, which is comparable to the state-of-the-arts reported on this
database.
Translating politeness across cultures: case of Hindi and English
Poster session 1: intercultural communication, virtual teams, and technology
/
Kumar, Ritesh
/
Jha, Girish Nath
Proceedings of the 2010 International Conference on Intercultural
Collaboration
2010-08-19
p.175-178
© Copyright 2010 ACM
Summary: In this paper, we present a corpus based study of politeness across two
languages-English and Hindi. It studies the politeness in a translated parallel
corpus of Hindi and English and sees how politeness in a Hindi text is
translated into English. We provide a detailed theoretical background in which
the comparison is carried out, followed by a brief description of the
translated data within this theoretical model. Since politeness may become one
of the major reasons of conflict and misunderstanding, it is a very important
phenomenon to be studied and understood cross-culturally, particularly for such
purposes as machine translation.
Max-cover in map-reduce
Full papers
/
Chierichetti, Flavio
/
Kumar, Ravi
/
Tomkins, Andrew
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.231-240
Keywords: greedy algorithm, map-reduce, maximum cover
© Copyright 2010 ACM
Summary: The NP-hard Max-k-cover problem requires selecting k sets from a collection
so as to maximize the size of the union. This classic problem occurs commonly
in many settings in web search and advertising. For moderately-sized instances,
a greedy algorithm gives an approximation of (1-1/e). However, the greedy
algorithm requires updating scores of arbitrary elements after each step, and
hence becomes intractable for large datasets.
We give the first max cover algorithm designed for today's large-scale
commodity clusters. Our algorithm has provably almost the same approximation as
greedy, but runs much faster. Furthermore, it can be easily expressed in the
MapReduce programming paradigm, and requires only polylogarithmically many
passes over the data. Our experiments on five large problem instances show that
our algorithm is practical and can achieve good speedups compared to the
sequential greedy algorithm.
Stochastic models for tabbed browsing
Full papers
/
Chierichetti, Flavio
/
Kumar, Ravi
/
Tomkins, Andrew
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.241-250
Keywords: branching process, convergence, random walks, stationary distribution,
tabbed browsing
© Copyright 2010 ACM
Summary: We present a model of tabbed browsing that represents a hybrid between a
Markov process capturing the graph of hyperlinks, and a branching process
capturing the birth and death of tabs. We present a mathematical criterion to
characterize whether the process has a steady state independent of initial
conditions, and we show how to characterize the limiting behavior in both
cases. We perform a series of experiments to compare our tabbed browsing model
with pagerank, and show that tabbed browsing is able to explain 15-25% of the
deviation between actual measured browsing behavior and the behavior predicted
by the simple pagerank model. We find this to be a surprising result, as the
tabbed browsing model does not make use of any notion of site popularity, but
simply captures deviations in user likelihood to open and close tabs from a
particular node in the graph.
A characterization of online browsing behavior
Full papers
/
Kumar, Ravi
/
Tomkins, Andrew
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.561-570
Keywords: browsing, pageviews, toolbar analysis
© Copyright 2010 ACM
Summary: In this paper, we undertake a large-scale study of online user behavior
based on search and toolbar logs. We propose a new CCS taxonomy of pageviews
consisting of Content (news, portals, games, verticals, multimedia),
Communication (email, social networking, forums, blogs, chat), and Search (Web
search, item search, multimedia search). We show that roughly half of all
pageviews online are content, one-third are communications, and the remaining
one-sixth are search. We then give further breakdowns to characterize the
pageviews within each high-level category.
We then study the extent to which pages of certain types are revisited by
the same user over time, and the mechanisms by which users move from page to
page, within and across hosts, and within and across page types. We consider
robust schemes for assigning responsibility for a pageview to ancestors along
the chain of referrals. We show that mail, news, and social networking
pageviews are insular in nature, appearing primarily in homogeneous sessions of
one type. Search pageviews, on the other hand, appear on the path to a
disproportionate number of pageviews, but cannot be viewed as the principal
mechanism by which those pageviews were reached.
Finally, we study the burstiness of pageviews associated with a URL, and
show that by and large, online browsing behavior is not significantly affected
by "breaking" material with non-uniform visit frequency.