HCI Bibliography Home | HCI Conferences | HYPER Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
HYPER Tables of Contents: 0506070809101112131415

Proceedings of the 2015 ACM Conference on Hypertext and Social Media

Fullname:HT'15: 26th ACM Conference on Hypertext & Social Media
Editors:Yeliz Yesilada; Rosta Farzan; Geert-Jan Houben
Location:Guzelyurt, TRNC, Cyprus
Dates:2015-Sep-01 to 2015-Sep-04
Publisher:ACM
Standard No:ISBN: 978-1-4503-3395-5; ACM DL: Table of Contents; hcibib: HYPER15
Papers:48
Pages:344
Links:Conference Website
  1. Keynote 1
  2. Session 1
  3. Session 2
  4. Session 3
  5. Keynote 2
  6. Session 4
  7. Session 5
  8. Session 6
  9. Session 7
  10. Session 8
  11. Session 9
  12. Doctoral Consortium Abstracts
  13. Demonstrations
  14. Poster Abstracts
  15. Late-Breaking Abstracts

Keynote 1

The Near Future is Hybrid BIBAFull-Text 1
  Yiannis Laouris
The exponential growth of the web, in connection with all its derivatives and all scientific, social, economic and other consequences, created the widely accepted notion that the future(s) is (are) digital. This could not be more wrong and more misleading. Indeed, at least for the next couple of decades, the futures are hybrid in all aspects and in practically all domains. For example, we are still far away from an educational system that operates only in virtual worlds. Friends' circles in social networking sites turn out to serve primarily the sustainability of existing real world friendships. While simulations and software (virtual instrument) solutions were trendy during the last few decades, we now witness a rapid development in robotics. Flying (i.e., drones), ground robots (i.e., robot dogs), and microcontrollers, fully equipped with sensors and actuators are about to massively populate every natural or man-made environment. This talk will discuss how underlying principles of hybrid futures evade and dictate developments in every aspect of IT, ranging from education and HCI, to visualizations and digital humanities, all grounded in the intelligent linking between software algorithms and physical infrastructures. The implications for the Anthropocene will be highlighted and discussed.

Session 1

Small-Scale Incident Detection based on Microposts BIBAFull-Text 3-12
  Axel Schulz; Benedikt Schmidt; Thorsten Strufe
Detecting large-scale incidents based on microposts has successfully been proposed and shown. However, the detection of small-scale incidents was not satisfyingly possible so far, though the information that is shared during such local events could improve the situational awareness of both citizens and decision makers alike.
   In this paper, we propose an approach for small-scale incident detection based on spatial-temporal-type clustering. In contrast to existing work, (1) we employ three distinct properties that define an incident, (2) we use a hybrid approach to reduce the computational overhead, and (3) we extract generalized features to increase robustness towards previously unseen data. Our evaluation in the domain of emergency first response shows that our approach identifies 32.14% of all real world incidents recorded for the city of Seattle just using on tweets. This result greatly outperforms the state of the art, which only detects about 6% of the real-world incidents. Also, a precision of 77% shows that we efficiently discard irrelevant information.
Did You Expect Your Users to Say This?: Distilling Unexpected Micro-reviews for Venue Owners BIBAFull-Text 13-22
  Wen-Haw Chong; Bing Tian Dai; Ee-Peng Lim
With social media platforms such as Foursquare, users can now generate concise reviews, i.e. micro-reviews, about entities such as venues (or products). From the venue owner's perspective, analysing these micro-reviews will offer interesting insights, useful for event detection and customer relationship management. However not all micro-reviews are equally important, especially since a venue owner should already be familiar with his venue's primary aspects. Instead we envisage that a venue owner will be interested in micro-reviews that are unexpected to him. These can arise in many ways, such as users focusing on easily overlooked aspects (by the venue owner), making comparisons with competitors, using unusual language or mentioning rare venue-related events, e.g. a dish being contaminated with bugs. Hence in this study, we propose to discover unexpected information in micro-reviews, primarily to serve the needs of venue owners.
   Our proposed solution is to score and rank micro-reviews, for which we design a novel topic model, Sparse Additive Micro-Review (SAMR). Our model surfaces micro-review topics related to the venues. By properly offsetting these topics, we then derive unexpected micro-reviews. Qualitatively, we observed reasonable results for many venues. We then evaluate ranking accuracy using both human annotation and an automated approach with synthesized data. Both sets of evaluation indicate that our novel topic model, Sparse Additive Micro-Review (SAMR) has the best ranking accuracy, outperforming baselines using chi-square statistics and the vector space model.
Sentiment-based User Profiles in Microblogging Platforms BIBAFull-Text 23-32
  Francisco J. Gutierrez; Barbara Poblete
Twitter has become one of the major platforms for self-expression in the Social Web, mostly due to its adoption by mobile users and its short message format. This presents endless possibilities for social behavior researchers that, for the first time, have access to massive amounts of data generated by humans. Nevertheless, most of the current research on emotions in social platforms focuses on reactions to particular events, or crowd behavior. In this article we present our research in the identification and characterization of user sentiment profiles in online social media. By analyzing a dataset of more than 36,000 users, we identify several distinctive groups, according to similarities in their sentiment behavior. We study differences and similarities between these profile clusters and present detailed statistics. We found that a large number of Twitter users can be grouped in nine distinct profiles according to the strength and polarity of their sentiment. Researchers and practitioners can benefit from our approach to characterize Twitter users in several scenarios, such as social recommendation, and mood estimation.
Breaking Bad: Understanding Behavior of Crowd Workers in Categorization Microtasks BIBAFull-Text 33-38
  Ujwal Gadiraju; Patrick Siehndel; Besnik Fetahu; Ricardo Kawase
Crowdsourcing systems are being widely used to overcome several challenges that require human intervention. While there is an increase in the adoption of the crowdsourcing paradigm as a solution, there are no established guidelines or tangible recommendations for task design with respect to key parameters such as task length, monetary incentive and time required for task completion. In this paper, we propose the tuning of these parameters based on our findings from extensive experiments and analysis of categorization tasks. We delve into the behavior of workers that consume categorization tasks to determine measures that can make task design more effective.

Session 2

Content Virality on Online Social Networks: Empirical Evidence from Twitter, Facebook, and Google+ on German News Websites BIBAFull-Text 39-47
  Irina Heimbach; Benjamin Schiller; Thorsten Strufe; Oliver Hinz
The virality of content describes its likelihood to be shared with peers. In this work, we investigate how content characteristics impact the sharing likelihood of news articles on Twitter, Facebook, and Google+. We examine a random sample of 4,278 articles from the most popular news websites in Germany categorized by human classifiers and text mining tools. Our analysis reveals commonalities and subtle differences between the three networks indicating different sharing patterns of their users.
A Dynamical Model of Twitter Activity Profiles BIBAFull-Text 49-57
  Hoai Nguyen Huynh; Erika Fille Legara; Christopher Monterola
The advent of the era of Big Data has allowed many researchers to dig into various socio-technical systems, including social media platforms. In particular, these systems have provided them with certain verifiable means to look into certain aspects of human behavior. In this work, we are specifically interested in the behavior of individuals on social media platforms -- how they handle the information they get, and how they share it. We look into Twitter to understand the dynamics behind the users' posting activities -- tweets and retweets -- zooming in on topics that peaked in popularity. Three mechanisms are considered: endogenous stimuli, exogenous stimuli, and a mechanism that dictates the decay of interest of the population in a topic. We propose a model involving two parameters η* and λ describing the tweeting behaviour of users, which allow us to reconstruct the findings of Lehmann et al. (2012) on the temporal profiles of popular Twitter hashtags. With this model, we are able to accurately reproduce the temporal profile of user engagements on Twitter. Furthermore, we introduce an alternative in classifying the collective activities on the socio-technical system based on the model.
The Role of Structural Information for Designing Navigational User Interfaces BIBAFull-Text 59-68
  Dimitar Dimitrov; Philipp Singer; Denis Helic; Markus Strohmaier
Today, a variety of user interfaces exists for navigating information spaces, including, for example, tag clouds, breadcrumbs, subcategories and others. However, such navigational user interfaces are only useful to the extent that they expose the underlying topology -- or network structure -- of the information space. Yet, little is known about which topological clues should be integrated in navigational user interfaces. In detail, the aim of this paper is to identify what kind of and how much topological information needs to be included in user interfaces to facilitate efficient navigation. We model navigation as a variation of a decentralized search process with partial information and study its sensitivity to the quality and amount of the structural information used for navigation. We experiment with two strategies for node selection (quality of structural information provided to the user) and different amount of information (amount of structural information provided to the user). Our experiments on four datasets from different domains show that efficient navigation depends on the kind of structural information utilized. Additionally, node properties differ in their quality for augmenting navigation and intelligent pre-selection of which nodes to present in the interface to the user can improve navigational efficiency. This suggests that only a limited amount of high quality structural information needs to be exposed through the navigational user interface.
Wisdom of the Crowd or Wisdom of a Few?: An Analysis of Users' Content Generation BIBAFull-Text 69-74
  Ricardo Baeza-Yates; Diego Saez-Trumper
In this paper we analyze how user generated content (UGC) is created, challenging the well known it wisdom of crowds concept. Although it is known that user activity in most settings follow a power law, that is, few people do a lot, while most do nothing, there are few studies that characterize well this activity. In our analysis of datasets from two different social networks, Facebook and Twitter, we find that a small percentage of active users and much less of all users represent 50% of the UGC. We also analyze the dynamic behavior of the generation of this content to find that the set of most active users is quite stable in time. Moreover, we study the social graph, finding that those active users area a highly connected among them. This implies that most of the wisdom comes from a few users challenging the independence assumption needed to have a wisdom of crowds. We also address the content that is never seen by any people (the digital desert), which challenges the assumption that the content of every person should be taken in account in the collective decision. At the end this is not surprising, as the Web is a reflection of our own society, where economical or political power also is in the hands of minorities.

Session 3

Machine Classification and Analysis of Suicide-Related Communication on Twitter BIBAFull-Text 75-84
  Pete Burnap; Walter Colombo; Jonathan Scourfield
The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation.
Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides BIBAFull-Text 85-94
  Mrinal Kumar; Mark Dredze; Glen Coppersmith; Munmun De Choudhury
The Werther effect describes the increased rate of completed or attempted suicides following the depiction of an individual's suicide in the media, typically a celebrity. We present findings on the prevalence of this effect in an online platform: r/SuicideWatch on Reddit. We examine both the posting activity and post content after the death of ten high-profile suicides. Posting activity increases following reports of celebrity suicides, and post content exhibits considerable changes that indicate increased suicidal ideation. Specifically, we observe that post-celebrity suicide content is more likely to be inward focused, manifest decreased social concerns, and laden with greater anxiety, anger, and negative emotion. Topic model analysis further reveals content in this period to switch to a more derogatory tone that bears evidence of self-harm and suicidal tendencies. We discuss the implications of our findings in enabling better community support to psychologically vulnerable populations, and the potential of building suicide prevention interventions following high-profile suicides.
A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms BIBAFull-Text 95-99
  Dominic Rout; Kalina Bontcheva
Social media monitoring is now an essential part of brand management, political science, and news production. Automatic tweet ranking and content recommendation methods are required, in order to support human analysts in deriving useful insights from large-scale social media data. To facilitate the development and comparative evaluation of tweet ranking methods, a task for which re-tweets do not form a reliable gold standard, a new, openly available Twitter corpus has been created. A number of results for several popular recommendation algorithms are presented for this corpus.

Keynote 2

From Small Sensors to Big Data BIBAFull-Text 101
  Barry Smyth
In our increasingly digitized world almost everything we do creates a record that is stored somewhere, whether we are purchasing a book, calling a friend, ordering a meal, or renting a movie. And in today's world of sensors and internet-enabled devices, smartphones and wearables, this is no longer just limited to our online activities. Exercising in the park, shopping for groceries, falling asleep, or even taking a shower, are just some of the everyday real-world activities that are likely to generate data. This is the big data world of the so-called Sensor Web. It is enabled by the widescale availability of high-performance computing, always-on communications, and mobile computing devices that come equipped with a variety of powerful sensors. This provides for a powerful computing and sensing ecosystem with important applications across all aspects of how we live, work, and play.
   The primary challenge for us now is to understand how we can (and whether we should) use this information. On the one hand, the promise of big data analytics is better decisions: better decisions about where we might live or where to send our kids to school; better decisions about the food we eat and the exercise we should take; and better decisions about some of the biggest choices facing modern societies when it comes to health, education, energy, and climate. On the other hand, this potential has a darker side, in the form of a gradual erosion of personal privacy as businesses and even governments seek to exploit our personal data for their own purposes, often without our informed consent.
   What is certain is that the combination of mobile computation, cheap but powerful sensors, and big data analytics points to new ways of thinking about some of society's toughest challenges. But to take advantage of these benefits we must reconcile the promise of big data with the pitfalls of privacy. Only then can these technologies can have a meaningful impact on how we can all benefit from the big data revolution as part of a healthier, safer, fairer world.

Session 4

Examining Personalization in Academic Web Search BIBAFull-Text 103-111
  Sara Salehi; Jia Tina Du; Helen Ashman
Personalization promises to improve the accuracy of Web search and has been drawing much research attention recently. Some evidence indicates that for educational purposes, the disadvantages of personalized search are not justified by its benefits. The potential issues with search personalization, especially in an educational context, include loss of serendipity and capability, commercialization of education and the "Filter Bubble" effect where users are denied information if search engine algorithms decide it is irrelevant to them. The majority of students in higher education make use of general-purpose search engines to find academic information, however we have little knowledge about the effects of personalization on learners' experience and achievements. This observation motivates the research in this paper. First, we surveyed 120 university students to investigate which research sources, including search engines they predominately use and how much they depend on each for educational purposes. We learned that the majority of students prefer Google to other search engines; indeed sometimes it is their primary or only information-seeking tool. Additionally, about 80% of them use search engines for educational purposes on daily basis. Second, we measured the difference between personalized and non-personalized search results for 120 academic search queries divided equally into four categories: Education, IT, Health sciences and Business. Our results showed that on average only 53% of links appear, not necessarily in the same order, in both personalized and non-personalized search results. Interestingly, we observed only slight differences in the extent of personalization based on academic topics.
An Interactive Method for Inferring Demographic Attributes in Twitter BIBAFull-Text 113-122
  Valentina Beretta; Daniele Maccagnola; Timothy Cribbin; Enza Messina
Twitter data offers an unprecedented opportunity to study demographic differences in public opinion across a virtually unlimited range of subjects. Whilst demographic attributes are often implied within user data, they are not always easily identified using computational methods. In this paper, we present a semi-automatic solution that combines automatic classification methods with a user interface designed to enable rapid resolution of ambiguous cases. TweetClass employs a two-step, interactive process to support the determination of gender and age attributes. At each step, the user is presented with feedback on the confidence levels of the automated analysis and can choose to refine ambiguous cases by examining key profile and content data. We describe how a user-centered design approach was used to optimise the interface and present the results of an evaluation which suggests that TweetClass can be used to rapidly boost demographic sample sizes in situations where high accuracy is required.
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics BIBAFull-Text 123-131
  Fred Morstatter; Jürgen Pfeffer; Katja Mayer; Huan Liu
Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic modeling algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the underlying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide insights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The probability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.
Media Bias in German Online Newspapers BIBAFull-Text 133-137
  Alexander Dallmann; Florian Lemmerich; Daniel Zoller; Andreas Hotho
Online newspapers have been established as a crucial information source, at least partially replacing traditional media like television or print media. As all other media, online newspapers are potentially affected by media bias. This describes non-neutral reporting of journalists and other news producers, e.g. with respect to specific opinions or political parties. Analysis of media bias has a long tradition in political science. However, traditional techniques rely heavily on manual annotation and are thus often limited to the analysis of small sets of articles. In this paper, we investigate a dataset that covers all political and economical news from four leading German online newspapers over a timespan of four years. In order to analyze this large document set and compare the political orientation of different newspapers, we propose a variety of automatically computable measures that can indicate media bias. As a result, statistically significant differences in the reporting about specific parties can be detected between the analyzed online newspapers.

Session 5

Characterizing Smoking and Drinking Abstinence from Social Media BIBAFull-Text 139-148
  Acar Tamersoy; Munmun De Choudhury; Duen Horng Chau
Social media has been established to bear signals relating to health and well-being states. In this paper, we investigate the potential of social media in characterizing and understanding abstinence from tobacco or alcohol use. While the link between behavior and addiction has been explored in psychology literature, the lack of longitudinal self-reported data on long-term abstinence has challenged addiction research. We leverage the activity spanning almost eight years on two prominent communities on Reddit: StopSmoking and StopDrinking. We use the self-reported "badge" information of nearly a thousand users as gold standard information on their abstinence status to characterize long-term abstinence. We build supervised learning based statistical models that use the linguistic features of the content shared by the users as well as the network structure of their social interactions. Our findings indicate that long-term abstinence from smoking or drinking (~one year) can be distinguished from short-term abstinence (~40 days) with 85% accuracy. We further show that language and interaction on social media offer powerful cues towards characterizing these addiction-related health outcomes. We discuss the implications of our findings in social media and health research, and in the role of social media as a platform for positive behavior change and therapy.
Twitter-based Election Prediction in the Developing World BIBAFull-Text 149-158
  Nugroho Dwi Prasetyo; Claudia Hauff
Elections are the main instrument of democracy. Citizens decide which entity or entities (a political party or a particular politician) should represent them. Traditionally, pre-election polls have been used to learn about trends and likely election outcomes. Predicting an election outcome based on user activity on Twitter has been shown to be a cheap alternative. While past research has focused on election prediction in the developed world (where its use is debatable), in this paper we provide a comprehensive argument for the use of Twitter-based election forecasting in the developing world. For our use case of Indonesia's presidential elections 2014, the most basic Twitter-predictor outperforms the majority of traditional polls, while the best performing predictor outperforms all traditional polls on the national level.
Language, Twitter and Academic Conferences BIBAFull-Text 159-163
  Ruth Olimpia G. Gavilanes; Diego Gomez; Denis Parra Santander; Christoph Trattner; Andreas Kaltenbrunner; Eduardo Graells
Using Twitter during academic conferences is a way of engaging and connecting an audience inherently multicultural by the nature of scientific collaboration. English is expected to be the lingua franca bridging the communication and integration between native speakers of different mother tongues. However, little research has been done to support this assumption. In this paper we analyzed how integrated language communities are by analyzing the scholars' tweets used in 26 Computer Science conferences over a time span of five years. We found that although English is the most popular language used to tweet during conferences, a significant proportion of people also tweet in other languages. In addition, people who tweet solely in English interact mostly within the same group (English monolinguals), while people who speak other languages interact more with different lingua groups. Finally, we also found higher interaction between people tweeting in different languages. These results suggest a relation between the number of languages a user speaks and their interaction dynamics in online communities.

Session 6

First Women, Second Sex: Gender Bias in Wikipedia BIBAFull-Text 165-174
  Eduardo Graells-Garrido; Mounia Lalmas; Filippo Menczer
Contributing to the writing of history has never been as easy as it is today. Anyone with access to the Web is able to play a part on Wikipedia, an open and free encyclopedia, and arguably one of the primary sources of knowledge on the Web. In this paper, we study gender bias in Wikipedia in terms of how women and men are characterized in their biographies. To do so, we analyze biographical content in three aspects: meta-data, language, and network structure. Our results show that, indeed, there are differences in characterization and structure. Some of these differences are reflected from the off-line world documented by Wikipedia, but other differences can be attributed to gender bias in Wikipedia content. We contextualize these differences in social theory and discuss their implications for Wikipedia policy.
Cultures in Community Question Answering BIBAFull-Text 175-184
  Imrul Kayes; Nicolas Kourtellis; Daniele Quercia; Adriana Iamnitchi; Francesco Bonchi
CQA services are collaborative platforms where users ask and answer questions. We investigate the influence of national culture on people's online questioning and answering behavior. For this, we analyzed a sample of 200 thousand users in Yahoo Answers from 67 countries. We measure empirically a set of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert Levine's Pace of Life and show that behavioral cultural differences exist in community question answering platforms. We find that national cultures differ in Yahoo Answers along a number of dimensions such as temporal predictability of activities, contribution-related behavioral patterns, privacy concerns, and power inequality.
Mining Affective Context in Short Films for Emotion-Aware Recommendation BIBAFull-Text 185-194
  Claudia Orellana-Rodriguez; Ernesto Diaz-Aviles; Wolfgang Nejdl
Emotion is fundamental to human experience and impacts our daily activities and decision-making processes where, e.g., the affective state of a user influences whether or not she decides to consume a recommended item -- movie, book, product or service. However, information retrieval and recommendation tasks have largely ignored emotion as a source of user context, in part because emotion is difficult to measure and easy to misunderstand. In this paper we explore the role of emotions in short films and propose an approach that automatically extracts affective context from user comments associated to short films available in YouTube, as an alternative to explicit human annotations. We go beyond the traditional polarity detection (i.e., positive/negative), and extract for each film four opposing pairs of primary emotions: joy-sadness, anger-fear, trust-disgust, and anticipation-surprise. Finally, in our empirical evaluation, we show how the affective context extracted automatically can be leveraged for emotion-aware film recommendation.
An Investigation into the Use of Logical and Rhetorical Tactics within Eristic Argumentation on the Social Web BIBAFull-Text 195-199
  Tom Blount; David E. Millard; Mark J. Weal
Argumentation is a key aspect of communications and can broadly be broken down into problem solving (dialectic) and quarrelling (eristic). Techniques used within argumentation can likewise be classified as fact-based (logical), or emotion/audience-based (rhetorical). Modelling arguments on the social web is a challenge for those studying computational argumentation as formal models of argumentation tend to assume a logical argument, whereas argumentation on the social web is often largely rhetorical. To investigate the application of logical versus rhetorical techniques on the social web, we bring together two ontologies used for modelling argumentation and online communities respectively, the Argument Interchange Format and the Semantic Interlinked Online Communities project. We augment these with our own ontology for modelling rhetorical argument, the Argumentation on the Social Web Ontology, and trial our additions by examining three case studies following argumentation on different categories of social media. Finally, we present examples of how rhetorical argumentation is used in the context of the social web and show that there are clear markers present that can allow for a rudimentary estimate for the classification of a social media post with regards to its contribution to a discussion.

Session 7

Predicting Answering Behaviour in Online Question Answering Communities BIBAFull-Text 201-210
  Grégoire Burel; Paul Mulholland; Yulan He; Harith Alani
The value of Question Answering (Q&A) communities is dependent on members of the community finding the questions they are most willing and able to answer. This can be difficult in communities with a high volume of questions. Much previous has work attempted to address this problem by recommending questions similar to those already answered. However, this approach disregards the question selection behaviour of the answers and how it is affected by factors such as question recency and reputation. In this paper, we identify the parameters that correlate with such a behaviour by analysing the users' answering patterns in a Q&A community. We then generate a model to predict which question a user is most likely to answer next. We train Learning to Rank (LTR) models to predict question selections using various user, question and thread feature sets. We show that answering behaviour can be predicted with a high level of success, and highlight the particular features that influence users' question selections.
A Long-Term Study of a Crowdfunding Platform: Predicting Project Success and Fundraising Amount BIBAFull-Text 211-220
  Jinwook Chung; Kyumin Lee
Crowdfunding platforms have become important sites where people can create projects to seek funds toward turning their ideas into products, and back someone else's projects. As news media have reported successfully funded projects (e.g., Pebble Time, Coolest Cooler), more people have joined crowdfunding platforms and launched projects. But in spite of rapid growth of the number of users and projects, a project success rate at large has been decreasing because of launching projects without enough preparation and experience. To solve the problem, in this paper we (i) collect the largest datasets from Kickstarter, consisting of all project profiles, corresponding user profiles, projects' temporal data and users' social media information; (ii) analyze characteristics of successful projects, behaviors of users and understand dynamics of the crowdfunding platform; (iii) propose novel statistical approaches to predict whether a project will be successful and a range of expected pledged money of the project; and (iv) develop predictive models and evaluate performance of the models. Our experimental results show that the predictive models can effectively predict project success and a range of expected pledged money.
Analyzing Book-Related Features to Recommend Books for Emergent Readers BIBAFull-Text 221-230
  Maria Soledad Pera; Yiu-Kai Ng
We recognize that emergent literacy forms a foundation upon which children will gage their future reading. It is imperative to motivate young readers to read by offering them appealing books to read so that they can enjoy reading and gradually establish a reading habit during their formative years that can aid in promoting their good reading habits. However, with the huge volume of existing and newly-published books, it is a challenge for parents/educators (young readers, respectively) to find the right ones that match children's interests and their read-ability levels. In response to the needs, we have developed K3Rec, a recommender which applies a multi-dimensional approach to suggest books that simultaneously match the interests/preferences and reading abilities of emergent (i.e., K-3) readers. K3Rec considers the grade levels, contents, illustrations, and topics, besides using special properties, such as length and writing style, to distinguish K-3 books from other books targeting more mature readers. K3Rec is novel, since it adopts an unsupervised strategy to suggest books for K-3 readers which does not rely on the existence of personal social media data, such as personal tags and ratings, that are seldom, if ever, created by emergent readers. Further-more, unlike existing book recommenders, K3Rec explicitly analyzes book illustrations, which is of special significance for emergent readers, since illustrations assist these readers in understanding the contents of books. K3Rec focuses on a niche group of readers that has not been explicitly targeted by existing book recommenders. Empirical studies conducted using data from BiblioNasium.com and Amazon's Mechanical Turk have verified the effectiveness of K3Rec in making book recommendations for emergent readers.
Pairwise Preferences Elicitation and Exploitation for Conversational Collaborative Filtering BIBAFull-Text 231-236
  Laura Blédaité; Francesco Ricci
The research and development of recommender systems is dominated by models of user's preferences learned from ratings for items. However, ratings have several disadvantages, which we discuss, and in order to address these issues we analyse another way to articulate preferences, i.e., as pairwise comparisons: item A is preferred to item B. We have developed a recommendation technology that, combining ratings and pairwise preferences, can generate better recommendations than a state of the art solution uniquely based on ratings.

Session 8

Surpassing the Limit: Keyword Clustering to Improve Twitter Sample Coverage BIBAFull-Text 237-245
  Justin Sampson; Fred Morstatter; Ross Maciejewski; Huan Liu
Social media services have become a prominent source of research data for both academia and corporate applications. Data from social media services is easy to obtain, highly structured, and comprises opinions from a large number of extremely diverse groups. The microblogging site, Twitter, has garnered a particularly large following from researchers by offering a high volume of data streamed in real time. Unfortunately, the methods in which Twitter selects data to disseminate through the stream are either vague or unpublished. Since Twitter maintains sole control of the sampling process, it leaves us with no knowledge of how the data that we collect for research is selected. Additionally, past research has shown that there are sources of bias present in Twitters dissemination process. Such bias introduces noise into the data that can reduce the accuracy of learning models and lead to bad inferences. In this work, we take an initial look at the efficiency of Twitter limit track as a sample population estimator. After that, we provide methods to mitigate bias by improving sample population coverage using clustering techniques.
Other Times, Other Values: Leveraging Attribute History to Link User Profiles across Online Social Networks BIBAFull-Text 247-255
  Paridhi Jain; Ponnurangam Kumaraguru; Anupam Joshi
Profile linking is the ability to connect profiles of a user on different social networks. Linked profiles can help companies like Disney to build psychographics of potential customers and segment them for targeted marketing in a cost-effective way. Existing methods link profiles by observing high similarity between most recent (current) values of the attributes like name and username. However, for a section of users observed to evolve their attributes over time and choose dissimilar values across their profiles, these current values have low similarity. Existing methods then falsely conclude that profiles refer to different users. To reduce such false conclusions, we suggest to gather rich history of values assigned to an attribute over time and compare attribute histories to link user profiles across networks. We believe that attribute history highlights user preferences for creating attribute values on a social network. Co-existence of these preferences across profiles on different social networks result in alike attribute histories that suggests profiles potentially refer to a single user. Through a focused study on username, we quantify the importance of username history for profile linking on a dataset of real-world users with profiles on Twitter, Facebook, Instagram and Tumblr. We show that username history correctly links 44% more profile pairs with non-matching current values that are incorrectly unlinked by existing methods. We further explore if factors such as longevity and availability of username history on either profiles affect linking performance. To the best of our knowledge, this is the first study that explores viability of using an attribute history to link profiles on social networks.
Only One Out of Five Archived Web Pages Existed as Presented BIBAFull-Text 257-266
  Scott G. Ainsworth; Michael L. Nelson; Herbert Van de Sompel
When a user retrieves a page from a web archive, the page is marked with the acquisition datetime of the root resource, which effectively asserts "this is how the page looked at a that datetime." However, embedded resources, such as images, are often archived at different datetimes than the main page. The presentation appears temporally coherent, but is composed from resources acquired over a wide range of datetimes. We examine the completeness and temporal coherence of composite archived resources (composite mementos) under two selection heuristics. The completeness and temporal coherence achieved using a single archive was compared to the results achieved using multiple archives. We found that at most 38.7% of composite mementos are both temporally coherent and that at most only 17.9% (roughly 1 in 5) are temporally coherent and 100% complete. Using multiple archives increases mean completeness by 3.1-4.1% but also reduces temporal coherence.
Opportunistic Layered Hypernarrative BIBAFull-Text 267-272
  Harold T. Goranson
We are designing a system to model narrative structures as expressed in lightly formalized text-centric chunks. The system is novel in shifting much of the organizational complexity to a categoric reasoning system in a two sorted logic. We optimize for two goals. One goal is to grow a pool of stored insights concerning complex narrative constructions, learned by aggregating crowd-sourced insights. These emphasize qualities that escape capture by existing methods, and include deliberate ambiguities, poetic allusions, irony, self reference, dynamic reinterpretation and cinematic devices. A second goal, described here, is to present suggested, machine generated narrative paths for an unskilled user, generated on the fly and informed by developing insights of multiple narrative situations. The narrative paths may follow the target video's narrative, annotative machine constructed essays or some synthesis of these.

Session 9

No Reciprocity in "Liking" Photos: Analyzing Like Activities in Instagram BIBAFull-Text 273-282
  Jin Yea Jang; Kyungsik Han; Dongwon Lee
In social media, people often press a "Like" button to indicate their shared interest in a particular content or to acknowledge the user who posted the content. Such activities form relationships and networks among people, raising interesting questions about their unique characteristics and implications. However, little research has investigated such Likes as a main study focus. To address this lack of understanding, based on a theoretical framework, we present an analysis of the structural, influential, and contextual aspects of Like activities from the test datasets of 20 million users and their 2 billion Like activities in Instagram. Our study results first highlight that Like activities and networks increase exponentially, and are formed and developed by one's friends and many random users. Second, we observe that five other essential Instagram elements influence the number of Likes to different extents, but following others will not necessarily increase the number of Likes that one receives. Third, we explore the relationship between LDA-based topics and Likes, characterize two user groups-specialists and generalists-and show that specialists tend to receive more Likes and promote themselves more than generalists. We finally discuss theoretical and practical implications and future research directions.
Build Emotion Lexicon from Microblogs by Combining Effects of Seed Words and Emoticons in a Heterogeneous Graph BIBAFull-Text 283-292
  Kaisong Song; Shi Feng; Wei Gao; Daling Wang; Ling Chen; Chengqi Zhang
As an indispensable resource for emotion analysis, emotion lexicons have attracted increasing attention in recent years. Most existing methods focus on capturing the single emotional effect of words rather than the emotion distributions which are helpful to model multiple complex emotions in a subjective text. Meanwhile, automatic lexicon building methods are overly dependent on seed words but neglect the effect of emoticons which are natural graphical labels of fine-grained emotion. In this paper, we propose a novel emotion lexicon building framework that leverages both seed words and emoticons simultaneously to capture emotion distributions of candidate words more accurately. Our method overcomes the weakness of existing methods by combining the effects of both seed words and emoticons in a unified three-layer heterogeneous graph, in which a multi-label random walk (MLRW) algorithm is performed to strengthen the emotion distribution estimation. Experimental results on real-world data reveal that our constructed emotion lexicon achieves promising results for emotion classification compared to the state-of-the-art lexicons.
Random Voting Effects in Social-Digital Spaces: A Case Study of Reddit Post Submissions BIBAFull-Text 293-297
  Tim Weninger; Thomas James Johnston; Maria Glenski
At a time when information seekers first turn to digital sources for news and opinion, it is critical that we understand the role that social media plays in human behavior. This is especially true when information consumers also act as information producers and editors by their online activity. In order to better understand the effects that editorial ratings have on online human behavior, we report the results of a large-scale in-vivo experiment in social media. We find that small, random rating manipulations on social media submissions created significant changes in downstream ratings resulting in significantly different final outcomes. Positive treatment resulted in a positive effect that increased the final rating by 11.02% on average. Compared to the control group, positive treatment also increased the probability of reaching a high rating >=2000 by 24.6%. Contrary to the results of related work we also find that negative treatment resulted in a negative effect that decreased the final rating by 5.15% on average.
Tag Me Maybe: Perceptions of Public Targeted Sharing on Facebook BIBAFull-Text 299-303
  Saiph Savage; Andres Monroy-Hernandez; Kasturi Bhattacharjee; Tobias Höllerer
Social network sites allow users to publicly tag people in their posts. These tagged posts allow users to share to both the general public and a targeted audience, dynamically assembled via notifications that alert the people mentioned. We investigate people's perceptions of this mixed sharing mode through a qualitative study with 120 participants. We found that individuals like this sharing modality as they believe it strengthens their relationships. Individuals also report using tags to have more control of Facebook's ranking algorithm, and to expose one another to novel information and people. This work helps us understand people's complex relationships with the algorithms that mediate their interactions with each another. We conclude by discussing the design implications of these findings.The Hypertext 2015 Doctorial Consortium session is held the first day of the 26th ACM Conference on Hypertext and Social Media. It offers Ph.D. students an opportunity to present their ongoing research towards obtaining a Ph.D. degree in the disciplines related to the conference, mostly in Computer and Information Sciences but not limited to them. The themes of the submissions are strongly connected to the research tracks of this year's conference: Digital Connectivity, Data Connectivity and Digital Humanities. The Digital Connectivity track targets developing insights into the mechanisms of information generation and dissemination, characterization of evolutionary processes on online social networks, and studies of models and systems that support these processes. The second track, Data Connectivity, deals with the methods, techniques and technologies that can be used to make data available on the Web, with a special focus on how heterogeneous data sources can be connected to each other. Finally, the track of Digital Humanities seeks to attract work from an interdisciplinary perspective, on the intersection between computer science on one hand, and the humanities and social sciences on the other.
   During the DC session, students receive constructive feedback from their peers and from a panel of mentors to support them in envisioning added-value contributions to the state-of-the-art research in Hypertext and Social Media. The consortium session is open to all doctorial students by application. Each submission was reviewed by three senior researchers in the topics the students made their submissions. The DC session is aimed in particular at students who have defined a dissertation topic but are still more than one year from graduating at the time of application in order to obtain benefits from feedback.
   This year, the committee has accepted three contributions to be presented during the Doctorial Consortium session: Automated Methods for Identity Resolution in Heterogeneous Social Platforms by Paridhi Jahin. In this work, connected to the digital and data connectivity tracks of the conference, the author proposes novel methods to search and link user identities scattered across heterogeneous social networks. The methods consider carefully users' privacy, so they are designed to access only public and historic data. The evaluation is proposed on large datasets over multiple platforms to prove their significance in identity resolution of an online user. Language Innovation and Change in On-line Social Networks by Daniel Kershaw. As a fundamental aspect for human communication, this research focuses on forecasting online language change through the use of predictive and descriptive methodologies. This work is framed within structuration theory which helps the researcher in structuring the analysis of the dynamics of language (re)production -- i.e. by the agent (user), the social structure and their interplay. A Framework to Provide Customized Reuse of Open Corpus Content for Adaptive Systems by Mostafa Bayomi. This work deals with issues of reusability of contents on adaptive systems. Adaptive systems tailor content specific to user's needs, but they face two big problems: (a) they use a closed corpus content that has been prepared for them a priori, and (b) the content is tightly coupled with other parts of the system, which hinders its reusability. This work presents a proposal that leverages the semantic web by extending an existing content provision system, Slice-pedia.

Doctoral Consortium Abstracts

Automated Methods for Identity Resolution across Heterogeneous Social Platforms BIBAFull-Text 307-310
  Paridhi Jain
Users create identities on multiple social platforms for various purposes but often do not link them. Unlinked identities raise concerns for enterprises and security practitioners. To address the concerns, we propose novel methods to search and link user identities scattered across heterogenous social networks. Our methods are automated and access only public current and historic data of a user. Evaluation on fairly large datasets from multiple platforms prove our methods' efficiency in identity resolution of an online user.
Language Innovation and Change in On-line Social Networks BIBAFull-Text 311-314
  Daniel Kershaw; Matthew Rowe; Patrick Stacey
Language is fundamental to human communication -- throughout the course of history language has constantly evolved. This can currently be seen in the changing forms of colloquial language in various on-line social networks (OSN's). These innovations in language are even appearing in every day life with the recent induction of 'lol' and 'rofl' into modern dictionaries. Changes and varying forms of language pose challenges to both academics and people in business when attempting to assess and communicate with different communities.
   In this Ph.D, we aim to forecast online language change through the use of predictive and descriptive methodologies. Through using data sets mined from a number of OSNs, we aim to develop generalizable models and theories for assessing and predicting such language changes. We philosophically frame this work by drawing on structuration theory which helps us structure our analysis of the dynamics of language (re)production -- i.e. by the agent (user), the social structure and their interplay. We draw on state-of-the-art work and methods, including the development of neural nets to analyse language usage, along with network and community classification too uncover social structures within language. Preliminary results have identified statistically significant innovations usage across communities across a number of OSN's, this was done by operationalizing known linguistic models of innovation acceptance.
A Framework to Provide Customized Reuse of Open Corpus Content for Adaptive Systems BIBAFull-Text 315-318
  Mostafa Bayomi
One of the main services that Adaptive Systems offer to their users is the provision of content that is tailored to individual user's needs. Some Adaptive Systems use a closed corpus content that has been prepared for them a priori, hence, they accept only a narrow field of content. Furthermore, the content is tightly coupled with other parts of the system, which also hinders its re-usability. To address these limitations, recent systems started to make use of open Web content to provide a wider variety of content. Previous approaches have attempted to harness the information available on the web by providing adaptive systems with customizable information objects. Since adaptive systems are evolving towards the Semantic Web and the use of ontologies, existing systems are limited by their ability to service these documents solely through keyword-based queries. In this research we propose a novel framework that extends existing content provision system, Slicepedia. Our framework uses the conceptual representation of content to segment it in a semantic manner. The framework removes unnecessary content from web pages, such as navigation bars, and then semantically reveals the structural representation of text to build a tree-like hierarchy. This tree can be traversed to obtain different levels of content granularity that facilitate content discoverability and adaptivity.

Demonstrations

VizTrails: An Information Visualization Tool for Exploring Geographic Movement Trajectories BIBAFull-Text 319-320
  Martin Becker; Philipp Singer; Florian Lemmerich; Andreas Hotho; Denis Helic; Markus Strohmaier
Understanding the way people move through urban areas represents an important problem that has implications for a range of societal challenges such as city planning, public transportation, or crime analysis. In this paper, we present an interactive visualization tool called VizTrails for exploring and understanding such human movement. It features visualizations that show aggregated statistics of trails for geographic areas that correspond to grid cells on a map, e.g., on the number of users passing through or on cells commonly visited next. Amongst other features, system allows to overlay the map with the results of SPARQL queries in order to relate the observed trajectory statistics with its geo-spatial context, e.g., considering a city's points of interest. The systems functionality is demonstrated using trajectory examples extracted from the social photo sharing platform Flickr. Overall, VizTrails facilitates deeper insights into geo-spatial trajectory data by enabling interactive exploration of aggregated statistics and providing geo-spatial context.
"I like ISIS, but I want to watch Chris Nolan's new movie": Exploring ISIS Supporters on Twitter BIBAFull-Text 321-322
  Walid Magdy; Kareem Darwish; Ingmar Weber
The recent rise of the "Islamic State of Iraq and Syria" (ISIS) has sparked significant interest in the group. We explored the tweets of a large number of Twitter users who frequently comment on this subject by either showing support or opposition. ISIS supporters dedicate on average 20% of their tweets to ISIS related content, compared to 4.5% for those who oppose ISIS. Thus, the vast majority of tweets for both groups are on general topics, covering many aspects in life, including politics, religion, and even jokes and funny photos. Our demo allows users to search and explore 123 million tweets of 57 thousands Twitter users who have declared explicit positions towards ISIS. Given a query, our system displays a comparative report that shows the difference in views between supporters and opponents of ISIS on the search topic. The report includes a timeline of per day mentions of query terms in the tweets of each group, the top retweeted tweets, images, videos, and tagcloud of top terms in results for each group. Time navigation allows the exploration of content shared by both groups on specific dates, which can go back in time to the period before ISIS appeared.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization BIBAFull-Text 323-325
  Shubhanshu Mishra; Jana Diesner; Jason Byrne; Elizabeth Surbeck
The adjustment of probabilistic models for sentiment analysis to changes in language use and the perception of products can be realized via incremental learning techniques. We provide a free, open and GUI-based sentiment analysis tool that allows for a) relabeling predictions and/or adding labeled instances to retrain the weights of a given model, and b) customizing lexical resources to account for false positives and false negatives in sentiment lexicons. Our results show that incrementally updating a model with information from new and labeled instances can substantially increase accuracy. The provided solution can be particularly helpful for gradually refining or enhancing models in an easily accessible fashion while avoiding a) the costs for training a new model from scratch and b) the deterioration of prediction accuracy over time.

Poster Abstracts

Everything is Filed under 'File': Conceptual Challenges in Applying Semantic Search to Network Shares for Collaborative Work BIBAFull-Text 327-328
  Dirk Ahlers; Mahsa Mehrpoor
Lots of professional collaborative work relies on shared networked file systems for easy collaboration, documentation, and as a joint workspace. We have found that in an engineering setting with tens of thousands of files, usual desktop search does not work as well, especially if the project space is huge, contains a large number of non-textual files that are difficult to search for, and is partially unknown by the users due to information needs reaching into previous years or projects.
   We therefore propose an approach that joins content and metadata analysis, link derivation, grouping, and other measures to arrive at high-level features suitable for semantic similarity and retrieval to improve information access for this case of professional search.
On Recommending Newly Published Academic Papers BIBAFull-Text 329-330
  Jiwoon Ha; Soon-Hyoung Kwon; Sang-Wook Kim
To recommend newly-published papers that did not receive any citations yet, this paper proposes a novel method by using the authors' interest on the papers cited in the newly-published paper. Compared to citation-network based methods, the empirical validation shows a significant improvement with our method in accuracy of 11%-41% in precision and 8%-34% in recall.
On Recommending Job Openings BIBAFull-Text 331-332
  Yeon-Chang Lee; Jiwon Hong; Sang-Wook Kim; Sheng Gao; Ji-Yong Hwang
AskStory is a company providing an e-recruitment service where job seekers find a variety of job openings. This paper discusses an approach to recommending job openings attractive to job seekers.

Late-Breaking Abstracts

Collaborative Learning in the Cloud: A Cross-Cultural Perspective of Collaboration BIBAFull-Text 333-336
  Kathrin Kirchner; Liana Razmerita
This present study aims to investigate how students perceive collaboration and identifies associated technologies used to collaborate. In particular we aim to address the following research questions: What are the factors that impact satisfaction with collaboration? How do these factors differ in different collaborative settings? Based on data from 75 students from Denmark and Germany, the article identifies collaborative practices and factors that impact positively and negatively satisfaction with collaboration.
Longitudinal Analysis of Low-Level Web Interaction through Micro Behaviours BIBAFull-Text 337-340
  Aitor Apaolaza; Simon Harper; Caroline Jay
To truly understand how people learn to navigate and use a Web site or application, we need to collect real usage data over extended periods of time. Detailed Web interaction data gathered in the wild (from URLs visited, to keystrokes and mouse movements) has the potential to provide an in-depth, ecologically valid view of interaction, and enable an understanding of how behaviour evolves over time. Interpreting such data is extremely challenging, however. We present a longitudinal data-driven analysis of fine-grained interaction data captured from 14,000 recurrent users over 12 months. At the core of our approach is the aggregation of low-level interaction data into micro behaviours. By analysing changes in these behaviours as a function of users' accumulated interaction time, we were able to demonstrate how users' interaction evolves as they become more familiar with a Web page. The results demonstrate that monitoring micro behaviours offers a simple and easily extensible post hoc means of understanding how Web-based behaviour evolves over time.
User-Adapted Web of Things for Accessibility BIBAFull-Text 341-344
  Ilaria Torre; Ilknur Celik
This paper describes a new wave of the Web that is the user-adapted Web of Things. This is a new step in the evolution of the Web of Things and of adaptive web-based systems. The current proposals for the Web of Things focus on the augmentation of the physical objects in order to provide enhanced services. However, in our view, the Web of Things can also be a means to make physical objects accessible or more usable for people with special needs by exploiting adaptive and semantic techniques. The architecture presented in the paper describes the specific modules and components at the basis of this approach.