Workshop - "Online Social Networks: Emerging Trends", 7-8 October 2014, Cyprus
Date: October 7-8, 2014
Location: Nicosia, Cyprus
Organised by: University of Cyprus (UCY)
The Laboratory for Internet Computing (LINC), Department of Computer Science of the University of Cyprus, is hosting the open workshop entitled: "Online Social Networks: Emerging Trends in Research and Innovation" that will take place on 07-08 October 2014, at the University of Cyprus, New Campus (map).
The workshop on "Online Social Networks: Emerging Trends" is organised by the Marie Curie Initial Training Network (ITN), iSocial. The workshop is dedicated to the current status of online social networks research and to practices of taking research into the market. The 2nd day of the workshop is dedicated to industry, featuring talks from Google, Yahoo!, Telefonica, and Useum; a poster session with iSocial research; and a presentation of the European Commission's initiative on Web entrepreneurship.
Participation to the "Online Social Networks: Emerging Trends" workshop is free of charge. However, all participants are kindly requested to complete the registration.
|Day 1: Oct. 7, 10:00 - 17:30
Room B108, Leventis Building (ΚΑΛ)
|Professor Krishna Gummadi
Head Networked Systems Research Group
Max Planc Institute for Software Systems
|Dr. Anastasios Noulas
University of Cambridge
||Dr. Demetris Antoniades
Laboratory for Internet Computing, UCY
|Day 2: Oct. 8, 9:00 - 17:00
Room 010, Social Facilities Building 7 (ΚΟΔ2)
|Dr. Maciej Kurant
Research Group, Google Zurich
||Dr. Nikolaos Laoutaris
Senior Researcher, Internet Group
Telefonica Research, Barcelona
|Dr. Gianmarco De Francisci Morales
Yahoo Labs Barcelona
European Commission DG
Content and Technology
Bio, Abstract, Presentation
Founder and CEO
Tuesday, 07 October 2014 (Room B108, Leventis Building (ΚΑΛ))
Session I: Online Social Networks Session
|10:00 - 11:00||On the Strength of Weak Identities in Social Computing Systems (presentation available in pdf)
Speaker: Prof. Krishna Gummadi (Max Planck Institute for Software Systems (MPI-SWS))
|11:00 - 11:30||Coffee Break|
|11:30 - 12:30||
Co-evolutionary dynamics in social networks: A case study of Twitter
|12:30 - 14:00||Lunch Break
Session II: Tutorial Session
|14:00 - 15:30|
|15:30 - 16:00||
|16:00 - 17:30||
Location-based Services – Part B
Wednesday, 08 October 2014 - Industrial Day (Room 010, Social Facilities Building 7 (ΚΟΔ2))
Session I: Web Search and Data Mining Session
|09:00 - 10:00||
Knowledge in Search Engines
|10:00 - 11:00|
|11:00 - 11:30||Coffee Break|
|11:30 - 12:30||
SAMOA: A Platform for Mining Big Data Streams
|12:30 - 14:00||
Posters presentations and Lunch
Session II: Entrepreneurship Session
|14:00 - 14:45|
|14:45 - 15:30||
USEUM: A PhD turned Start-up
|15:30 - 16:30||
Personal Experiences and Best Practices in Taking Research to the Market
|16:30 - 16:50||Networking
Today's social computing systems like Facebook and Twitter have an achilles heel: users of these systems operate behind "weak identities", i.e., identities that can be forged without too much effort. Attackers can easily create forge multiple fake identities and manipulate the functionality of the system. There is mounting evidence that such fake identities are being used to introduce / promote spam content or to manipulate the real popularity of existing users and content on these systems. In this talk, I will first discuss existing defense strategies that largely focus on detecting and suspending fake identities in social computing systems. Later, I will propose new approaches to reason about the "strength" or "trustworthiness" of weak identities.
Complex networks often exhibit co-evolutionary dynamics, meaning that the network topology and the state of nodes or links are coupled,affecting each other in overlapping time scales. We focus on the co-evolutionary dynamics of online social networks, and on Twitter in particular. Monitoring the activity of thousands of Twitter users in real-time, and tracking their followers and tweets/retweets, we propose a method to infer new retweet-driven follower relations. The formation of such relations is much more likely than the exogenous creation of new followers in the absence of any retweets. We identify the most significant factors(reciprocity and the number of retweets that a potential new follower receives) and propose a simple probabilistic model of this effect. We also discuss the implications of such co-evolutionary dynamics on the topology and function of a social network. Finally, we briefly consider a second instance of co-evolutionary dynamics on Twitter, namely the possibility that a user removes a follower link after receiving a tweet or retweet from the corresponding followee.
The wide adoption of smartphones by mobile users a few years ago signalled a massive transition of the web ecosystem towards the integration of information about regions, locations and places. Foursquare, Google Places and Yelp are only a few of the services that have risen by attracting the attention of millions of mobile web user around the globe. In the meantime, technologies that are disrupting traditional industries are being introduced by applications such as Uber and Airbnb and promise the arrival of new datasets that describe activity in urban environments. In this tutorial, we will explore the opportunities, brought forward by the new datasets that emerge from location-based services to analyse and model human mobility in cities. Our approach will stretch from the presentation of techniques inspired by research in complex systems, to data mining methodologies for modern local search and mobile recommendation frameworks. Subsequently, we will go through a number of scenarios for new applications in the area that include event detection and recommendation, location-based retail analytics and the modelling of neighbourhoods in cities.
Over the last few years, search engines have evolved from complex keyword matching and scoring algorithms towards systems that can understand users questions and answer them directly. Important factors in this evolution are the ability to represent knowledge in a way that computers can manipulate it, and the ability to connect structured knowledge to natural language corpora. This talk will discuss how Google search has been going through a radical transformation in this direction. We will talk about some of the key technologies behind the Google Knowledge Graph and its applications in search.
In this talk I'll go over some initial results from our measurement study aiming to identify signs of discriminatory practices in e-commerce. I'll present the tool we used to collect our data, analyze an initial dataset from some 300+ beta testers of the tool, and focus on instances of dynamic pricing observed in conjunction to different locations, retailers, and products. I'll later talk about the important challenges remaining for going from this initial study to a much more concrete and thorough understanding of the issue at a scale that is more representative of the actual practices of retailers at "Internet-scale". Finally, I'll try to connect this particular measurement study with other related important questions remaining unanswered in the general area of privacy economics, Internet advertising and e-commerce. Most of the material that I will present can be found in the following two articles: J. Mikians, L. Gyarmati, V. Erramilli, N. Laoutaris, “Crowd-assisted Search for Price Discrimination in E-Commerce: First results,” ACM CoNEXT'13. J. Mikians, L. Gyarmati, V. Erramilli, N. Laoutaris, “Detecting price and search discrimination on the Internet,” in Proc. of ACM HotNets'12.
Social media and user generated content are causing an ever growing data deluge. The rate at which we produce data is growing steadily, thus creating larger and larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness, and relatedness to ongoing events. However, current (de-facto standard) solutions for big data analysis are not designed to deal with evolving streams. In this talk we introduces SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams.SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification and clustering, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. It is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0
USEUM, the first ever crowd-sourced museum of art, started 2 years ago as a part-time PhD project that aimed to develop “The Wikipedia of Art”, whilst later it went on to become an award-winning funded company. In the last couple of years USEUM has faced, not only the academical challenges of a doctorate, but also, all the challenges of being a high-risk technology start-up company. USEUM today is a fully developed online museum exhibiting 13,000 paintings by 1,200 artists and various museums from all around the world and it has already launched its innovative Augmented Reality mobile application for iOS. In this presentation USEUM’s Founder Foteini Valeonti will cover how USEUM started off and grew over time, overcoming the aforementioned challenges so far balancing between a doctorate and a fast-paced start-up.
Professor Krishna Gummadi is a tenured faculy member and head of the Networked Systems research group at the Max Planck Institute for Software Systems (MPI-SWS) in Germany. He received his Ph.D. (2005) and M.S. (2002) degrees in Computer Science and Engineering from the University of Washington. He also holds a B.Tech (2000) degree in Computer Science and Engineering from the Indian Institute of Technology, Madras. Krishna's research interests are in the measurement, analysis, design, and evaluation of complex Internet-scale systems. His current projects focus on understanding and building social Web systems. Specifically, they tackle the challenges associated with protecting the privacy of users sharing personal data, understanding and leveraging word-of-mouth exchanges to spread information virally, and finding relevant and trustworthy sources of information in crowds. Krishna's work on online social networks, Internet access networks, and peer-to-peer systems has led to a number of widely cited papers and award (best) papers at ACM/Usenix's SOUPS, AAAI's ICWSM, Usenix's OSDI, ACM's SIGCOMM IMW, and SPIE's MMCN conferences.
Dr. Demetris Antoniades holds a BSc (2005), MSc (2007) and PhD (2011) in Computer Science from the University of Crete. His PhD studies where funded from the Hrakleitos II PhD Research Scholarship fund. He interned with Telefonica I+D in the academic year of 2010-2011 and he was a research assistant with FORTH from 2004 to 2011. Currently his research interest lie in the areas of Internet Measurement, Internet Monitoring, Internet and Networking Systems, Network Monitoring, Online Social Networks, Complex Networks and Cloud Computing. He has published in ACM IMC, ACM WWW, PAM, IEEE Network and has been a reviewer for IEEE/ACM Transactions on Networking, Computer Networks and more.
Dr. Anastasios Noulas is a Research Associate in the Computer Laboratory at the University of Cambridge. He has over 6 years of interdisciplinary scientific research experience on spatial data mining, dynamically evolving social networks, human mobility and recommender systems, with publications in top-tier conferences that include ICWSM, KDD and ICDM alongside journals of multi-disciplinary interest. After completing his MEng in Computer Science at UCL, Anastasios went to Cambridge to do his PhD, which focused on data analysis and modelling in location-based online social networks and related systems and applications with broad interest in social networks, human mobility and urban activity modelling. After finishing his PhD in 2013 Anastasios moved to Foursquare's Data Science Team as a Visiting Research Scientist and subsequently returned to the Computer Laboratory as a Research Associate, working on the EPSRC GALE Project, Providing Global Accessibility to Local Experience. During 2014 Anastasios was a Visiting Professor in the Department of Mathematics at the University of Namur, Belgium, where he taught a Master's Course on Location-based Services. Anastasios also works as a Data Scientist & Research Consultant at Piinpoint, a Start-Up that provides an on-line analytics platform for retail store localisation.
Dr. Maciej Kurant received a Ph.D. degree from EPFL, Lausanne, Switzerland, in 2009. Next, he worked at the University of California, Irvine, and at ETHZ, Switzerland. Since 2012 he is at Google Zurich. Maciej's pre-Google areas of research interest included graph sampling in Online Social Networks and inference and study of large-scale complex networks (such as the human brain). Currently, he is working with the Google Knowledge Graph.
Dr. Nikolaos Laoutaris is a senior researcher at the Internet research group of Telefonica Research in Barcelona. Prior to joining the Barcelona lab he was a postdoc fellow at Harvard University and a Marie Curie postdoc fellow at Boston University. He got his PhD in computer science from the University of Athens in 2004. In his latest work, Nikolaos has been focusing on the economics of ISP interconnection (including issues of network neutrality) and the economics of privacy on the web and e-commerce. His recent work on detecting price discrimination in e-commerce has received coverage by the international press and has lead to collaboration and consultation with several legislative and regulatory bodies including the European Commission’s Directorate General on Internal Market (DG MARKT), the Federal Trade Committee in the US, and the Office of Fair Trade in UK. Prior to that Nikolaos had worked in various system, algorithmic, and performance evaluation aspects of computer networks and distributed systems, including: efficient inter-datacenter bulk transfers, energy-efficient distributed system design, content distribution for long tail content, transparent scaling of social networks, pricing of broadband services and ISP interconnection economics. He has published more than 60 publications in top-tier peer reviewed venues including ACM SIGCOMM, IEEE INFOCOM, ACM CoNEXT, ACM IMC, ACM SIGMETRICS, ACM PODC, WWW, ACM HotNets, ACM HotMobile, IEEE/ACM Transactions on Networking (h-index of 24 according to Google Scholar). He has served multiple times in the technical program committees of most of the above-mentioned venues and he has also been an associate editor for ACM Computer Communications Review, and Computer Networks Journal. He has filled more than 10 patents and has been granted 3.
Dr. Gianmarco De Francisci Morales is a Research Scientist at Yahoo Labs Barcelona. He received his Ph.D. in Computer Science and Engineering from the IMT Institute for Advanced Studies of Lucca in 2012. His research focuses on large scale data mining and big data, with a particular emphasis on Web mining and Data Intensive Scalable Computing systems. He is an active member of the open source community of the Apache Software Foundation working on the Hadoop ecosystem (Giraph, S4), and a committer for the Apache Pig project. He is a co-organizer of the workshop series on Social News on the Web (SNOW) co-located with the WWW conference. He is one of the lead developers of SAMOA, an open-source platform for mining big data streams.
Foteini Valeonti is the founder of USEUM and a PhD Candidate at UCL. USEUM is the first ever Crowdsourced Museum of Art and started off from Foteini’s part-time PhD studies at UCL, the title of which is “Making Art more Accessible with Crowdsourcing and Augmented Reality”. USEUM has received £110,000 as seed-funding and in 2012 it was the 1st prize winner of the Athens Startup Weekend competition hosted by the Microsoft Innovation Center. Foteini’s background is a BSc (Hons) in Computer Science from the University of Piraeus and a 2-year MFA in Interactive Digital Media at Ravensbourne College of Design and Communication (part of City University), which she attended on scholarship and earned with A+. Her MFA thesis is an Augmented Reality iPad game she developed in collaboration with Shakespeare’s Globe Theatre in London called "Shakespeare's Hunt". During her Masters studies,. Foteini worked as Lead Developer of mobile applications for the Museum of BMW. Foteini has been invited to discuss her research and work at various conferences, whilst as an undergraduate she was a Microsoft Student Partner, contributing to numerous projects & events.
6.1 Joint Posters:
|#||Poster Title||Poster Presenters:||Partners Institutions:|
|1||Information propagation in evolving multi-functional multiplex Online Social Networking platforms (abstract)||Hariton Efstathiades
|2||WebDHT: A DOSN data overlay for the Web (abstract)||Mikael Högqvist
|3||Location-based access control for P2P video sharing (abstract)||Giovanni Simoni||INSUB, PEER|
|4||CoMPAC: Collaborative Multi-Party Access Control for OSN (abstract)||Panagiotis Ilia||FORTH, INSUB|
|5||Structural evolution of information based online social networked systems (abstract)||Kolja Kleineberg||UB, FORTH|
|6||Distributed Activity based Sybil Identification over Decentralized Social Networks (abstract)||Naeimeh Laleh
|7||DIVa - Decentralized Identity Validation for Social Networks (abstract)||Amira Soliman
|8||Data and Access Control Management on DOSNs using Bitcoin’s like Chained Public Control Policies (abstract)||Amira Soliman
|9||Navigable Overlay for Information Dissemination in Decentralized Online Social Networks (abstract)||Anis Nasir
|KTH, IBM, UCY|
6.2 Individual Posters:
|#||Poster Title||Poster Presenter:||Partner Institution:|
|1||Think before RT: An Experimental Study of Abusing Twitter Trends (abstract)||Despoina Antonakaki||FORTH|
|2||Identification of Real-world Daily Patterns based on Interactions in Online Social Networks (abstract)||Hariton Efstathiades||UCY|
|3||Hive.js: Browser-Based Distributed Caching for Adaptive Video Streaming (abstract)||Mikael Högqvist||PEER|
|4||RankSlicing: A decentralized protocol for supernode selection (abstract)||Giovanni Simoni||PEER|
|5||Fine-grained Access Control and Photo-based Social Authentication for OSN (abstract)||Panagiotis Ilia||FORTH|
|6||Risk Assessment in Social Networks based on User Anomalous Behavior (abstract)||Naeimeh Laleh||INSUB|
|7||Evolution of the Digital Society Reveals Balance between Viral and Mass Media Influence (abstract)||Kolja Kleineberg||UB|
|8||Semi-Supervised Multiple Cross-Document Coreference Resolution (abstract)||Kambiz Ghoorchian||KTH|
|9||Ensemble Learning Strategy for Decentralized Online Social Networks (abstract)||Leila Bahri||INSUB|
|10||Community based Identity Validation in Online Social Networks (abstract)||Leila Bahri||INSUB|
|11||Distributed and scalable Application Store for social networks (abstract)||Andrés García García||IBM|
|12||The Power of Both Choices: Practical Load Balancing for Stream Processing Engines (abstract)||Anis Nasir||KTH|
|13||Messaging Coordination Platform: Federated Publish/Subscribe Systems for Internet of Things (abstract)||Chen Chen||IBM|
7. Posters Abstracts:
7.1 Joint Posters
Nowadays individuals are connected in various social networking platforms such as Facebook, Twitter and LinkedIn. A common truth, which is presented in literature studies, is that the same person handles a profile in more than one online social networking platform, of different types. Furthermore, user's communities’ structure and interaction patterns are influenced by the type of the platform. For instance, Twitter is primarily used to obtain and publish daily information ("Social" OSN) whereas LinkedIn is mainly used to maintain professional contacts and publish career related information ("Professional" OSN). We plan to investigate the correlation between the difference in functionality and the topological properties of the multiplex network (a system which consists of multiple network layers). The overlap of links will be of special interest in this scenario. The overlap denotes the correlation between the existence of a link between certain nodes in different layers. The different functionalities could lead to increased information diffusion in the whole system. Moreover, data retrieved from different OSN types provide us with a complete set of information about a person; information about his interests, friends, colleagues, curriculum vitae etc. We aim in analyzing this information in order to investigate the influence of different life-fields in users’ ego-network structure and online social networking interactions.
Current Distributed Online Social Networks (DOSNs) are implemented as server software, desktop clients or smart phone app. Downloading and installing a separate application or server software leads to an increased hurdle to the spread of DOSN. We proposed to build a DOSN using recent web standards such as WebRTC in order to let users contribute their resources without extensive technical knowledge. As an initial step, we introduce WebDHT, a browser-based P2P application implementing a Distributed Hash Table (DHT) used for discovery ofsocial connections and simple data storage. Using the browser to run aP2P application leads to additional technical challenges such as howto ensure reliability and availability of data in this newenvironment. Initial experiments show that it is feasibility to use WebRTC to implement a P2P DHT using algorithms such as Kademlia and Pastry. Further work is necessary to study the effects of WebDHT at larger scale. The source code is made available as Open Source to further collaboration both within as well as outside the project.
Portable technologies, like smartphones and tablets, enable agile work paradigms like Bring Your Own Device, but also introduce new challenges for defending confidentiality of content. It is easier for unauthorized individuals to access confidential information displayed by aware or unaware authorized users. Location Based Access Control mitigates this problem by allowing policies to be defined over the physical position of the devices. In this work we study the applicability of location based policies in the context of a Distributed Video on Demand platform. As use cases we consider confidentiality domains within the same corporate building and the Home Sourcing scenario.
The widespread adoption of Online Social Networks has raised many privacy concerns among the users' community, as users cannot effectively control access to shared data objects. The current design of OSNs allows only the data owner to define the access control policy for the uploaded data objects, and therefore, the associated users cannot restrict or control data published by others. In this work we present a collaborative multi-party access control model that takes into consideration the concerns of all the users associated with the uploaded data objects. This model allows all the related users to participate on the specification of the object's access control policy and, the enforcement of the collaboratively defined policy is entirely performed by the trusted friends of the related users. In general, according to our approach the social network provider does not take any part either on the specification or the enforcement of the access policy. Moreover, our mechanism employs selective encryption to allow the related users to control specific regions of the data object. Thus, users are allowed to keep specific regions of interest private, even if the data objects is provided to the access requesting user.
Information based online social networked systems like Twitter are reshaping the flow of information in today's world. The functionality of these systems is essentially different from friendship based networks like Facebook. We plan to investigate the structural evolution of information based social networked systems. To this end, we will gather data from Twitter and extract the topological features of the graph. Based on our findings, we will develop a model to identify the main mechanisms responsible for emergence of the observed topological features. The comparison of the topological evolution of information based networks and friendship based networks is expected to yield insights in the relationship between functionality and the mechanisms underlying the evolution of the respective system.
Sybil attacks are one of the most prevalent and practical attacks against online social networks (OSNs). Researchers have observed Sybils forwarding spam and malware on OSN. Since Sybils have weird behavior pattern in the network, our goal is to analyze the behavior of users in OSN to detect Sybils. Thus, identifying misbehaviors is an important challenge for monitoring, anomaly detection in OSN. The basic idea is the more the user behavior diverges from "normal behavior", more the user is anomalous. Therefore, we use clustering algorithm to compute the probability of behavior of each user with respect to other users in the same community in each cluster to predict anomalous behaviors in the network. However, measuring probabilities of users is problematic when working with "big data," with their heterogeneity and volume. A key problem is how to minimise the communication overhead and energy consumption in the network when identifying misbehaviors. Our approach to this problem is based on a distributed, cluster-based anomaly detection algorithm. Our goal is to minimise the communication overhead by clustering users to discriminate their community before anomaly detection and sending a description of the clusters to the other nodes. In order to evaluate our distributed scheme, we implement our algorithm in a Facebook data. We demonstrate that our scheme achieves comparable accuracy compared to a centralized scheme with a significant reduction in communication overhead.
Online Social Networks (OSNs) have attracted millions of users and have put the online virtual interactions at the same level of importance of offline ones. Yet, identity validation for users in current OSNs is still a challenging open problem. Contrary to face-to-face interactions, in online ones we cannot reliably determine the authenticity of others' claimed identities. Nevertheless, OSNs providers such as Facebook and Twitter announce the existence of fake profiles among their users, they cannot decline their services to new users just because their information cannot be validated to guarantee its truthfulness. Therefore, there is a crucial need to empower OSNs users by providing tools that help them to decide the level of trust they can assign to whomever they communicate with. In this work we introduced a model to learn correlations between OSNs profile attributes. Identity validation patterns are deduced from these correlations and are used as guidelines that OSN users can use to evaluate the reliability of the new profiles they wish to interact with. The model operates in a decentralized way with a node-centric approach. It also does not require any global knowledge about OSNs topology. Therefore, our model fits Decentralized Online Social Networks (DOSNs), the new trend in OSNs.
One of the main challenges for DOSNs to compete with existing OSNs is to provide a platform for fine granularity access control. In fact, access control in social networks is now provided at the level of a data piece as in major social network providers (eg, Facebook, Twitter, etc). A user of these services can specify different access control policies for every picture or post or piece of data they share. It is true that most users do not make full use of this flexibility of fine grained access control, but the underlying technology is available. In decentralized social networks, it seems that the common approach to access control is achieved by leveraging on data encryption. This latter is undeniably effective in securely managing access control, but does not provide efficiency in scaling to the amounts of data being produced on social networks and to their access granularity needs. Our research concern is on how to provide an underlying platform for access control in decentralized social networks that will minimize the need for encryption and that will make eventual use of community collaboration/general effort. Along that line, we aim at exploiting the concepts used in the Bitcoin network by which the transactions' correctness is ensured by collaborative proof-of-work on a public ledger of chained transactions. The Bitcoin network has proven effectiveness and efficiency in establishing a peer-to-peer transactioned e-currency. However, some argue that Bitcoin would suffer from major issues shall the public chain grow big enough to no more be workable by available task force. The idea of our research is to build nano-Bitcoin networks that are to operate independently of each other. Each of these nano-networks is to correspond to a social community in the OSN. The challenge is to build these communities such that the number of transactions a node is engaged in is minimized to the nodes concerning it only. Once these communities are formed, access control within each of them is to be managed by a nodes-consensus on the integrity of access policies publicly chained in what's similar to Bitcoin's public ledger.
With the increase in awareness related to privacy issues for centralized social networks, many decentralized online social networks have been proposed. Such networks transfer processing and storage functionalities from the service providers towards the end users. De- centralization of online social networks comes with two different type of challenges that are related with distributed systems, like, heterogeneity, openness, scalability, fault-tolerance, and transparency, and social challenges, like locality, privacy and network growth. In this work, we provide extensive literature study of all the existing solutions and propose a novel scheme for data dissemination in a bounded friend to friend network.
7.2 Individual Posters Abstracts:
Twitter is one of the most influential Online Social Networks (OSNs), adopted not only by hundreds of millions of users but also by public figures, organizations, news media, and social authorities. One of the factors contributing to this success is the inherent property of the platform for spreading news - encapsulated in short messages that are tweeted from one user to another - across the globe. Today, it is sufficient to just inspect the trending topics in Twitter for figuring out what is happening around the world. Unfortunately, the capabilities of the platform can be also abused and exploited for distributing illicit content or boosting false information, and the consequences of such actions can be really severe: one false tweet was enough for making the stock-market crash for a short period of time in 2013. In this paper, we analyze a large collection of tweets and explore the dynamics of popular trends and other Twitter features in regards to deliberate misuse. We identify a specific class of trend-exploiting campaigns that exhibits a stealthy behavior and hides spam URLs within Google search-result links. We build a spam classier for both users and tweets, and demonstrate its simplicity and effiency. Finally, we visualize these spam campaigns and reveal their inner structure.
Ubiquitous Internet connectivity enables users to update their Online Social Network profile from any location and at any point in time. These, often geo-tagged, data can be used to provide valuable information to closely located users, both in real time and in aggregated form. In this work we use openly available data from Twitter originating from the Netherlands to identify work, residential and leisure areas of the habitants. Additionally, we analyze users' ego-networks and identify the influence of key location areas in Twitter social graph. Furthermore, we investigate whether geographic location of the user is a factor that influences his profile construction and Twitting behavior. Using this dataset we proceed to identify real-world patterns that inhabitants of an area have during working, non-working hours, workdays and weekends. Moreover, we reveal relationships between the residential and working areas in real world phenomena, such as transportation patterns.
Peer-to-peer (P2P) technology has long been considered a natural complement to standard CDN infrastructure for video distribution since it greatly reduces costs and improves quality of user experience. However, P2P solutions have traditionally required the installation of additional software or plugins to be deployed, which significantly hinders adoption. In this paper, we present Hive.js, a browser-based plugin-less distributed caching platform for video streaming. Hive.js is layered over WebRTC, a new set of HTML5 APIs for direct browser-to-browser communication, and it is designed to transport adaptive HTTP streaming protocols, specifically MPEG-DASH. Initial results obtained by evaluating Hive.js in a controlled test environment show that our approach significantly reduces the load on CDN infrastructure and does not sacrifice quality of user experience.
In peer-to-peer applications deployed on the Internet, it is common to assign greater responsibility to supernodes, which are usually peers with high computational power, large amount of memory, or high network bandwidth capacity. In this paper, we describe a practical solution to the problem of supernode selection that is the process of discovering the best peers in the network by some application-speciﬁc metric. We provide a distributed heuristic that allows identifying the best K nodes in the P2P overlay, by taking into consideration the realities of actual deployments, such as the presence of NATs. Our approach consists of an epidemic protocol which does not require new connections to be established, but rather relies on established connections, such as the ones provided by a NAT-resilient peer sampling framework. We support our claims with a thorough evaluation of our solution in simulation and in a real deployment on thousands of consumer machines.
The widespread use of Online Social Networks (OSNs) has raised many concerns regarding user privacy. According to the current design of OSNs the content publisher is considered to be the content owner and thus, content published by others cannot be controlled by associated users. In this work we present an approach that changes the access control granularity from the level of the photo to that of users' faces. The OSN automatically identifies the users depicted in each image uploaded, and prompts those users to define the privacy setting for their face. This enables the OSN to enforce each user's desired setting, and anyone accessing the photo will view a processed version where faces with restricted access will be blurred out. Moreover, Facebook has launched a form of two-factor authentication recently, where users are required to identify photos of their friends to complete a log-in attempt. However, this mechanism can be bypassed if the attacker is able to collect the available photos of the victim and his friends. We demonstrate an attack that employs image comparison techniques to identify the presented photos within an offline collection of the users' photos. Moreover, we design a system with a novel photo selection and transformation process, which generates photo-challenges that are robust against these attacks. Our approach generates photos that fail software-based face recognition and image matching techniques, while remaining recognizable to humans who are familiar with the depicted people.
There is a dramatic increase in online social networks (OSN) and it has sprung a lot of security concerns. Some of the information posted on these sites might contain malicious links and can lead to security risks such as, identity theft and cyber stalking. Also, there is no immediate approach for users to verify the authenticity of the sender, as user can see only the public features of the sender. Although there have been extensive research work about protecting trust schemes from various types of attacks, how to accurately and efficiently detect different kind of attacks, victim of attacks, and users in collusion network that caused by real users, not automated programs, is still a big problem. Therefore, a mechanism to detect risky users on OSNs is required and urgent. We characterize and understand some kind of risky behavior to have a measure of risk for each user in OSN. Since the majority of users closely obey a pattern (say, a power law), we confidently consider as anomalous the few users that deviate. Therefore, our approaches use anomaly detection techniques to identify anomalous users based on their activity pattern and their network structure in the OSN. We carefully choose features, and design our anomaly detection models, so that it is scalable and it work un-supervised and report experiments on real OSN.
Online social networks (OSNs) enable researchers to study the social universe at a previously unattainable scale. The worldwide impact and the necessity to sustain the rapid growth of OSNs emphasize the importance of unraveling the laws governing their evolution. Empirical results show that, unlike many real-world growing networked systems, OSNs follow an intricate path that includes a dynamical percolation transition. In light of these results, we present a quantitative two-parameter model that reproduces the entire topological evolution of a quasi-isolated OSN with unprecedented precision from the birth of the network. This allows us to precisely gauge the fundamental macroscopic and microscopic mechanisms involved. Our findings suggest that the coupling between the real preexisting underlying social structure, a viral spreading mechanism, and mass media influence govern the evolution of OSNs. The empirical validation of our model, on a macroscopic scale, reveals that virality is 4–5 times stronger than mass media influence and, on a microscopic scale, individuals have a higher subscription probability if invited by weaker social contacts, in agreement with the "strength of weak ties" paradigm.
Cross-Document Coreference Resolution is the categorization task of a set of documents into a set of groups with different referential entities. It is often considered as the disambiguation of a single mention across multiple documents that is not always the case. Real problems often require the disambiguation of streams of text like (news feeds or blog posts) including multiple mentions referring to multiple entities. In general there are two main reasons why multiple disambiguation is needed, (i) the lack of enough documents containing a specific mention (ii) even having enough documents it's not efficient to apply distinctive single disambiguation with respect to the preprocessing efforts. In this research we are focusing on multiple disambiguation of mentions across documents. Our model is based on a previous algorithm developed in our group  using graph based document modeling and diffusion based clustering. By applying slight changes our model can create higher number of homogeneous clusters consequent in lower F value that can be compensated by concatenation of homogeneous clusters using supervised tagging of a small randomly sampled fraction of the documents. The experiments show that our model can achieve 9% higher F-Score using only 4% sampling compared to previous solutions.
Ensemble based systems have become increasingly important with the rapid growth of the amount of data available. Where, the amount of data to be analyzed is too large to be effectively handled by a single classifier. The key goal of such systems is to obtain a better composite global model from several intermediate/base classifiers to produce more accurate and reliable estimates. Decentralized Online Social Networks (DOSNs) represent one of the main domains in which ensemble learning can be applied and achieve remarkable results. In this paper, we introduce an ensemble learning framework for DOSNs that can be used for different analytic applications such as ranking, filtering, and spam detection. In our framework, we create a random overlay on top of DOSN by connecting every node with randomly selected nodes using gossip-based sampling algorithm. By disseminating the base classifiers on the random overlay, every node updates its local base classifier using a mixture of learning models generated from different sources of information. The experiments using different synthetic datasets show that ensemble classifiers have achieved 18:25% higher accuracy over base classifiers, and at most 5% lower accuracy compared to centralized model.
Identity validation of OSNs' peers is a critical concern to the insurance of safe and secure online socializing environments. Works on identity, from a computer science perspective, have mainly focused on studying, understanding, and detecting identity related frauds and breaches. Starting from the vision of empowering users to determine the validity of the identities of peers they interact with on OSNs, we suggest a method that leverages on community sourcing to estimate the trustworthiness of online social profiles based only on the information presented on these latters. Our system guarantees for utility and operability within the dynamics and continuous evolution of OSNs, as well as for user anonymity and impartiality in rating.
An App Store is a tool-supported infrastructure for the provisioning, consumptions, purchase, and re-use of IT-solutions for seamless business collaboration. The Store supports the financial management of Apps (pricing, payment, revenue sharing) for both End-Users and App Developers in a distributed, B2B social network environment, and provides the core elements for the monetization of Apps throughout the Fispace ecosystem. App Developers publish and manage the financial aspects of Apps, while End-Users seach and consume Apps. App Developers are provided with an Apps Interface discovery system, publication support tools and financial management tools. The Apps Interface discovery system is used to reuse functionality and build mashup applications. The publication support tool provides an integrated compliance check that eases the publishing of new Apps. The financial management tools enable developers to run statistics, define revenue models and share revenue with any involved partner. End-Users are provided with a search, purchase and execution support system. The search system allows End-Users to find applications according to a set of parameters. The purchase support system guides End-Users in the purchase process for Apps. A simple right management system enables End-Users to customize Apps after purchasing.
Load Balancing in Distributed Stream Processing Engines (DSPEs) is a significant problem, as it directly effects the hardware utilization and throughput of the system. However, current solutions, e.g., key grouping and shuffle grouping, are unable to provide sufficient guarantees for load balancing for DSPEs. Therefore, we introduce Partial Key Grouping (PKG), a new stream partitioning strategy that adapts the classical "power of two choices" to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. Key splitting leverages both choices by relaxing the atomicity constraints of key grouping. Local load estimation solves the problem of gauging the load of downstream servers without any communication overhead. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to seven orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster.
We design a federated publish/subscribe system for Internet of Things (IoT), namely Messaging Coordination Platform (MCP). The MCP system is composed of a cluster of servers in the cloud, which collaborate to support a large number of IoT devices (e.g., sensors, smart phones, etc). Our design emphasizes on the coordination among many MCP servers for efficient publish/subscribe functionalities. The MCP system has two key components. First, a scalable membership service that provides local view maintenance, attribute replication and group communication. Second, an overly that is designed for the optimization of publish/subscribe protocols. Our empirical evaluation demonstrates the performance of the MCP system.