Social Graph Anonymization Huu-Hiep Nguyen To cite this version: Huu-Hiep Nguyen. Social Graph Anonymization. Cryptography and Security [cs.CR]. Université de Lorraine, 2016. English. NNT: 2016LORR0168. tel-01403474v2 HAL Id: tel-01403474 https://hal.inria.fr/tel-01403474v2 Submitted on 7 Apr 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. AVERTISSEMENT Ce document est le fruit d'un long travail approuvé par le jury de soutenance et mis à disposition de l'ensemble de la communauté universitaire élargie. Il est soumis à la propriété intellectuelle de l'auteur. Ceci implique une obligation de citation et de référencement lors de l’utilisation de ce document. D'autre part, toute contrefaçon, plagiat, reproduction illicite encourt une poursuite pénale. Contact : [email protected] LIENS Code de la Propriété Intellectuelle. articles L 122. 4 Code de la Propriété Intellectuelle. articles L 335.2- L 335.10 http://www.cfcopies.com/V2/leg/leg_droi.php http://www.culture.gouv.fr/culture/infos-pratiques/droits/protection.htm E´cole doctorale IAEM Lorraine Anonymisation de Graphes Sociaux ∴ ∵ ∴ Social Graph Anonymization ` THESE pr´esent´ee et soutenue publiquement le date pour l’obtention du Doctorat de l’Universit´e de Lorraine (mention informatique) par NGUYEN Huu-Hiep Composition du jury Rapporteurs : Cl´emence Magnien CNRS, LIP6 Catuscia Palamidessi INRIA, LIX Examinateurs : Isabelle Chrisment Universit´e de Lorraine, LORIA Sylvain Peyronnet ix-labs Emmanuel Viennet Universit´e Paris 13, L2TI Directeurs de thèse : Abdessamad Imine Universit´e de Lorraine, LORIA Micha¨el Rusinowitch INRIA, LORIA Institut National de Recherche en Informatique et en Automatique Laboratoire Lorrain de Recherche en Informatique et ses Applications — UMR 7503 Mis en page avec la classe thesul. Acknowledgements I am deeply indebted to my PhD thesis advisors, Michaël Rusinowitch and Abdessamad Imine, for giving me an excellent opportunity to conduct research on privacy for social networks, for supporting me throughout the three-year PhD period. Their great inspiration, precise guidance and constant encouragement have helped me a lot to improve important skills of a researcher. I have learnt a great deal of problem exploration and solving from them. They have taught me how to develop from half-baked ideas to complete and practical contributions which, in the end, need to be presented in convincing arguments. I am very grateful to the members of my dissertation committee, Dr. Clémence Magnien, Dr. Catuscia Palamidessi, Professor Isabelle Christment, Dr. Sylvain Peyronnet and Professor Emmanuel Viennet for having accepted to assess my thesis and for their reviewing efforts and insightful remarks. TobeamemberofPESTOteamisalsoanhonourforme. Iwouldliketothankallthepeople of PESTO team, including current and former people that I have had chance to interact with in a wonderful environment, Christophe Ringeissen, Mathieu Turuani, Véronique Cortier, Steve Kremer, Laurent Vigneron, Vincent Cheval, Jannik Dreier, Walid Belkhir, Houari Mahfoud, Éric Le Morvan, Ludovic Robin, Ivan Gazeau, Alicia Filipiak and Joseph Lallemand. Additionally, I am greatly thankful the INRIA-CORDI fellowship that helped my PhD work possible. Many thanks also go to my Vietnamese friends in Nancy, who have been with me to share memorable moments. Last, andmostimportant, Iwouldliketothankmyparentsfortheirloveandsupportduring my stay away from home. It is to them that I dedicate this thesis. i ii To my parents. iii iv Abstract Privacy is a serious concern of users in daily usage of social networks, especially when online social networks offer free services in exchange for large collection of user information. The motto “If you’re not paying for it; you’re the product” demands effective measures for user privacy protection. At the same time, social networks are a valuable data source for large-scale studies on social organization and evolution. Sanitized social network information is therefore occasionally shared with third parties by service providers. On the other side, by participating to social networks, users keep their own data and they may use the very infrastructure of service providers to gather local view of the network to some extent not only restricted to 1-hop friends, for example by exchanging noisy links. To this end, the problems of privacy protection for social network data are still calling for effective and efficient approaches both in centralized and decentralized settings. This thesis addresses three privacy problems of social networks: graph anonymization, pri- vatecommunitydetectionandprivatelinkexchange. Themaingoalistoprovidenewparadigms for publication of social graphs in noisy forms, private community detection over graphs as well as distributed aggregation of graphs via noisy link exchange processes. We start the thesis by giving the big picture of data privacy in social networks and clarifying the categories to which our work belongs. Then we present our three contributions as follows. First, wetackletheproblemofgraphanonymizationviatwodifferentsemantics: uncertainty semantics and differential privacy. These are two main categories of graph anonymization in the literature. Asforuncertaintysemantics,weproposeageneralobfuscationmodelcalledUncertain Adjacency Matrix (UAM) that keeps expected node degrees equal to those in the unanonymized graph. Weanalyzetworecentlyproposedschemesandshowtheirfittingintothemodel. Wealso point out disadvantages in each method and present our scheme Maximum Variance (MaxVar) to fill the gap between them. Moreover, to support fair comparisons, we develop a new tradeoff quantifying framework by leveraging the concept of incorrectness in location privacy research. Experiments on large social graphs demonstrate the effectiveness of MaxVar. Apartfromprivacynotionviauncertaintysemantics,wecontributeanewalgorithmforgraph anonymization under differential privacy, also known as (cid:15)-DP graph publication. However, the problem is very challenging because of the huge output space of noisy graphs, up to 2n(n−1)/2. In addition, a large body of existing schemes on differentially private release of graphs are not consistent with increasing privacy budgets as well as do not clarify the upper bounds of privacy budgets. In this thesis, we categorize the state-of-the-art of (cid:15)-DP graph publication in two main groups: direct publication schemes and model-based publication schemes. On the one hand, we explain why model-based publication schemes are not consistent and are suitable only in scarce regimes of privacy budget. On the other hand, we prove that with a privacy budget of O(lnn), there exist direct publication schemes capable of releasing noisy output graphs with edge edit distance of O(1) against the true graph. We introduce the new linear scheme Top-m-Filter (TmF) and improve the existing technique EdgeFlip. Both of them exhibit consistent behaviour with increasing privacy budgets while the model-based publication schemes do not. As for better scalability, we also introduce HRG-FixedTree, a fast permutation sampling, to learn the Hierarchical Random Graph (HRG) model. Thorough comparative evaluation on a wide range of graphs provides a panorama of the state-of-the-art’s performance as well as validates our proposed schemes. Second, we present the problem of community detection under differential privacy. Complex networks usually expose community structure with groups of nodes sharing many links with the other nodes in the same group and relatively few with the nodes of the rest. This feature captures valuable information about the organization and even the evolution of the network. Over the last decade, a great number of algorithms for community detection have been pro- posed to deal with the increasingly complex networks. However, the problem of doing this in a private manner is rarely considered. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation and al- gorithm perturbation. We choose Louvain method as the back-end community detection for input perturbation schemes and propose the method LouvainDP which runs Louvain algorithm on a noisy super-graph. For algorithm perturbation, we design ModDivisive using exponential mechanism with the modularity as the score. We have thoroughly evaluated our techniques on real graphs of different sizes and verified their outperformance over the state-of-the-art. Finally, we propose protocols for private link exchange over social graphs. It is motivated by the fact that current online social networks (OSN) keep their data secret and in centralized manner. Researchers are allowed to crawl the underlying social graphs (and data) but with limitedrates,leadingtoonlypartialviewsofthetruesocialgraphs. Toovercomethisconstraint, we may start from user perspective, the contributors of the OSNs. More precisely, if users cautiouslycollaboratewithoneanother,theycanexchangenoisyfriendlistswiththeirneighbors in several rounds to get better views of the true social graph. The problem is unique in the sense that the disseminated data over the links are the links themselves. However, there exist fundamental questions about the feasibility of this model. The first question is how to define simpleandeffectiveprivacyconceptsforthelinkexchangeprocesses. Thesecondquestioncomes from the high volume of link lists in exchange which may increase exponentially round after round. While storage and computation complexity may not be big problems, communication costs are non-trivial. We address both the questions by a simple (α,β)-exchange using Bloom filters. Weevaluateourproposedschemesonvarioussyntheticgraphmodelsanddrawanumber of critical findings.
Description: