A Anonymous or not? Understanding the Factors Affecting Personal Mobile Data Disclosure CHRISTOSPERENTIS,TelecomItalia-SKIL&FondazioneBrunoKessler MICHELEVESCOVI,TelecomItalia-Semantics&KnowledgeInnovationLab(SKIL) CHIARALEONARDI,FondazioneBrunoKessler(FBK) CORRADOMOISO,TelecomItalia-FutureCenter MIRCOMUSOLESI,UniversityCollegeLondon FABIOPIANESI,FondazioneBrunoKessler BRUNOLEPRI,FondazioneBrunoKessler 7 1 Thewideadoptionofmobiledevicesandsocialmediaplatformshavedramaticallyincreasedthecollection 0 andsharingofpersonalinformation.Moreandmorefrequently,usersarecalledtotakedecisionsconcerning 2 thedisclosureoftheirpersonalinformation.Inthisstudy,weinvestigatethefactorsaffectingusers’choices n towardthedisclosureoftheirpersonaldata,includingnotonlytheirdemographicandself-reportedindi- a vidualcharacteristics,butalsotheirsocialinteractionsandtheirmobilitypatternsinferredfrommonths J of mobile phone data activity. We report the findings of a field-study conducted with a community of 63 subjectsprovidedwith(i)asmart-phoneand(ii)aPersonalDataStore(PDS)enablingthemtocontrolthe 8 disclosureoftheirdata.WemonitorthesharingbehaviorofourparticipantsthroughthePDS,andevalu- 2 atethecontributionofdifferentfactorsaffectingtheirdisclosingchoicesoflocationandsocialinteraction data.Ouranalysisshowsthatsocialinteractioninferredbymobilephonesisanimportantfactorrevealing ] Y willingnesstoshare,regardlessofthedatatype.Inaddition,weprovidefurtherinsightsontheindividual traitsrelevanttothepredictionofsharingbehavior. C CCS Concepts: •Human-centered computing → Ubiquitous and mobile computing systems and . s tools;•Securityandprivacy→Socialaspectsofsecurityandprivacy; c [ AdditionalKeyWordsandPhrases:HumanFactors,Privacy,PersonalMobileData,MobileSensing,Social Computing,LivingLabs. 1 ACMReferenceFormat: v Christos Perentis, Michele Vescovi, Chiara Leonardi, Corrado Moiso, Mirco Musolesi, Fabio Pianesi and 8 BrunoLepri,YYYY.Anonymousornot?UnderstandingtheFactorsAffectingPersonalMobileDataDisclo- 0 sure.ACMTrans.InternetTechnol.V,N,ArticleA(JanuaryYYYY),19pages. 3 DOI:http://dx.doi.org/10.1145/0000000.0000000 8 0 . 1. INTRODUCTION 1 The wide adoption of mobile phones, Internet services, social media platforms, and 0 the proliferation of wearable devices and connected objects (Internet of Things) have 7 1 v: Author’s addresses: C. Perentis, Telecom Italia - Semantics & Knowledge Innovation Lab (SKIL) & Fon- dazioneBrunoKessler,ViaSommarive18,38123Trento,Italy;M.Vescovi,TelecomItalia-SKIL,ViaSom- i X marive18,38123Trento,Italy;C.Moiso,TelecomItalia-FutureCenter,viaReissRomoli274,10148Torino, Italy;M.Musolesi,DepartementofGeography,UniversityCollegeLondon,GowerStreetWC1E6BT,Lon- r don,UnitedKingdom;C.Leonardi,F.PianesiandB.Lepri,FondazioneBrunoKessler,ViaSommarive18, a 38123Trento,Italy. (cid:13)c ACM,YYYY.Thisistheauthor’sversionofthework.ItispostedherebypermissionofACMforyour personaluse.Notforredistribution.ThedefinitiveversionwaspublishedinPUBLICATION,{V,N,YYYY} http://doi.acm.org/10.1145/nnnnnn.nnnnnn Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgranted withoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthat copiesbearthisnoticeandthefullcitationonthefirstpage.Copyrightsforcomponentsofthisworkowned byothersthanACMmustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orrepub- lish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.Request [email protected]. (cid:13)c YYYYACM. 1533-5399/YYYY/01-ARTA$15.00 DOI:http://dx.doi.org/10.1145/0000000.0000000 ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. A:2 C.Perentisetal. resulted in a massive production of personal data that characterize many aspects of dailylifeatextremelyfinetemporalandspatialgranularities[Laneetal.2010;Madan etal.2012;BettiniandRiboni2015]. The availability of such a huge amount of data represents an invaluable resource fordesigningandbuildingsystemsabletounderstandpeopleaswellascommunities’ needsandactivitiessoastoprovidetailoredfeedbackandservices[Lathiaetal.2013]. At the same time, an increasing number of applications makes it easier for people tosharetheirpersonalinformation(e.g.,currentlocation,activitiesinwhichtheyare involved and other contextual information) across many social networking applica- tionsandmobileapps[Hsiehetal.2007;Miluzzoetal.2008;Tangetal.2006].These scenarios, however, raise unprecedented privacy challenges and concerns, with users beingcontinuouslycalledtotakedecisionsconcerningthedisclosureoftheirpersonal information on the basis of a difficult trade-off between data protection, given the po- tential for user identification [de Montjoye et al. 2013; de Montjoye et al. 2015; Rossi andMusolesi2014;RossiandMusolesi2015],andtheadvantagesstemmingfromdata sharing[Acquistietal.2015]. Severalresearchershavethereforestartedinvestigatingtheroleofvariousfactorsin influencingtheattitudetowardsdatadisclosure:e.g.,interpersonalrelationships[Con- solvoetal.2005;Wieseetal.2011];usercharacteristicssuchasgender[HoyandMilne 2010],age[Christofidesetal.2012]orpersonalitytraits[Querciaetal.2012;Schram- meletal.2009];andthetypeoftheshareddata[Knijnenburgetal.2013]. Our study makes a step further in this direction. Besides considering only demo- graphics,self-reportedpersonalitytraitsandprivacydispositions,ourworktakesinto account the role played by behavioral information about social interactions and mo- bility patterns, extracted by the user’s mobile phone. We focus in particular on the sharingofinformationaboutlocationsandsocialinteractionsdatatypes. Inordertoinvestigateallthesefactors,weranafield-studywithacommunityof63 subjects. They were provided with (i) a smartphone incorporating a sensing software explicitly designed for collecting mobile phone data; and (ii) a Personal Data Store (PDS),asystemmeanttobothenablesubjectstoraiseawarenessoftheirdataandto control their disclosure with the other members of the community as well as to keep track of their actual sharing behavior. A relevant aspect of our approach is that we observetheactualsharingbehavioronrealuserdataratherthanattitudesexpressed throughquestionnaires. PersonalDataStores(PDS)aresystemsdesignedtoprovideuserswithcontrolover theirpersonaldatadisclosingchoicestowardsthird-parties(e.g.,on-lineappsandser- vices). More specifically, such systems enable services to access personal data and meta-datathroughmechanismspreservingusers’privacy[Munetal.2010;Moisoetal. 2012; de Montjoye et al. 2014]. By design they are meant to create a trusted environ- ment where several other mobile/web services, e.g., using communication, location or sensordata,interactwiththeuser.Inaddition,userscanactivelyseetheirdatabeing fedtotheon-lineservicesandthepotentialbenefittheyreceivefromthem. We may think about a scenario where the personal information derived from the Internet services and from the PDS can be used for the design and enhancement of privacy-preserving systems. A designer could imagine to personalize default privacy settingsortorecommendsharingpoliciesinanadaptivewaybyusingthemostinfor- mativebehavioralfeatures. Our results show that it is possible to identify disclosing information behavioral routinesbyextractingfeaturesforexamplefromcallandSMSdataoraPDSInternet service. In other words, we can single out key factors that can be used to understand users’privacyrelatedbehaviors.Moreover,wecanhighlightmeaningfulcombinations of factors derived from mobile data, behavioral patterns of a PDS Internet service or ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. Anonymousornot?UnderstandingtheFactorsAffectingPersonalMobileDataDisclosure A:3 individual characteristics that maximize the understanding of the issues related to the disclosure of personal data. Such potential could encourage the development of Internetservicestowardsamoretransparentdirection. The main contributions of this work can be summarized as follows. First, we run a field-studywithinaliving-labwherepeoplesharecontinuouslytheirrealdata.Inthis experimental setting we capture the dynamic sharing behavior of users concerning personalinformationandnotjustastaticchoice.Second,wecomputeseveralfamilies of features related not only to self-reported demographics, personality traits and pri- vacyattitudes,butalsobehavioralcommunicationandmobilityinformationcaptured by mobile phones as well as usage patterns extracted from a PDS. Finally, we experi- mentally evaluate and highlight the effects of those factors on the choice users make whenselectingtheirprivacysettingsfortwoparticulartypesofpersonaldata,location andsocialinteractions. 2. RELATEDWORK Previous research has considered a number of factors that can explain individual at- titudes and preferences toward disclosing personal information. Demographic char- acteristics, such as gender and age, have been found to affect disclosure attitudes and behavior. Several studies have identified gender differences concerning privacy concerns and consequent information disclosure behaviors: for example, women are generally more protective of their online privacy with regard to the amount of data disclosed on social networking platforms [Hoy and Milne 2010]. Similarly, in a study on Facebook usage Fogel and Nehmad [2009] found that women are less likely than men to share personal data such as instant messenger address, home place or phone numberontheirprofilepage.Agealsoplaysaroleinaffectinginformationdisclosure behavior. For example, in a study with 288 adolescents and 285 adults on Facebook usage,Christofidesetal.[2012]foundthatadolescentsdisclosemoreinformationthan adults. Prior work also emphasizes the role of personality traits - e.g., individual stable psychological attributes - to explain risk perception and consequent information dis- closure behavior. Korzaan et al. [2009] explored the role of the Big-5 personality traits[CostaandMcCrae2008]andfoundthatAgreeableness,definedasbeingsympa- thetic,straightforwardandselfless,hasasignificantinfluenceonindividualconcerns for information privacy. Junglas et al. [2008] and Amichai-Hamburger and Vinitzky [2010], again used the Big-5 personality traits and found that Agreeableness, Con- scientiousness, and Openness to Experience affect the concern for privacy. However, other studies targeting the influence of personality traits did not find significant cor- relations[Schrammeletal.2009;Massaetal.2015]. An interesting and extensive study is that conducted by Quercia et al. [2012] with 1,313 Facebook users in US. The authors investigated the role of the Big-5 person- ality traits and they found weak correlations among Openness to Experience and, to a lesser extent, Extraversion and the disclosure attitudes on Facebook. In 2010, Lo [2010]suggestedthatLocusofControl[Rotter1966]couldaffectindividualperception ofriskindisclosingpersonalinformation,withinternals(i.e.,peoplewhobelievethat their own actions merely determine their life events) being more likely than exter- nals (i.e., people who believe that mostly external factors determine their life events) to feel that they can control the risk of becoming privacy victims, hence more will- ing to disclose/share their personal information. Additional work has also showed a positiveassociationbetweenusers’sociabilitycapturedbytheirpersonalnetworksize and the subject’s behavior with respect to information disclosure: subjects character- ized with high sociability tend to share more information and to have less privacy concerns[YoungandQuan-Haase2009]. ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. A:4 C.Perentisetal. BuildingonthesefindingsandfollowingthesuggestionsbyJensenetal.[2005],our work connects demographic factors, individual traits and dispositions to the actual sharing behavior of people rather than attitudes expressed through questionnaires. Moreover, we focus our attention not only on demographic factors, individual traits anddispositions,butalsoonbehaviorsdirectlymeasured(i.e.,inferred)fromthedata themselves(e.g.,numberofcalls,diversityininteractions,physicaldistancetraveled, etc.). 3. FIELDSTUDY Inthissectionwedescribethemethodologyfollowedduringour15-weekstudy. 3.1. TheLivingLaboratory We conducted our field study within the Mobile Territorial Lab project [2012], a long- term living lab launched in November 2012 as a joint effort between industrial and academicresearchinstitutions[Centellegheretal.2016].Itconsistsofagroupofvol- unteers who carry in their daily life an instrumented smartphone in exchange for a monthly credit bonus of voice, SMS, and data access. Specifically, participants are provided with (i) an Android-based smartphone running a sensing software that con- tinuously collects different types of mobile phone data (e.g., communication events, location, apps usage, etc.) [Aharony et al. 2011], and (ii) a tool, called Personal Data Store (PDS) [de Montjoye et al. 2014], which stores the participant’s information and enableshim/hertoexercisefullcontrolonowndatamanagement[Vescovietal.2014]. ByusingthePDS,subjectscandecideatanytimeaboutwhetherandhowtodisclose theirdatatotheotherparticipants.OneofthemostimportantcharacteristicsofMTL is its ecological validity, given that the participants’ behaviors are sensed in the real world,aspeoplelivetheireverydaylife,andnotunderartificiallaboratoryconditions. Allvolunteerswererecruitedwithinthetargetgroupofyoungfamilieswithchildren usingasnowballsamplingapproachwherestudysubjectsrecruitfuturesubjectsfrom amongtheiracquaintances[Goodmanetal.1961].Uponagreeingtothetermsofpar- ticipation,thevolunteersgrantedresearcherslegalaccesstotheirbehavioraldatacol- lectedbytheirsmartphones.However,volunteersretainfullrightsovertheirpersonal datasuchthattheycanasktodeletethecollectedinformationfromthesecurestorage servers.Moreover,participantshavethechoicetoparticipateornotinaspecificstudy. Inthecurrentpaper,wereportastudyconductedon63individuals(20malesand43 females) from the MTL community. Participants’agerangedfrom28to46yearsold (mean=38.67andstandarddeviation=3.34).Theyheldavarietyofoccupationsand educationlevels,rangingfromhighschooldiplomastoPhDdegrees.AllweresavvyAn- droid users who had used the smartphones provided by the living lab since 8 months before.AllparticipantslivedinItalyandthevastmajoritywereofItaliannationality. Thesampleischaracterizedbyamedium-lowsocialconnectivity.Onaveragesubjects declaredtoknow7.94othersubjects(out-degree)andresultedtobeknownby7.84(in- degree).Inthefollowingsubsections,weoutlinetheprocedureadoptedforthecurrent study and we describe more in detail the mobile sensing platform, the PDS, and the collecteddataaboutparticipants’demographiccharacteristicsandindividualtraits. 3.2. ExperimentalSetup The study took place for 15 weeks from July to November of 2013. Before the offi- cial beginning of the study, participants were asked to fill a survey including scales targeting: (i) Big-5 personality traits [Perugini and Di Blas 2002], (ii) Locus of Con- trol [Farma and Cortivonis 2000], (iii) Dispositional Trust [Mayer and Davis 1999], (iv)Self-Disclosure[Cozby1973],and(v)privacyconcerns[Smithetal.1996]. ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. Anonymousornot?UnderstandingtheFactorsAffectingPersonalMobileDataDisclosure A:5 Fig.1:IndividualViews:exampleofaPDSindividualviewforcallinteractions. On the first day of the study, participants were asked to set their initial disclosure preferencesontheprivacysettingareaprovidedbythePDS.Fromthattimeon,sub- jectswerefreetochangetheirsettingatwillandatanytime.Aweekafter,westarted providingsubjectswiththesocialviews(seeFigure2)builtfromthedatadisclosedin thecommunity.BoththeindividualandthesocialviewsaregeneratedbythePDS.At the end of the study subjects were asked to set their final sharing preferences on the PDS. 3.3. MobileSensingPlatform Thesensingsoftwarerunsinapassivemanneranddoesnotinterferewiththenormal usage of the phone. The configuration is set in a way that battery-intensive actions (e.g., GPS and Bluetooth scans) are performed in intervals allowing usefulness while minimizing battery consumption. The data collected consisted of: i) call logs, ii) SMS logs,iii)proximitydataobtainedbyscanningnear-byphonesandotherBluetoothde- vices and iv) location data obtained using GPS or localized WiFi. Bluetooth and GPS scans were done every 5 minutes. Note that in this study we use 5-months (February toJuneof2013)ofcollecteddatatocomputeseveralbehavioralfeatures. 3.4. PersonalDataStore ThePDSisadigitalspace,ownedandcontrolledthroughaWebinterfacebytheuser, actingasrepositoryforthepersonalinformationcollectedduringthestudyandoffer- ingeveryuserthepossibilitytoview,controlanddiscloseher/hisowndata.Datawere organized in “regions” by putting together data having a similar meaning (e.g., data about locations were organized in the same “region”, independently of whether they werecollectedthroughGPSoraWiFihit). One section of the PDS was designed to provide users with visualizations of their (always up to date) personal data. Two types of Individual Views were provided for each kind of owned data: a detailed view (in tables or maps), where every available piece of raw data is represented in detail, and aggregated views (see Figure 1) with aggregations, at different levels, of the personal data (e.g., charts, pies, clusters of frequentlocations,quantityofcontacts,etc.). ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. A:6 C.Perentisetal. Fig.2:SocialViews:exampleofaPDSsocialviewforcallinteractions. The PDS also features a Sharing Area [Vescovi et al. 2014], a space for subjects to fix the desired disclosure level of their data, distinguished into: (i) Do Not Share; (ii) Share Anonymously; (iii) Share Non-Anonymously (i.e., labeling the data with some personaldemographicinformation).Finally,subjects’choicesaredirectlyreflectedinto Social Views, shown in Figure 2 and built out of the personal data disclosed by the participants. SocialviewswereaccessibleanytimebytheparticipantsontheirPDSinsuchaway that any change through the sharing/disclosure settings had an immediate effect on the material displayed in them. This enabled levels of comparison of one subject’s be- haviorwiththoseoftheothersthatdependedonthesubject’scurrentsharingsettings. Inmoredetail:a)ifforagivenperson,agivendatatypeandagiventimethesetting wasDoNotShare,thenthecorrespondingsocialviewsdidnotexploitthecorrespond- ingdataandtheuserwaspreventedtoaccessanyofthem;b)ifthesettingwasShare Anonymously then only their aggregated and anonymous data were made available in social views and they could access data only in the same format; c) with a Share Non-Anonymously setting, the relevant data were presented with information about thesubjectandthelatterwasenabledtoaccessallthesimilarlydisclosedinformation bytheotherusers. In summary, the level of disclosure and the social views worked in full synchrony to ensure that the higher the chosen disclosure level, the more detailed was the in- formation made available and accessible about the others, with an increasing level of socialcomparison.Toexemplify,viewssuchas“HowmuchamIsocial?”,“HowlongI’ve been on the phone w.r.t. others?”. For example, Figure 2 presents the latter example viewforausersharingher/hisdatanon-anonymously;theredcolumnrepresentsthe user, while on the horizontal axis the information related to the other users sharing “non-anonymously”arereported(iftheuserwassharinganonymouslyallthecolumns wouldbelabeledas“anonymous”). ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. Anonymousornot?UnderstandingtheFactorsAffectingPersonalMobileDataDisclosure A:7 Fig.3:PDSSharingArea:allowsuserstosettheirPDdisclosurepreferences. 3.5. Demographics,PersonalityandOtherIndividualCharacteristics Wecollecteddifferenttypesof informationfromoursubjectsincludingdemographics, self-reportedpersonalitytraitsandattitudestowardsprivacy.Descriptiveinformation forthefollowingscalescoresisprovidedinTableII. Demographic Information. As pointed out in Section 2, there have been several at- temptstoassociateprivacyconcernsandsharingbehaviorwithdemographicinforma- tion.Inourcaseweusedparticipants’ageandgender. Personality and Individual Traits. In our study, Big-5 personality traits are mea- sured by means of the BFMS questionnaire [Perugini and Di Blas 2002], a scale val- idated for Italian covering the traditional dimensions of Extraversion, Neuroticism, Agreeableness, Conscientiousness and Openness to Experience. The scale consists of 10 adjectives per personality trait, with a rating scale from 1 to 7. The Big-5 person- ality traits scores are obtained by summing the points of each of the 10 adjectives. We also exploited the Locus of Control (LoC) [Rotter 1966], a psychological construct measuring whether causal attribution for one’s behavior or beliefs is made to oneself ortoexternaleventsorcircumstances.Thecorrespondingscaleconsistsofasetofbe- liefsaboutwhethertheoutcomesofone’sactionsaredependentuponwhatthesubject does (internal orientation) or upon events outside of her/his control (external orienta- tion). Locus of Control was measured by asking subjects to fill the Italian version of Craig’s Locus of Control scale [Farma and Cortivonis 2000]. This scale is composed of 17questionsusingaratingscalefrom0to5.Eachparticipant’sLocusofControlscore iscomputedbysummingupthepointsofeachitem. AnotherconstructwetakeintoaccountistheDispositionalTrust.Rotter[1967]was among the first to discuss trust as a form of personality trait, defining interpersonal trust as a generalized expectancy that the words or promises of others can be relied on. In our study, we resort to Mayer and Davis [1999] Trust Propensity Scale. The Dispositional Trust scale has 8 item-questions measured in a 1 to 7 point scale. To acquirethefinaltrustscoreforeachsubjectwesumupthepointsofeachitem. Finally,wetargetedtheself-disclosureattitudesofoursubjects.Self-disclosurehas been defined as any message about the self that an individual communicates to an- other one [Cozby 1973]. We use Wheeless’s scale, which has been utilized to measure self-disclosure in online communication and in interpersonal relationships [Wheeless and Grotz 1976]. Precisely, we measure five dimensions of self-disclosure using a 1-7 scale for each, namely: (i) amount of disclosure (7 items), (ii) positive-negative nature ofdisclosure(7items),(iii)consciouslyintendeddisclosure(4items),(iv)honesty&ac- ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. A:8 C.Perentisetal. TableI:FrequencyTableforInitialLocation&InteractionsPrivacySetting. DependentVariables TransformedDependentVariables PrivacyPreference Location Interactions PrivacyPreference Location Interactions DoNotShare 2 1 ShareAnonymously 21 22 ShareAnonymously 22 23 ShareNon-Anonymously 39 39 ShareNon-Anonymously 39 39 Total 63 63 Total 61 62 curacyofdisclosure(8items),and(v)generaldepthorintimacyofdisclosure(5items). Thefinalscorefordimensions(iii)and(v)isthesumofpointscollectedfromthecorre- spondingitems,respectively.Incontrast,formeasuringdimensions(i)and(ii):4items sum up to factor , while 3 items sum up to factor . For computing (iv): 4 items sum a b uptofactor and4itemssumuptofactor .Thefinalscoreforthedimensions(i),(ii) a b and(iv)isgiveneachtimebythisconstruct:(32−factor )+factor . a b PrivacyConcerns.Informationaboutprivacyconcernswascollectedresortingtothe scaleofConcernforInformationPrivacy(CFIP)developedbySmithetal.[1996].This scaleregardsprivacyconcernsoftheindividualaboutorganizationalinformationpri- vacy practices with four data-related dimensions: collection, unauthorized secondary use, errors and improper access to personal information. The concerns are measured using a 1 to 7 point scale consisting of 15 question-items. The final score is computed bysummingalltheresponsestothequestions. Social Relationships within the Community. Each user was asked to indicate the peopleknownwithinthecommunity. 4. METHODOLOGY Our goal is to understand the effect of a wide range of variables in the disclosing decisionspeoplemakeabouttheirpersonalmobiledata. To do this, we make two concrete steps. Firstly, we fit Binary Logistic Regression (BLR) models testing separately how the sharing choices (dependent variables) are affected by the following set of independent variables’ families: (i) demographic infor- mation, (ii) psychological traits and other individual dispositions (Big-5 personality traits,LocusOfControl, dispositional trust,privacyconcerns,self-disclosure),(iii)so- cial relationships within the community, (iv) dynamic behavior (communication and mobility), and (v) PDS access usage information, as visualized in Table II. Note that testing separately per-group represents a feature selection step, since we use back- ward elimination. Secondly, taking into account exactly those features that showed a significant effect, we construct an Overall and a combined Mobile+PDS BLR classifi- cationmodelperdatatypeinordertopredictthesharingchoices.Theoverallmodels representthemosteffectivepredictorsfromthedifferentfamilies,whilethecombined models use only behavioral mobile data and PDS usage access features. Such data couldactuallybecollectedusingaPDSserviceinareal-lifescenario. 4.1. DependentVariables:SharingChoices Tomodelthedisclosureofpersonalinformationweconstructdependentvariablestak- ing into account the final disclosing choices subjects set in the PDS, one for each dif- ferent data type: Sharing Location and Sharing Interactions (calls & SMS). As said, users were able to choose among three levels of sharing, i.e., Do Not Share, Share Anonymously and Share Non-Anonymously, for each data type. We observe from Ta- ble I that the Do Not Share choice has few occurrences concerning both the location and the social interactions data. For this reason, we discarded the data instances for thesharingchoiceDoNotShareforbothdatatypes. ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. Anonymousornot?UnderstandingtheFactorsAffectingPersonalMobileDataDisclosure A:9 4.2. IndependentVariables 4.2.1. Demographics,PersonalityandOtherIndividualCharacteristics. Inthispaper,wetake intoaccountseveralcharacteristicsofourstudyparticipants.Specifically,wefocuson demographic data (age and gender), Big-5 personality traits, Locus of Control (LoC), dispositional trust, a measure of privacy concerns and the five variables describing self-disclosure(seeTableII). Furthermore, features regarding participants’ social network were extracted using the self-reported information provided about the acquaintance level with the other peopleinsidethecommunity.Morespecifically,thefollowingvariableshavebeencom- puted: (i) out-degree (i.e., the number of people that a person reports she/he knows) and (ii) in-degree (i.e., the number of people that reported knowing a specific person). All variables describing individuals’ characteristics are normalized scalar variables, exceptgender(female/male)beingacategoricaldichotomousvariable. Noticealsothatitwasnotpossibletounderstandhowthechoiceofa“friend”affects thedisclosingoption,becausesubjectsarenotawareoftheidentityoftheotherusers’ privacy setting. In the best case, if both parties share openly the data they could see each other demographic information (see Figure 2) but not the name. Therefore, we focusedouranalysisonfeaturesthatcharacterizetheirsocialnetworksize,i.e.,in/out- degree. 4.2.2. Dynamic Behavioral Data. We computed a number of features from participants’ mobilephoneusagebehavior,willingtoexamineiftheycouldassociatewithpersonal informationdisclosuredecisions.InTableIIallthebehavioralfeatures(computedover the aforementioned 5 month period) appear combined with descriptive information. Firstly,weconsiderlocationandsocialinteraction(calls&SMS)information,collected passivelyfromthemobilephone. Forbothsocialinteractiondata(i.e.,calls&SMS)wecomputethefollowingfivefea- turesadjustedtoeachdatatypecontext,asshowninTableII.Thefirstthreeconcern thewholeperiodofthestudy(i.e.,5months),whilethelasttwoonesquantifyadaily behavior taking into account the days that users were actively communicating. Note thatourcommunityisreallyactive,thusforaparticipantthetotaldaysofactivecom- municationisalmostequaltothedaysofthestudy.Thefeaturesarethetotalnumber ofcalls(outgoing/incoming)andSMS(sent/received).Wealsoconsiderthenumber(#) ofuniquecallscontactsandSMScontacts,andthecalls’andSMSdiversity.Thismea- sure of diversity [Eagle et al. 2010] quantifies how the individuals spread their time amongtheircontacts.Moreprecisely,itisgivenbythefollowingformula: −(cid:80)k p logp D(i)= j=1 ij ij, (1) logk wherep isthevolumeofcommunicationinteractions(callsorSMS)betweensubject ij i and j normalized by the total number of i(cid:48)s calls or SMS, and k is the distinct num- berofindividualscontactedbycallsorSMS,respectively.Highvaluesofthediversity measureindicatethatparticipantsdistributetheirtimemoreevenlyamongtheircon- tacts. Finally, we extract the daily average and standard deviation of the calls and SMSevents,usingthedayswhenuserswereactive. To characterize participants’ mobility behavior we extract metrics quantifying amountanddeviationofthemovementrecentlyusedbyCanzianandMusolesi[2015]. Regarding the amount, we compute the total distance covered by the subject, i.e., the sumofthegeodesicdistanceofthesubsequentlatitudeandlongitudecoordinatepairs during the 5 months period. In addition, based on the days the user was found active we compute the daily average distance covered. Note that we exclude coordinates not matchingItaly’sterritoryfortworeasons,(i)tocaptureeverydaylifebehaviorand(ii) ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY. A:10 C.Perentisetal. Table II: All features included in the analysis extracted from: Self-reported informa- tion,mobiledataandPDSusage. DataSource DataCategory DataType Mean SD Min Max Age 38.67 3.34 28 46 Demographics Gender - - - - Extraversion 39.78 10.06 16 59 Self- Neuroticism 32.25 7.29 13 47 Reported Personality Agreeableness 49.78 6.893 35 63 Data Conscientiousness 45.94 9.84 21 61 through Openness 44.52 6.75 28 56 surveys LocusofControl 27 10.34 9 59 OtherTraits Trust 25.79 5.92 13 46 PrivacyConcerns 80.11 12.55 53 102 In(Un)tentional 21.35 4.36 9 28 DisclosureAmount 26.48 8.74 8 46 Self-Disclosure Positive-Negative 34.19 6.34 18 46 Depth-Intimacy 15.94 6.71 5 34 Honesty-Accuracy 39.84 7.89 20 52 Out-degree 7.94 4.45 2 22 CommunitySN In-degree 7.84 4.74 2 25 #TotalCalls 1713.71 526.84 695 2876 #UniqueCallContacts 163.4 50.59 70 339 Calls CallDiversity 0.71 0.06 0.53 0.85 avg.Calls(daily) 12.53 3.74 5.39 22.29 Mobile std.Calls(daily) 8.19 2.24 4.07 16.16 Phone #TotalSMS 1027.76 401.29 112 2036 Data #UniqueSMSContacts 92.37 38.53 32 258 SMS SMSDiversity 0.73 0.06 0.58 0.91 avg.SMS(daily) 7.90 2.43 3.20 14.24 std.SMS(daily) 5.80 1.83 2.08 10.09 TotalDistance 5604.34 2338.14 2305.94 11549 std.Displacements 336.57 190.16 77.94 1106.25 Location avg.Distance(daily) 40.82 16.11 16.83 90.94 avg.std.Displ.(daily) 2.54 1.41 0.57 7.42 IndividualViews 1.78 1.56 0 9 PDS Location SocialViews 1.33 1.32 0 5 Usage IndividualViews 1.83 1.49 0 8 Data Interactions SocialViews 1.37 1.46 0 6 toavoidoutliersgeneratedbyveryhighdistancesin-betweencountrieswhentraveling (e.g.,byairplane).Thosefeaturescapturetheamountofmobilityofasubject.Next,we measure the standard deviation of displacements, where displacement stands for the distancebetweenonevisitedpairofcoordinatesandthesubsequentone.Thismeasure quantifies how much each location transition refrains from the total user movement. Wealsoincludethedailyaverageforthestandarddeviationofdisplacements,quanti- fyingadeviationofthevisitedlocationsfromtheaveragedailymovement. 4.2.3. PersonalDataStoreUsage. WealsoinvestigatedtheroleplayedbyPersonalData Store usage by computing: (i) the total number of distinct days participants accessed theindividualviewsandthesocialviewsforbothlocationandinteractiondatatypes. Thosemetricswillprovideuswithinsightsofhowusersusedthetoolandwhichkind offeedback(i.e.,theindividualorthesocialone)theyvisitedmoreoftenperdatatype. 4.3. LogisticRegressionAnalysisandClassification Aspreviouslymentioned,wefirstinvestigatethepredictiveroleplayedbythedifferent groups of independent variables. Then, using for each group only the factors showing a significant effect we build a combined Mobile+PDS and an Overall model for our ACMTransactionsonInternetTechnology,Vol.V,No.N,ArticleA,Publicationdate:JanuaryYYYY.