ebook img

Introduction to Social Sensing PDF

61 Pages·2012·0.37 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Social Sensing

Chapter 9 SOCIAL SENSING Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY [email protected] Tarek Abdelzaher University of Illinois at Urbana Champaign Urbana, IL [email protected] Abstract A number of sensor applications in recent years collect data which can bedirectlyassociated with humaninteractions. Someexamplesofsuch applications include GPS applications on mobile devices, accelerome- ters, or location sensors designed to track human and vehicular traffic. Such datalends itself to a varietyof rich applications in which one can use the sensor data in order to model the underlying relationships and interactions. This requires the development of trajectory mining tech- niques, which can mine the GPS data for interesting social patterns. It also leads to a number of challenges, since such data may often be private, and it is important to be able to perform the mining process without violating the privacy of the users. Given the open nature of theinformation contributedbyusersinsocial sensingapplications, this also leads to issues of trust in making inferences from the underlying data. In this chapter, we provide a broad survey of the work in this importantandrapidlyemergingfield. Wealsodiscussthekeyproblems whichariseinthecontextofthisimportantfieldandthecorresponding solutions. Keywords: Sensor Networks, Social Sensors, Cyber-physicalNetworks 238 MANAGING AND MINING SENSOR DATA 1. Introduction TheproliferationofnumerousonlinesocialnetworkssuchasFacebook, LinkedInandGoogle+ haslead toanincreasedawareness ofthepowerof incorporating social elements into a variety of data-centric applications. Such networks are typically data rich, and contain heterogeneous data alongwithlinkagestricture,whichcanbeminedforavarietyofpurposes [39, 98, 108]. In particular, it has been observed that the use of a combination of social structure and different kinds of data can be a very powerful tool for mining purposes [136, 175, 182]. A natural way to enhance the power of such social applications is to embed sensors within such platforms in order to continuously collect large amounts of data for prediction and monitoring applications. This has lead to the creation of numerous social sensing systems such as Biketastic [142], BikeNet [55], CarTel [88] and Pier [148], which use social sensors for a variety of transportation and personal applications. The fusion of mobile, social, and sensor data is now increasingly being seen as a tool to fully enable context-aware computing [20]. Anumberofrecenthardwareplatformshaveextendedthedata-centric capabilities of social networks, by providing the ability to embed sensor data collection directly into the social network. Therefore, it is natu- ral to explore whether sensor data processing can be tightly integrated with social network construction and analysis. For example, methods such a crowd-sourcing are a natural approach for improving the ac- curacy of many socially-aware search applications [168]. Some of the afore-mentioned data types on a conventional social network are static and change slowly over time. On the other hand, sensors collect vast amounts of data which need to be stored and processed in real time. There are a couple of important drivers for integrating sensor and social networks: One driver for integrating sensors and social networks is to allow the actors in the social network to both publish their data and subscribe to each other’s data either directly, or indirectly after discovery of useful information from such data. The idea is that such collaborative sharing on a social network can increase real- timeawarenessofdifferentusersabouteach other,andprovideun- precedented information and understandingabout global behavior of different actors in the social network. The vision of integrating sensor processing with the real world was first proposed in [177]. A second driver for integrating sensors and social networks is to provide a better understanding and measurement of the aggre- Social Sensing 239 gate behavior of self-selected communities or the external environ- ment in which these communities function. Examples may include understanding traffic conditions in a city, understanding environ- mental pollution levels, or measuring obesity trends. Sensors in the possession of large numbers of individuals enable exploiting thecrowd for massively distributeddata collection andprocessing. Recent literature reports on several efforts that exploit individuals fordatacollection andprocessingpurposessuchascollection ofve- hicular GPS trajectories as a way for developing street maps [78], collectively locating itemsofinterestusingcell-phonereports,such as mapping speed traps using the Trapster application [190], use of massive human input to translate documents [145], and the de- velopment of protein folding games that use competition among players to implement the equivalent of global optimization algo- rithms [21]. The above trends are enabled by the emergence of large-scale data collection opportunities, brought about by the proliferation of sensing devices of every-day use such as cell-phones, piedometers, smart energy meters, fuelconsumptionsensors(standardizedinmodernvehicles), and GPS navigators. The proliferation of many sensors in the possession of the common individual creates an unprecedented potential for build- ing services that leverage massive amounts data collected from willing participants, or involving such participants as elements of distributed computing applications. Social networks, in a sensor-rich world, have become inherently multi-modal data sources, because of the richness of the data collection process in the context of the network structure. In recentyears, sensordatacollection techniques andservices havebeenin- tegrated into many kinds of social networks. These services have caused a computational paradigm shift, known as crowd-sourcing [23, 47], re- ferring to the involvement of the general population in data collection and processing. Crowd-sourcing, arguably pioneered by programs such as SETI, has become remarkably successful recently due to increased networking, mobile connectivity and geo-tagging [1]. We note that the phenomenonofcrowd-sourcingisnotexclusivetosensordata,butisalso applied to other tagging and annotation processes, in which the knowl- edge is sourced from a social network of users. A classic example of a crowd-sourcing application is the Amazon Mechanical Turk [192], which allows users to submit data records for annotation at the payment of a fee for annotation purposes. Thus, the Amazon Mechanical Turk serves as an intermediary for crowd-sourcing of annotations for data records. In the case of social sensing which is also often referred to as people- centric sensing [6, 26, 123] or participatory sensing [24], this crowd- 240 MANAGING AND MINING SENSOR DATA sourcingisgenerally achieved throughsensorswhichareclosely attached to humans, either in wearable form, or in their mobile phones. Some examples of integration of social and sensor networks are as follows: A variety of applications can be created to collect real time in- formation from large groups of individuals in order to harness the wisdom of crowds in a variety of decision processes. For example, the Google Latitude application [184] collects mobile position data of uses, and uses this in order to detect theproximity of users with their friends. This can lead to significant events of interest. For example, proximity alerts may be triggered when two linked users are within geographical proximity of one another. This may itself trigger changes in the user-behavior patterns, and therefore the corresponding sensor values. This is generally true of many ap- plications, the data on one sensor can influence data in the other sensors. Numerous other GPS-enabled applications such as City sense,Macrosense, andWikitude[185, 195, 191]serveasgps-based social aggregators for making a variety of personalized recommen- dations. The approach has even been used for real-time grocery bargain hunting with the LiveCompare system [46]. Vehicle Tracking Applications: A number of real-time automotive trackingapplicationsdeterminetheimportantpointsofcongestion in the city by pooling GPS data from the vehicles in the city. This can be used by other drivers in order to avoid points of congestion in the city. In many applications, such objects may have implicit links among them. For example, in a military application, the different vehicles may have links depending upon their unit mem- bership or other related data. Two classic examples of vehicular applications in the context of participatory sensing are the CarTel [88] and GreenGPS [64] systems. Trajectory Tracking: In its most general interpretation, an actor in a social network need not necessary bea person, butcan beany living entity such as an animal. Recently, animal tracking data is collected with the use of radio-frequency identifiers. A number of social links may exist between the different animals such as group membership, or family membership. It is extremely useful to uti- lize the sensor information in order to predict linkage information and vice-versa. A recent project called MoveBank [186] has made tremendous advances in collecting such data sets. We note that a similar approach may be used for commercial product-tracking applications, though social networking applications are generally relevant to living entities, which are most typically people. Social Sensing 241 Applications to Healthcare: Inrecentyears, numerousmedicalsen- sor devices can be used in order to track the personal health of individuals, or make other predictions about their lifestyle [41, 65, 84, 119, 121, 122, 150]. This can be used for emergency response, longtermpredictionsaboutdiseasessuchasdementia, orotherlife style influenceanalysis offactors suchas eatinghabits andobesity. Social sensing applications provide numerous research challenges from the perspective of analysis. We list some of these challenges below: Since the collected data typically contains sensitive personal data (eg. location data), it is extremely important to use privacy- sensitive techniques [61, 133] in order to perform the analysis. A recent technique called PoolView [61] designs privacy-sensitive techniques for collecting and using mobile sensor data. Sensors,whetherwearableorembeddedinmobiledevices,aretypi- callyoperatedwiththeuseofbatteries, whichhavelimitedbattery life. Certain kinds of sensor data collection can drain the battery life more quickly than others (eg. GPS vs. cell tower/WiFi lo- cation tracking in a mobile phone). Therefore, it is critical to design the applications with a careful understanding of the un- derlying tradeoffs, so that the battery life is maximized without significantly compromising the goals of the application. The volume of data collected can be very large. For example, in a mobile application, one may track the location information of millions of users simultaneously. Therefore, it is useful to be able to design techniques which can compress and efficiently process the large amounts of collected data. Since the data are often collected through sensors which are error- prone, or may be input by individuals without any verification, this leads to numerous challenges about the trustworthiness of the data collected. Furthermore, the goals of privacy and trust tend to beat oddswith one another, becausemost privacy-preservation schemes reduce the fidelity of the data, whereas trust is based on high fidelity of the data. Many of the applications require dynamic and real time responses. For example, applications which trigger alerts are typically time- sensitive and the responses may be real-time. The real-time as- pects of such applications may create significant challenges, con- sidering the large number of sensors which are tracked at a given time. 242 MANAGING AND MINING SENSOR DATA This chapter is organized as follows. Section 2 briefly discusses some key technological advances which have occurred in recent years, which have enabled the design of such dynamic and embedded applications. Section 3 discusses a broad overview of the key system design questions which arise in these different contexts. One of the important issues discussed in this section is privacy, which is discussed in even greater detail in a later section. Section 4 discusses some important privacy issues which arise in the context of social networks with embedded sen- sors. Section 5 discusses the trust-worthiness issues which arise in such crowd-sourcing systems. Section 6 introduces techniques for social net- work modeling from dynamic links which are naturally created by the sensor-based scenario. Since such dynamic modeling often requires tra- jectory mining techniques, we present methods for trajectory mining in section 7. Section 8 introduces some of the key applications associated with social sensing. Section 9 discusses the conclusions and research directions. 2. Technological Enablers of Social Sensing A number of recent technological advances in hardware and software have enabled the integration of sensors and social networks. One such key technological advance is the development is small mobile sensors which can collect a variety of user-specific information such as audio or video. Many of the applications discussed are based on user-location. Such location can easily be computed with the use of mobile GPS- enabled devices. For example, most of therecent smart-phonestypically have such GPS technology embedded inside them. Some examples of such mobile sensor devices may be found in [117, 100]. Sensors typically collect large amounts of data, which must be con- tinuously stored and processed. Furthermore, since the number of users in a social network can be very large, this leads to natural scalability challenges for the storage and processing of the underlying streams. For example, many naive solutions such as the centralized storage and pro- cessing of the raw streams are not very practical, because of the large number of streams which are continuously received. In order to deal with this issue, a number of recent hardware and software advances have turned out to be very useful. DevelopmentofMiniaturizedSensorTechnology: Thedevelopment ofminiaturized (wearable) sensorsandbatteries haveallowed their use and deployment in a number of different social settings. For example, the development of miniaturized sensors, which can be embedded within individual attire can be helpful in a wide vari- Social Sensing 243 ety of scenarios [42, 100, 63, 33, 34]. A classic example is the spec mote, which is an extremely small sensor device that can be embedded in the clothing of a user, while remaining quite unob- trusive. Advancement of smartphone technology: In recent years, there has been considerable advancement in smartphone technology, which are now fairly sophisticated devices containing a wide array of sensors such as GPS, compass, accelerometers, bluetooth capabil- ities etc. In addition, these are convergent devices, with consider- able computational capabilities, internet connectivity, and differ- ent modes of user interaction and content upload, such as social tweets, ability to record pictures and videos etc. All of these ca- pabilities create a rich content-based and sensing environment for a wide variety of applications. Increased Bandwidth: Since sensor transmission typically requires large wireless bandwidth, especially when the data is in the form of audio or video streams, it is critical to be able to transmit large amountsofdatainrealtime. Theincreasesinavailable bandwidth in recent years, have made such real time applications a reality. Increased Storage: In spite of the recently designed techniques for compressing the data, the storage challenges for stream processing continue to be a challenge. Recent years have seen tremendous advances in hardware, which allow much greater storage, than was previously possible. Development of Fast Stream Processing Platforms: A number of fast stream processing platforms, such as the IBM System S plat- form [187] have been developed in recent years, which are capable of storing and processing large volumes of streams in real time. This is a very useful capability from the perspective of typical cyber-physical applications which need a high level of scalability for real-time processing. Development of Stream Synopsis Algorithms and Software: Since the volume of the data collected is very large, it often cannot be collected explicitly. Thisleadstotheneedfordesigningalgorithms and methods for stream synopsis construction [7]. A detailed dis- cussion of a variety of methods (such as sketches, wavelets and histograms) which are used for stream synopsis construction and analysis is provided in [7]. 244 MANAGING AND MINING SENSOR DATA The sensing abilities of miniaturized devices and smartphones have also increased considerably in recent years. For example, the one of the earliest systems, which is referred to as a sociometer [33, 34], a small wearable device is constructed, which can detect people nearby, provide motion information and accelerometers, and also has microphones for detection of speech information. In addition, the device has the flexi- bility to allow for the addition of other kinds of sensors such as GPS sensors and light sensors. These sensors can be used in order to detect implicit links between people, and the corresponding community behav- ior. Theaim of collecting a large numberof such interactive behaviors is to be able to effectively model interactions, between different users, and then model the dynamics of the interaction with the use of the collected information. Sincetheworkin[33],muchofthesesensingcapabilitiesarenowavail- able in commodity hardware such as mobile phones. For example, the Virtual Compass system [18] uses the sensors available in mobile phones in order to sense the interactions between different actors. Virtual Com- passis apeer-based relative positioningsystem thatuses multipleradios to detect nearby mobile devices and places them in a two-dimensional plane. It uses different kinds of scanning and out-of-band coordination to explore tradeoffs between energy consumption, and the latency in detecting movement. Methods are designed for using different kinds of sensorsignalsinVirtualCompassinordertoreducetheenergyfootprint. More details may be found in [18]. 3. Data Collection, Architectural and System Design Challenges The aforementioned monitoring and social computing opportunities present a need for a new architecture that encourages data sharing and efficiently utilizes data contributed by users. The architecture should allow individuals, organizations, research institutions, and policy mak- ers to deploy applications that monitor, investigate, or clarify aspects of socio-physical phenomena; processes that interact with the physical world, whose state depends on the behavior of humans in the loop. An architecture for social data collection should facilitate distillation of concise actionable information from significant amounts of raw data contributed by a variety of sources, to inform high-level user decisions. Such an architecture would typically consist of components that sup- port (i) privacy-preserving sensor data collection, (ii) data model con- struction, and (iii) real-time decision services. (iv) effective methods for recruitment, and (v) energy efficient design. For example, in an ap- Social Sensing 245 plication that helps drivers improve their vehicular fuel-efficiency, data collection might involve upload of fuel consumption data and context from the vehicle’s on-board diagnostics (OBD-II) interface and related sensors; a model might relate the total fuel consumption for a vehicle on aroad segment as afunction of readily available parameters (such as av- erage road speed, degree of congestion, incline, and vehicle weight); the decision support service might provide navigation assistance to find the mostfuel-efficient routetoagiven destination (asopposedtoafastestor shortest route). Of course, none of these can be effectively implemented withoutenergy-efficient datacollection andparticipantrecruitment. Be- low, we elaborate on the above functions. 3.1 Privacy-Preserving Data Collection In a grassroots application that is not managed by a globally trusted authority, an interesting challenge becomes ensuring the privacy of data shared. Anonymity is not a sufficient solution because the data them- selves (such as GPS traces) may reveal the identity of the owner even if shared anonymously. One interesting direction is to allow individuals to “lie” about their data in a way that protects their privacy, but without degradingapplication quality. Forexample, inatrafficspeedmonitoring applicationreconstructionofcommunitystatisticsofinterest(suchasav- erage traffic speed on different streets) should remain accurate, despite useofperturbeddata(“lies”aboutactualspeedofindividualvehicles)as input to the reconstruction process. This is possible thanks to deconvo- lution techniques that recover the statistical distribution of the original signals, given the statistical distribution of perturbed data and the sta- tistical distribution of noise. Solutions to this and related problems can be found in literature on privacy-preserving statistics [9]. Recently, spe- cial emphasis was given to perturbing time-series data [61], since sensor datatypically compriseacorrelated series ofsamplesof somecontinuous phenomenon. Perturbingtime-series dataischallenging becausecorrela- tions among nearby samples can be exploited to breach privacy. Recent results demonstrate that the frequency spectrum of the perturbation signal must substantially overlap with the frequency spectrum of the original data time-series for the latter to be effectively concealed [61]. Generalizations to perturbation of correlated multi-dimensional time- series data were proposed in [133]. The main challenge addressed in this workwastoaccountforthefactthatdatasharedbydifferentsensorsare usually not independent. For example, temperature and location data can be correlated, allowing an attacker to make inferences that breach privacy by exploiting cross-sensor correlations. 246 MANAGING AND MINING SENSOR DATA A related interesting problem is that of perturbation (i.e., noise) en- ergy allocation. Given a perturbation signal of a particular energy bud- get (dictated perhaps by reconstruction accuracy requirements), how to allocate this energy budget across the frequency spectrum to optimally conceal an original data signal? A recent technique defines privacy as the amount of mutual information between the original and perturbed signals. Optimality is defined as perturbation that minimizes the upper bound on such (leaked) mutual information. The technique describes how optimal perturbation is computed, and demonstrates the funda- mental trade-off between the bound on information leak (privacy) and the bound on reconstruction accuracy [132]. We note that the privacy protectionissuesforsocialsensingdataarisebothduringtrajectorydata collection, and trajectory data management [38]. Since this section is focussed only on the data collection and system design issues, we will discussthisissueinamoreholisticandalgorithmicwayinalatersection of this chapter. 3.2 Generalized Model Construction Many initial participatory sensing applications, such as those giving riseto the above privacy concerns, wereconcerned with computingcom- munity statistics out of individual private measurements. The approach inherently assumes richly-sampled, low-dimensional data, where many low-dimensional measurements (e.g., measurements of velocity) are re- dundantly obtained by individuals measuring the same variable (e.g., speed of traffic on the same street). Only then can good statistics be computed. Many systems, however, do not adhere to the above model. Instead, data are often high-dimensional, and hence sampling of the high-dimensional space is often sparse. The more interesting question becomeshowtogeneralizefromhigh-dimensional,sparsely-sampleddata to cover the entire input data space? For instance, consider a fuel- efficient navigation example, where it is desired to compute the most fuel-efficient route between arbitrary source and destination points, for an arbitrary vehicle and driver. What are the most important gen- eralizable predictors of fuel efficiency of current car models driven on modern streets? A large number of predictors may exist that pertain to parameters of the cars, the streets and the drivers. These inputs may be static (e.g., car weight and frontal area) or dynamic (e.g., traveled road speed and degree of congestion). In many cases, the space is only sparsely sampled, especially in conditions of sparse deployment of the participatory sensingservice. Itisverydifficulttopredicta priori which parameters will be more telling. More importantly, the key predictors

Description:
Charu C. Aggarwal. IBM T. J. Watson Research Center. Yorktown Heights, NY [email protected]. Tarek Abdelzaher. University of Illinois at Urbana
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.