A5: Automated Analysis of Adversarial Android Applications TimothyVidas,JiaqiTan,JayNahata,ChaurLihTan,NicolasChristin,PatrickTague ECE/CyLab CarnegieMellonUniversity Workingpaper TechnicalReportCMU-CyLab-13-009 Firstversion: February21,2013 Thisversion: June3,2014 Abstract Mobilemalwareisgrowing–bothinoverallvolumeandinnumberofexistingvariants–atapace rapid enough that systematic manual, human analysis is becoming increasingly difficult. As a result, there is a pressing need for techniques and tools that provide automated analysis of mobile malware samples. WepresentA5,anautomatedsystemtoprocessAndroidmalware. A5isahybridsystemcom- biningstaticanddynamicmalwareanalysistechniques. Android’sarchitecturepermitsmanydifferent pathsformalwaretoreacttosystemevents,anyofwhichmayresultinmaliciousbehavior. Keyinno- vations in A5 consist in novel methods of interacting with mobile malware to better coerce malicious behavior,andincombiningbothvirtualandphysicalpoolsofAndroidplatformstocapturebehaviorthat couldotherwisebemissed. TheprimaryoutputofA5isasetofnetworkthreatindicatorsandintrusion detectionsystemsignaturesthatcanbeusedtodetectandpreventmaliciousnetworkactivity. Wedetail A5’sdistributeddesignanddemonstrateapplicabilityofourinteractiontechniquesusingexamplesfrom real malware. Additionally, we compare A5 with other automated systems and provide performance measurementsofanimplementation,usingapublisheddatasetof1,260uniquemalwaresamples,show- ing that A5 can quickly process large amounts of malware. We provide a public web interface to our implementationofA5thatallowsthirdpartiestouseA5asawebservice. 1 1 Introduction The number of applications available for mobile phones and tablets has surged dramatically over the past couple of years; this trend has been particularly pronounced for Android devices, that now represent 73% of all mobile devices [17]. Concomitant with this rise in the number of applications available, malware targeting mobile platforms, and specifically Android, has also started to appear [16,32,39]. Even though industryreportsof“exponentialgrowth”inmobilemalware[13,23]mustbetakenwithagrainofsalt[22] thereislittledoubtthattheoverallvolumeofmobilemalwareisincreasingatapacethatmakesitdifficult tosustainsystematicmanualanalysis. Itis,therefore,importanttodevelopautomatedanalysiscapabilities formobilemalware. Detecting, analyzing and combating mobile malware presents a number of unique challenges. First, differentfromthesituationwithpersonalcomputers,usersgenerallydonothavefulladministrativeaccess totheirmobiledevice,whichmakesitmuchmorechallengingtodevelopeffectiveanti-virustools. Second, carriersandnetworkoperators,whocanfairlytightlycontrolthenetwork,mayhaveonlylimitedcapabilities to control individual devices. Third, techniques useful to “sandbox” potentially harmful applications, such asvirtualization,aremuchlessmatureonmobiledevicesthantheyareonPCs. These unique challenges suggest that traditional malware analysis and detection methods need to be rethought in the context of mobile devices. For mobile devices, network-based identifiers (e.g., network traffic patterns) are considerably more actionable than host-based identifiers (e.g., writing a specific file). Indeed, a carrier or operator could easily disconnect, and potentially reset, a mobile device that produces suspiciousnetworktraffic. Ontheotherhand,detecting,onthedeviceitself,thatanapplicationismalicious ismuchmorecomplexwithoutelevatedprivileges. Inotherwords,giventhecurrentadministrativemodels, network-based intrusion detection systems appear considerably more useful to mobile devices than their host-basedcounterparts. We use these insights to propose “A5,” short for Automated Analysis of Adversarial Android Appli- cations. A5 is a system that draws conceptual design from existing dynamic analysis (or “sandbox”) sys- tems. At a high level, A5 executes malware in a sandbox environment that consists of a combination of physical devices and virtual Android systems hosted on a PC. A5 allows malware to connect to the In- ternet, in order to record network threat indicators and create network intrusion detection system (IDS) signatures. These signatures can in turn be used by an enterprise to protect mobile devices that connect to the Internet through a corporate network or to protect all corporate devices by forcing mobile device traffic through a network proxy. Similarly, cellular providers could use these signatures to protect devices connected to carrier networks. We provide a web-based interface to our current implementation of A5 at http://dogo.ece.cmu.edu/a5. The key novelty in A5 is to use a combination of static and dynamic analysis to coerce the application intotriggeringitsmaliciousbehavior. Indeed,inmobileapplications,activitycanbetriggeredbyawideas- sortmentofsystemevents–forinstance,receivingaphonecall,orhavingthescreengointolockmode. A5 attempts to exhaustively determine all possible paths that can trigger malicious behavior, before separately evaluatingthem. Doingso,A5cancaptureactivitythatwouldbemissedbyna¨ıvelyexecutingthemalware (i.e.,simply“clickingontheicon”). Furthermore,bycombiningphysicaldeviceswithvirtualAndroidim- ages,A5cancaptureawiderrangeofmaliciousbehaviorthanasandboxsolelybasedonemulationwould and can correctly process malware that employs certain type of sandbox evasion techniques. Likewise, A5 canaccommodateawiderangeofdifferenthardwareandsoftware(e.g.,SDK)configurations. Inthereminderofthispaper,wefirstintroducebackgroundonstaticanddynamicanalysisinsection2, where we also differentiate A5 from the relatively large body of related work on Android security. We then describe the design and architecture of A5 in section 3. We present a performance evaluation of our current implementation of A5 in section 4, notably showing that, using parallelism, A5 is able to analyze 1,260 unique malware samples in just over 10 hours. We discuss A5’s limitations in section 5, and draw conclusionsinsection6. 2 Background and Related Work Withoutaccesstosourcecodeforanalysis,inspectionandunderstanding,onemustresorttoothertechniques whenanalyzingcompiledsoftware. Inthecontextofmalwareanalysis,dynamicanalysisinvolvesexecuting the malware samples to observe their behavior [25]. Conversely, static analysis refers to techniques that inspect or process a sample, but never execute the malware [14]. Manual, static analysis, colloquially knownasas“reverseengineering,”canbeveryeffective,butoftenrequireshighlytrainedindividualsandis timeconsuming. Thus, itisdifficulttoscalemanualanalysisatthepacethatmobilemalwareisgrowing– bothintermsofvolumeandinnumberofexistingvariants[13,16,23,32,39]. Adynamicanalysistechniqueoftenusedinvulnerabilitydiscoverycanbeautomatedtoprocessinputto samples automatically. Fuzzing is the process of sending data as input to a program, possibly intentionally invaliddata, inordertocoerceadesiredconditionorbehavior. Theinputcanbecreatedprogrammatically to cover a range of inputs, and in this way can be thought of as a brute-force attack against the software. This technique may be considered inelegant, but fuzzing implementations are often straight-forward, and effective. Fuzzing is used in automated vulnerability discovery to find software vulnerabilities that are not feasibletoauditinanyotherway[29]. Malware sandboxes. Malware sandboxes automate dynamic analysis techniques to inspect large volumes of malware automatically. The general operation of a sandbox system is to execute each input sample much like a user would, but in a controlled environment instrumented to monitor host and network activ- ity. The sheer volume of unique malware samples on traditional computers makes the use of automated sandboxes appealing. Numerous commercial products, such as CWSandbox [37], and academic projects, suchasANUBIS[6], haveappearedoverthepastseveralyears. Automatedsandboxesoftenscalelinearly withcomputationalpower. Asandboxaddressingcomputermalwaremaybootavirtualmachine,copythe samples to the virtual machine, then execute the sample. The sandbox can monitor and report on changes to the host (i.e., registry keys, files) and network communications. For instance, Rossow el al. present a dynamic analysis system called Sandnet [25], which is used to collect network traffic from PC malware samples. Sandnet is used to process 100,000 samples and the authors find that DNS and HTTP have novel trendsinmalwareuse. Malware analysis systems for Android. The work most related to A5 is a dynamic analysis system called Andrubis [2]. Andrubis is an extension to the automated PC malware analysis project ANUBIS, but is designed for processing Android packages. The inner-workings of Andrubis are not publicly known, but thecreatorsallowanyonetointeractwithapublicinterfaceviawebsite. Blasingetal.[7]describeanother dynamic analysis system for Android. Their system focuses on classifying input applications as malicious (ornot). ThesysteminstrumentsLinuxfeaturesandscansapplicationsfortheuseofpotentiallydangerous criteria. LikeAndrubis,thissysteminteractswiththemalwarebystartingtheapplication’sprimaryActivity. A5differsfromthesetwosystemsprimarilyinthewayA5interactswiththemalware—usingmultiple techniquestocoercetheexecutionofthemaliciouscode. However,thereareseveralotherdifferencessuch as the parallel implementation of A5, support for every Android API version, and the ability to use virtual instances,physicaldevicesorboth. DroidBox[21]isagenericappmonitoringtoolforAndroidapps. ItmonitorsanAndroidappforvarious activities at runtime, such as incoming and outgoing network data, file read and write operations, services started, etc. It then provides a timeline view of the monitored activity from the app. DroidBox is useful for manually identifying malware by viewing its observed behavior. Compared to A5, DroidBox does not automatically coerce the app into undertaking particular behaviors, and A5 specifically captures network trafficforfindingmaliciousnetworkindicators. Inaddition, A5usesstatic-analysisinadditiontodynamic monitoringoftheapptofindcoercionpointsautomatically. SimilartoA5’sbytecodestatic-analysis,ComDroid[8]performsstatic-analysisofdecompiledbytecode ofAndroidapplications. ComDroidperformsflow-sensitive,intra-proceduralanalysistofindAndroid“In- tents” sent with weak or no permissions—but contrary to A5, ComDroid does not perform any dynamic analysis. A5 currently only captures network traffic to aid in finding malicious network indicators. It may make sense to pair A5 with taint tracking systems such as TaintDroid [10] in order to track host-based malware indicators. For instance, Andrubis employs TaintDroid. However, it may take significant effort to extend TaintDroidtosupportallSDKtargetversionsandtoworkwitharangeofphysicaldevices,asA5doesright now. Automatedsignaturecreation. Automatingthetediousanderror-proneprocessofcreatingnetworkIDSsig- naturesisawellresearchedtopicbutremainsanopenproblem. Asarepresentativeexample,KimandKarp create an automated system call Autograph that generates signatures for TCP-based Internet worms [19]. Like many efforts at automatic signature creation, Autograph’s detection mechanisms are particularly de- signed to address one type of malware, in this case worms. As such, Autograph’s pre-filtering step that discernsunsuccessfulTCPconnections,isnotparticularlyusefulforidentifyingmaliciousAndroidapplica- tiontraffic. AdifferentsystemcalledHoneycombpresentssimilaritiestoA5’sdesiredgoalofautomatically creatingIDSsignatures. KreibichandCrowcroftdescribethesystemwhichcollectstrafficfromahoneypot andsubsequentlycreatesnetworksignatures[20]. Sincethenetworktrafficiscapturedfromahoneypot,the trafficisassumedtobemalicious(oratleastsuspicious). A5similarlyassumesthatallinputismalware,but due to the repackaging common in Android malware, malicious network traffic is likely to be mixed with benigntraffic. 3 A5 architecture TheimmediateneedforasystemlikeA5isdrivenbyincreasingvolumesofmobilemalware. However,the designofA5isalsodirectedbyseveralcriteriaborrowingfromthemorematurefieldofPC-baseddynamic analysis and the unique nature of today’s mobile device ecosystem. Here we enumerate a list of desired featuresforsuchasystem,anddescribeanimplementationdesignedtomeetthesegoals. 3.1 ObjectivesandDesign Autonomy and scalability. The system must be able to handle volumes of malware without user interac- tion. As with PC malware, mobile malware is now growing at a rate that makes manual, human analysis unfeasible. Evasion resistance. The system must be able to adapt to evasion advances in malware. As seen in the PC, mobilemalwareisincreasinginsophistication. Withtheadventofautomatedmalwareprocessing,malware authorshavealreadybeguntoincludeminorattemptstoevadesandboxsystems. Mobile-specific interaction. The system must interact with malware in Android-specific ways. It is indeed moredifficulttosolicitmaliciousbehaviorfromcurrentmobilemalware,thantraditionalmalware. Simply executingthemalware(i.e.,“clickingontheicon”)maynotexhibitanymaliciousbehavior. Indeed,current mobile operating systems permit applications to register a software handler for a wide range of system events;forinstance,receivinganSMS,screengoinginlockmode,andsoforth. Anysucheventmaytrigger some application code. Traditional computer programs may receive input along with execution; mobile applications may receive input along with a myriad of system events. In either case, the behavior may dependupontheinputtotheapplication. Figure1: A5architecture. Malwareisfirstingested(1)intoasharedjobqueue. Independently,anoverallcontroller isstarted(2)whichstartsoneormoreWorkerprocesses(3). EachWorkerretrievesjobsfromthequeue(4)andeither postpones work or reserves a device instance (5) for dynamic analysis (6). Once analysis is complete, the device is returnedtothereadypool(7). TheremaybemanyWorkersandDevicePoolsonasinglehost. Thecontrollerandjob queuemayservicemanyhosts(eachwithmanyWorkers). Network-level indicator collection. The system should primarily collect network threat indicators. Host- based indicators, such as the modification of a file found on the device, are of limited value on Android. Indeed, since Android’s architecture does not permit file system hooks, and, more generally does not even permitprivilegedaccesstomostcomponentsofthesystem,itisnotpossibletoimplementcontrolssimilar toanti-virusproductsfoundonthePC.EvenifAndroid’sarchitecturewereadaptedtopermitsuchproducts (e.g., by systematically “rooting” devices), network indicators are particularly useful to cellular carriers and/or wireless network operators to protect the device even without the ability to install controls on the deviceitself. Modularity. The system should have a modular, expandable design. Mature analysis systems need to have interfaces allowing for the system to interact with other software systems such as intrusion-detection sys- tems (IDS) or firewall management tools. This requirement is generally driven by entities that have larger researchandanalysisenvironmentsofwhichA5maybecomeacomponent. Additionally,thesystemmust begenerallyabletoadapttounforeseencircumstances,suchasmalwarethatexhibitssomenewbehavioror technologythatwasnotyetimaginedwhenthesystemwasdesigned. Based on these objectives, we made the following design choices for the A5 architecture. To process as much malware as possible, A5 is highly parallel and distributed. The basic steps involved are shown in Figure 1 and detailed in the following sections. A5 consists of a queue, a main controller, and a set of workers which interact with a pool of device instances – these device instances are a combination of hardware resources (e.g., a specific phone model), and Android images running as virtual machines on a traditionalPC. First, malware is moved into the system and two stages of static analysis are performed to determine methodsofinteractingwiththemalware. Oncestaticanalysisiscomplete,anentryiscreatedinajobqueue for subsequent dynamic analysis. Later, one of many worker processes retrieves the job from the shared queue and executes the malware using an available device from the device pool. The dynamic analysis is informed from the static analysis; this combination of static and dynamic analysis allows our system to bettercoercemalwaretoexecutenefariousbehavior. The remainder of this section details the implementation of A5. In particular, each of stage 1 static analysis, stage 2 static analysis, and dynamic analysis are detailed. Then, the concept of device pools consisting of virtual and physical devices is described, followed by a discussion of some Android-specific interactiontechniques. 3.2 MalwareIngestion A5 assumes all input samples are malware. The primary functions of the ingestion process are to create a sharedjobqueueentry,(weusebeanstalkd1),tocalculateseveralpiecesofmeta-data(suchascryptographic hashvalues),andtoinitiatethestaticanalysis. A5’s ingest process is designed to run on each individual sample. This allows for on-demand sub- mission, such as what one may expect of a web service, and as a batched process running periodically, consumingsamplesthatareplacedinaparticularsystempath. ThisallowsallofA5torunperpetuallywith nointeractionfromauser. Manysecuritycompaniesreceivethousandsofsamplesdailyfromsourcessuch asVirusTotal[35]orMWCollect[36]. Theseincomingsamplescaneasilybesorted,forexample,tocollect allAndroidsamplesinonelocationforinputintoA5. 3.3 StaticAnalysis A5 first resorts to static analysis to try to detect potentially malicious actions. In Android, applications are usually written in Java (less than 5% have “native” C components [40]), and are distributed as APK (Androidpackage) files. TheseAPKfiles areinfact Ziparchives, which containcompiledJava classes(in DalvikDEXformat),applicationresources,andanAndroidManifest.xmlbinaryXMLfilecontaining application meta-data. The structure of Android applications and the Android security mechanisms have beenwell-documented[12,27]andmanytoolsexistforcreatingandmanipulatingAPKs[3,5]. Typically,AndroidapplicationsthathaveauserinterfacespecifyatleastoneAndroidActivityandthose thatdonothaveauserinterfacespecifyatleastoneService. Theseareclassesthattypicallycontainthecore functionality of the mobile application, and are the primary method for executing application code. Much of the interaction with an Activity will be through the Graphical User Interface (GUI). However, a Service mayexhibitnoGUIcomponentsatall,requiringdifferentinteractionduringlaterdynamicanalysis. Android Inter-Process Communication (IPC) typically occurs in the form of an Android event known asanIntent. Forinstance,Intentsareusedtotransferinformationbetweenapplicationsandtonotifyappli- cations when a particular system event, such as the receipt of a text message, has occurred. Since Android ServiceshavenoGUI,itispreciselythesetypesofeventsthatinitiateaService. The chief output of static analysis is an enumeration of “interaction points” (e.g. Activities) and a set of“receivableintents”(e.g. BOOT COMPLETED).Anyofthesemaycausetheapplicationtotakeactions thatwouldnotnormallyoccuriftheapplicationwassimplylaunchedusingthegraphicalinterface. Assuch, A5 will use these sets in order to coerce behavior from the malicious application. Many of these meet the needforbettermobile-specificinteraction. 1beanstalkdcanbefoundathttp://kr.github.com/beanstalkd/andisdescribedas“asimple, fastworkqueue” originallydesignedtoreducelatencyonhigh-volumewebsites. 1 <receiver 2 android:name=”.message.SmsReceiver” 3 android:enabled=”true” 4 android:exported=”true” > 5 <intent−filter 6 android:priority=”214783648” > 7 <action 8 android:name=”android . provider .Telephony.SMS RECEIVED” > 9 </ action> 10 </ intent−filter> 11 </ receiver> Figure 2: ReceiverfromANDROID-DOSmalware. A5notestheactionforthisreceiver. Receiptofatextmessage maybetheonlywaythismethodisexecuted. Inotherwords, the.message.SmsReceivercodemayneverbe invokediftheinstanceneverreceivesatextmessage. 3.3.1 Stage1StaticAnalysis: AndroidManifest Much of the stage 1 analysis in A5 revolves around the AndroidManifest.xml file. This file dictates much of how an application may interact with the device. Each application must advertise the desire to receiveparticularIntentsbydeclaringpermissionsintheAndroidManifest. Similarly, throughdocumenta- tion [34] and source code analysis [15], use of certain API functions implies the ability to receive certain Intents. Even though the manifest is stored in binary XML form, tools are readily available for parsing key components such as requested permissions, broadcast receivers, background services, and activities. Each of these components define key interaction points for the application, and are cataloged for later use in dynamicanalysis. For instance, an Android BroadcastReceiver or “receiver” is a way for an application to register the desire to receive an Intent from the system or another application . A receiver from recent Android malwareisshowninFigure2. A5parsesandsavestheactionfromthisportionofthemanifest,inthiscase receiptofanSMSmessage. Duringdynamicanalysis,thereceiptofatextmessagemaybetheonlyaction thatinvokesthe.message.SmsReceivermethod. Instead of creating yet-another-tool to extract pertinent information, we elected to leverage an existing open source tool known as Androguard [1]. If Androguard did not support a particular function that A5 required,weimplementedthefeatureandsubmittedpatchesbacktotheAndroguarddevelopers. 3.3.2 Stage2StaticAnalysis: Bytecode Inadditiontotherelativelyna¨ıvestage1analysisoftheAndroidapplicationmanifest,wealsoanalyzethe Java bytecode of the application binaries. The goal of this stage 2 static analysis is to identify additional interactionpointswhichenableusersorthesystemtointeractwiththeapplication. Whilemanyinteraction points are declared in the application manifest, some may be created dynamically by the application, thus beingmissedbyna¨ıveanalysis. Anexampleofaninteractionpointthatmaybemissedduringstage1isshowninFigure3. Theappli- cation in Figure 3 performs a registerReceiver call registering the desire to be notified when either theuserbeginsinteractingwiththedeviceorthedevicescreenturnsoff. NeitheroftheseIntentsarefound intheAndroidManifest.xml. 1 public static void h(Context paramContext) 2 { 3 IntentFilter localIntentFilter = new IntentFilter () ; 4 localIntentFilter .addAction(”android . intent . action .USER PRESENT”); 5 localIntentFilter .addAction(”android . intent . action .SCREEN OFF”); 6 paramContext. registerReceiver (new UserActivityReceiver () , localIntentFilter ); 7 return ; 8 } Figure 3: Code section reverse engineered from a GoldDream malware sample. Here, the desire to receive USER PRESENT and SCREEN OFF Intents are registered dynamically - these Intents do not appear in the An- droidManifest.xmlandwouldbemissedbyanalysistechniquesthatdonotincorporatebytecodelevelanalysis. 1 Calendar c = Calendar . getInstance () ; 2 String bootc = ”android . intent . action .BOOTCOMPLETED”; 3 int seconds = c. get(Calendar .SECOND); 4 intentFilter bootcif = new intentFilter (bootc); 5 registerReceiver ( bootcif ); Figure 4: Codesectiondemonstratingtheneedtoresolvevariables. InthiscasetheCFGisrecursivelytraversedin ordertofindthevalueofbootcifatthetimeregisterReceiveriscalledinline5. Inthiscase,A5concludesthat theprogramdynamicallyregisteredthedesiretobenotifiedwhenthesystemhasfinishedbooting. The stage 2 static analysis algorithm is fairly intuitive. First, A5 invokes the DED [11] decompiler to create Java classes from the Android application code. Next, A5 uses Soot [30] to obtain an Intermediate Representation(IR)andaControlFlowGraph(CFG).ThisabstractIRisknownasJimple[31]andisuseful becauseiteasestheburdenofdealingwiththemorecomplexJavabytecode. EachnodeintheCFGrepresentsoneJavastatement,andthegraphedgescorrespondtotherelationship betweenthestatementsinthemalware. A5traversestheCFGinordertofindnodesthatrepresentaknown Androidinteractionpoint. Each CFG node is further decomposed into an Abstract Syntax Tree (AST) representing individual components of the statement. Specifically, A5 looks for calls to android.content.Context.reg- isterReceiver() and android.app.Activity.startActivityForResult(). Calls to android.content.Context.registerReceiver(),asshowninline6ofFigure3,resultinthe application becoming eligible to receive Intents witha specified Action (i.e. a particular string). Similarly, callstoandroid.app.Activity.startActivityForResult()resultintheapplicationmaking a call to another application, but with an embedded Intent for the callee to make a callback to the target application. When A5 discovers one of these calls, the CFG is recursively traversed in order to resolve variabledefinitions. ThesedefinitionsmustberesolvedinordertocaptureIntentsthatrepresentinteraction points. Forexample,Figure4showsacalltoregisterReceiveratline5,however,theASTnodeonly containsthecomponentforvariablebootcif. A5recursivelytraversestheCFGtodeterminethevariable definitionatline2. Much like the stage 1 static analysis, the output of the bytecode static-analysis is a set of receivable Intentsforuseduringthedynamicanalysis. 3.4 DynamicAnalysis The ingestion process and static analysis components execute relatively quickly, but the dynamic analysis portion is more time-consuming. Fortunately, it also lends itself to parallel execution. Figure 1 depicts many workers on a single A5 host. Worker processes on each A5 host retrieve jobs from the shared job queue for processing. Once a new job has been reserved from the queue, the Worker inspects a pool of candidate Android instances available to that particular host attempting to reserve a compatible instance. A compatible instance is one in which the malware sample is expected to run. For example, a mobile application that declares a minimum SDK (Android API) level of 8, will not run on a level 4 device. Even if the application were to be modified by A5 to specify level 4 prior to instance selection, the application mayactuallyrelyuponafeaturenotavailableuntillevel8. Assumingacompatibleinstanceisavailable,the Worker continues with dynamic analysis. If no compatible device is available, the job is placed back into thequeuewithadelayinordertoreducethechancesthatWorkersrepeatedlyreservethesamejobwhenno compatibleinstanceisavailable–effectivelyde-prioritizingotherpendingsamples. Figure 5: Worker process flow. Communication with the device instance is performed using the Android Debug Bridge(ADB),andoutputfromthestaticanalysisisutilizedindynamicanalysisinteraction. TheWorkerthenbootstheinstanceandfollowstheprocessdepictedinFigure5. Communicationwith arunninginstanceisperformedwiththeSDKdebuggingtoolknownastheAndroidDebugBridge(ADB)2. Sincethebootmaytakesometime,theworkerinitiatesthebootprocess,then,usingADB,blocksuntilthe deviceisfullybooted. Oncebooted,A5usesADBtoinstallthemalwaresampleintotheinstance. Oncethe sampleisinstalled,theWorkercoercesmaliciousbehaviorfromtheinstance,againusingADB.Afteraset periodoftimetheWorkerterminatestheinstanceandreturnsittoaknownstate. 3.5 InstancePools Whenretrievinganewjob,theWorkermustlocateadeviceinstancecompatiblewiththemalwaresample. Forthisreason,A5maintainspoolsofdevicesoneachhost. Eachinstanceinthepoolmaybeaphysicalor virtual instance, as detailed below. Workers synchronize the use of instances by maintaining instance state inadatastructuresharedamongWorkersoneachhost. Virtual instances have the benefits of being low-cost, easy to automate and generally flexible. On the other hand, virtual devices are not typically used in everyday computing, so malware that can detect the 2http://developer.android.com/tools/help/adb.html virtualenvironmentmayelecttoexhibitalternatebehavior. Inthislight,theuseofphysicaldevicesmaybe warranted. 3.5.1 VirtualInstances Virtual instances in A5 are realized with modified versions of the emulator distributed with the Android SDK. These instances all are are stored and executed on commodity computer hardware. Using virtual instances allows A5 to scale easily by simply creating larger instance pools as needed. Resetting a vir- tual instance to a known state is as simple as starting the instance with a “wipe data” flag or deleting and recreatingtheinstanceimagefromscratch. However the emulator offers a subset of features found on a real device. The lack of features can be coarsely grouped into two classes: features not implemented by the emulation system and software that is notpresentintheemulateddeviceimage. AnexampleoftheformerisBluetooth,whichisnotimplemented in the emulator, but is present on most devices. An example of the latter is the Google Play application, whichispre-installedinnearlyeveryAndroiddevicesold,butisnotpresentinthedefaultemulateddevice image. The lack of features presents a fundamental problem to a dynamic analysis system: if malware makes use of one these missing features, the malware will not run properly. This change in behavior may be ex- plicit(malwareemploys“virtualizationdetection”)orimplicit(themalwarehappenstotrytouseamissing feature). Ineithercase,thedesiredbehaviorfromthesamplewillnotberealized. On traditional computers, virtualization detection was initially not found at all. It was later employed morefrequentlytoevadedynamicanalysissystems: Ifthemalwarewasdetectingitwasrunningonavirtual machine,themalwarewoulddemonstratebenignbehavior. Theincreaseduseofvirtualizationdetectionby malware in turn led to creating dynamic systems employing physical machines [28]. However, as virtual machines and, more generally, virtualization, is increasingly employed on laptops and desktop computers, runninginavirtualmachineisnotagive-awaythattheplatformisattemptingtoanalyzeapieceofmalware. Inotherwords,newformsofmalwaremayparadoxicallyemployvirtualizationdetectionlessfrequently. Intoday’smobilecomputingparadigm,thedevicesphysicallymovewiththeuseranddatabandwidthat anygiventimevariesgreatly. Contrarytotraditionalcomputingplatforms, thereisnotyetaclearuse-case for virtualization in typical end-user environments. Resources such as bandwidth or power are far more constrained and devices are typically not shared among multiple users. However, if systems like A5 are increasinglydeployed–aswebelievetheywillbe–virtualizationdetectioninmobilemalwarewillbecome areality. Manualanalysisofmalwarefamiliesin2012[39]revealedthatthecurrentgenerationofmobilemalware does not yet employ virtualization detection. Even so, Vidas and Christin explored possible methods such detection might be implemented and provided a taxonomy of several methods [33]. Following this work, recentmalwareisstartingtoemploysuchdetections“in-the-wild”. Forinstance,arecentAndroidMalware (Jan 21, 2014), Android.hehe [9], implements two checks: (1) the nonexistance of an IMSI - a unique cellular subscriber numer and (2) the existance of Build. strings that are exactly “sdk” or “google sdk” asshowninFigure6. Similarly,AndroidMalware”Oldboot”(Apr2,2014)identifiesisrunninglocationof themalwareinstance(/sbin/meta chk)andexitsifthepathisnotasexpectedorifthereisnosimcard present. In A5, the emulator software is built from source, and subsequently the resulting emulator is sim- ilar to the emulator distributed with the binary Android SDKs. However, we enhanced the emulator to evade some virtualization detection features. For example, an unmodified emulator will always re- turn the same values for APK calls such as TelephonyManager.getDeviceId() (all zero’s) or Settings.Secure.ANDROID ID(null). Bymodifyingthevirtualinstancessuchthatvaluesindicating a physical rather than a virtual device is in use, A5 becomes less detectable by malware seeking to deter-
Description: