Root ORAM: A Tunable Differentially Private Oblivious RAM Sameer Wagh Paul Cuff Prateek Mittal PrincetonUniversity PrincetonUniversity PrincetonUniversity [email protected] [email protected] [email protected] 6 ABSTRACT 1. INTRODUCTION 1 0 State-of-the-art mechanisms for oblivious RAM (ORAM) Cloudstorageandcomputingareimportanttoolstoout- 2 suffer from significant bandwidth overheads (greater than source data but have given rise to significant privacy con- 100x) that impact the throughput and latency of memory cerns due to the non-local nature of data storage. Though y accesses. Thisrenderstheirdeploymentinhigh-performance encryption goes a long way in assuring data confidentiality, a and bandwidth-constrained applications difficult, motivat- recent work [5,12] has shown that encryption is not suffi- M ingthedesignoflow-overheadapproachesformemoryaccess cient. Encryption does not hide memory access patterns; obfuscation. an untrusted storage server can thus perform traffic analy- 4 In this work, we introduce and formalize the notion of sisofmemoryaccesspatternstocompromiseclientprivacy. 2 a differentially private ORAM that provides statistical pri- The work of Islam et al. has shown the leakage of sensitive vacyguarantees, andwhichtothe extentof our knowledge, keywordinformationbyperformingtrafficanalysisofaccess ] R isthefirstofitskind. Theformalizationofdifferentiallypri- patterns over encrypted email [12]. Similarly, Dautrich et vateORAMopensupalargedesignspaceoflow-bandwidth al. have shown that access patterns over database tuples C ORAM protocols that can be deployed in bandwidth con- can leak ordering information [5]. s. strained applications. Oblivious RAM (ORAM), first introduced by Goldreich c We present Root ORAM, a family of practical ORAMs and Ostrovsky [10,11], is a cryptographic primitive which [ thatprovideatunable,multi-dimensionaltrade-offbetween allows a client to protect its data access pattern from an thedesiredbandwidthoverhead,outsourcing ratio1 andthe untrusted server storing the data. Since its introduction, 3 v system security, and that provide rigorous privacy guaran- substantialprogresshasbeenmadebytheresearchcommu- 8 tees of differentially private ORAMs. The Root ORAM nity in developing novel and efficient ORAM schemes [4,9, 7 protocolscanbetunedtoachieveapplication-specificband- 16,17,20–22]. Recent work has also shown the promise of 3 widthconstraints,enablingpracticaldeployment,atthecost using ORAMs as a critical component in developing proto- 3 of statistical privacy guarantees quantified under the differ- colsforcryptographicprimitivessuchasSecureMulti-Party 0 ential privacy framework and lower outsourcing ratios. Computation [9]. . We demonstrate the practicality of Root ORAM using However,ORAMschemesincuralargeoverheadinterms 1 theoretical analysis, simulations, as well as experiments on of bandwidth that renders them impractical. For exam- 0 AmazonEC2. Ourtheoreticalanalysisrigorouslyquantifies ple, even the most efficient ORAM protocols [17,21,22] in- 6 the privacy offered by Root ORAM, and provably bounds curalogarithmicoverheadcomparedtoconventionalRAMs 1 the information leaked from observing memory access pat- (greater than 100x including constants). This significantly : v terns. Our experimental analysis shows the feasibility of impactsthethroughputandlatencyofmemoryaccesses,and i theseprotocolsusingrealisticsimulations. Thesimplestpro- presents a bottleneck for real-world deployment of ORAMs X tocol in the Root ORAM family requires a bandwidth of inhigh-performanceandbandwidthconstrainedapplications. r mere 10 blocks, at the cost of rigorously quantified security Thelackoflow-bandwidthORAMs,despiteconsiderableef- a lossandalowoutsourcingratio. Thisisanorderofmagni- forts from the security community, is an undeniable indica- tude improvement over the existing state-of-the-art ORAM tor for the need of a fundamentally new approach. schemes which incur logarithmic bandwidth overheads. Hence,weproposeanovelapproachfordevelopingpracti- calORAMprotocolsthatcansupportevenaconstantband- 1Outsourcingratio isdefinedastheamountofdatathatcan width overhead compared to conventional RAMs. Our key be outsourced for a certain amount of local storage. Refer approachistoprovidestatisticalprivacyguaranteesandpro- to Sec. 7 for more details. posetunableprotocoldesigns. Wefirstformalizethenotion of a differentially private ORAM that provides statistical Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed privacy guarantees, and which to the extent of our knowl- forprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcita- edge, is the first of its kind. As the name suggests, we use tiononthefirstpage. Copyrightsforcomponentsofthisworkownedbyothersthan the differential privacy framework developed by Dwork et ACMmustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orre- publish,topostonserversortoredistributetolists,requirespriorspecificpermission al. [6] with its ((cid:15),δ)-differential privacy modification [7]. In and/[email protected]. the current formulation of an ORAM, the output is com- CCS’16October24–28,2016,HofburgPalace,Vienna,Austria putationally indistinguishable for any two input sequences. (cid:13)c 2016ACM.ISBN978-1-4503-2138-9...$15.00 InadifferentiallyprivateORAM,wecharacterizetheeffect DOI:10.1145/1235 of a small change in the ORAM input to the change in the arigorouslyquantifiedsecuritylossaswellasareductionin probability distribution at the output. outsourcing ratio. At the same time, the server-side stor- WealsopresentRootORAM2,afamilyofORAMschemes age efficiency can be as high as 1:2 i.e., one dummy block thatprovidetunableORAMprotocolsallowingvariableband- perrealblockoutsourced(comparedtoPathORAMwhich widthoverheads,systemsecurityandoutsourcingratiosand usesaround1:10). WeimplementRootORAManddemon- including a design point that supports constant bandwidth strateitspracticalityusingsimulationsaswellasrealworld constructionandproviderigorousprivacyguaranteesofdif- experiments on Amazon EC2. These results are covered in ferentially private ORAMs. The low bandwidth protocols, Section 7. achievedatthecostofstatisticalprivacyandloweroutsourc- These contributions are order of magnitude improvement ingratios,areanorderofmagnitudeimprovementoverpre- over state-of-the-art protocols, though we would like to re- vious work in which the protocols still incur a logarithmic mind the reader that these come at the cost of a rigorously bandwidth [9,21,22]. quantified privacy loss and a reduction in outsourcing ra- TheformalizationofadifferentiallyprivateORAMopens tio. Finally, Root ORAM does not assume any server-side upalargeunderlyingdesignspacecurrentlynotconsidered computation and uses low, practical amounts of client-side bythecommunity. Withrigorouslyquantifiedprivacyguar- storage at the same time being extremely simple to imple- antees, we propose Root ORAM as the first step in the di- ment at both the client and the server side. rection of statistically private ORAMs. 1.1 OurContributions 2. PRELIMINARIES:STATISTICALPRIVACY Root ORAM introduces a number of paradigm shifts in thedesignofORAMprotocolswhileatthesametimebuild- ing on the prevailing ideas of contemporary ORAM con- structions3. Our main contributions can be summarized as follows: The notion of a differentially private ORAM: We formalizethenotionofadifferentiallyprivateORAM,which to the extent of our knowledge is the first of its kind. For- mallydiscussedinSection3,adifferentiallyprivateORAM bounds the information leakage from memory access pat- terns of an ORAM protocol. Tunable/parametric protocol family: In bandwidth Figure 1: A representation of an ORAM. The proto- constraint applications, large bandwidth overhead (>100x) col box translates an input access sequence into an of conventional ORAM schemes is a significant bottleneck. output access sequence. We propose the tunable reduction of bandwidth with out- sourcing ratio and a rigorously quantified privacy loss. We proposeanewfamilyofORAMschemescalledRootORAM ThesignificantbandwidthoverheadinconventionalORAM whichcanbetailoredaspertheneedsandconstraintsofthe schemes despite considerable research efforts necessitates a applicationtoachievedesiredsecurityandbandwidth. This paradigm shift in our approach to protocol design. To this serves as a key enabler for practical deployment and is dis- extent, we formulate the concept of statistical privacy in cussed in more detail in Section 4,5. ORAMs. Security: Weanalyzeandprovidetheoreticalguarantees Herewepresentabriefoverviewofthenotionofstatistical forthesecurityofferedbyRootORAMschemesinthenew privacy in the ORAM context. A perfect ORAM roughly5 differentially private ORAM framework. As a consequence leaks no information about the input access sequence. In ofouranalysis,wealsogiveanewproofofthePathORAM other words, we can consider an ORAM to be a black-box protocol security in the new framework. Our approach is with an input sequence as X and an output sequence as Y generalandwillbeusefulforrigorouslyreasoningaboutthe as shown in Fig. 1. An ORAM with perfect privacy would security of alternative statistically private ORAM schemes guaranteetheindependenceofX andY. Aslightlystronger in the future. This is presented in Section 6. conditioncouldbetosaythatthedistributionoftheoutput Performance: Root ORAM introduces a new design sequence is uniform over its space for any given input6. pointwithconstantbandwidthoverhead. Thesimplestpro- The most natural way to extend the latter condition for tocol of the family has bandwidth usage per access as low designingstatisticallyprivateORAMsistoconsiderORAM asaconstantaround10datablocks4 comparedto10·logN schemes that give non-uniform distributions of the output blocksinthecaseofPathORAM.Butthiscomesatthecost sequences Y (for a given input X) and use security metrics thatquantifythe“non-uniformity”ofthisdistribution. This 2The protocol family is called Root ORAM because in the isgraphicallyillustratedinFig.2,whereanattackeraimsto lowestbandwidthregime,thedatastructurereducestojust guess the original access pattern after observing the output therootandtheleaves(treeofdepth1). RefertoSec.5for details. access pattern o. 3This work is inspired by the Path ORAM paper [22] and wewouldliketogivetheauthorsofthepaperallduecredit. 5We say roughly because ORAMs could leak information At the same time, we would like to highlight the fact that such as timing of accesses/access pattern size. thereareconsiderabledifferencesbetweenthetwoprotocols 6This is stronger because conventional definitions of per- (as given in section 4). fect ORAMs involve outputs being computationally indis- 4Achieved for λ = 4 and Z = 2. Refer to Section 5 for tinguishablewhichhidesasmalldetailwhichweshallseein details. Sec. 6. Hence, we introduce and formalize the following statistical notion of an ORAM: differentially private ORAM.7 3.1 FormalizingDP-ORAM TheintuitionbehindadifferentiallyprivateORAMisthat given any two input sequences that differ in a single access, thedistributionsoftheiroutputsequencesshouldbe“close”. In other words, similar access sequences lead to similar dis- tributions. We formally define it as follows: Definition 3. Differentially Private ORAM : Let −→ y, as defined in Eq. 1, denote the input to an ORAM. Let −→ ORAM(y) be the resulting randomized data request sequence of an ORAM algorithm. We say that a ORAM protocol is ((cid:15),δ)-differentially private if for all input access sequences −→ −→ y and y , which differ in at most one access, the following Figure 2: This figure shows the intuition behind sta- 1 2 condition is satisfied by the ORAM protocol, tistically private ORAMs. In particular, differential privacy based statistical ORAM. Pr[ORAM(−→y )∈S]≤e(cid:15)Pr[ORAM(−→y )∈S]+δ (2) 1 2 3. DIFFERENTIALLYPRIVATEORAM where S is any set of output sequences of the ORAM. Thenotionofstatisticalprivacyiscertainlynotnewacross We note that the formalism does not make any assump- security/privacyapplicationsincurrentliterature[1,13];but tion about the size of the output sequences in S. Thus, ithasneverbeenpreviouslyexploredinthecontextofORAMs. if the input to the ORAM is changed by a single access We believe formulating such a framework would greatly ex- pandtheabilityoftheresearchcommunitytodevelopnovel tuple (opi,addri,datai), the output distribution does not change significantly. Fig. 2 graphically represents this intu- ORAM protocols with low-bandwidth overhead, serving as ition. Given two sequences r and r , the two distributions an enabler for real-world deployment of this technology. 1 2 generated (the red and the blue) are close to each other in Overtheyears,anumberofpapershavebeenpublishedin the differential privacy sense. theORAMdomainwhichadoptquiteafewdifferentdefini- It is important to note that the differential privacy guar- tions to quantify ORAM bandwidth overhead. We will use antees when two access patterns differ in multiple elements the original and straightforward definition of bandwidth as directlyfollowsfromthecomposabilitypropertyofdifferen- the average number of blocks transferred for one access [4]. tial privacy. Since this property is extremely important for Definition 1. Thebandwidthcostofastorageschemeis theutilityofthemechanism,wesummarizethisintheform givenbytheaveragenumberofblockstransferredinorderto of a theorem. read or write a single block. Theorem 1 (Composability of DP-ORAM). Given Formally, an ORAM is defined as a mechanism (possi- two access sequences s and s that differ in m ac- −→ 1 2 blyrandomized)whichtakesaninputaccesssequence y as cesses, a ((cid:15),δ)-differentially private ORAM mecha- given below, nism guarantees, −→ y =((opM,addrM,dataM),...,(op1,addr1,data1)) (1) Pr[ORAM(s1)∈S]≤em(cid:15)Pr[ORAM(s2)∈S]+mδ (3) −→ andoutputsaresultingoutputsequencedenotedbyORAM(y). The proof of the theorem directly follows from the compos- Here, M is the length of the access sequence, op denotes i ability property of the differential privacy mechanism [15]. th whetherthei operationisareadorawrite,addr denotes i In other words, Root ORAM guarantees can be extended to the address for that access, and data denotes the data (if i sequences which differ in multiple accesses and hence can −→ op is a write). Denoting by |y| the length of the access i be used to give rigorous guarantees for arbitrary access se- −→ sequence y, the currently accepted security definition for quences. ORAM security can be summarized as follows [22]: Definition 2. (CurrentlyacceptedORAMSecurity) 4. ROOTORAMOVERVIEW −→ : Let y asgiveninEq.1,denoteaninputaccesssequence. In this section, we briefly describe our key design goals −→ Let ORAM(y) be the resulting randomized data request se- andgiveahighleveloverviewoftheRootORAMprotocol. quence of an ORAM algorithm. The ORAM protocol guar- The notation used is given in Table 1. antees that for any −→y and −→y(cid:48), ORAM(−→y) and ORAM(−→y(cid:48)) are computationally indistinguishable if |−→y| = |−→y(cid:48)|, and also 4.1 DesignGoals −→ that for any y the data returned to the client by ORAM is Tunable ORAM scheme: We target a tunable archi- −→ consistentwith y (i.etheORAMbehaveslikeavalidRAM) tecture with explicit low bandwidth regimes which can be with high probability. used to design ORAM protocols for bandwidth constrained applications. ThisframeworkforORAMsisconstructedwithcomplete security at its core [4,9,17,22] and there is no natural way 7Fortherestofthepaper,weshalluseDP-ORAMtomean to extend this to incorporate a statistical privacy notion. a differentially private ORAM Symbol Description we would like to give the authors all due credit. At the N =2L Numberofrealdatablocksoutsourced sametime,inthissubsection,wewouldliketohighlightthe k≥1 Modelparameter(totunebandwidth) critical differences between the two papers. p∈(0,1−1/N] Modelparameter(totunesecurity) DifferentiallyPrivateORAM:RootORAMintroduces Z Numberofblocksineachbucket a new rigorous metric to quantify ORAM security, which B Sizeofeachblock(inbits) extends current formalism to include the notion of a sta- P(x) Pathfromleafxtotheroot tistically private ORAM. We rigorously bound the privacy P(x,i) NodeatleveliinP(x) offered by the Root ORAM (as well as Path ORAM) using x:=position[a] Datablockaiscurrentlymappedtoleaf this metric. xi.e. aresidesinsomebucketinP(x) Storage structure: Root ORAM uses a partial binary tree as the storage structure at the server where the height Table 1: Notation for Root ORAM of the tree is a model parameter k. This is represented in Fig 3. The parameter k governs the bandwidth of the Secure framework: We target protocols that provide protocol. The Path ORAM protocol on the contrary has a rigorousprivacyguaranteesviz. thatofdifferentiallyprivate fixed height binary tree (complete binary tree). ORAMs formalized in Section 3. Tunability: The ability to tune the protocol as per the Low Storage and Computation: The design should systemconstraintsisastarkdifferencebetweenRootORAM use as low storage as possible both on the client as well andPathORAM.There is no way to optimize Path ORAM as the server side. Server side computation is not always when the bandwidth is constrained and statistical security practical and hence we would like to avoid assuming any is acceptable. Root ORAM introduces the novel notion of such capability. non-uniform mapping and gives rigorous statistical privacy 4.2 ApproachOverview guarantees. Path ORAM’s update mapping scheme then turns out to be a special case of this mapping. RootORAMprotocolcanbesplitintothreecomponents, Simply by tuning the parameters, Root ORAM matches the access, the new mapping and the eviction. These are orexceedstheperformanceofPathORAM.Weprovidethe briefly described below. As Path ORAM is an instantia- ability to operate in the low bandwidth regime which Path tion of Root ORAM, the protocols are very similar in their ORAM cannot support. The eviction scheme allows Root structure but have important subtle differences. ORAM to achieve perfect security ((cid:15) = 0) at even lower Since we demonstrate the feasibility of having a tunable bandwidth than the Path ORAM protocol.8 ORAM architecture, we just need to show the existence of Eviction scheme: Theevictionschemesofthetwopro- such a trade-off. In particular, we do not claim this con- tocols have subtle differences. Path ORAM relies on a suf- struction to be the optimal way of achieving trade-offs. We ficiently large bucket size to achieve its goals. In contrast, choose the server-side storage to be stored in a partial bi- RootORAMusesanevictionschemeoffakeaccesses. Root narytreewhereeachnodeisabucketwhichcanholdupto ORAMparameterscanbetunedtoachievePathORAMpro- Z datablocks. Astashattheclientisusedtostoreasmall tocol, but the latter is not the lowest bandwidth full security amount of data. Data elements are mapped to leaves and ((cid:15)=0) protocol in the Root ORAM family. this mapping is stored locally. Multi-dimensionaldesignspace: RootORAMcanbe Access: The main invariant (same as Path ORAM) is tunedaspertheuser’srequirementsintermsofthesecurity that any data block is along the path from the root to the desired,thebandwidthavailableandlocalstoragerequired. leafitismappedorisinthestash. Toaccessadataelement, Thus, Root ORAM offers attractive design points that can the client looks up the local mapping to find the leaf that supportawiderangeofoperatingconditionsjustbytuning the data element is mapped onto. its parameters. New Mapping: Therelevantdatablockisthenreador writtenwiththenewdataandanewmappingisgenerated. 5. ROOTORAMDETAILS It is important to note that this new mapping is not uni- form among the leaves. There is tremendous flexibility in Inthissection,weprovidethedetailsofRootORAM.We choosing this distribution; for our purposes, we choose the beginbydescribingthebasicsoftheprotocol. Therequired distribution to be given by Eq. 4, i.e., the new mapping is notation is tabulated in Table 1. slightly more likely to be be the same as the old mapping 5.1 ServerStorage than any other random leaf. Finally,newrandomizedencryptionsaregeneratedandall Server Storage: Theserverstoresdataintheformofa the data is written back with elements being pushed down partialbinarytreeconsistingofbuckets9 asnodes. Inother further in the tree if possible (towards the leaf) and if new words,givenanintegerk,wefirstconstructabinarytreeof elements can be written back to the tree. depthki.e.,rootatlevel0andthelowestlevelatlevelk−1 Stash Recursion and Eviction: Root ORAM uses the (thiswillhave2k−1 leaves). Theneachofthe2k−1 leavesof recursion technique on the local stash. It also uses fake ac- this tree has 2L−k+1 children each (From here on, we shall cesses to keep a low stash value. The client machine inde- refer to these N =2L nodes as the leaves of the tree). This pendentlysendsfakeaccessqueriestotheserver,completely set-up is illustrated in Fig. 3. indistinguishable from normal requests, through a Poisson 8For (cid:15) = 0, Path ORAM uses a bandwidth of ∼ 10logN process with parameter λ. The eviction process and the data blocks per access whereas Root ORAM can perform recursion together ensure low stash size. the same with around ∼ 8logN for the following choice of parameters: Z =2,λ=1,p=1−2−k. 4.3 ComparisonwithPathORAM[22] 9A bucket contains multiple blocks of data storage which RootORAMisinspiredbythePathORAMprotocoland can be either real or dummy. Figure 3: Root ORAM server storage : The figure illustrates the server side storage. The level 0 to k−1 form a binary tree and the last level of the tree contains N = 2L leaves evenly distributed over the binary tree leaves. Bucketstructure: EachnodeisabucketconsistingofZ blocks,eachblockcaneitherberealordummy(encryptions of 0). Path structure: The leaves are numbered in the set {0,1,...,2L − 1}. P(x) denotes the path (set of buckets along the way) from leaf x to the root and P(x,i) denotes the bucket in P(x) at level i. It is important to emphasize here that the path length in Root ORAM is (k+1) blocks compared to the (logN)+1 in Path ORAM. Dummy blocks and randomized encryption: We usethestandardpaddingtechnique(fillbucketswithdummy blocks when needed) along with randomized encryption to ensure indistinguishability of real and dummy blocks. 5.2 Invariantsofthescheme Main Invariant (same as Path ORAM) : The main invariant in Root ORAM is that each real data block a is Figure 4: The new position of a data block is in gen- mapped to a leaf x := position[a],x ∈ {0,1,2,...2L −1} eralnon-uniformaccordingtothisdistribution. The and at any point in the execution of the ORAM, the real distribution reduces to uniform when p=1−2−L block will be somewhere in a bucket ∈P(x) or in the local Stash.10 Stash: Theclientmaintainsalocalstash,whichisasmall Secondary Invariant : We maintain the secondary in- amountofstoragelocallyattheclientwhichisusedtostore variantthataftereachaccesstoanelement,itsnewmapping overflown data blocks locally. is governed by a constant but non-uniform distribution D. Recursion: The client position map and the stash are Thereistremendousflexibilityinchoosingthisdistribution; both stored recursively. The recursion technique has been for our purposes, we consider the distribution D given by usedinpreviousworks[19,21,22]. Theideaistouseanother the following equation and shown graphically in Fig. 4. ORAMtostorethepositionmapattheserversideinsteadof storing directly at the client size and recurse. The position Pz,x =p2+(p1−p2)δzx (4) map for the final ORAM is stored locally. WhereP istheprobabilitythatanelementaccessedfrom z,x 5.4 Mainidea leaf x is mapped to a leaf z, δ is the Kronecker11 delta, ij p = (1−p) and p = p/(N −1) where p is the model Themainideaoftheprotocolisverysimple,wereaddata 1 2 parameter as defined in Table. 1. alongapath,trytowritedatabacktothesamepath(with some modifications and new encryptions) and if there is in- 5.3 ClientStorage sufficient storage, we retain those overflown data elements Position Map: The client side stores a position map back in the local Stash. which maps real data blocks to leaves of the server tree. Along with this, there is an independent access process of fake accesses12. These accesses are made by the user to 10Itisimportanttonotethattheinvariantdoesnotsaythat theserverandareindistinguishablefromrealaccesses. Fake thepositioneachdatablockisuniformoverthesetofleaves, accessesaredrawnfromaPoissonprocesswithaparameter as shall be clarified by the second invariant. 11Kronecker delta is defined as λ. It is important to note that in Root ORAM, the real (cid:26) δ = 0 if i(cid:54)=j 12Thisisthesimilartotheevictionschemedescribedin[18, ij 1 if i=j 19]withthecrucialdifferencethatourfakeaccessesarecom- pletely indistinguishable from real accesses. accessbytheuserandthefakeaccessesbytheclientmachine update mapping(a) : are exactly the same and hence are indistinguishable from the server’s perspective. 1: x←position[a] 2: if Bernoulli(p) = 0 then 5.5 Detailsoftheprotocol 3: return x An access is defined as a 3-tuple 4: else 5: return UniformRandom(cid:0){0,1,2,..2L−1}\{x}(cid:1) Access =(data ,element ,operation ) i i i i 6: end if For a real access, given a particular access 3-tuple, the user push down(position[a]): When any path is accessed, finds the mapping of the data block needed using his local this function tries to place any data blocks along the path position map. He then requests the whole path of that leaf P(position[a]) or in the Stash to lower positions on the from the server tree. After processing the data and gener- same path if possible. atingnewrandomizedencryptions,theuserwritesthedata write(a) : Once the mapping is updated, say initially backtothetreewiththeelementthatwasaccessedatanew x := position[a] and after updating the mapping z := location along the path. But the key idea here is that the position[a], we try to write the data block back into the element that was accessed has a non-uniform distribution13 bucket which is the lowest intersection of the two paths in of it being mapped to other leaves. It is more likely to be (cid:84) consideration i.e. lowest bucket in P(x) P(z) (with the mappedtothesameleafthantoothersandtheprobabilities conventionthatbucketwiththehighestlevelnumberisthe involved are decided by the security parameter p. root at level 0) which has an empty/dummy block. The broader picture of the protocol is as follows. The fake access(): A fake access is issued to push back ele- client systems makes real as well as fake accesses to the ments from the stash to the tree. One data block, say a(cid:48), is server. Therealaccessisasdescribedinthepreviouspara- chosen at random from the stash15 and a normal access is graph. There is a parameter λ which controls the amount performedona(cid:48),i.e.,read(a(cid:48))followedbypush down(position[a(cid:48)]) of fake accesses. One way of implementing the protocol is followed by an update mapping(a(cid:48)) followed by a write(a(cid:48)). in the following way.14 normal access(a): A normal access consists of the fol- 6. THEORETICALEVALUATION lowingfunctionsinorder: read(a),push down(position[a]), update mapping(a) and finally a write(a). In this section, we shall state our main theorems, their proofs and a few interesting special cases. Access(op,a,data∗) : Theorem 2 (Main theorem). Given a stash size 1: while ORAM is under use do C, the Root ORAM protocol with parameters k,p,Z 2: α← Poisson(λ) andλis((cid:15),δ)-differentiallyprivatefor(cid:15)=2log(cid:16)(N−1)(1−p)(cid:17) 3: for i=1:α do p 4: normal access(a) and δ=(1−p)Mk where Mk =(C+Z(k+1)+1) 5: end for Proof: First, we remark that due to the conservative 6: fake access() security analysis (Refer Sec. 6.1), λ does not appear in the 7: end while aboveexpressions. Thetheoremhastwoparts,the(cid:15)bound read(a): The reading phase is the same as that in Path andtheδbound. Firstly,wegiveabriefinsightintothetwo ORAM;Usingthelocalclientsidemapping,theclientfinds security parameters (cid:15) and δ. The proof is then structured outtheleaftowhichthedataelementaiscurrentlymapped as follows: i.efindxsuchthatx:=position[a]. Wethenrequestallthe The (cid:15) bound: data blocks in the buckets along path P(x). The invariant ensuresthattheclientcanretrievea,itsdataelement,from • Set up the differential privacy framework for ORAM. these. This completes the reading phase of the protocol. • Then we set up the probability evaluation model to find theprobabilityofthataparticularrealsequenceleadsto read(a) : a particular output sequence by the ORAM. 1: x←position[a] • Then we compute the maximum change that one access in the input sequence can have on the probability of the 2: for i∈{0,1,...,k} do output sequence (over all output sequences). 3: S ←S ∪ ReadBucket(P(x,i)) • Finally we complete the (cid:15) bound. 4: end for The δ bound: update mapping(a): After reading a data block, we modifyitsmappingusingthedistributionmentionedinEq.4 • We first show the need for δ in the security. i.eupdate mappingkeepsthemappingsamewithprobabil- • We then conservatively evaluate a bound on δ. ity(1−p)andwiththeremainingprobabilitychangesitto a uniformly random leaf among the remaining leaves. (cid:15), δ interpretation: Given an ORAM scheme with an unboundedamountoflocalstash,weshowthatsuchascheme 13Thedistributionbecomesuniformifp=1−2−L =1−1/N. is (cid:15)-differentially private. But the moment we introduce a 14Itshouldbenotedthatthecodehasbeenstructuredinthe followingwayforclarityofunderstandingandhencecanbe 15The security analysis takes care of the correlation of the optimized in a number of ways. fake access with stash elements by a worst case analysis. Observed seq. a b a c a a b d Symbol Description Real seq. x y x z y y z x M Access pattern size C Stash size Probabilities 1 1 p 1 p p p p p (1−p) N N 1 N 2 1 2 2 1 p p/(N −1) 2 Table 3: An example of writing probabilities given M M =Z(k+1)+C k k the real and observed access patterns r and o. Dif- ferent symbols are used for real and observed access patternsmerelyfortheclarityofthedemonstration. Table 2: Additional notation for security analysis p and p are as defined in Eq. 4 and Table 2. 1 2 finite amount of stash, this is no longer true as is shown Observed seq. a b a c a a b d in Sec. 6.4. And the privacy loss under such a situation is Real seq. x y x z y y z x precisely the quantity that is bounded by δ. In the context of Path ORAM, δ characterizes the pri- Probabilities 1 1 p 1 p p p p vacy loss if the stash size exceeds its bounds. Similarly, in N N 1 N 2 1 2 2 Root ORAM, δ quantifies the privacy loss if the stash size Table4: Only the blue symbols affect the probability is exceeded. that will be written under the data element shown by an enclosing box. The red elements show the The (cid:15) bound previous and next access of the boxed data element. 6.1 Frameworkset-up on whether the observed locations were same or different The additional notation used is specified in Table 2. We respectively. conservatively assume that the adversary knows the real se- quences (in other words we assume λ = ∞). Essentially, • A background check is maintained, if at any time there among the M access of which some are real and some are are more than (k+1)×Z +C data blocks mapped to fake, we conservatively assume that these can be distin- the same location, the probability becomes 0, where the guished. Inpractice,thesecurityofferedbyourapproachis symbols can be found from the notations in Table 1,3. highersincetheuntrustedserverstoragecannotdifferentiate Refer to Subsection 6.4 for details. fake accesses from real accesses in practice. • Finally,wemultiplyallthewrittenprobabilitiestogetthe More formally, let f denote the set of fake accesses and i probability Pr[ORAM(R)=o]. r denotes the real set of accesses made by the ORAM. We i denote by Ri, the complete set of accesses made (ri along This is demonstrated in the Table 3. with f ). Thus we have that: i 6.3 Maximumchange Pr[ORAM(r )=o] Pr[ORAM(R )=o] max 1 ≤ max 1 Next, we find the maximum change in the probabilities |r1r−1r,r22|=1Pr[ORAM(r2)=o] |R1R−1R,R22|=1Pr[ORAM(R2)=o] that can occur as a result of changing one access. (5) First we note that in the probability model, the prob- where ORAM(R ) denotes the ORAM protocol output on se- ability written under each data element depends only its i quence R without any additional fake accesses. But to current observed location and its previous observed loca- i prove the bounds of differential privacy in the theorem, we tion and nothing else (and is governed by the distribution need to bound the following term: D given by Eq. 4). Hence, if one data access is changed, the maximum change that can occur in the probability is Pr[ORAM(r )=o] max 1 ≤e(cid:15) at most in two places viz, the location which was modified |r1r−1r,r22|=1Pr[ORAM(r2)=o] and the next accessed location of that data element. With this,wecanenumerateallthepossiblecasesthatcanoccur Hence, it suffices to bound the latter term in Eq. 5 by e(cid:15). and find the maximum change in probabilities. To do this 6.2 Probabilitymodel efficiently, we develop some more notation. Let the accessed data element be changed from a to b. Next,weevaluatetheratiooftheprobabilitiesbyinvoking Let the previous location of access of data element a be thesecondaryinvariant. Recallthatoursecondaryinvariant l (leaf pa) and the next location be l . Similarly, the is: after each access(real/fake) for an element, the position pa na previous location of access of b is l and the next location map of that element (and none other) changes randomly pb asl . Ifanyofthese4donotexisti.ethesymbolwasnever according to the distribution D given in Eq. 4. nb accessed before or was never accessed afterwards, we define With this invariant, we can compute the probability of a that leaf to be 0 for simplification of the equations16. Let l particular real sequence R leading to a particular observed bethelocationoftheaccessinconsiderationi.ethelocation sequenceo. Forourcomputation,wewritetherealsequence of data access which was changed in r and r . Note that (including fake accesses) below the observed sequence and 1 2 in the DP-ORAM calculations, we have the same observed calculate the probabilities according to the following rules: sequence for both the sequences r and r , the location of 1 2 • Thefirsttimeadatablockisaccessed,itslocationisran- accessl isthesameinboththesequences. Thisisshownin dom. Hence, we write a 1/N below this access. the Fig. 5. • Whenanelementthatwasaccessedbeforeisaccessed,we 16In other words, if data element a was never accessed after write a p or p in the probability calculation depending the location of access change, then l =0. 1 2 na 6.4 Theneedforδ In this subsection, we show the need for δ in quantifying thesecurity. WeusethePathORAMpapertodemonstrate thisshort-coming. WeassumethattheStashsizeisbounded by C and let M denote ZlogN +C +1. For demonstra- L tionpurpose,weconstructaminimalworkingexample. Let: −→ y =((r,1,·),(r,1,·),...,(r,1,·)) and (6) −→y(cid:48) =((r,1,·),(r,2,·),...,(r,M ,·)) (7) L Figure 5: The sequences r1 and r2 differ by one ele- whererdenotesthereadoperationand·denotesdatawhich ment (boxed). The previous accessed location and isnotimportantforthedemonstration. Inwords,oneaccess the next accessed location are as shown, dots are ir- sequence consists of M accesses to the same element and L relevantaccesses(notaorb). Theobservedsequence thesecondaccesssequenceconsistsofM differentaccesses L o is the same for both (DP-ORAM condition). to elements 1,2,...,M . L −→ Now, of all the possible sequences ORAM(y) can produce, Now, the probabilities can differ in at most 3 places v.i.z wecanseethatthesequence1,1,...,1canbeoneofthem.18 l, lna and lnb. Let r1 be the sequence with symbol a and But, its not hard to see that the same sequence 1,1,...,1 r2 be the sequence with symbol b. To make the equations can never occur as ORAM(−→y(cid:48)). The reason for this is simply crisp, we define the following extension to the Kronecker because we cannot ever map more than M elements to L delta function, the same path (else the Path ORAM invariant is broken i.e. stashoverflows)andhencetheM accessestothesame 0 if i(cid:54)=j L location cannot all be different elements. δ = 1 if i=j ij We demonstrate this as an attack on the Path ORAM 1/N−p2 if j =0 protocol. Considerasituationwhereaprogramisusingthe p1−p2 PathORAMprotocoltohideitsaccesspatternandweknow ThismodificationoftheKroneckerdeltaisforthesimplicity that the program has the following traits, of the equations. Specifically, the modification ensures that ifasymbolisaccessedforthefirsttime,thenitsprobability (cid:40) 1,1,1,...,1 if Secret = 1 given by P = p +(p −p )δ evaluates to 1/N, as it Access Pattern= z,x 2 1 2 zx 1,2,3,..,M if Secret = 0 should. Now if Pr[ORAM(R ) = o] > 0 and Pr[ORAM(R ) = 1 2 o]>0i.e.,boththeratiosarewell-defined,wecancalculate If y is the real access pattern, if ever we see a sequence of the ratio of the probabilities as: −→ M or more access made to the same location in ORAM(y), L Pr[ORAM(R1)=o] = Pl,lpa ·Plna,l·Plnb,lpb we can immediately infer that Secret = 1!!19 Pr[ORAM(R2)=o] Plna,lpa ·Pl,lpb ·Plnb,l 6.5 δ bound After observing that 1/N ≥ p2, we can see that this ComingbacktotheRootORAMprotocol,nowwecansee p1 p1 maximum value of the ratio of probabilities occurs when that the probability of an observed sequence can suddenly lna = l = lpa and lpb = lnb (cid:54)= l. In this case, the ratio is jump from 0 to a non-zero value after one data access has given by, been changed. And this is what is captured by the δ in the p ·p ·p (cid:18)p (cid:19)2 ((cid:15),δ)-differential privacy framework for ORAMs. 1 1 1 = 1 LetM denotethenumber(C+Z(k+1)+1). Itiseasy p ·p ·p p k 1 2 2 2 toseethatthereisasuddenjumpintheprobabilityfrom0 Evaluating this in terms of our parameters, p = (1−p) 1 to a non-zero value when the real access is changed at one andp = p andpluggingthisintothedifferentialprivacy 2 N−1 location when we look at any such sequence. In particular equation, we get we choose the following two sequences: Pr[ORAM(r )=o] Pr[ORAM(R )=o] max 1 ≤ max 1 r =(1,2,3,...,M ) |r1r−1r,r22|=1Pr[ORAM(r2)=o] |R1R−1R,R22|=1Pr[ORAM(R1)=o] r12 =(1,2,3,...,Mkk−1,1) (cid:18)p (cid:19)2 ≤ 1 If Pr[ORAM(ri) = o] > 0 for i = 1,2, then we have already p2 shown the (cid:15) bound and hence δ = 0. So it remains to find (cid:18)(N −1)(1−p)(cid:19)2 = thesmallestprobabilities. Thesameproofgoesthroughfor p other probability distributions leading to (cid:15)=2log(cid:16)pmax(cid:17). Itisimportanttonotethattheaboveequationholdsforall 18For that matter so can any sequence a,a,...,a for apnmyina∈ observed access sequences o. And hence, we can see that {1,2,3,...N} (cid:16) (cid:17) Root ORAM guarantees (cid:15)=2log (N−1)(1−p) . This com- 19Thereasonforthisisthatthereisanotherconstraintinthe p system which is that no leaf can have more than M data pletes the (cid:15) bound.17 blocks mapped to it. This is because each path P(Lx) has ZlogN bucketsandalongwiththemaininvariantthateach The δ bound block is stored somewhere along the path from the mapped 17Another important point to note here is that the above leaftotherootorintheStash. Weassumethatthisattack analysisisaworstcaseanalysisandhenceitonlydependson is covered in the failure probability of the ORAM because two probabilities in the distribution D viz., the largest and the probability of this occurring is very very low. Symbol Description p 1−2−i, i={1..L} L From 10 to 21 k Runs from 1 to L Z Z ∈{2,3,4,5} λ λ∈{0.25,0.5,0.75,1,2,∞} Table 5: Simulations limits the maximum δ when one of these terms is 0. WLOG, Pr[ORAM(r ) = o] = 0. Hence δ is the maximum value of 1 Pr[ORAM(r )=o]. Now,onesimpleupperboundonδcanbe 2 found by noting the following: Since the probabilities used to compute for each access are at most p (they are either 1 p ,p or1/N andp isthelargest),wecangetaquickupper 1 2 1 bound on δ as Figure 6: This figure illustrates the maximum stash δ≤pMk =(1−p)Mk (8) usage as a function of N. Different lines correspond 1 to different values of λ. To put this in perspective, Where Mk = (C +Z(k +1)+1). This completes the δ the maximum stash usage for 10 GB of outsourced bound.20 (cid:4) data is roughly 40 MB (using a 4 KB block size). 6.6 Bandwidth We use random access patterns for the simulations and themaximumstashsizeiscalculatedexcludingthetransient Theorem 3. ThebandwidthoftheRootORAMpro- storageforonepath. Unlikecurrentwork,weindependently tocol with parameters k,p,Z and λ is 2×Z(k+1)×(1+ studytheeffectofincreasingthenumberofaccesses(M)on 1/λ) per real access. the max stash size. This is equivalent to giving bounds on thefailureprobabilityofthestash. Nextwebrieflydescribe Proof : The number of blocks in any path of the tree is the aims of our evaluations before showing its results. equaltoZ(k+1)andhencetwicethenumberofblocksare We study the effect of various parameters on the perfor- transferredperreadandwrite. Also,theywaytheparame- mance. In particular, we study the effect of N on the max- ter λ is set (i.e the way the fake accesses are programmed), imum stash size used (which in other words, captures the we perform on an average λ real accesses per fake access. failure probability of the ORAM), the dependence of the (the average of a Poisson process with parameter λ is λ). stashsize(outsourcingratio)onbandwidthandsecurity((cid:15)), Hence, the bandwidth gets an addition factor of (1+1/λ) per real access. (cid:4) the growth of the maximum stash size with the number of accessesandfinallythelatencyofRootORAMprotocolsfor access over remote21 Amazon EC2 servers. 7. SYSTEMSEVALUATION In the previous sections, we have established the design 7.2 Evaluationresults space made possible by formalizing DP-ORAM using the- Max stash usage vs N: Inlightoftherecentpaperby oretical analysis. We also saw Root ORAM as a way to Bindschaedleretal.[2],webaseourexperimentalevaluation achieve some interesting points in the design space. In this by giving due importance to the constants involved. Fig. 6 section,weanalyzethestashsizeusage,inotherwords,the showsthedependenceofthemaximumstashusedonN,the outsourcingratio. We define outsourced ratio as the ratio number of outsourced blocks. Different lines correspond to of the total data outsourced to the amount of local storage different values of λ. As can be seen, the effect of λ goes required. down for relatively large values of N. Weresorttosimulationstodemonstratethestashsizere- To put numbers in perspective, the worst case stash size quired. We simulate Root ORAM for various values of the for standard 4 KB blocks for 10 GB of data outsourced is parameter to understand the impact of design parameters roughly 40 MB. The growth indicates that this outsourced on the stash size. The parameter choices have been tab- ratio will be larger for larger values of N. ulated in Table 5. We begin by giving the details of our Outsourcing ratio vs Bandwidth, (cid:15) : Fig. 7a shows implementations. the outsourcing ratio as a function of the bandwidth and 7.1 Detailsoftheimplementation security ((cid:15)) for 4 different values of Z (Z increases from 2 (lowest plane) to 5 (top plane)). When low bandwidth WeimplementedthecompletefunctionalityofRootORAM protocols are used (small k), the outsourcing ratio is rela- in C++. We plan to make our implementation publicly tively small. Specifically, in such regimes, for small values available as an open source software. We performed all ex- of Z, we have an outsourcing ratio of about 10X, whereas periments on a 1.4 GHz Intel processor. The Amazon EC2 for larger Z values, the ratio is almost 1000X. This enables experimentswereperformedusingaTCPconnectionforre- asmartphoneclienttooutsourceabout1TBofdatatothe liable data downloads. untrustedcloudserverwithlessthana1GBoflocalstorage 20Giventheexponentiallydecreasingsmallnatureofδwedo not improve on this weak bound. 21Locations hidden for author anonymity. (a) (b) Figure 7: Fig. 7a illustrates the outsourcing ratio as a function of k and p. Different surfaces correspond to Z = 2 to Z = 5 from bottom to top respectively. The y-axis is related to the security parameter p by the equation p = 1−2−i. Fig. 7b shows for a fixed value of (cid:15), the trade-off between outsourcing ratio and bandwidth (taking recursion into consideration). required and extremely low bandwidth requirements. With terestingdesignpoints,someofwhichareshowninFig.7b. higher bandwidths, much better outsourcing ratios can be achieved and this can be seen in Fig. 7. 7.4 PerfectSecurityDesignPoints Another interesting feature of Root ORAM is the ex- Root ORAM offers a number of interesting design points tremely high outsourcing ratios achieved by models with even with perfect security. For (cid:15) = 0, Root ORAM en- almost complete tree structures i.e., k ≈ from 13 and 20. ablesabandwidthoutsourcingratiotrade-offascanbeseen Though Root ORAM does not have theoretical bounds on in Fig. 7b. For small values of k, Fig. 7a shows that we thestashusagelikeotherprotocolssuchasPathORAMor can design protocols with low bandwidth at the cost of low Ring ORAM, it can be seen clearly how these aspects tie outsourcing ratio. Similarly, for large values of k, we can together as Path ORAM is a instantiation of Root ORAM achieve extremely high outsourcing ratios at lower band- in high bandwidth, (cid:15)=0 regime, consequently having high widths by using smaller k and utilizing fake accesses. For outsourcing ratio. Similarly, using appropriate values of p, ex: k ∼ 15,Z = 2,λ = 4 =⇒ 75X overhead, a factor we can achieve (cid:15) = 0.001,0.01 etc. for a given bandwidth of 3X improvement over Path ORAM and comparable to with outsourcing ratios as seen in Fig. 7a. overhead of Ring ORAM. Similarly, Root ORAM can out- Real-world implementation: Next, we discuss the perform both Path ORAM and Ring ORAM at the cost of resultsofourimplementationofRootORAMoverAmazon a smaller outsourcing ratio as can be seen in Fig. 7b. EC2servers. Ouraimwastocomputethelatencyoverhead ofamemoryaccessesasafunctionofRootORAMparame- 7.5 Howtochooseparameters? ters. Fig.8adepictslatencyasafunctionofthebandwidth Giventhevariousparameters,wedescribehowRootORAM for Z =5 while Fig. 8b depicts the latency as a function of can be enabled for any particular application. As in any the constrained client bandwidth across different values of other differential privacy application, we set a privacy bud- k. We used the trickle application to constrain the band- get(anupperbound: (cid:15) )forthesystem. Thenweallow width at client machines to desired values. We notice an budget (cid:15)-DP queries till the budget is exhausted i.e., cumulative (cid:15) order of magnitude difference between latencies at low and of the queries by the composition theorem reaches (cid:15) . high values the bandwidth i.e., low and high values of k. budget Then we can choose two of the following three parameters independently: desired security value (cid:15), the available band- 7.3 Recursivestashreduction width and the outsourcing ratio R. The third parameter is Recursively storing the stash in an ORAM improves the determined by the choice of the other two and the optimal performanceexponentially. SupposethatweuseRootORAM choicewouldbedeterminedbytheapplicationrequirements. with a specific set of parameters k,p,Z and λ and achieves 7.6 Summary aworstcaseoutsourcingratioofR. The(cid:15),δvaluesachieved aregivenbyTheorem.2,bandwidthbyTheorem.3andthe We have shown the practicality of Root ORAM through outsourcing ratio by the above simulations. theoreticalanalysis,simulationsandreal-worldexperiments. Wecanseethatusingtroundsofrecursion,theoutsourc- Theoretically, we have analyzed the bandwidth and secu- ingratiogrowsexponentiallytoRt,whereasthesecurityand rityfordifferentparametervalues. Experimentally,wehave the bandwidth increase only linearly viz., ((cid:15),δ) → (t(cid:15),tδ) shown the dependence of the outsourcing ratio and access andbandwidth→t×bandwidth. Thisintroducesfurtherin- latencyonRootORAMparameters;demonstratingthefea-