ebook img

LibFTE: A Toolkit for Constructing Practical, Format-Abiding Encryption Schemes PDF

15 Pages·2014·0.46 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview LibFTE: A Toolkit for Constructing Practical, Format-Abiding Encryption Schemes

LibFTE: A Toolkit for Constructing Practical, Format-Abiding Encryption Schemes DanielLuchaup1 KevinP.Dyer2 SomeshJha1 [email protected] [email protected] [email protected] ThomasRistenpart1 ThomasShrimpton2 [email protected] [email protected] 1DepartmentofComputerSciences,UniversityofWisconsin-Madison 2DepartmentofComputerScience,PortlandStateUniversity Abstract caseofcredit-cardnumbers,thismeanstakinginastring of 16 decimal digits as plaintext and returning a string Encryptionschemeswheretheciphertextmustabidebya of16decimaldigitsasciphertext. Thisisanexampleof specifiedformathavediverseapplications,rangingfrom format-preserving encryption (FPE). NIST is now con- in-placeencryptionindatabasestoper-messageencryp- sidering proposals for standardized FPE schemes, such tionofnetworktrafficforcensorshipcircumvention.De- astheFFXmode-of-operation[7],whichisalreadyused spite this, a unifying framework for deploying such en- in some commercial settings [3]. On a totally different cryption schemes has not been developed. One conse- front, a recent paper [11] builds a format-transforming quenceofthisisthatcurrentschemesaread-hoc;another encryption scheme. It takes in plaintext bit strings (for- is a requirement for expert knowledge that can disuade mattedornot)andreturnsciphertextsformattedtobein- onefromusingencryptionatall. distinguishable, from the point of view of several state- We present a general-purpose library (called libfte) of-the-art network monitoring tools, from real HTTP, that aids engineers in the development and deploy- SMTP, SMB or other network protocol messages. This mentofformat-preservingencryption(FPE)andformat- FTE scheme is now part of the Tor Project’s Browser transformingencryption(FTE)schemes. Itincorporates Bundle,andisbeingintegratedintootheranti-censorship a new algorithmic approach for performing FPE/FTE systems. using the nondeterministic finite-state automata (NFA) representation of a regular expression when specifying It seems clear that FPE and FTE have great poten- formats. This approach was previously considered un- tialforotherapplications,too.Unfortunately,developers workable, and our approach closes this open problem. willfindadauntingcollectionofdesignchoicesanden- Weevaluatelibfteandshowthat,comparedtootheren- gineering challenges when they try to use existing FPE cryptionsolutions,itintroducesnegligiblelatencyover- orFTEschemesinnewapplications,ortoinstantiateen- head, and can decrease diskspace usage by as much tirely new schemes. To begin with, there isn’t a stan- as 62.5% when used for simultaneous encryption and dardwaytospecifytheformatsthatplaintextsorcipher- compression in a PostgreSQL database (both relative to texts must respect. There are no established guidelines, conventional encryption mechanisms). In the censor- andcertainlynoautomatedtools,tohelpdevelopersun- ship circumvention setting we show that, using regular- derstand whether they should be targeting deterministic expression formats lifted from the Snort IDS, libfte can schemes or randomized ones, or how their chosen for- reduceclient/servermemoryrequirementsbyasmuchas matsmightaffectruntimeperformanceandmemoryus- 30%. age. (In the case of FTE, it can be difficult to tell if a giveninputandoutputformatwillresultinaschemethat operates properly.) There are no established APIs, and 1 Introduction noreferenceimplementationsoropen-sourcelibrariesto aiddevelopment. Both in practice and in the academic literature, we see anincreasingnumberofapplicationsdemandingencryp- Making FPE/FTE More Approachable: libfte. In tion schemes whose ciphertexts abide by specific for- this work, we offer a unifying framework for build- matting requirements. A small industry has emerged ing and deploying FPE and FTE schemes. We design around the need for in-place encryption of credit-card and implement an algorithm library, libfte, and include numbers, and other personal and financial data. In the in it developer-assistance tools. A paramount goal of 1 our effort is ease-of-use: our library exposes an inter- Deployment Examples Setting Type Constraint face in which formats for plaintexts and ciphertexts are Databases creditcardnumber 16-digitstring easily specified via Perl-compliant regular expressions datefield YYYYMMDD (regexes),anditrelievestheprogrammeroftheburdens accountbalance 32-bitintegers ofmakinggoodalgorithmandparameterchoices. WebForms emailaddress contains@symbol, Some of what we do is to make existing algorithms endswith{.com,...} year,month,day YYYY,MM,DD (e.g., FFX) significantly easier to use. But some of the URL startswithhttp(s) engineeringanddeploymentchallengesdemandentirely Network HTTPGETrequest “GET/...” newapproachestobothFPEandFTE.Perhapsmostno- Monitors BrowserX “...User-Agent:X...” tably,wesolveanopenproblemregardinghowtobuild SSHtraffic “SSH-...” regular-expression-based schemes using a regex’s non- Table 1: Example deployment settings and constraints deterministic finite automaton (NFA) representation, as forFPE/FTEschemes. opposed to its DFA representation. This is desirable because it can lead to significantly more space-efficient schemes,buttheapproachwaspreviouslythoughttobe unworkable [5,11]. We dispel this thought, and experi- databasefields,aclassicmotivationalsettingforFPE,but mentallyobservetheresultingboostinefficiency. onethathas(tothebestofourknowledge)neverbeenre- Tosummarizethemaincontributionsofthiswork,we: ported upon. We show that performance loss compared to conventional encryption is negligible. We also show • Design and implement a library and toolkit to how to leverage the flexibility of libfte to improve per- makedevelopmentanddeploymenteasy. Thelibfte formance,byusinga(deterministic)FTEschemethatsi- library exposes simple interfaces for performing multaneouslyencryptsandcompressesfields(inaprov- FPE/FTE over regex formats specified by the user. ablysecuremanner). We provide a configuration tool that guides devel- Wethenuselibftetobuildaproof-of-conceptbrowser opers towards good choices for the algorithms that plugin that encrypts form data on websites such as Ya- will instantiate the scheme, and that provides con- hoo! Contacts. This uses a variety of FPE and FTE crete feedback on expected offline and online per- schemes,andallowsonetoabidebyavarietyofformat formanceandmemoryusage. restrictionchecksperformedbythewebsite. • Develop new FTE schemes that take regular- Finally, we show that our NFA-based algorithms in expressionformats,butcanworkdirectlywiththeir libfteenablesignificantmemorysavings,specificallyfor NFArepresentation.Thiswaspreviouslythoughtto thecaseofusingFTEinthenetwork-monitor-avoidance beanunworkableapproach[5], duetoaPSPACE- setting [11]. Using a corpus of 3,458 regular expres- hardness result, but we show how to side-step this sionsfromtheSnortmonitorweshowthatwecanreduce via a new encoding primitive called relaxed rank- memoryconsumptionofthisFTEapplicationby30%. ing. TheresultisFTEschemesthathandlealarger class of regexes, and impose smaller offline/online memoryrequirements. 2 PreviousApproachesandChallenges • Detail a general, theoretical framework that cap- tures existing FPE/FTE schemes as special cases, We review in more detail some of the main results in and surfaces potentially useful new constructions, the areas of format-preserving and format-transforming e.g., deterministic FTE that encrypts and com- encryption,andthendiscusssomeofthechallengespre- presses simultaneously. Due to space constraints, sentedwhenoneattemptstoimplementandusethesein the formalisms appear mostly in the full ver- practice. Asweshallsee,existingtoolsfallshortforthe sion[16]. typesofapplicationswetarget. Table2providesasum- mary. Inaddition,thelibftelibrarywillbemadepubliclyavail- able as free and open-source software1, with APIs for Format-preserving encryption. In many settings the Python,C++andJavaScript. formatofaplaintextanditsencryptionmustbethesame, andthetoolusedtoachievethisisformat-preservingen- Applications. Weexerciselibftebyapplyingittoava- cryption (FPE). Work on FPE predates its name, with riety of application settings. Table1 gives a summary various special cases such as length-preserving encryp- ofthediversityofformatsrequiredacrossthesevarious tion of bit strings for disk-sector encryption (c.f., [14, applications. 15]), ciphers for integral sets [8], and elastic block ci- WefirstshowhowtouselibftetoperformFPEofSQL phers [10] including de novo constructions such as the 1https://libfte.org/ hasty pudding cipher [21]. For an overview of work on 2 Paper Builds Formats Schemes Implementation Comments [7] FPE sliceofΣ∗ deterministic none proposedNISTstandard [5] FPE sliceofchosen deterministic none firstFPEpaper,theoryonly, regularlanguage requiresregex-to-DFAconversion [11] FTE sliceofchosen randomized opensource, inputformatfixedasbitstrings, regularlanguage butdomainspecific controlofoutputformat, requiresregex-to-DFAconversion This FPE/ range-sliceof deterministic/ opensource, controlofinputandoutputformat, Work FTE chosenregularlanguage randomized configurationtoolchain, NFAandDFAranking, nondomainspecific regex-to-DFAconversionnotrequired Table2: Analysisofpriorworks,andacomparisonoffeatures. FPE,seeRogaway[20]. specify the FFX mode of operation, which is a specific FPE was first given a formal cryptographic treatment kindofFPEschemeandisbasedontheBRRSwork[5]. byBellare, Ristenpart, RogawayandSpies(BRRS)[5]. FFXtakesaparameter2 ≤ r ≤ 216, theradix, anden- In their work, BRRS suggested an approach to FPE crypts a plaintext P ∈ L = (cid:83) {0,1,...,r −1}(cid:96) to a (cid:96) called the “rank-encipher-unrank” construction. First, ciphertext in L with |L| = |P|. The length (cid:96) ranges they show how to build a cipher that maps Z to Z , between a minimum value of 2 (or 10, if r ≥ 10) and N N for an arbitrary fixed number N. (Recall that Z = 232−1. Forexample,FFX[10]enciphersstringsofdec- N {0,1,...N − 1}.) Now say that X is a set of strings imal digits to (same length) strings of decimal digits; thatallfitsomespecifiedformat,andonedesiresanen- FFX[8]doeslikewiseforoctalstrings. Inaddition,FFX cryption scheme mapping X to X. A classic algorithm hasanextra“tweak”input,makingitalength-preserving duetoGoldbergandSipser(GS)[12]showsthat,givena tweakablecipher,inthesenseof[17]. Thetweakallows DFAforX,thereexistsanefficientlycomputablefunc- FFXtosupportassociateddata. tion rank : X → Z , where |X| = N and rank(x) We are aware of no public, open-source implementa- N is defined to be its position (its “rank”) in the shortlex tionsofFFX,thoughtheredoexistproprietaryones[3]. orderingofX. Inaddition,rankhasanefficientlycom- Even given such an implementation, the formats sup- putable inverse unrank : Z → X, so that unrank (i) ported by FFX are not as general as we might like. N L is the i-th string in the ordering of L. Then to encrypt For example, the scheme does not support domain Z N a string x ∈ X: (1) rank the input x to yield a num- when N is not expressible as r(cid:96) for the supported bera←rank(x),(2)enciphera,givinganewnumberb, radices r. One can rectify this using cycle walking [8] then(3)unrankbtoyieldtheciphertexty ←unrank(b), buttheburdenisondeveloperstoproperlydoso,hinder- whichisanelementofX. ingusability. Moreover,theuserislefttodeterminehow BRRS focus on FPE for sets X that are a slice of a besttomapmoregeneralformatsintothesetofformats languageL,thatisX =L∩ΣnforsomenandwhereΣ thatFFXsupports. is the alphabet of L. Relatedly, we define a range-slice Format-transformingencryption.Dyer,Coull,Risten- ofalanguageLasX = L∩(Σn∪Σn+1∪···∪Σm), part and Shrimpton (DCRS) [11] introduced the notion forn≤m.Thelatterissuperiorbecauseitoffersgreater offormat-transformingencryption, andgaveapurpose- flexibility,althoughnotexploredbyBRRS.Still,extend- built scheme that mapped bitstring plaintexts to cipher- ing BRRS to an FPE scheme over the entire (regular) texts belonging to a specified regular language. Their languageispossible,byestablishingatotalrankingone FTEschemewasbuilttoforceprotocolmisidentification slice at a time. The main disadvantage of the BRRS in state of the art deep-packet-inspection (DPI) systems scheme is that it requires a DFA to represent the set X. usedforInternetcensorship. For most users, this is an unnatural way to specify lan- The DCRS scheme is randomized, which lets it tar- guages,orslicesthereof. get strong privacy goals for the plaintexts (namely, se- WequicklynotethattheBRRSalgorithmmaybesus- mantic security [13]), and also naturally aligns with us- ceptibletotiming-basedside-channelattacks,sincerank ingstandardencryptionschemesasbuildingblocks. The is not constant time. Timing information may therefore scheme itself is similar in spirit to BRRS: the plaintext leak partial information about plaintexts. We leave to bitstring is encrypted using an authenticated encryption future work exploration of this potential security issue, scheme,theresultingintermediateciphertextinterpreted whichextendstolibfteandothernon-constant-timemes- as a number, and this number is then unranked into the sageencodingsaswell. targetlanguage.LikeBRRS,thisschemeworksonslices The FFX scheme. Bellare, Rogaway, and Spies [7] ofagivenregularlanguage. 3 DCRS observe that regular expressions provide a FPE/FTE schemes. Loosely, the API takes a configura- friendlier programming interface for specifying inputs. tion, describing what algorithms to use, and some key But to use the GS scheme for ranking/unranking, they inputsforthosealgorithms,whiletheassistanthelpsde- mustfirstconvertthegivenregularexpressiontoanNFA velopers determine good configurations. Let us start by andthenfromanNFAtoaDFA.Thelaststepoftenleads talkingabouttheassistant. toalargeblowupinthenumberofstates,sometimesren- Configuration assistant. A format is a tuple F = deringtheprocesscompletelyintractable. (Examplesof (R,α,β), where R is a regular expression, and α ≤ β suchregexs,andtheassociatedNFAandDFAsizes,are are numbers. A format defines a set of strings L(F) = giveninTable6inSection6.) Evenwhentheprocessis (cid:12) {s ∈ L(R)(cid:12)α ≤ |s| ≤ β },whereL(R)istheset tractable,theprecomputedtablesthatDCRSandBRRS of strings matched by R. Following traditional naming use to implement ranking require space that scales lin- conventions, we call L(F) the language of the format. earlyinthenumberofstatesintheDFA.Manyofthefor- Because of its wide-spread use, in libfte the input R is matsusedbyDCRSrequireseveralmegabytesofmem- specifiedinPerl-CompatibleRegularExpressionsyntax. ory; in one case, 383MB. This is prohibitive for many However,wenotethatPCREsyntaxallowsexpressions applications, especiallyifonewantstokeepseveralpo- that have no equivalent, formal regular expression. For tentialformatsinmemory. instance,PCREexpressionsusing\1,\2,...(where\1is Thus,inmanyinstancesitwouldbepreferabletouse a back-reference to the first capture group; see [1]) are the NFA representation of the given regex, but BRRS not even context free, let alone regular. Thus, libfte ac- showed that ranking given just the NFA representation cepts expressions built from a subset of the full PCRE ofaregularlanguageisPSPACE-hard.BuildinganyFPE syntax. orFTEschemethatworksdirectlyfromanNFAhasre- Ourconfigurationassistanttakesasinputtwoformats, mainedanopenproblem. onedescribingtheformatofplaintextstrings(F ), and P We also note that developers might hope for a gen- one describing the desired format of ciphertext strings eralpurposeFTEscheme,thattakesarbitraryregularex- (F ). Italsoacceptssome“preferenceparameters”,for C pressions for the input and output formats, and that can example specifying the maximum memory footprint of bebuiltfromexistingdeterministiccryptographicprim- anyschemeconsideredbytheassistant,buttheseareset itives (e.g., wideblock tweakable blockciphers) or ran- tosomereasonabledefaultvalueifnotspecified. Itthen domized ones (e.g., authenticated encryption schemes). runs a battery of tests, in an effort to determine which But actually instantiating such a scheme presents an ar- configurationswillresultinFPE/FTEschemethatabide rayofalgorithmicandengineeringchoices;inthecurrent by the user’s inputs. Concretely, the assistant outputs a stateofaffairs,expertknowledgeisrequired. table listing various possible configurations (some con- Summary. While a number of approaches to FPE and figurations may not be possible, given the user’s input), FTEexist,thereisagapbetweentheoryanddeveloper- along with information pertaining to expected perfor- friendly tools. Implementations are non-existent, and manceandmemoryusage. Giventheuser’spreferences, even expert developers encounter challenges when im- the table lists the best option first. In the case that no plementing schemes from the literature, including: un- availableconfigurationispossible,theassistantprovides derstandingandmanagingmemoryrequirements,devel- informationastowhy,sothattheusercanaltertheirin- oping a “good” construction, or engineering the plain- putsandtryagain. text/ciphertext format pair. Finally, there exist funda- TheencryptionAPI. Thealgorithmlibraryexposesan mentalperformanceroadblockswhenusingsomeclasses encryptionAPIthattakesasinputanencryptionconfig- of regular expressions. This is compounded by the fact uration, which consist of a plaintext format, a cipher- that, a priori, it isn’t obvious when a given regex will text format, and a configuration identifier. The latter is raisetheseroadblocks. a string that specifies the desired methods for perform- ingranking,unranking,encryptionanddecryption. The library performs all necessary precomputations (initial- 3 Overviewoflibfte izerankers,buildlook-uptables,etc.) inaninitialization function andreturnsahandletoanobjectthatcanper- To aid adoption and usage of FPE and FTE, we de- formencryptionanddecryption, accordingtothespeci- veloped a set of tools known collectively as libfte. At fiedconfiguration. Currently,tenconfigurationsaresup- a high level, libfte has two primary components (see portedbylibfte(seeSection6fordescriptions). Figure3): astandalonetoolcalledtheconfigurationas- sistant, and a library of algorithms (implemented in a Roadmap. In Sections 4 and 5 we describe in detail mixture of Python and C/C++) that exposes an API for thealgorithmsthatresultintheseconfigurations. InSec- encryption and decryption via a number of underlying tion 4 we detail a new type of ranking algorithm, what 4 encryption is prefer (memory input/output format d(reatnedrmominiizsetdic )| … utilpizeartfioornm |a rnucneti)me M foinrpmuatt foourtmpuatt … “valid?” a b b rank encrypt Y unrank … … N “try again” C valid valid invalid valid config. config. config. config. Figure3: Left: Thelibfte configurationassistant(builtagainstthelibrary)helpsuserscreateformatsthatmeettheir specific performance requirements. The assistant takes an input/output format pair and uses a decision-tree process to determine if the formats are valid. If the formats are deemed valid, performance statistics are reported for the instantiatedscheme(s). Right: ThelibraryimplementsAPIsforFPE/FTEschemes. Shownisadiagramofthebasic flowofourFPE/FTEschemes. Asinputittakesaninput/outputformatandmessageM andreturnsaciphertextC. wecallrelaxedranking,thatallowsustoworkmoredi- lenges. rectlywithregularexpressions(inparticular,theirequiv- alent NFAs), and sidestep the PSPACE-hardness obsta- 4.1 RelaxedRanking cle. In Section 5, we lay out methods of combining re- laxed ranking with standard cryptographic primitives to We introduce a framework for building FPE and FTE build both deterministic and randomized FPE and FTE schemes directly from NFAs. The resulting algorithms schemes. Fordeterministicschemes,weleverageatech- willoftenusesignificantlylessmemorythantheDFAap- niquecalledcyclewalking,andforrandomizedschemes, proach,thusenablinggeneral-purposeregex-basedrank- weemployrejectionsampling. ing in memory-constrained applications. For instance, Then in Section 6 we describe specific instantiations theNFAfortheregex(a|b)∗a(a|b){20}has48states. oftheseschemes,andexplainhowtheconfigurationas- A key insight is that we can circumvent the negative sistant works in more detail. Finally, in Section 7 we result about NFA ranking if we shift to a relaxed rank- showhowtheseschemescanbeputtoworkinthreedif- ing approach, which we formally define in a moment. ferentusecases: databaseencryption,webformencryp- This will require, in turn, constructing FPE and FTE tion,andnetworkmonitorcircumvention. schemes given only relaxed ranking which we address inSection5. 4 Fast,RelaxedRanking 4.1.1 RelaxedRankingSchemes The rank-encipher-unrank method for constructing Informally,arelaxedrankingofalanguageLrelaxesthe FPE/FTE schemes needs efficient techniques for map- requirementforabijectionfromLtoZ . |L| ping strings in a regular language L to positive integers Formally, arelaxedrankingschemeforLisapairof aswellascomputingtheinverseoperation(mappingpos- functionsRank andUnrank ,suchthat: L L itive integers back to strings in the language). Exist- 1. Rank : L → Z is injective, i ≥ |L| (Note that L i ing techniques are often impractical for two main rea- wecapitalize‘Rank’todistinguishrelaxedranking sons. First, the traditional DFA-based ranking requires fromranking.) the construction of a DFA corresponding to a regular 2. Unrank :Z →Lissurjective;and expression. DFAs for some regular expressions can be L i verylarge.Forinstance,theminimumDFAfortheregex 3. ForallX ∈L,UnrankL(RankL(X))=X. (a|b)∗a(a|b){20} has 1+221 states. Second, the num- The last condition means that we can correctly invert bersinvolvedinrankingcanbeverylarge(forlanguages points in the image of L, denoted Img(L) ⊆ Z . Note withmanystrings)andoperationsontheseintegerscan i thatarankingisarelaxedrankingwithi=|L|. therefore be computationally expensive. As an extreme example, ranking a 10,000-byte long element accepted DFA-based ranking revisited. As a thought experi- bytheregex.∗requiresnumbersofupto(28)10000 bits, ment,onecanviewthetraditionalGSDFA-basedrank- or 10,000 bytes. This section tackles these two chal- ingforregularlanguagesasfollows:letIbethesetofall 5 acceptingpathsinaDFA.First,onemapsastringX ∈L Pathπcanalsobeexpressedasasequenceoftransitions toitsacceptingpathπ ∈ I. Then,onemapsπ toan τ τ ···τ ,wheren = |π|isthelengthofπ. Thesuffix X X 1 2 n integerviaan(exact)ranking. Thecompositionofthese π1ofthepathπisτ ···τ ,andwehaveπ =τ π1. The 2 n 1 twofunctionsyieldsarankingfunctionforallstringsin sequenceofcharactersinthepathisπ| =a a ...a . Σ j1 j2 jn L. In the DFA ranking algorithms of [5,12], these two The intermediate set I. An accepting path is one that stepsaremerged. ends in an accepting state. Let Acc (q) be the set M Atwo-stageframework. Wecanusethistwo-steppro- of accepting paths starting from state q. We let I = ceduretobuildefficientrelaxedrankingalgorithms.Sup- Acc (q ). M 0 posewedesiretobuildarelaxedrankingfunctionRank L The functions map and unmap. We must map from L fromagivensetLintoZ . Wefirstidentifythreecom- i toI andback. Thelatterissimpler: defineunmap(π)to ponents: bethewordπ| .Thisisfasttocompute,intimelinearin Σ 1. an intermediate set I for which we can efficiently |w|. Theforwarddirectionmap(w)requiresadetermin- performranking,i.e.,thereisanefficientalgorithm istic choice for an accepting path for w. This is called forrank :I →Z wherei=|I|; I i parsing. Any suitable parsing algorithm will work, but 2. aninjectivefunctionmap:L→I;and we note that the most obvious algorithms may be quite 3. a surjective function unmap: I → L such that for inefficient. Forexample, simplyrecordingallaccepting allX ∈Litholdsthatunmap(map(X))=X. pathswhilerunningtheNFArunsintimeexponentialin Wethendefine |w|intheworstcase. Rank (X) = rank (map(X)) Linear-time parsing. We now give the (to the best of L I our knowledge) first algorithm for determining a com- Unrank (Y) = unmap(unrank (Y)) L I pactrepresentationofallofanNFA’sacceptingpathsfor Shouldunmapadditionallybeinjective,thenRank isa astringw. Thenmap(w)simplyrunsthisalgorithmfor L bijection,andwehave(strict)ranking. wandoutputsthelexicographicallyleastacceptingpath. Atfirstglance, thisframeworkmayseemtonothave Ouralgorithmconstructsanimplicitrepresentationofa accomplishedmuchaswerelyonastrictrankingtoreal- directed-acyclicgraph(DAG)representingallaccepting izeit. ButwewillensurethatthelanguageI allowsfor paths for w. The lexicographically least accepting path strict ranking, and so the framework allows us to trans- for w can then be found using a simple traversal of the formtheproblemofrankingfromadifficultset(L)toan DAG.Nextwedescribethealgorithmindetail. easierone(I). Let M = (Q,Σ,δ,q ,F) be an NFA, Q(cid:48) ⊆ Q, and 0 c∈Σ. Wedenotebyδ(Q(cid:48),c)thesetofstatesqsuchthat 4.1.2 RelaxedRankingUsingNFAs (q(cid:48),c,q)∈δforsomeq(cid:48) ∈Q(cid:48),andbyδ−1(Q(cid:48),c)theset ofstatesqsuchthat(q,c,q(cid:48))∈δforsomeq(cid:48) ∈Q(cid:48). We construct relaxed ranking for NFAs using the ap- Consider a string w = c c ...c . Traditional NFA proach above. We use as intermediate set I the set of 1 2 n matchingstartswithafrontierofstatesF = {q },and allacceptingpathsintheNFA.Tomapintothisset,for 0 0 ateverypositionk inw itcomputesF = δ(F ,c ). each string in L we deterministically pick an accepting k k−1 k The string is accepted if F ∩ F (cid:54)= ∅. However, this path (a process called parsing). To rank on I we de- n doesnotalloweasyrecoveryofanacceptingpath,even fineapathordering,andgeneralizetheGoldberg-Sipser ifallF setsaresaved. Themainreasonforthisisthat rankingalgorithmforDFAstocountpathsbasedonthis k there might be states in the frontiers that do not lead to ordering. an accepting state. To work around this, we also scan RecallthatanNFAisa5-tupleM = (Q,Σ,δ,q ,F), 0 the input backwards, maintaining a backwards frontier where Q is a finite set of states, Σ is the alphabet, δ ⊆ Q × Σ × Q is the transition relation2, q ∈ Q is the set of states where Bn = F, and Bk−1 = δ−1(Bk,ck). 0 Giventhesequences{F }and{B },withk = 0,...,n, start state, and F ⊆ Q is the set of final (or accepting) k k states. If(q,a,q(cid:48)) ∈ δ thenMmaytransitionfromstate we compute {Sk} where Sk = Fk ∩ Bk. The set Sk qtostateq(cid:48) whenthecurrentinputsymbolisa. Wealso contains all states reachable from the start state follow- writeatransitionτ = (q,a,q(cid:48)) ∈ δ asq →a q(cid:48),whereq ing transitions on c1...ck such that ck+1ck+2...cn is an isthesourceandq(cid:48)isthedestinationofτ. acceptingpath. Together,{Sk}andtheNFAtransitions oftheform(q,c ,q(cid:48))withq ∈S ∧q(cid:48) ∈S ,forman ApathπinMisasequenceoftransitions k k−1 k implicit Direct Acyclic Graph (DAG) representation of q a→j1 q a→j2 q ···q a→jn q · allacceptingpathsforw. Finally,wetraversethisDAG i0 i1 i2 in−1 in starting from q ∈ S and following the lexicographi- 0 0 2Weassumethatthereareno(cid:15)-transitions,butthisiswithoutloss callysmallesttransitions,whichyieldsmap(w). ofgeneralityastherearestandardmethodstoefficientlyremovethem fromanNFA. NFA path ranking. All that remains is to give a strict 6 ranking algorithm for I, the set of accepting paths in c(cid:48)(cid:48)). InaDFAc(cid:48) = c(cid:48)(cid:48) =⇒ q(cid:48) = q(cid:48)(cid:48). Butequation (1) theNFA.Here,wecanadapttechniquesfromtheDFA- doesnothavetousestandardlexicographicalordering. based ranking by Goldberg and Sipser. Their algorithm Our idea is to give priority to states over characters. canbeviewedasarecursiveprocedureforcountingthe We assume a state and character order given by an ar- numberofacceptingDFApathsthatprecedeagivenpath bitrary but fixed enumeration of Q and Σ, and use the inlexicographicalorder. followingorderfortransitionsoriginatingfromthesame Let T(q,n) be the number of paths of length n in stateq:(q,c(cid:48),q(cid:48))(cid:108)(q,c(cid:48)(cid:48),q(cid:48)(cid:48))if-and-only-if(q(cid:48) <q(cid:48)(cid:48))or Acc (q). Note that, for all q ∈ Q and 0 ≤ i ≤ n, q(cid:48) = q(cid:48)(cid:48) andc(cid:48) < c(cid:48)(cid:48). Thisspecificorderallowsforpre- M thevalueofT(q,i)canbecomputedinpolynomialtime computationinequation(2). Inequation(2)wecanre- usingasimpledynamic-programmingalgorithm. placeallthetermsT(q(cid:48),n−1)whichcorrespondtotran- Assume that the NFA transitions are enumerated ac- sitions (q,c(cid:48),q(cid:48))(cid:108)τ with n[q,q(cid:48)]×T(q(cid:48),n−1), where cording to a total ordering, and that τ (cid:108)τ(cid:48) means that theprecomputedvaluen[q,q(cid:48)]representsthenumberof τ precedes τ(cid:48) according to this order. The ordering transitions from q to q(cid:48). Similarly, all the terms corre- ontransitionsinducesalexicographicalordering’≺’on sponding to edges τ(cid:48) = (q,c(cid:48),q(cid:48)(cid:48)), where τ(cid:48) (cid:108) τ = paths (which are sequences of transitions). Formally, if (q,c(cid:48)(cid:48),q(cid:48)(cid:48)), can be replaced by r[q,c(cid:48)(cid:48),q(cid:48)(cid:48)] × T(q(cid:48)(cid:48),n π =τ π1andπ =τ π1,thenthisorderis: −1),wherer[q,c(cid:48)(cid:48),q(cid:48)(cid:48)]isthenumberofsuchtransitions. 1 1 1 2 2 2 π ≺π ⇐⇒ τ (cid:108)τ ∨(cid:0)τ =τ ∧π1 ≺π1(cid:1) (1) These optimizations have benefit, because the numbers 1 2 1 2 1 2 1 2 T(q,n)canbeverylargemultipleprecisionintegers. Let rank(π) be the number of accepting paths π(cid:48) ≺ π that precede π in the lexicographical order on paths. It 5 BuildingFTESchemes followsthat,rank((cid:15)) = 0(therankoftheemptystring is0),andforanyπ =τπ1 ∈AccM(q),wehave: Now we turn to building FTE schemes, treating FPE in passing as a special case of FTE. We specifically give (cid:88) rank(π) = rank(π1)+ T(q(cid:48),n−1) (2) two methods for composing relaxed-ranking algorithms (q,c(cid:48),q(cid:48))(cid:108)τ with an underlying cryptographic primitive to make an Notethatthesumisovertransitionsτ(cid:48) = (q,c(cid:48),q(cid:48)) ∈ δ FTE scheme. For deterministic FTE, the cryptographic that precede τ in transition order, τ(cid:48) (cid:108)τ. In words, we component is a tweakable cipher (e.g. FFX), and we call the composition cycle-walking FTE. For random- are summing over all outgoing edges from q that lead ized FTE, the cryptographic component is an authenti- topathsthatarelexicographicallysmallerthanthepaths cated encryption scheme, and we call the composition thatfollowthetransitionτ.Unrollingtherecursiongives method rejection-sampling FTE. (Impatient readers can us an iterative procedure for ranking accepting paths of look ahead to Figure 4 for the pseudocode.) We delay lengthnthatcanbeefficientlyimplementedviadynamic specificinstantiationsoftheschemesuntilSection6.1. programming. To conclude, the relaxed ranking for a string w ac- Informal FTE scheme syntax. We provide a formal ceptedbyanNFAisRank(w)=rank(map(w)),andthe treatmentofFTEschemesyntaxinthefullversion. We reverseisUnrank(r)=unmap(unrank(r)). provide a simpler, more informal discussion of it here; this will suffice for what follows. An FTE scheme is a pair of algorithms (Enc,Dec). The FTE encryption al- 4.2 LargeIntegerDFA/NFAOptimization gorithmEnctakesasinputs We present a simple but effective optimization that • akeyK speeds up both NFA and DFA-based ranking. In prac- • apairofformats(F ,F )thatdescribethelanguage P C tice,rankingefficiencydependsonhowfastweevaluate L(F )ofplaintextinputs, andthelanguageL(F ) P C thesuminequation(2),andthisdependsontheprecise ofciphertextoutputs definitionofthetransitionorder. Wedefinethisorderso • aplaintextstringM ∈L(F ) thatwecanreplacemultiplelarge-integeradditionswith P asinglemultiplication. Ourexperimentsconfirmedthat • associateddata,andencryptionparameters(bothop- thisreplacementindeedresultedinfastercode. tional) Observethatequations(1)and(2)usedforpathrank- andoutputsaciphertextstringC ∈ L(F ),oraspecial C ingdependonlyontransition(edge)orderandstructure “failure” symbol ⊥. Associated data is data that must oftheautomaton. ThisobservationisvalidforbothNFA beboundtotheunderlyingplaintext,butwhoseprivacy andDFA.Previous,traditional,DFArankingisgivenby is not required. (For example, metadata meant to pro- these equations and standard lexicographical ordering, videcontextfortheuseorprovenanceoftheplaintext.) usingcharacterorder: (q,c(cid:48),q(cid:48))(cid:108)(q,c(cid:48)(cid:48),q(cid:48)(cid:48))⇐⇒(c(cid:48) < Weallowforencryptionparameterstohelpenforcespe- 7 EncT(M): DecT(C): EncT(M): DecT(C): K K K K c ←n2s(r,Rank (M)) p ←n2s(r,Rank (C)) a←Rank (M) b←Rank (C) 0 X 0 Y X Y i←0 i←0 M(cid:48) ←n2s(t−τ,a) C(cid:48) ←n2s(t,b) Do Do i←0 IfC(cid:48) =⊥Ret⊥ ifi>imaxthenRet⊥ i←i+1 Do M(cid:48)←$DKT(C(cid:48)) i←i+1 p ←DT(p ) ifi>i thenRet⊥ IfM(cid:48) =⊥Ret⊥ i K i−1 max c ←ET(c ) u←s2n(r,p ) i←i+1 a←s2n(t−τ,M(cid:48)) i K i−1 i v←s2n(r,ci) Untilu∈Img(X) C(cid:48)←$EKT(M(cid:48)) RetUnrankX(a) Untilv∈Img(X)∪Img(Y) RetUnrank (u) b←s2n(t,C(cid:48)) X Ifv∈Img(Y)then Untilb∈Img(Y) RetUnrank (v) RetUnrank (b) Y Y Ret⊥ Figure4: Left: Cycle-walkingdeterministicFTE.n2s(r,a)returnsthestringrepresentingnumberainradixr, and s2n(r,b)returnsthenumberwhoseradixrrepresentationisb. Theparameteri determinesthemaximumnumber max ofiterations. Right: Rejection-samplingrandomizedFTE. cific failure criteria, which will become clear when we ate sets I(X) for X and I(Y) for Y. If Rank and X describe our schemes. We write EncT,P(M) for FTE Rank arethecorrespondingrelaxed-rankingfunctions, K Y encryption of message M, under key K, using associ- let Img(X) be the image of X under Rank , and like- X ated data T and parameters P. To ease the burden of wise Img(Y) be the image of Y under Rank . Define Y notation(slightly),wetypicallydonotexplicitlylistthe N = |I(X)| and N = |I(Y)|. (Recall that if we X Y parameters as inputs. The encryption algorithm may be are using NFA-based ranking over either X or Y, these randomized, meaning that fresh randomness is used for values can be significantly larger than |X| or |Y|.) We eachencryption. assumethatbothN ,N arefinite. X Y The FTE decryption algorithm Dec takes as input Say one has a tweakable cipher3 E that natively (F ,F ),K,aciphertextC,andtheassociateddataT supports strings over a variety of radices, e.g. FFX. P C (ifany),andreturnsaplaintextM or⊥. Thedecryption (At a minimum, there are many constructions of se- algorithmisalwaysdeterministic. cure tweakable ciphers that support radix 2, e.g. [9, Unlike conventional encryption schemes, we do not 14, 15].) Now, fix integers r ≥ 2 and t ≥ demandthatEncTK,P(M)alwaysyieldavalidciphertext, (cid:100)max{logr(NX),logr(NY)}(cid:101),sothatastringoftsym- oralwaysyield⊥,whenT,P andK arefixed. Instead, bols from {0,1,...,r − 1} suffices to represent the weallowencryptionto“fail”,withsomesmallprobabil- relaxed-rankings of X and Y. Then if E can encipher ity, to produce a ciphertext for a any given plaintext in the set of strings {0,1,...,r − 1}t, we can encrypt a its domain. Doing so will permit us to give simple and plaintextM ∈X asshownontheleftsideofFigure4. naturalFTEschemesthatwouldberuledoutotherwise. Cyclewalking. Awell-knownfactaboutpermutations Ingeneral,theformatscanchangeduringthelifetime is that they can be decomposed into a collection of dis- of the key, even on a per-plaintext basis. (Of course, jointcycles:startingfromanyelementainthedomainof changes must be synchronized between parties.) When thepermutationπ,repeatedapplicationofπwillresultin we talk about an FTE scheme being over some given asequenceofdistinctvaluesthateventuallyendswitha. formats, or their languages, we implicitly have in mind BlackandRogaway[8]werethefirsttoexploitthisfact some notion of a format-session, during which the for- tobuildcipherswithnon-standarddomains,andweuse matsdonotchange. it,too. ForanyfixedK andT,themappinginducedby ET isapermutation. Thus,insidetheDo-loop,thedis- K tinctstringsc ,c ← ET(c ),c ← ET(ET(c )),and 5.1 Cycle-walking(deterministic)FTE 0 1 K 0 2 K K 0 soonformasequencethateventuallymustreturntoc . 0 To build deterministic FTE schemes we take inspira- Intuitively,ifwewantaciphertextthatbelongstoapar- tionfromBRRSrank-encrypt-unrank. However,accom- ticular subset S ⊆ {0,1,...,r −1}t, we can walk the modating format transformations and, especially, NFA- cycleuntilwehitastringci ∈S. based language representations introduces new chal- Thereare,however,twoimportantdetailstoconsider. lenges. The first is that encryption is not guaranteed to hit any To begin, let X = L(F ) and Y = L(F ). As- P C 3IftheFTEschemedoesnotneedtosupportassociateddata,then sume that we perform relaxed ranking using the two- theunderlyingcipherneednotbetweakable,andreferencestoTinthe stage framework in Section4.1.1, with the intermedi- pseudocodecanbedropped. 8 stringc ∈ S. Forexample,ifthesubsetissmall,orthe thishappens,theresultingincycle-walkingschememay i cycleisveryshort. Soencryptionmustbeequippedwith beprohibitivelyinefficientinsomeapplications. test that tells it when this has happened, and ⊥ should Simplifications. We note that the cycle-walking tech- bereturned. Thesecondisthattheremustbeatestthat niqueisusedin[5],aswell,buttheyrestricttothemuch uniquelyidentifiesthestartingstringc . Thisisbecause 0 simpler case that X = Y. More generally when we decryption should work by waking the cycle in reverse. know that Img(X) ⊆ Img(Y), we can simplify our Absent a test that uniquely identifies c , it may not be 0 construction. One may still need to cycle-walk in this clearwhenthereversecycle-walkshouldstop. case if rt > |Y|. For example, say one desires to use Our implementation deals with both of these issues. r = 2 (binary strings) but the larger of |X|,|Y| is not In particular, c is the t-symbol string that results from 0 a power of two. But when Img(X) ⊆ Img(Y) we relaxed-ranking our FTE plaintext input M. By defini- know that, if the encryption cycle-walk terminates be- tion,c isastringthat,whenviewedasaradix-rinteger, 0 forei steps, thenitalwaysfindsapointinImg(Y), max isinImg(X). Wedesiretofindac that, whenviewed i i.e.p =1.Also,theexpectednumberofstepsisatmost s as an integer, is in Img(Y), since this is the set of in- rt/|Img(Y)|=rt/|Y|,againmodelingET asarandom tegers that yield ciphertexts in Y that will be properly K permutation. Finally, we note that the walk termination decrypted. Intuitively,thewalkshouldhaltonthefirsti testcanbesimplifiedtov ∈Img(Y),andencryptioncan for which this is true. But then, if any of c ,...,c 1 i−1 thereafterimmediatelyreturnUnrank (v). Y representintegersthatareinImg(X),properdecryption isnotpossible(becausewedonotknowhowmanysteps Security. Wementioned, above, thatthestandardsecu- to go from c back to c ). Thus our cycle-walking en- rityassumptionforatweakablecipheristhat, whenthe i 0 cryption checks, at each step, to see if the current walk keyK issecret,everyassociateddatastringT resultsin shouldbeterminatedbecausedecryptionwillnotbepos- EKT(·) being indistinguishable from a random permuta- sible, or because we have found a c that will yield a tion. Underthisassumption,itisnothardtoseethatthe i ciphertext Y that will decrypt properly. We also allow cycle-walkingconstructionoutputs(essentially)random cycle-walkingFTEtotakeamaximum-number-of-steps elementsofthesetY = L(Fc),whenitdoesnotoutput parameter imax, and encryption fails if that number of ⊥. Intuitively,eachEKT(ci−1)inthecycle-walkisaran- stepsisexceeded. domstring(subjecttopermutivity),sothecorresponding numbervrepresentedbythestringisrandom,too. Thus, Efficiency. The standard security assumption for a ifv ∈Img(Y),itisarandomelementofthisset,result- tweakable cipher is that, for any secret key K, and any ing in a random element of Y being chosen when v is associateddataT,themappinginducedbyET isindis- K unranked. tinguishable from that of a random permutation. Mod- In the full version we formally define a security no- eling ET as such, the expected number of steps be- K tionfordeterministicFTEschemes, andgiveatheorem forethecycle-walkterminatesisatmostrt/|Img(X)∪ statingthesecurityofourconstructionrelativetothisse- Img(Y)| (a conservative bound) and never more than curitynotion. i . Assuming the walk terminates before i steps, max max thentheprobabilitythattheencryptionsucceedsisp = s |Img(Y)|/|Img(X)∪Img(Y)|. Since relaxed ranking 5.2 Rejection-Sampling(randomized)FTE is injective, |Img(X)| = |X| and |Img(Y)| = |Y|, so p ≥ 1/(1+|X|/|Y|). Thusweexpectthatp isquite We now turn our attention to building randomized FTE s s closeto1if|Y|(cid:29)|X|. schemes. Let Π = (K,E,D) be a conventional, ran- Each step of the cycle-walk requires checking v ∈ domized,authenticated-encryptionschemewithsupport Img(X)∪Img(Y),whichcanbedonebycheckingv ∈ forassociateddata(AEAD).Weassumethatthisscheme Img(X) first (signaling termination of the walk), and has a fixed ciphertext stretch τ; this is typical of in-use thenv ∈ Img(Y)(signalingsuccessfultermination). A AEADschemes. TobuildarandomizedFTEschemeus- straightforwardwaytoimplementthelastistotestifv = ing a generalized ranking scheme, we use a rejection- Rank (Unrank (v)) or, using our two-stage viewpoint sampling approach. Let t be the least integer such that Y Y on relaxed ranking, map(Unrank(v)) = unrank (v), both of the following are true: (1) |I(X)| ≤ 2t−τ, and I whichmaybefaster.Checkingv ∈Img(X)canbedone (2) |I(Y)| ≤ 2t. Then to encrypt M ∈ X, or de- likewise. crypt C ∈ Y, under key K and associated data T, we Recall that the NFA representation of a regex, un- doasshownontherightsideofFigure4. like a DFA representation, may have many accepting AstandardsecurityassumptionforAEADschemesis paths for a given string in its language. This can lead thatitsciphertextsareindistinguishablefromstrings(of to N (cid:29) |X| = Img(X) or N (cid:29) |Y| = Img(Y), thesamelength)thatareuniformlyrandom. Underthis X Y hence, potentially, rt (cid:29) |Img(X) ∪ Img(Y)|. When assumption,treatingeachC(cid:48)asarandomt-bitstring,the 9 Sub-Component Writtenin... LinesofCode RegularExpressionParser C/C++/Flex/Bison 2,057 For deterministic schemes (those without the final $) DFAMinimizer C/C++ 1,166 we use the cycle-walking construction, with FFX[2] NFA/DFARanking C/C++ 2,752 as the underlying tweakable cipher. For randomized FFX C++ 842 schemes, we use the rejection-sampling construction. FPE/FTE C++ 870 ConfigurationAssistant C++/Python 731 As the underlying encryption scheme, we employ the Bellare-Rogaway“encode-then-encipher”paradigm[6], Table5: Thesub-componentsoflibfte. prepending the result of Rank (M) (interpreted as a X fixed-length bitstring) with the appropriate number of random padding bits, and applying FFX[2] to this. Be- causeourparticularapplicationofrandomizedFTEdoes expectednumberofinvocationsofET is2t/|Img(Y)|= K notneedsupportforassociateddata,theFFXtweakwas 2t/|Y|. (Andcertainlynomorethani .) max fixed to an all-zeros string, and we do not need redun- Underthisstandardsecurityassumption,itisintuitive dancypaddinginourencode-then-encipherscheme. that any element of Y = F returned by our rejection- c Wenotethat,althoughwefixedspecificinstantiations sampling FTE is a uniform one. If each C(cid:48) is indistin- of FPE/FTE schemes for the sake of a concrete evalua- guishablefromarandomstring, thenthecorresponding tion, there is no reason to restrict to these. In the ran- number b represented by C(cid:48) is random, too. Hence if domizedscheme,forexample,onecoulduseCTR-AES b∈Img(Y),thenitisarandomelementofthatset,and (witharandomIV)andHMAC-SHA256inan“encrypt- sotheelementofY thatresultsfromunrankingbwillbe then-mac”composition[4,18](includinganyassociated random. datainthemac-scope)fortheunderlyingprimitive.4 We give a formal security notion for randomized FTE, and a security theorem for rejection-sampling- basedFTE,inthefullversion. 6.2 TheLibFTE ConfigurationAssistant We now turn our attention towards the implementation 6 RealizingLibFTE details of the libfte configuration assistant. We di- videtheinternalworkflowoftheconfigurationassistant In Section5 we explored strategies for constructing into three steps. First, we gather requirements from the FPE/FTE schemes in theory. Now, let’s concretely de- user, this is done by the user passing parameters to a scribetheschemesimplementedinlibfte. command-line interface. Then, we start with an initial setofallpossibleFPE/FTEschemes(i.e.,P-xx,T-xx,T- Implementation. The libfte implementation is a hy- xx-$) that one could instantiate, and use a decision tree brid of C, C++, Python, Flex and Bison. We present a algorithm to eliminate schemes from the initial set that detailed breakdown of the sub-components in Table5. donotsatisfyuserrequirements. Finally, theconfigura- We engineered a custom regular expression parser be- tion assistant analyzes the set of all schemes that were cause off-the-shelf solutions did not expose the appro- not eliminated in stage two, performs a battery of tests priate data structures necessary to implement our NFA onthem, andreturnstheresultstotheuser. Weprovide relaxed-rankingalgorithm. asampleoutputofthistoolinFigure7. In addition to a native C++ interface, we also pro- vide interfaces in Python and JavaScript for libfte. The Collectingrequirementsfromtheuser.Thecommand- Python interface is exposed through a manually-written line configuration assistant (see Figure7) takes two re- wrapperaroundtheC++implementation.TheJavaScript quiredparameters,theinputandoutputformats. Inaddi- interface is provided through C++-to-JavaScript compi- tion to the required parameters, the configuration assis- lation. tant takes optional parameters, most notably: the mem- ory threshold for the configuration assistant to deter- minize regexs, and the memory threshold for FPE/FTE 6.1 SchemesImplementedinLibFTE schemeinstantiation. Identifying feasible schemes. Next, the configuration Weuseashorthandnotationtorefertotypesoflibfte in- assistant’s job is to eliminate schemes that fall outside stantiations.Asanexample,T-ND-$isaanFTEscheme theuser-specifiedrequirements. Itstartswithasetofall that uses NFA-based ranking (Section4.1) for the input possibleFPE/FTEschemesthatonecouldconstruct(i.e., format,andDFA-basedranking(Section4.2)fortheout- put format, and is randomized ($); T-ND denotes the 4One should keep in mind the interaction between the cipher- same,buttheschemeisdeterministic.FPEconstructions textlengthoverheadsofAEADandtheexpectednumberofstepsin aresimilarlynamed,butbeginwithP. rejection-sampling. 10

Description:
using the nondeterministic finite-state automata (NFA) representation of a matted or not) and returns ciphertexts formatted to be in- distinguishable
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.