ebook img

Speeding up sum-of-squares for tensor decomposition and planted sparse vectors PDF

66 Pages·2016·0.47 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Speeding up sum-of-squares for tensor decomposition and planted sparse vectors

Speeding up sum-of-squares for tensor decomposition and planted sparse vectors SamuelB.Hopkins∗ TselilSchramm† JonathanShi‡ DavidSteurer§ January17,2016 Abstract We consider two problems that arise in machine learning applications: the problem of recoveringaplantedsparsevectorinarandomlinearsubspaceandtheproblemofdecomposing a random low-rank overcomplete 3-tensor. For both problems, the best known guarantees arebasedonthesum-of-squaresmethod. Wedevelopnewalgorithmsinspiredbyanalyses of the sum-of-squares method. Our algorithms achieve the same or similar guarantees as sum-of-squaresfortheseproblemsbuttherunningtimeissignificantlyfaster. Fortheplantedsparsevectorproblem,wegiveanalgorithmwithrunningtimenearlylinear intheinputsizethatapproximatelyrecoversaplantedsparsevectorwithuptoconstantrelative √ sparsityinarandomsubspaceof(cid:146)n ofdimensionuptoΩ˜( n). Theserecoveryguarantees matchthebestknownonesofBarak,Kelner,andSteurer(STOC2014)uptologarithmicfactors. Fortensordecomposition,wegiveanalgorithmwithrunningtimeclosetolinearintheinput size(withexponent≈ 1.086)thatapproximatelyrecoversacomponentofarandom3-tensor over(cid:146)n ofrankuptoΩ˜(n4/3). ThebestpreviousalgorithmforthisproblemduetoGeandMa (RANDOM2015)worksuptorankΩ˜(n3/2)butrequiresquasipolynomialtime. ∗CornellUniversity. [email protected]. SupportedbyanNSFGraduateResearchFellowship(NSFawardno. 1144153)andbyDavidSteurer’sNSFCAREERaward. †UCBerkeley,[email protected]. SupportedbyanNSFGraduateResearchFellowship(NSFawardno 1106400). ‡CornellUniversity,[email protected]. SupportedbyDavidSteurer’sNSFCAREERaward. §CornellUniversity,[email protected]. SupportedbyaMicrosoftResearchFellowship,aAlfredP.Sloan Fellowship,anNSFCAREERaward,andtheSimonsCollaborationforAlgorithmsandGeometry. Contents 1 Introduction 1 1.1 Plantedsparsevectorinrandomlinearsubspace . . . . . . . . . . . . . . . . . . . . . 2 1.2 Overcompletetensordecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Tensorprincipalcomponentanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Techniques 6 2.1 Plantedsparsevectorinrandomlinearsubspace . . . . . . . . . . . . . . . . . . . . . 8 2.2 Overcompletetensordecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Tensorprincipalcomponentanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Preliminaries 13 4 Plantedsparsevectorinrandomlinearsubspace 14 4.1 Algorithmsucceedsongoodbasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 Closenessofinputbasisandgoodbasis . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Overcompletetensordecomposition 20 5.1 ProofofTheorem5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Spectralgapfordiagonalterms: proofofProposition5.4 . . . . . . . . . . . . . . . . 24 5.3 Boundforcrossterms: proofofProposition5.5 . . . . . . . . . . . . . . . . . . . . . . 28 5.4 FullalgorithmandproofofTheorem5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Tensorprincipalcomponentanalysis 38 6.1 Spikedtensormodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.2 Linear-timealgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 References 40 A Additionalpreliminaries 43 A.1 Linearalgebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.2 Concentrationtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B Concentrationboundsforplantedsparsevectorinrandomlinearsubspace 52 C Concentrationboundsforovercompletetensordecomposition 55 D Concentrationboundsfortensorprincipalcomponentanalysis 62 1 Introduction The sum-of-squares (SoS) method (also known as the Lasserre hierarchy) [Sho87, Par00, Nes00, Las01]isapowerful,semidefinite-programmingbasedmeta-algorithmthatappliestoawide-range ofoptimizationproblems. Themethodhasbeenstudiedextensivelyformoderate-sizepolynomial optimizationproblemsthatariseforexampleincontroltheoryandinthecontextofapproximation algorithmsforcombinatorialoptimizationproblems,especiallyconstraintsatisfactionandgraph partitioning(seee.g. thesurvey[BS14]). Forthelatter,theSoSmethodcapturesandgeneralizesthe bestknownapproximationalgorithmsbasedonlinearprogramming(LP),semidefiniteprogram- ming(SDP),orspectralmethods,anditisinmanycasesthemostpromisingapproachtoobtain algorithmswithbetterguarantees—especiallyinthecontextofKhot’sUniqueGamesConjecture [BBH+12]. Asequenceofrecentworksappliesthesum-of-squaresmethodtobasicproblemsthatarisein unsupervisedmachinelearning: inparticular,recoveringsparsevectorsinlinearsubspacesand decomposingtensorsinarobustway[BKS14,BKS15,HSS15,BM15,GM15]. Forawiderangeof parametersoftheseproblems,SoSachievessignificantlystrongerguaranteesthanothermethods, inpolynomialorquasi-polynomialtime. LikeotherLPandSDPhierarchies,thesum-of-squaresmethodcomeswithadegreeparameter d ∈ (cid:142) that allows for trading off running time and solution quality. This trade-off is appealing becauseforapplicationstheadditionalutilityofbettersolutionscouldvastlyoutweighadditional computationalcosts. Unfortunately,thecomputationalcostgrowsrathersteeplyintermsofthe parameter d: therunningtimeis nO(d) where n isthenumberofvariables(usuallycomparableto theinstancesize). Further,evenwhentheSDPhassizepolynomialintheinput(when d (cid:3) O(1)), solvingtheunderlyingsemidefiniteprogramsisprohibitivelyslowforlargeinstances. Inthiswork,weintroducespectralalgorithmsforplantedsparsevector,tensordecomposition, andtensorprincipalcomponentsanalysis(PCA)thatexploitthesamehigh-degreeinformationas thecorrespondingsum-of-squaresalgorithmswithoutrelyingonsemidefiniteprogramming,and achievethesame(orclosetothesame)guarantees. Theresultingalgorithmsarequitesimple(a coupleoflinesofmatlabcode)andhaveconsiderablyfasterrunningtimes—quasi-linearorcloseto linearintheinputsize. Asurprisingimplicationofourworkisthatforsomeproblems,spectralalgorithmscanexploit informationfromlargervaluesoftheparameter d withoutspendingtime nO(d). Forexample,one ofouralgorithmsrunsinnearly-lineartimeintheinputsize,eventhoughitusespropertiesthat thesum-of-squaresmethodcanonlyusefordegreeparameter d (cid:62) 4. (Inparticular,theguarantees thatthealgorithmachievesarestrictlystrongerthantheguaranteesthatSoSachievesforvaluesof d < 4.) TheinitialsuccessesofSoSinthemachinelearningsettinggavehopethattechniquesdeveloped in the theory of approximation algorithms, specifically the techniques of hierarchies of convex relaxationsandroundingconvexrelaxations,couldbroadlyimpactthepracticeofmachinelearning. Thishopewasdampenedbythefactthatingeneral,algorithmsthatrelyonsolvinglargesemidefinite programsaretooslowtobepracticalforthelarge-scaleproblemsthatariseinmachinelearning. Ourworkbringsthishopebackintofocusbydemonstratingforthefirsttimethatwithsomecare SoSalgorithmscanbemadepracticalforlarge-scaleproblems. 1 In the following subsections we describe each of the problems that we consider, the prior best-knownguaranteeviatheSoShierarchy,andourresults. 1.1 Plantedsparsevectorinrandomlinearsubspace Theproblemoffindingasparsevectorplantedinarandomlinearsubspacewasintroducedby Spielman,Wang,andWrightasawayoflearningsparsedictionaries[SWW12]. Subsequentworks havefoundfurtherapplicationsandbegunstudyingtheprobleminitsownright[DH14,BKS14, QSW14]. Inthisproblem,wearegivenabasisfora d-dimensionallinearsubspaceof(cid:146)n thatis random except for one planted sparse direction, and the goal is to recover this sparse direction. Thecomputationalchallengeistosolvethisproblemevenwhentheplantedvectorisonlymildly sparse(aconstantfractionofnon-zerocoordinates)andthesubspacedimensionislargecompared totheambientdimension(d (cid:62) nΩ(1)). Severalkindsofalgorithmshavebeenproposedforthisproblembasedonlinearprogramming (LP),basicsemidefiniteprogramming(SDP),sum-of-squares,andnon-convexgradientdescent (alternatingdirectionsmethod). Aninherentlimitationofsimplerconvexmethods(LPandbasicSDP)[SWW12,dGJL04]isthat theyrequiretherelativesparsityoftheplantedvectortobepolynomialinthesubspacedimension √ (lessthan n/ d non-zerocoordinates). Sum-of-squares and non-convex methods do not share this limitation. They can recover planted vectors with constant relative sparsity even if the subspace has polynomial dimension (uptodimensionO(n1/2)forsum-of-squares[BKS14]anduptoO(n1/4)fornon-convexmethods [QSW14]). Westatetheproblemformally: Problem1.1(Plantedsparsevectorproblemwithambientdimension n ∈ (cid:142),subspacedimension d (cid:54) n, sparsity ε > 0, and accuracy η > 0). Given an arbitrary orthogonal basis of a subspace spannedbyvectors v0,v1,...,vd−1 ∈ (cid:146)n,where v0 isavectorwithatmost εn non-zeroentriesand v1,...,vd−1 arevectorssampledindependentlyatrandomfromthestandardGaussiandistribution on(cid:146)n,outputaunitvector v ∈ (cid:146)n thathascorrelation(cid:104)v,v0(cid:105)2 (cid:62) 1−ηwiththesparsevector v0. Our Results. Our algorithm runs in nearly linear time in the input size, and matches the best- knownguaranteesuptoapolylogarithmicfactorinthesubspacedimension[BKS14]. Theorem 1.2 (Planted sparse vector in nearly-linear time). There exists an algorithm that, for every √ sparsityε > 0,ambientdimensionn,andsubspacedimensiondwithd (cid:54) n/(logn)O(1),solvestheplanted sparsevectorproblemwithhighprobabilityforsomeaccuracy η (cid:54) O(ε1/4)+on→∞(1). Therunningtimeof thealgorithmisO˜(nd). WegiveatechnicaloverviewoftheproofinSection2,andafullproofinSection4. Previousworkalsoshowedhowtorecovertheplantedsparsevectorexactly. Thetaskofgoing fromanapproximatesolutiontoanexactoneisaspecialcaseofstandardcompressedsensing(see e.g. [BKS14]). 2 Table1: Comparisonofalgorithmsfortheplantedsparsevectorproblemwithambientdimensionn,subspacedimension d,andrelativesparsityε. Reference Technique Runtime Largest d Largest ε √ Demanet,Hand[DH14] linearprogramming poly any Ω(1/ d) √ Barak,Kelner,Steurer[BKS14] SoS,generalSDP poly Ω( n) Ω(1) Qu,Sun,Wright[QSW14] alternatingminimization O˜(n2d5) Ω(n1/4) Ω(1) √ thiswork SoS,partialtraces O˜(nd) Ω˜( n) Ω(1) 1.2 Overcompletetensordecomposition Tensorsnaturallyrepresentmultilinearrelationshipsindata. Algorithmsfortensordecompositions have long been studied as a tool for data analysis across a wide-range of disciplines (see the early work of Harshman [Har70] and the survey [KB09]). While the problem is NP-hard in the worst-case [Hås90, HL13], algorithms for special cases of tensor decomposition have recently led to new provable algorithmic results for several unsupervised learning problems [AGH+14, BCMV14, GVX14, AGHK14] including independent component analysis, learning mixtures of Gaussians[GHK15],LatentDirichlettopicmodeling[AFH+15]anddictionarylearning[BKS15]. Some previous learning algorithms can also be reinterpreted in terms of tensor decomposition [Cha96,MR06,NR09]. Akeyalgorithmicchallengefortensordecompositionsisovercompleteness,whenthenumber ofcomponentsislargerthantheirdimension(i.e.,thecomponentsarelinearlydependent). Most algorithmsthatworkinthisregimerequiretensorsoforder4orhigher[LCC07,BCMV14]. For example,theFOOBIalgorithmof[LCC07]canrecoveruptoΩ(d2)componentsgivenanorder-4 tensor in dimension d under mild algebraic independence assumptions for the components— satisfiedwithhighprobabilitybyrandomcomponents. Forovercomplete3-tensors,whicharisein manyapplicationsoftensordecompositions,sucharesultremainselusive. Researchershavethereforeturnedtoinvestigateaverage-caseversionsoftheproblem,whenthe componentsoftheovercomplete3-tensorarerandom: Givena3-tensorT ∈ (cid:146)d3 oftheform (cid:88)n T (cid:3) a ⊗ a ⊗ a , i i i i(cid:3)1 where a1,...,an are random unit or Gaussian vectors, the goal is to approximately recover the components a1,...,an. Algorithms based on tensor power iteration—a gradient-descent approach for tensor decomposition—solvethisprobleminpolynomialtimewhen n (cid:54) C·d foranyconstantC (cid:62) 1(the runningtimeisexponentialinC)[AGJ15]. Tensorpoweriterationalsoadmitslocalconvergence analysesforupto n (cid:54) Ω˜(d1.5)components[AGJ15,AGJ14]. Unfortunately,theseanalysesdonot give polynomial-time algorithms because it is not known how to efficiently obtain the kind of initializationsassumedbytheanalyses. Recently,GeandMa[GM15]wereabletoshowthatatensor-decompositionalgorithm[BKS15] basedonsum-of-squaressolvestheaboveproblemforn (cid:54) Ω˜(d1.5)inquasi-polynomialtimenO(logn). Thekeyingredientoftheirelegantanalysisisasubtlespectralconcentrationboundforapartic- ulardegree-4matrix-valuedpolynomialassociatedwiththedecompositionproblemofrandom 3 Table2: Comparisonofdecompositionalgorithmsforovercomplete3-tensorswithncomponentsindimensiond. Reference Technique Runtime Largest n Components Anandkumaretal. [AGJ15]a tensorpoweriteration poly C·d incoherent Ge,Ma[GM15] SoS,generalSDP nO(logn) Ω˜(d3/2) N(0, 1 Id ) d d thisworkb SoS,partialtraces O˜(nd1+ω) Ω˜(d4/3) N(0, 1 Id ) d d aTheanalysisshowsthatforeveryconstantC (cid:62)1,therunningtimeispolynomialforn (cid:54)C·dcomponents. The analysisassumesthatthecomponentsalsosatisfyotherrandom-likepropertiesbesidesincoherence. bHere,ω (cid:54)2.373istheconstantsothatd×dmatricescanbemultipliedinO(dω)arithmeticoperations. overcomplete3-tensors. Westatetheproblemformally: Problem 1.3 (Random tensor decomposition with dimension d, rank n, and accuracy η). Let lae1t,T...∈,(a(cid:146)nd∈)⊗(cid:146)3dbbeethined3e-tpeennsdoernTtly(cid:3)s(cid:80)amn plae⊗d3.vectorsfromtheGaussiandistributionN(0, d1 Idd),and i(cid:3)1 i Singlecomponent: GivenTsampledasabove,findaunitvector b thathascorrelation max (cid:104)a ,b(cid:105) (cid:62) 1−ηwithoneofthevectors a . i i i Allcomponents: GivenTsampledasabove,findasetofunitvectors{b1,...,bn}such that(cid:104)a ,b (cid:105) (cid:62) 1−ηforevery i ∈ [n]. i i OurResults. Wegivethefirstpolynomial-timealgorithmfordecomposingrandomovercomplete 3-tensorswithuptoω(d)components. Ouralgorithmsworksaslongasthenumberofcomponents satisfies n (cid:54) Ω˜(d4/3), which comes close to the bound Ω˜(d1.5) achieved by the aforementioned quasi-polynomialalgorithmofGeandMa. Forthesingle-componentversionoftheproblem,our algorithmrunsintimeclosetolinearintheinputsize. Theorem1.4(Fastrandomtensordecomposition). Thereexistrandomizedalgorithmsthat,forevery dimension d and rank n with d (cid:54) n (cid:54) d4/3/(logn)O(1), solve the random tensor decomposition problem withprobability1−o(1)forsomeaccuracy η (cid:54) O˜(n3/d4)1/2. Therunningtimeforthesingle-component versionoftheproblemisO˜(min{d1+ω,d3.257}),where dω isthetimetomultiplytwo d-by-d matrices. The runningtimefortheall-componentsversionoftheproblemisO˜(n ·min{d1+ω,d3.257}). WegiveatechnicaloverviewoftheproofinSection2,andafullproofinSection5. Weremarkthattheabovealgorithmonlyrequiresaccesstotheinputtensorwithsomefixed inversepolynomialaccuracybecauseeachofitsfourstepsamplifieserrorsbyatmostapolynomial factor(seeAlgorithm5.17). Inthissense,thealgorithmisrobust. 1.3 Tensorprincipalcomponentanalysis Theproblemoftensorprincipalcomponentanalysisissimilartothetensordecompositionproblem. However, here the focus is not on the number of components in the tensor, but about recovery in the presence of a large amount of random noise. We are given as input a tensor τ · v⊗3 +A, where v ∈ (cid:146)n isaunitvectorandtheentriesofAarechoseniidfromN(0,1). Thisspikedtensor modelwasintroducedbyMontanariandRichard[RM14],whoalsoobtainedthefirstalgorithmsto 4 Table3: Comparisonofalgorithmsforprincipalcomponentanalysisof3-tensorsindimensiondandwithsignal-to-noise ratioτ. Reference Technique Runtime Smallest τ Montanari,Richard[RM14] spectral O˜(d3) n Hopkins,Shi,Steurer[HSS15] SoS,spectral O˜(d3) Ω(n3/4) thiswork SoS,partialtraces O(d3) Ω˜(n3/4) solvethemodelwithprovablestatisticalguarantees. Thespikedtensormodelwassubsequently addressedbyasubsetofthepresentauthors[HSS15],whoappliedtheSoSapproachtoimprove thesignal-to-noiseratiorequiredforrecoveryfromodd-ordertensors. Westatetheproblemformally: Problem1.5(Tensorprincipalcomponentsanalysiswithsignal-to-noiseratio τ). LetT ∈ ((cid:146)d)⊗3 be atensorsothatT (cid:3) τ·v⊗3+A,whereAisatensorwithindependentstandardgaussianentries and v ∈ (cid:146)d isaunitvector. GivenT,recoveraunitvector v(cid:48) ∈ (cid:146)d suchthat(cid:104)v(cid:48),v(cid:105) (cid:62) 1−η. Ourresults. Forthisproblem,ourimprovementsoverthepreviousresultsaremoremodest—we achievesignal-to-noiseguaranteesmatching[HSS15],butwithanalgorithmthatrunsinlineartime ratherthannear-lineartime(timeO(d3)ratherthanO(d3polylogd),foraninputofsize d3). Theorem1.6(Tensorprincipalcomponentanalysisinlineartime). Thereisanalgorithmwhichsolves thetensorprincipalcomponentanalysisalgorithmwhen τ (cid:62) n3/4log1/2n/ε,with η (cid:54) O(ε). Furthermore, thealgorithmrunsintimeO(d3). ThoughfortensorPCAourimprovementoverpreviousworkismodest,weincludetheresults hereasthisproblemisapedagogicallypoignantillustrationofourtechniques. Wegiveatechnical overviewoftheproofinSection2,andafullproofinSection6. 1.4 Relatedwork Foremost,thisworkbuildsupontheSoSalgorithmsof[BKS14,BKS15,GM15,HSS15]. Ineachof thesepreviousworks,amachinelearningdecisionproblemissolvedusinganSDPrelaxationfor SoS.Intheseworks,theSDPvalueislargeintheyescaseandsmallinthenocase,andtheSDP valuecanbeboundedusingthespectrumofaspecificmatrix. Thiswasimplicitin[BKS14,BKS15], and in [HSS15] it was used to obtain a fast algorithm as well. In our work, we design spectral algorithmswhichusesmallermatrices,inspiredbytheSoScertificatesinpreviousworks,tosolve thesemachine-learningproblemsmuchfaster,withalmostmatchingguarantees. Akeyideainourworkisthatgivenalargematrixwithinformationencodedinthematrix’s spectralgap,onecanoftenefficiently“compress”thematrixtoamuchsmalleronewithoutlosing thatinformation. Thisisparticularlytrueforproblemswithplantedsolutions. Inthisway,weare abletoimproverunningtimebyreplacingan nO(d)-sizedSDPwithaneigenvectorcomputationfor an nk ×nk matrix,forsome k < d. TheideaofspeedingupLPandSDPhierarchiesforspecificproblemshasbeeninvestigatedina seriesofpreviousworks[dlVK07,BRS11,GS12],whichshowsthatwithrespecttolocalanalysesof 5 thesum-of-squaresalgorithmitissometimespossibletoimprovetherunningtimefrom nO(d) to 2O(d) ·nO(1). However,thescopesandstrategiesoftheseworksarecompletelydifferentfromours. First,thenotionoflocalanalysisfromtheseworksdoesnotapplytotheproblemsconsideredhere. Second,theseworksemploytheellipsoidmethodwithaseparationoracleinspiredbyrounding algorithms,whereaswereducetheproblemtoanordinaryeigenvectorcomputation. Itwouldalsobeinterestingtoseeifourmethodscanbeusedtospeedupsomeoftheother recentsuccessfulapplicationsofSoStomachine-learningtypeproblems,suchas[BM15],orthe applicationof[BKS14]totensordecompositionwithcomponentsthatarewell-separated(rather thanrandom). Finally,wewouldberemissnottomentionthatSoSlowerboundsexistforseveralof theseproblems,specificallyfortensorprincipalcomponentsanalysis,tensorprediction,andsparse PCA[HSS15,BM15,MW15]. ThelowerboundsintheSoSframeworkareagoodindicationthatwe cannotexpectspectralalgorithmsachievingbetterguarantees. 2 Techniques Sum-of-squares method (for polynomial optimization over the sphere). The problems we consider are connected to optimization problems of the following form: Given a homogeneous n-variaterealpolynomial f ofconstantdegree,findaunitvector x ∈ (cid:146)n soastomaximize f(x). Thesum-of-squaresmethodallowsustoefficientlycomputeupperboundsonthemaximumvalue ofsuchapolynomial f overtheunitsphere. For the case that k (cid:3) deg(f) is even, the most basic upper bound of this kind is the largest eigenvalueofamatrixrepresentationof f. Amatrixrepresentationofapolynomial f isasymmetric matrix M withrowsandcolumnsindexedbymonomialsofdegree k/2sothat f(x)canbewritten asthequadraticform f(x) (cid:3) (cid:104)x⊗k/2,Mx⊗k/2(cid:105),where x⊗k/2 isthe k/2-foldtensorpowerof x. The largesteigenvalueofamatrixrepresentation M isanupperboundonthevalueof f(x)overunit x ∈ (cid:146)n because f(x) (cid:3) (cid:104)x⊗k/2,Mx⊗k/2(cid:105) (cid:54) λmax(M)· (cid:107)x⊗k/2(cid:107)22 (cid:3) λmax(M). Thesum-of-squaresmethodsimprovesonthisbasicspectralboundsystematicallybyassociating alargefamilyofpolynomials(potentiallyofdegreehigherthandeg(f))withtheinputpolynomial f andcomputingthebestpossiblespectralboundwithinthisfamilyofpolynomials. Concretely, thesum-of-squaresmethodwithdegreeparameter d appliedtoapolynomial f withdeg(f) (cid:54) d considers the affine subspace of polynomials {f +(1− (cid:107)x(cid:107)2) · (cid:49) | deg((cid:49)) (cid:54) d −2} ⊆ (cid:146)[x] and 2 minimizes λmax(M) among all matrix representations 1 M of polynomials in this space.2 The problemofsearchingthroughthisaffinelinearspaceofpolynomialsandtheirmatrixrepresentations andfindingtheoneofsmallestmaximumeigenvaluecanbesolvedusingsemidefiniteprogramming. 1Earlierwedefinedmatrixrepresentationsonlyforhomogeneouspolynomialsofevendegree. Ingeneral,amatrix representationofapolynomial(cid:49)isasymmetricmatrixMwithrowsandcolumnsindexedbymonomialsofdegreeat most(cid:96)(cid:3)deg((cid:49))/2suchthat(cid:49)(x)(cid:3)(cid:104)x⊗(cid:54)(cid:96),Mx⊗(cid:54)(cid:96)(cid:105)(asapolynomialidentity),wherex⊗(cid:54)(cid:96) (cid:3) √1 (x⊗0,x⊗1,...,x⊗(cid:96))is (cid:96)+1 thevectorofallmonomialsofdegreeatmost(cid:96). Notethat(cid:107)x⊗(cid:54)(cid:96)(cid:107)(cid:3)1forallxwith(cid:107)x(cid:107)(cid:3)1. 2Thenameofthemethodstemsfromthefactthatthislaststepisequivalenttofindingtheminimumnumberλsuch thatthespacecontainsapolynomialoftheformλ−((cid:49)12+···+(cid:49)t2),where(cid:49)1,...,(cid:49)t arepolynomialsofdegreeatmost d/2. 6 OurapproachforfasteralgorithmsbasedonSoSalgorithmsistoconstructspecificmatrices (polynomials)inthisaffinelinearspace,thencomputetheirtopeigenvectors. Bydesigningour matricescarefully,weensurethatouralgorithmshaveaccesstothesamehigherdegreeinformation thatthesum-of-squaresalgorithmcanaccess,andthisinformationaffordsanadvantageoverthe basic spectral methods for these problems. At the same time, our algorithms avoid searching forthebestpolynomialandmatrixrepresentation,whichgivesusfasterrunningtimessincewe avoidsemidefiniteprogramming. Thisapproachiswellsuitedtoaverage-caseproblemswherewe avoidtheproblemofadversarialchoiceofinput;inparticularitisapplicabletomachinelearning problemswherenoiseandinputsareassumedtoberandom. Compressingmatriceswithpartialtraces. Aseriouslimitationoftheaboveapproachisthatthe representationofadegree-d polynomialrequiressizeroughly nd. Hence,evenavoidingtheuseof semidefiniteprogramming,improvinguponrunningtimeO(nd)requiresadditionalideas. Ineachoftheproblemsthatweconsider,wehavealargematrix(suggestedbyaSoSalgorithm) witha“signal”plantedinsomeamountof“noise”. Weshowthatinsomesituations,thislarge matrixcanbecompressedsignificantlywithoutlossinthesignalbyapplyingpartialtraceoperations. Inthesesituations,thepartialtraceyieldsasmallermatrixwiththesamesignal-to-noiseratioasthe largematrixsuggestedbytheSoSalgorithm,eveninsituationswhenlowerdegreesum-of-squares approachesareknowntofail. ThepartialtraceTr : (cid:146)d2×d2 → (cid:146)d×d isthelinearoperatorthatsatisfiesTr A⊗B (cid:3) (TrA)·B (cid:146)d (cid:146)d forallA,B ∈ (cid:146)d×d. Toseehowthepartialtracecanbeusedtocompresslargematricestosmaller ones with little loss, consider the following problem: Given a matrix M ∈ (cid:146)d2×d2 of the form M (cid:3) τ·(v ⊗ v)(v ⊗ v)(cid:62)+A⊗B forsomeunitvector v ∈ (cid:146)d andmatricesA,B ∈ (cid:146)d×d,wewishto recoverthevector v. (Thisisasimplifiedversionoftheplantedproblemsthatweconsiderinthis paper,where τ·(v ⊗ v)(v ⊗ v)(cid:62) isthesignalandA⊗B playstheroleofnoise.) ItisstraightforwardtoseethatthematrixA⊗B hasspectralnorm (cid:107)A⊗B(cid:107) (cid:3) (cid:107)A(cid:107)·(cid:107)B(cid:107),andso when τ (cid:29) (cid:107)A(cid:107)(cid:107)B(cid:107),thematrix M hasanoticeablespectralgap,andthetopeigenvectorof M will becloseto v ⊗ v. If |TrA| ≈ (cid:107)A(cid:107),thematrixTr M (cid:3) τ·vv(cid:62)+Tr(A)·B hasamatchingspectral (cid:146)d gap,andwecanstillrecover v,butnowweonlyneedtocomputethetopeigenvectorofa d×d (as opposedto d2×d2)matrix.3 √ IfAisaWignermatrix(e.g. asymmetricmatrixwithiid±1entries),thenbothTr(A),(cid:107)A(cid:107) ≈ n, andtheaboveconditionisindeedmet. Inouraveragecase/machinelearningsettingsthe“noise” componentisnotassimpleasA⊗B withAaWignermatrix. Nonetheless,weareabletoensure thatthenoisedisplaysasimilarbehaviorunderpartialtraceoperations. Insomecases,thisrequires additionalalgorithmicsteps, suchasrandomprojectioninthecaseoftensordecomposition, or centeringthematrixeigenvaluedistributioninthecaseoftheplantedsparsevector. Itisaninterestingquestioniftherearegeneraltheoremsdescribingthebehaviorofspectral normsunderpartialtraceoperations. Inthecurrentwork,wecomputethepartialtracesexplicitly andestimatetheirnormsdirectly. Indeed,ouranalysesboildowntoconcentrationsboundsfor special matrix polynomials. A general theory for the concentration of matrix polynomials is a notoriousopenproblem(see[MW13]). 3Insomeofourapplications,thematrixMisonlyrepresentedimplicitlyandhassizesuperlinearinthesizeofthe input,butneverthelesswecancomputethetopeigenvectorofthepartialtraceTr(cid:146)d Minnearly-lineartime. 7 PartialtraceoperationshavepreviouslybeenappliedforroundingSoSrelaxations. Specifically, theoperationofreweighingandconditioning,usedinroundingalgorithmsforsum-of-squaressuch as [BRS11, RT12, BKS14, BKS15, LR15], corresponds to applying a partial trace operation to the momentsmatrixreturnedbythesum-of-squaresrelaxation. We now give a technical overview of our algorithmic approach for each problem, and some broad strokes of the analysis for each case. Our most substantial improvements in runtime are fortheplantedsparsevectorandovercompletetensordecompositionproblems(Section2.1and Section2.2respectively). OuralgorithmfortensorPCAisthesimplestapplicationofourtechniques, anditmaybeinstructivetoskipaheadandreadabouttensorPCAfirst(Section2.3). 2.1 Plantedsparsevectorinrandomlinearsubspace RecallthatinthisproblemwearegivenalinearsubspaceU (representedbysomebasis)thatis spannedbya k-sparseunitvector v0 ∈ (cid:146)d andrandomunitvectors v1,...,vd−1 ∈ (cid:146)d. Thegoalis torecoverthevector v0 approximately. BackgroundandSoSanalysis. LetA ∈ (cid:146)n×d beamatrixwhosecolumnsformanorthonormal basisforU. Ourstartingpointisthepolynomial f(x) (cid:3) (cid:107)Ax(cid:107)4 (cid:3) (cid:80)n (Ax)4. Previousworkshowed √ 4 i(cid:3)1 i thatfor d (cid:28) n themaximizerofthispolynomialoverthespherecorrespondstoavectorcloseto v0 andthatdegree-4sum-of-squaresisabletocapturethisfact[BBH+12,BKS14]. Indeed,typical randomvectors v in(cid:146)n satisfy (cid:107)v(cid:107)44 ≈ 1/n whereasourplantedvectorsatisfies (cid:107)v0(cid:107)44 (cid:62) 1/k (cid:29) 1/n, andthisdegree-4informationisleveragedbytheSoSalgorithms. Thepolynomial f hasaconvenientmatrixrepresentation M (cid:3) (cid:80)ni(cid:3)1(aia(cid:62)i )⊗2,where a1,...,an aretherowsofthegeneratormatrixA. Itturnsoutthattheeigenvaluesofthismatrixindeedgive information about the planted sparse vector v0. In particular, the vector x0 ∈ (cid:146)d with Ax0 (cid:3) v0 witnessesthat M hasaneigenvalueofatleast1/k because M’squadraticformwiththevector x⊗2 0 satisfies(cid:104)x0⊗2,Mx0⊗2(cid:105) (cid:3) (cid:107)v0(cid:107)44 (cid:62) 1/k. Ifwelet M(cid:48)bethecorrespondingmatrixforthesubspaceU withouttheplantedsparsevector, M(cid:48)turnsouttohaveonlyeigenvaluesofatmostO(1/n)uptoa singlespuriouseigenvaluewitheigenvectorfarfromanyvectoroftheform x ⊗ x [BBH+12]. Itfollowsthatinordertodistinguishbetweenarandomsubspacewithaplantedsparsevector (yescase)andacompletelyrandomsubspace(nocase),itisenoughtocomputethesecond-largest eigenvalueofa d2-by-d2 matrix(representingthe4-normpolynomialoverthesubspaceasabove). Thisdecisionversionoftheproblem,whilestrictlyspeakingeasierthanthesearchversionabove,is attheheartofthematter: onecanshowthatthelargeeigenvaluefortheyescasecorrespondstoan eigenvectorwhichencodesthecoefficientsofthesparseplantedvectorinthebasis. Improvements. ThebestrunningtimewecanhopeforwiththisbasicapproachisO(d4)(thesize √ ofthematrix). Sinceweareinterestedin d (cid:54) O( n),theresultingrunningtimeO(nd2)wouldbe subquadraticbutstillsuper-linearintheinputsizen·d (forrepresentingad-dimensionalsubspace of(cid:146)n). Tospeedthingsup, weusethepartialtraceapproachoutlinedabove. Wewillbeginby applyingthepartialtraceapproachnaively,obtainingreasonablebounds,andthenshowthata smallmodificationtothematrixbeforethepartialtraceoperationallowsustoachieveevensmaller signal-to-noiseratios. In the planted case, we may approximate M ≈ 1k(x0x0(cid:62))⊗2 + Z, where x0 is the vector of coefficients of v0 in the basis representation given by A (so that Ax0 (cid:3) v0), and Z is the noise 8

Description:
recovering a planted sparse vector in a random linear subspace and the problem of decomposing a random low-rank overcomplete T(ai⊗ aj). In fact, careful computation reveals that T(ai⊗ ai) ⩾ ˜Ω(. √ n/d)T(ai⊗ aj). The idea now is to replace ∑ i,j〈 , ai⊗ aj〉(ai⊗ aj)⊗2 with ∑
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.