ebook img

Combined modeling of sparse and dense noise for improvement of Relevance Vector Machine PDF

4.9 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Combined modeling of sparse and dense noise for improvement of Relevance Vector Machine

COMBINEDMODELINGOFSPARSEANDDENSENOISEFORIMPROVEMENTOF RELEVANCEVECTORMACHINE MartinSundin,SaikatChatterjeeandMagnusJansson ACCESSLinnaeusCenter,SchoolofElectricalEngineering KTHRoyalInstituteofTechnology,Stockholm,Sweden [email protected], [email protected], [email protected] 5 1 0 ABSTRACT sity through a weight vector where the weights are essential 2 toformlinearcombinationsofrelevantkernelstopredictthe Using a Bayesian approach, we consider the problem of re- n object of interest; the weight vector is a set of system pa- coveringsparsesignalsunderadditivesparseanddensenoise. a rameters and its sparsity leads to reduction of model com- J Typically,sparsenoisemodelsoutliers,impulseburstsordata 2 loss. To handle sparse noise, existing methods simultane- plexity for regression. Naturally, the RVM has been further usedforsparserepresentationtechniquesaswellasdevelop- 1 ously estimate the sparse signal of interest and the sparse noiseofnointerest. Forestimatingthesparsesignal,without ingBayesiancompressivesensingmethods[5]. ] For a Bayesian linear model, the standard RVM uses a theneedofestimatingthesparsenoise,weconstructarobust L multivariate isotropic Gaussian prior to model the additive RelevanceVectorMachine(RVM).IntheRVM,sparsenoise M dense noise. Here isotropic means that the associated co- andeverpresentdensenoisearetreatedthroughacombined t. noise model. The precision of combined noise is modeled variance matrix is proportional to the identity matrix. Such a adensenoisemodelhasinherentlimitationstoaccommodate by a diagonal matrix. We show that the new RVM update t s equations correspond to a non-symmetric sparsity inducing instances of outliers [6, 7, 8, 9, 10], impulse bursts [11, 12] [ ormissing(lost)data[13,14]. Wehypothesizethatasparse costfunction. Further,thecombinedmodelingisfoundtobe 1 computationally more efficient. We also extend the method anddensenoisemodelcanaccommodateforthestatisticsof v a variety of noise types, without causing degradation in per- to block-sparse signals and noise with known and unknown 9 formance for any noise type compared to the standard case block structures. Through simulations, we show the perfor- 7 ofusingonlyadensenoisemodel. Inthispaper,wedevelop 5 mance and computation efficiency of the new RVM in sev- RVMforsuchacombined(joint)sparseanddensenoisesce- 2 eralapplications:recoveryofsparseandblocksparsesignals, 0 housingpricepredictionandimagedenoising. nario. . 1 0 1.1. Systemmodel 1. INTRODUCTION 5 1 Weconsiderthefollowinglinearsystemmodel : Noise modeling has an important role in the Bayesian infer- v ence setup to achieve better robustness and accuracy. Typ- y=Ax+e+n, (1) i X ically noise is considered to be additive and dense (possibly wherey ∈Rm isthemeasurements,x∈Rn isasparsevec- r evenwhite)innature.Inthispaperweinvestigatetheeffectof a tor(forexampleweightsinregressionorsparsesignaltoesti- sparse noise modeling in a standard Bayesian inference tool mateincompressedsensing),A ∈ Rm×n isaknownsystem calledtheRelevanceVectorMachine(RVM)[1]. matrix(forexample,regressorsorsamplingsystem). Further, The RVM is a Bayesian sparse kernel technique for ap- e ∈ Rm is sparse noise and n ∈ Rm is dense noise. Using plications in regression and classification [1]. Interest in the (cid:96) -norm notation to represent the number of non-zeros in a RVMcanbeattributedtothecausethatitsharesmanychar- 0 vector,weassumethat||x|| (cid:28) nand||e|| (cid:28) maresmall acteristicsofthepopularsupportvectormachinewhilstpro- 0 0 and unknown. The random vectors x, e and n are indepen- vidingBayesianadvantages[2,3,4],mainlyprovidingposte- dent. The model (1) is used in face recognition [15], image riorsfortheobjectofinterest. GenerallytheRVMisafully denoising[6]andcompressedsensing[5]. Bayesiantechniquethataimstolearnalltherelevantsystem parametersiterativelytoinfertheobjectofinterest. Inalin- ear model setup used for regression, RVM introduces spar- 1.2. Ourcontribution We develop a RVM for the model (1), by treating e+n as ThisworkwaspartiallysupportedbytheSwedishResearchCouncilun- dercontract621-2011-5847. a combined noise. By learning parameters of x and e+n, weestimatexwithouttheneedofestimatinge. Wealsocon- distribution p(y|γ,ν,β) becomes equivalent to maximiz- sider the scenario where the signal x and noise e are block ing the joint distribution p(y,γ,ν,β), in the limit of non- sparse. By using techniques similar to the ones in [16] we informative priors. In calculations, however, the parameters generalizethemethodstosignalswithunknownblockstruc- (a,b,c,d) are often given small values to avoid numerical ture.Themaintechnicalcontributionistoderiveupdateequa- instabilities. To estimate (cid:2)x(cid:62) e(cid:62)(cid:3)(cid:62), RB-RVM fixes the tionsthatareusediterativelyforestimationofparametersin precisionsandsets thenewRVM.WerefertothenewRVMastheRVMforcom- binedsparseanddensenoise(SD-RVM).Byanapproximate (cid:2)xˆ(cid:62) eˆ(cid:62)(cid:3)(cid:62) =βΣ [A I ](cid:62)y, (4) analysis,theSD-RVMalgorithmisshowntobeequivalentto RB m the minimization of a non-symmetric sparsity inducing cost (cid:18)(cid:18) Γ 0 (cid:19) (cid:19)−1 Σ = +β[A I ](cid:62)[A I ] , function. Finally, the performance of SD-RVM is evaluated RB 0 N m m numericallyusingexamplesfromcompressedsensing,block sparsesignalrecovery, housepricepredictionandimagede- whereΓ=diag(γ ,γ ,...,γ )andN=diag(ν ,ν ,...,ν ). 1 2 n 1 2 m noising. Throughoutthepaper,wetakeanapproachofcom- TheRB-RVMiterativelyupdatestheprecisionsbymaximiz- paringSD-RVMvis-a-vistheexistingRobustBayesianRVM ingp(y,γ,ν,β),resultingintheupdateequations (RB-RVM)[6](describedinthenextsection). 1−γ [Σ ] 1−ν [Σ ] γnew = i RB ii, νnew = i RB n+i,n+i, 1.3. Priorwork i xˆ2 i eˆ2 i i To establish relevance of our work we briefly describe prior (cid:80)n γ [Σ ] +(cid:80)m ν [Σ ] βnew = i=1 i RB ii j=1 j RB n+i,n+i, (5) workinthissection. Almostallpriorworks[6,7,8,9]trans- ||y−Axˆ−ˆe||2 2 latethelinearsetup(1)totheequivalentsetup (cid:20) (cid:21) where [Σ ] denotes the (i,i) component of the matrix (cid:2) (cid:3) x RB ii y= A Im e +n, (2) ΣRB. The update equations (4) and (5) are found by applying (cid:2) (cid:3) whereI isthem×midentitymatrix, A I actsasthe thestandardRVMto(2). Derivationscanbefoundine.g. [1, m m effectivesystemmatrixand(cid:2)x(cid:62) e(cid:62)(cid:3)(cid:62) actsastheparameter 2]. Iteratinguntilconvergencegivesthefinalestimatesxˆ and eˆ. In the iterations, some precisions become large, making vectortobeestimated. TheRB-RVMof[6]usesthestandard their respective components in xˆ and eˆ close to zero. This RVMapproachfor(2)directly.HenceRB-RVMlearnsmodel parametersforallthreesignalsx = [x , x , ...,x ](cid:62),e = makesthefinalestimateofxˆ andeˆsparse. 1 2 n [e , e , ...,e ](cid:62) and n, and thus estimates both x and e RVMhashighsimilaritywithSparseBayesianLearning 1 2 m jointly. RB-RVMassumesGaussianpriors (SBL)[16,17,3,4]. SparseBayesianlearninghasbeenused forstructuredsparsesignals,forexampleblocksparsesignals n m (cid:89) (cid:89) [16], where the problem of unknown signal block structure x∼ N(0,γ−1), e∼ N(0,ν−1), n∼N(0,β−1I ), i i m wastreatedusingoverlappingblocks.Themodelextensionof i=1 i=1 RB-RVM shown in (2) for handling block sparse noise with wheretheprecisions(inversevariances)α ,ν andβ areun- unknownblockstructureisstraight-forwardtoderive. How- i i known. TheprecisionsaregivenGammapriors ever, in our formulation, as we are not estimating the noise explicitly,theuseofblocksparsenoisewithunknownblock p(γ )=Gamma(γ |a+1,b), (3) i i structureisnon-trivial. p(νi)=Gamma(νi|a+1,b), Further,nonBayesian(evennotstatistical)methodshave p(β)=Gamma(β|c+1,d), beenusedforsparseestimationproblems[7,18,19,20]. For example, the (cid:96) -norm minimization method justice pursuit 1 whereGamma(γi|a+1,b) ∝ γiae−bγi [1]. Typicalpractice (JP) [7] uses the optimization technique of the standard ba- istomaximizep(y|γ,ν,β)toinfertheprecisions,wherewe sispursuitdenoisingmethod[18],asfollows usedboldfacesymbolstodenotethevectors xˆ,ˆe=argmin ||x|| +||e|| s.t. ||y−Ax−e|| ≤(cid:15), γ =[γ1,γ2,...,γn](cid:62), x,e 1 1 2 ν =[ν ,ν ,...,ν ](cid:62). (6) 1 2 m Instead we take the alternative (full Bayesian) approach of where (cid:15) > 0 is a model parameter set by the user. For un- maximizing p(y,γ,ν,β) and assume that precisions have known noise power, it is impossible to know (cid:15) a-priori. We non-informativepriorbytakingthelimit(a,b,c,d)→0. For mention that a fully Bayesian setup like the RVM does not thedistributionsconsideredhere,maximizingtheconditional requireparameterssetbyauser. 2. RVMFORCOMBINEDSPARSEANDDENSE andthedeterminantlemma[21] NOISE(SD-RVM) det(B−1+AΓ−1A(cid:62))=det(Σ−1)det(Γ−1)det(B−1). 2.1. SD-RVMMethod (12) For(1),weproposetouseacombinedmodelforthetwoad- Using (11) and (12) we find that L is maximized w.r.t. γ i ditivenoises,asfollows when e+n∼N(0,B−1), (7) −1Σ + 1 + a −b− 1xˆ2 =0. (13) 2 ii 2γ γ 2 i i i where B = diag(β ,β ,...,β ). We also use β to denote 1 2 m thevectorβ =[β1,β2,...,βm](cid:62).Thatmeansthetwonoises Insteadofsolvingforγi(whichwouldrequiresolvinganon- aretreatedasasinglecombinednoisewhereeachnoisecom- linearcoupledequationsinceΣandxˆdependonγi)weap- ponenthasitsownprecision. Therationaleisthatwedonot proximatetheequationas need to seperate the two noises. Although our model pro- 1−γ Σ +2a−(xˆ2+2b)γnew =0. (14) motessparsityinthenoiseweempiricallyfindthatitisable i ii i i tomodelsparseandnon-sparsenoise. Usingthenoisemodel Wesolve(14)forγnewratherthan(32)forγ sinceitinprac- (7) and that x ∼ (cid:81)n N(0,γ−1), we find the maximum a i i i=1 i ticeoftenresultsinabetterconvergence[1,22]. Theupdate posteriori(MAP)estimate equationthenbecomes xˆ=ΣA(cid:62)By, 1−γ Σ +2a γnew = i ii . Σ=(Γ+A(cid:62)BA)−1, i xˆ2+2b i where as before Γ = diag(γ ,γ ,...,γ ). The precisions Settinga=b=0weobtain(8). 1 2 n areupdatedas Forthenoiseprecisionsweusethat γnew = 1−γiΣii, (8) ∂ (cid:2)y(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:3)=[y−Axˆ]2. (15) i xˆ2 ∂β j i j 1−β [AΣA(cid:62)] βjnew = [yj−Axˆ]2 jj, (9) WefindthatLismaximizedw.r.t. βj when j 1 1 1 c − tr(ΣA(cid:62)A )+ − [y−Axˆ]2+ −d=0, whereΣii =[Σ]ii. Thederivationsof(8)and(9)aregivenin 2 j,: j,: 2β 2 j β j j thenextsection. where A denotes the j’th row vector of A. Rewriting the j,: 2.2. DerivationofupdateequationsforSD-RVM equationas Toupdatetheprecisionswemaximizethedistributionp(y,γ,β)= 1−β A ΣA(cid:62) +2c−([y−Axˆ]2+2d)βnew =0, j j,: j,: j j p(y|γ,β)p(γ)p(β)(obtainedbymarginalizingoverx),with respecttoγ andβ ,whereweusetheprior usingthatA ΣA(cid:62) =[AΣA(cid:62)] ,wefindthat i j j,: j,: jj p(βj)=Gamma(βj|c+1,d), βnew = 1−βj[AΣA(cid:62)]jj +2c. j [y−Axˆ]2+2d andp(γ )isasin(3). Thelog-likelihoodoftheparametersis j i Settingc=d=0weobtain(9). 1 L=constant− logdet(B−1+AΓ−1A(cid:62)) (10) Thederivationsof(11)and(15)aregiveninAppendix6.1. 2 1 − y(cid:62)(B−1+AΓ−1A(cid:62))−1y 2.3. Analysisofsparsity 2 n m (cid:88) (cid:88) Severalapproximationsaremadeinthederivationoftheiter- + (alogγ −bγ )+ (clogβ −dβ ). i i j j ativeupdateequations.Itisinterestingtoseehowtheapprox- i=1 j=1 imationsaffectthesparsityofthesolution. Inthissubsection, WemaximizeLw.r.t. γ bysettingthederivativetozero. weshowthattheapproximationsmaketheSD-RVMequiva- i Tosimplifythederivativeweusethat lent to minimizing a non-symmetric sparsity promoting cost function. ∂ (cid:0)y(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:1)=xˆ2, (11) TomotivatethatthestandardRVMissparsitypromoting, ∂γi i onecanusethatthemarginaldistributionofxi isastudent-t distribution. Forafixedβ (ande=0),thestandardRVMis thereforeaniterativemethodforsolving(detailscanbefound in[1]) n β (cid:16) a(cid:17)(cid:88) min ||y−Ax||2+ 1+ log(x2+2b). x 2 2 2 i i=1 Thelog-sumcostfunctioncanbeusedasasparsitypromot- ingcostfunction,makingitplausiblethattheRVMpromotes sparsity. For the SD-RVM, the precisions are updated by maxi- mizing the marginal distribution p(y,γ,β). The problem is Fig.1.Thenon-symmetriclog-balllog(x21+0.02)+log(x22+ equivalenttomaximizingLin(31).Weshowapproximations 0.1) ≤ 0. SD-RVM is equivalent to finding the small- forrelevantpartsoftherighthandsideofLasfollows estnon-symmetriclog-ballthatintersectsthelinearsubspace Ax+˜e=y. logdet(Σ−1)≈logdet((Σold)−1)+ n m (cid:88) (cid:88) it can be shown that the standard RVM and RB-RVM are Σold(γ −γold)+ [AΣoldA(cid:62)] (β −βold), (16) ii i i jj j j also equivalent to non-symmetric cost functions under ap- i=1 j=1 propriateapproximations. Atwo-dimensionalexampleusing wheretheapproximationisuptofirstorderinγ andβ. We x=[x1, x2](cid:62)isshowninFig.1. rewritetheprobleminvariablesxand˜eusingthat[23] 2.4. Computationalcomplexity n m (cid:88) (cid:88) y(cid:62)(AΓ−1A(cid:62)+B−1)−1y=min γix2i + βje˜2j, In this section we take a non-rigorous approach for quanti- x,˜e i=1 j=1 fying the computational complexity of SD-RVM. The com- (17) plexity is computed per iteration, since the number of iter- suchthatAx+˜e=y ations depends on the stopping criterion used, and with the assumptionofanaiveimplementation. EachiterationofSD- wherenow˜e=e+nasin(7).Undertheapproximation(16) RVM requires O(n3) flops to compute the matrix Σ using and the reformulation (17), maximization of logp(y,γ,β) Gauss-Jordan elimination [24]. Updating the precisions re- becomesequivalentto quires O(nm) flops since the residual y−Axˆ needs to be computed. HencethecomputationalcomplexityofSD-RVM n min (cid:88)(cid:2)(x2+Σold+2b)γ +(1+2a)log(γ )(cid:3) is i ii i i γi,βj,x,˜ei=1 O(max(nm,n3))=O(n·max(m,n2)). m (cid:88)(cid:104) (cid:105) + (e2j +[AΣoldA(cid:62)]jj +2d)βj +(1+2c)log(βj) . AnaturalinterestisthecomplexityofRB-RVM.Againwith j=1 the assumption of a naive implementation, each iteration of suchthatAx+˜e=y RB-RVMrequirestheinversionofa(n+m)×(n+m)matrix to compute Σ . Updating the precisions requires O(nm) RB By minimizing the objective with respect to γi and βj, the flopsandhencethecomputationalcomplexityofRB-RVMis problemreducesto O(max(nm,(n+m)3))=O((n+m)3). n (cid:88) min (1+2a) log(x2+Σold+2b) (18) In Section 4.1 we provide numerical evaluations to quantify i ii x,˜e i=1 algorithm run time requirements that confirm that SD-RVM (cid:88)m istypicallyfasterthanRB-RVM. +(1+2c) log(e˜2+[AΣoldA(cid:62)] +2d), j jj j=1 3. SD-RVMWITHBLOCKSTRUCTURE suchthatAx+˜e=y 3.1. SD-RVMforknownblockstructure where we have ignored additive constants. Because of the approximations, the constants Σold and [AΣoldA(cid:62)] make Todescribeablocksparsesignalx ∈ Rn withknownblock ii jj thecostfunctionnon-symmetricinthecomponentsofxand structurewepartition[n]={1,2,...,n}intoblocksas ˜e. The SD-RVM is thus equivalent to minimizing a non- [n]=I ∪I ∪···∪I , symmetricsparsitypromotingcostfunction. Inasimilarway 1 2 p where|I |=n andI ∩I =∅fori(cid:54)=j. Thesignalisblock i i i j sparsewhenonlyafewblocksofthesignalarenon-zero.The component-wiseSD-RVMgeneralizestothisscenariobyre- quiring that the precisions are equal in each block, i.e. we = + + choose the prior distribution for the components of block I i tobe x ∼N(0,γ−1I ). Ii i ni γ−1 γ−1 γ−1 γ−1 1 2 3 wherexIi ∈Rni denotesthevectorconsistingofthecompo- nentsofxwithindicesinI . i Similarly we can partition the components of the sparse noise˜e∈Rmintoblocksas = + + + + [m]=J ∪J ∪···∪J , 1 2 q where|J | = m ,J ∩J = ∅fori (cid:54)= j andtheblockJ of j j j i j eisgiventhepriordistribution γ−1 γ˜−1 γ˜−1 γ˜−1 γ˜−1 γ˜−1 1 2 3 4 5 ˜e ∼N(0,β−1I ). Jj j mj Fig.2. Illustrationofnon-overlappingandoverlappingblock Asbefore,theprecisionsaregivengammadistributions(3)as parameterizations. priors,wherenow p(βj)=Gamma(βj|c+1,d). (19) of the blocks to which the component belongs. Let γi be theprecisionofthecomponentx andγ˜ betheprecisionof i k Using this model, we derive the update equations of preci- blockI . Wemodelthesignalas k sionsasbelow x ∼N(0,γ−1), n −γ tr(Σ )+2a i i γnew = i i Ii , (20) (cid:88) i ||xˆ ||2+2b γ−1 = γ˜−1. (22) Ii 2 i k βnew = mj −βjtr([AΣA(cid:62)]Jj)+2c, (21) k,i∈Ik j ||(y−Axˆ) ||2+2d We model the noise in a similar way with precisions β for Jj 2 j componentj andprecisionsβ˜ fortheblockwithsupportJ . l l where Σ denotes the n × n submatrix of Σ formed by Ii i i To promote sparsity, the precisions of the underlying blocks elements appropriately indexed by I . By setting I = {i} i i aregivengammadistributionsaspriors andJ ={j}weobtaintheupdateequationsforcomponent- j wisesparsesignalandnoise. WeseethatwhenI = {i}and γ˜ ∼Gamma(γ˜ |a+1,b), i k k Jj = J = [m], then(21)reducestotheupdateequationsof β˜ ∼Gamma(β˜|c+1,d). thestandardRVMsince l l n−βtr(AΣA(cid:62))=(cid:88)γ Σ . Ineachiterationweupdatetheunderlyingprecisionsγ˜k. The i ii componentwiseprecisionsarethenupdatedusing(22). With i thismodel,theupdateequationsfortheprecisionsbecome The derivation of the update equations (20) and (21) is 1 tr(Γ )− 1 tr(Γ ΣΓ )+2a foundinAppendix6.2. γ˜new = γ˜k k γ˜k k k , (23) k 1 ||Γ xˆ||2+2b γ˜2 i 2 k 3.2. SD-RVMforunknownblockstructure 1tr(B )− 1tr(B A(cid:62)ΣAB )+2c β˜new = β˜l l β˜l l l , (24) In some situations the signal can have an unknown block l 1 ||B (y−Axˆ)||2+2d β˜2 l 2 structure, i.e. the signal is block sparse, but the dimensions l and positions of the blocks are unknown. This scenario can whereΓ isthediagonalmatrixwith[Γ ] =γ ifi∈I and k k ii i k behandledbytreatingthesignalasasuperpositionofblock [Γ ] =0otherwise.Wedenotethecorrespondingmatrixfor k ii sparse signals [16] (see illustration in Figure 2). This ap- β by B . The componentwise precisions are updated using l l proach also describes the scenario (1) when e is component (22)andsimilarforβ . j wise sparse and n is dense (e.g. Gaussian). The precision Weseethatwhentheunderlyingblocksaredisjoint,then of each component is then a combination of the precisions γ = γ˜ for all i ∈ I and β = β˜ for all j ∈ J . The i k k j l l updateequationsthenreducetotheupdateequations(21)for theblocksparsemodelwithknownblockstructure. 3.2.1. Sparseanddensenoise Inthemodel(1)wherexandearecomponentwisesparseand nisdense,then I ={i}, J ={j}, J =[m], (25) i j m+1 where i = 1,2,...,n and j = 1,2,...,m. In this scenario the support set of the sparse and dense noise is overlapping, sotheupdateequationsfortheprecisionsbecome 1−Σ γ +2a γnew = ii i , i xˆ2i +2b Fig.3. NMSEvs. m/nforoutlierfreemeasurements. 1− βj[A(cid:62)ΣA] +2c β˜new = β˜j jj , j =1,2,...,m, j β [y−Axˆ]2+2d j j (cid:80)m β − 1 (cid:80)m β2[A(cid:62)ΣA] +2c β˜new = j=1 j β˜m+1 j=1 j jj , m+1 (cid:80)m β2[y−Axˆ]2+2d j=1 j j β =(β˜−1+β−1 )−1. j j m+1 Wewillusetheseupdateequationsinthesimulationswhere the signal is component-wise sparse and the noise is a sum of(component-wise)sparseanddensenoise. Itturnsoutthat thismethodisslightlybetterthantheSD-RVMinsection2.1. 4. SIMULATIONEXPERIMENTS In this section we evaluate the performance of the SD-RVM Fig. 4. NMSE vs. m/n for 5% outliers contaminated mea- using several scenarios – for simulated and real signals. For surements. simulatedsignals,weconsideredthesparseandblocksparse recovery problem in compressed sensing. Then for real sig- nals, we considered prediction of house prices using the In the simulations we varied the measurement rate m/n Boston housing dataset [25] and denoising of images con- (ratio of the number of measurements and the signal dimen- taminated by salt and pepper noise. In the simulations we sion)formeasurementswithoutoutliersandwith5%outliers. usedthecvxtoolbox[26]toimplementJP. We chose n = 100 and fixed the signal-to-dense-noise-ratio (SDNR) 4.1. Compressedsensing SDNR=E[||Ax||2]/E[||n||2]=||x|| /(mσ2), 2 2 0 n Therecoveryproblemincompressedsensingconsistsofesti- matingasparsevectorx∈Rnin(1)frommlinearmeasure- to 20 dB. By generating 100 measurement matrices and 100 vectorsxandeforeachmatrixwenumericallyevaluatedthe ments, where m (cid:28) n. To evaluate the performance of the algorithms,wegeneratedmeasurementmatricesA ∈ Rm×n NormalizedMeanSquareError(NMSE) bydrawingtheircomponentsfromaN(0,1)distributionand NMSE=E[||x−xˆ||2]/E[||x||2]. scalingtheircolumnvectorstounitnorm.Weselectedthepo- 2 2 sitionsoftheactivecomponentsofxandeuniformlyatran- TheresultsareshowninFigure3andFigure4.Wefoundthat dom and draw their values from N(0,1). In the simulation SD-RVMoutperformedtheothermethods. Theimprovement we draw the additive noise n from N(0,σn2Im). We com- ofSD-RVMoverRB-RVMwas1to1.5dBform/n > 0.5, paredJP,thestandardRVM,RB-RVMandSD-RVM.ForJP withandwithoutoutliers. ComparedtoJP,theimprovement (cid:112) √ (6)weassumedσ tobeknownandset(cid:15)=σ m+2 2m of SD-RVM was 3 to 3.7 without outlier noise and 1 to 4 n n asproposedin[27]. dB with outlier noise when m/n > 0.5. The poor perfor- 0 −5 −10 E [dB]−15 BBSloBckL SD−RVM S NM−20 Block JP −25 −30 −35 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n Fig.6.NMSEvs.m/nforsignalswithknownblockstructure and5%outliersinmeasurements. Fig. 5. Histogram of cputimes for the compressed sensing problem. Timeisinseconds. 5 0 mance of RVM is due to sensitivity to the regularization pa- −5 rameters. The performance of RVM improves greatly when the regularization are optimally tuned, however, the optimal −10 values varies with SNR and measurement dimensions. The E [dB]−15 experimentsshowthattheperformanceofSD-RVMdoesnot MS N degradeintheabsenceofsparsenoise. −20 RVM BSBL Foreachrealizationoftheproblemwemeasuredtherun- RB−RVM −25 time(cputime)ofeachalgorithm. Thehistogramoftherun- SD−RVM Block SD−RVM times is shown in figure 5. We found that the runtimes of −30 Block JP the RVM algotithms (the standard RVM, RB-RVM and SD- RVM)wereshorterthantheruntimeofJPandtheruntimesof −35 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 JPwerespreadoveralargerrange. OftheRVMalgorithms, m/n SD-RVMhadthehighestconcentrationoflowruntime(≤0.2 Fig.7. NMSEvs. m/nforsignalswithunknownblockstruc- seconds), while the runtimes of the standard RVM and RB- tureand5%outliersinmeasurements. RVM was more concentrated around 0.2 seconds. The his- tograminfigure5hasbeentruncatedtoonlyshowpercentage The resulting problem can then be solved using BSBL for forthevisiblevalues. knownblockstructure. WhentheminimumblocksizeK min is known, the summation can be restricted to subsets of size 4.2. Blocksparsesignals |I|=K [16]. min The SD-RVM can be extended to the block sparse case Therecoveryproblemincompressedsensingcanbegeneral- usingthemethodsdevelopedinSection3.1andSection3.2. izedtoblocksparsesignalsandnoise[28]. Forblocksparse Justice Pursuit can be extended to the block sparse case in a signals, the signal components are partitioned into blocks of similarwayasBSBLbysetting whichonlyafewblocksarenon-zero.SparseBayesianlearn- ing (SBL) extended to the block sparse signal case is often (cid:88) (cid:88) xˆ,ˆe=argmin ||x || + ||e || , (26) referred to as block SBL (BSBL) [16, 17]. The problem of x,e Ii 2 Jj 2 unknownblockstructurecanbesolvedbyoverparametrizing i j theblocks[16]. InBSBL[16],thesignalismodelledas suchthat||y−Ax−e||2 ≤(cid:15) (cid:88) wherethesumrunsoverallblocks(non-overlappingorover- Ax= A x , I I lapping) and as before we assume the noise variance to be I⊂[n] (cid:112) √ known and set (cid:15) = σ m+ 8m. For unknown block n i.e. the measured signal is modelled as a sum of signals structure we also compared with component sparse methods where each signal represents a block of the original signal. RVM,RB-RVMandSD-RVM. To numerically evaluate the performance of the block Table 1. Prediction of median houseprice using the Boston sparse algorithms we varied the measurement rate m/n for Housingdataset. Meanerrorandmeancputime(inseconds) measurementswith5%sparsenoise.Wesetthesignaldimen- fordifferentfractions,ρ,ofthedatasetusedastrainingset. sionton=100andfixedtheSDNRto20dB.Wedividedthe RVM RB-RVM SD-RVM signalxinto20blocksofequalsizeofwhich3blockswere non-zero. The sparse noise consisted of blocks with 5 com- ρ Error Cputime Error Cputime Error Cputime ponentsineachblock. Inthesparsenoise, 5%oftheblocks 0.3 1.24 0.18 0.43 0.60 0.38 0.15 were active. For known block structure, the blocks were 0.4 1.26 0.29 0.39 1.25 0.35 0.25 choosen uniformly at random from a set of predefined and 0.5 1.27 0.42 0.39 2.20 0.36 0.38 non-overlapping blocks while for unknown block structure, 0.6 1.28 0.60 0.41 3.28 0.37 0.53 the first component of each block was choosen uniformly at 0.7 1.28 0.92 0.45 5.27 0.43 0.80 random, making it possible for the blocks to overlap. The active components of the signal and the sparse noise were 24 drawn from N(0,1). By generating 50 measurement matri- Median filter ces A and 50 signals x and sparse noises e for each matrix 22 RVM wenumericallyevaluatedtheNMSE. RB−RVM 20 SD−RVM ForknownblockstructurewefoundthatblockSD-RVM outperformed the other methods. The NMSE of the block 18 sparse SD-RVM was lower than the NMSE of block JP by R more than 10 dB for m/n = 0.3,0.4 and from 2 to 6 dB SN 16 P lowerform/n≥0.5. TheresultsarepresentedinFigure6. 14 For unknown block structure we found that for m/n < 0.5,SD-RVMforunknownblockstructuregavebestperfor- 12 mance while for m/n ≥ 0.5, the usual component sparse 10 SD-RVMgavethebestperfomance. TheNMSEofJPforun- knownblockstructurewasabout5dBlargerthantheNMSE 8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ofblockSD-RVMfor0.4 ≤ m/n ≤ 0.6, whileform/n ≥ ρ 0.9 block JP gave a better NMSE than block SD-RVM. As expected,RVMandBSBLgavepoorperformancesincethey Fig. 8. PSNR vs. percentage of salt and pepper noise (ρ) arenotdevelopedtohandlemeasurementswithsparsenoise. averaged over 7 images with 10 noise realizations for each Theresultsareshowninfigure7. imageforeachvalueofρ. 4.3. Housepriceprediction theaveragecustomer,xcanthereforebemodelledasasparse Onerealworldproblemisthepredictionofhouseprices. To vector. testthealgorithmsonrealdata, weusedtheBostonhousing We used a fraction ρ of the dataset as training data and dataset[25]. Thedatasetconsistsof506housepricesinsub- the rest as test set. By choosing the training set uniformly urbs of Boston and 13 parameters (air quality, accessibility, at random we evaluated the mean absolute error of the pre- pupil-to-teacher ratio, etc.) for each house. The problem is dictedmedianandmeancputime(inseconds)over1000real- topredictthemedianhousepriceforpartofthedataset(test izations. data)usingthecomplementdataset(trainingdata)tolearnre- gressionparameters. Wemodelthehousepricesas WefoundthatSD-RVMgave10%to5%lowermeaner- rorthanthatofRB-RVMandthemeanerrorofRB-RVMand pi =wi(cid:62)x+ni+ei, SD-RVM was about 70% lower than the error of the RVM (seeTable1). ThecputimeofSD-RVMwas16%to25%of where p is the price of house i, w ∈ R13 contains the pa- i i thecputimeofRB-RVM. rameters of house i, x ∈ R13 is the regression vector, n is i (Gaussian)noiseande isa(possible)outlier.Veryexpensive i 4.4. Imagedenoising or inexpensive houses can treated as outliers. The goal is to estimatethemedianhousepriceforthetestset. Wefindthe A grayscale image (represented in double-precision) can be medianbyestimatingtheregressionparametersandsetting modelled as an array of numbers in the interval from 0 to mˆ =median(W(cid:62)xˆ), 1. Common sources of noise in images are electronic noise in sensors and bit quantization errors. Salt and Pepper [6] where W contains the parameters of the houses in the test noise makes some pixles black (0) or white (1). To test the set. Itisbelievedthatonlyafewparametersareimportantto algorithms we added ρ percent of salt and pepper noise in Fig.9. Onerealizationofsaltandpeppernoisedenoisingforρ = 0.2. Columnsfromlefttoright: Noisyimage,medianfilter, RB-RVMandSD-RVM.PeakSignaltoNoiseRatio(PSNR)hasbeenroundedtotwodecimals. 7 different images (Boat, Baboon, Barbara, Elaine, House, vectorizationoperator[29],i.e. LenaandPeppers)anddenoisedthemusingthemedianfilter,   (cid:18)(cid:18) (cid:19)(cid:19) a the RVM, the RB-RVM and the SD-RVM. The pixels were a b vech = b . settoeitherblackorwhitewithequalprobability. b c c Themedianfilterestimatesthevalueofeachpixelbythe Giventheregressionparameters,thevalueofthecentralpixel medianina3×3squarepatch. FortheRVM,RB-RVMand is estimated as yˆ = βˆ . Since pixels close to the central 0 SD-RVM, the value of a pixel was estimated by forming a pixel are more important, the errors are weighted by a ker- 5×5squarepatcharoundthepixel. Inthepatch, thepixels nel,K(x,x ). Theestimationproblemthusbecomes i weremodeledas[29] (cid:88)P (cid:12) min (cid:12)y −β −β(cid:62)(x−x ) (cid:12) i 0 1 i β0,β1,β2i=1 yi =β0+β(cid:62)1(x−xi)+β(cid:62)2vech((x−xi)(x−xi)(cid:62))+ni, −β(cid:62)vech((x−x )(x−x )(cid:62))(cid:12)(cid:12)2K(x,x ), 2 i i (cid:12) i whereweusedthekernel wheren isnoise,xisthepositionofthecentralpixel,x is i i K(x,x )=exp(cid:0)−||x−x ||2/r2(cid:1)(cid:0)1+x(cid:62)x (cid:1)p, thepositionofpixeli, i = 1,2,...,25andvechisthehalf- i i 2 i Forfixedprecisionsγ andβ,theMaximumAPosteriori Table2.Meancputime(inseconds)fordenoisingimagescor- (MAP)estimateofxbecomes ruptedbysaltandpeppernoiseaveragedover7images. Algorithm Meancputime xˆ=argmax logp(y,x|γ,β) x Medianfilter 5 =argmin (y−Ax)(cid:62)B(y−Ax)+x(cid:62)Γx RVM 925 x RB-RVM 1154 =ΣA(cid:62)By, SD-RVM 788 whereΣ=(Γ+A(cid:62)BA)−1. TheformoftheMAPestimate isthesameforallmodelsconsideredinthispaper. thekernelisacompositionofaGaussianandpolynomialker- nel[6,29]. Inthesimulationweusedr = 2.1andp = 1as 6.1. Derivationof(11)and(15) in[6]. Toavoidoverfitting,itisbeneficialtopromotesparsity in[β0, β(cid:62)1, β(cid:62)2](cid:62)[30,31,32]. Proofof (11). Since WecomparedthealgorithmsbycomputingthePeakSig- n naltoNoiseRatio(PSNR) (cid:88) B−1+AΓ−1A(cid:62) =B−1+ γ−1a a(cid:62), i i i (cid:32) E[||X−Xˆ||2] (cid:33) i=1 PSNR=−10·log F , 10 E[maxi,j|Xij|2](pq) whereaiisthei’thcolumnvectorofAwefindthat where the size of the image is p×q and the expectation is ∂ (cid:0)y(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:1) ∂γ takenoverthedifferentimagesandrealizationsofthenoise. i Allimagesinthesimulationwereofsizep = q = 256with =γ−2(cid:0)a(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:1)2. i i max |X |2 = 1. Figure 9 shows one realization of the i,j ij problem,whereSD-RVMgiveslowerPSNRthanthemedian Usingthat filterandRB-RVM.Inthesimulationswevariedρandused 10 noise realizations for each image. The result is shown in Γ−1A(cid:62)(B−1+AΓ−1A(cid:62))−1y (27) figure8. =Γ−1A(cid:62)(cid:0)B−BA(Γ+A(cid:62)BA)−1A(cid:62)B(cid:1)y We found that the median filter performed best for ρ ≤ =Γ−1A(cid:62)By−Γ−1A(cid:62)BA(Γ+A(cid:62)BA)−1A(cid:62)By 0.1whileSD-RVMoutperformedRB-RVMforallvaluesof (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) ρ and also the median filter for ρ ≥ 0.2. The gain in using =Σ−1−Γ =Σ SD-RVMoverthemedianfilterwassignificantfortheimages =ΣA(cid:62)By=xˆ, Boat, Elaine, Lena, House and Peppers (for ρ ≥ 0.2). The meancputimeofSD-RVMwas68%ofthemeancputimeof wegetthat RB-RVM (see Table 4.4) while the median filter was by far thefastestmethod. ∂ (cid:0)y(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:1)=γ−2(γ xˆ )2 =xˆ2. ∂γ i i i i i 5. CONCLUSION Proofof (15). Since In this paper we introduced the combined Sparse and Dense noiseRevelanceVectorMachine(SD-RVM)whichisrobust y(cid:62)(B−1+AΓ−1A(cid:62))−1y to sparse and dense additive noise. SD-RVM was shown to =y(cid:62)By−yBA(Γ+A(cid:62)BA)−1A(cid:62)By, beequivalenttotheminimizationofanon-symmetricsparsity (cid:124) (cid:123)(cid:122) (cid:125) promotingcostfunction. Throughsimulations,SD-RVMwas =Σ shown to empirically perform better than the standard RVM wegetthat andtherobustRB-RVM. ∂ (cid:0)y(cid:62)(B−1+AΓ−1A(cid:62))−1y(cid:1) ∂β 6. APPENDIX:DERIVATIONOFUPDATE j =y2−2y A ΣA(cid:62)By+y(cid:62)BAΣA(cid:62)A ΣA(cid:62)By EQUATIONS j j j,: j,: j,: =y2−2y A xˆ+(A x)2 =[y−Axˆ]2. j j j,: j,: j Herewederivetheupdateequationsforxˆ,γ andβ forthe i j SD-RVMinSections2.1,3.1and3.2.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.