ebook img

Least Angle Regression PDF

93 Pages·2004·0.82 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Least Angle Regression

TheAnnalsofStatistics 2004,Vol.32,No.2,407–499 ©InstituteofMathematicalStatistics,2004 LEAST ANGLE REGRESSION BY BRADLEY EFRON,1 TREVOR HASTIE,2 IAIN JOHNSTONE3 AND ROBERT TIBSHIRANI4 StanfordUniversity ThepurposeofmodelselectionalgorithmssuchasAllSubsets,Forward Selection and Backward Elimination is to choose a linear model on the basisof thesameset ofdatatowhichthemodel willbeapplied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable.LeastAngleRegression(LARS),anewmodelselectionalgorithm, isausefulandlessgreedyversionoftraditionalforwardselectionmethods. Threemainproperties arederived: (1) A simplemodification of theLARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods.(2)AdifferentLARSmodificationefficientlyimplementsForward Stagewiselinearregression,anotherpromisingnewmodelselectionmethod; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of bothmethods,whichareseenasconstrainedversionsofthesimplerLARS algorithm.(3)AsimpleapproximationforthedegreesoffreedomofaLARS estimateisavailable,fromwhichwederiveaCpestimateofpredictionerror; thisallowsaprincipledchoiceamongtherangeofpossibleLARSestimates. LARS and its variants are computationally efficient: the paper describes apubliclyavailablealgorithmthatrequiresonlythesameorderofmagnitude of computational effort as ordinary least squares applied to the full set of covariates. 1. Introduction. Automatic model-building algorithms are familiar, and sometimesnotorious,inthelinearmodelliterature:ForwardSelection,Backward Elimination, All Subsets regression and various combinations are used to auto- matically produce “good” linear models for predicting a response y on the basis of some measured covariates x ,x ,...,x . Goodness is often defined in terms 1 2 m ofpredictionaccuracy,butparsimonyisanotherimportantcriterion:simplermod- els arepreferred for thesakeofscientificinsightinto the x−y relationship. Two promisingrecentmodel-buildingalgorithms,theLassoandForwardStagewiselin- ReceivedMarch2002;revisedJanuary2003. 1SupportedinpartbyNSFGrantDMS-00-72360andNIHGrant8R01-EB002784. 2SupportedinpartbyNSFGrantDMS-02-04162andNIHGrantR01-EB0011988-08. 3SupportedinpartbyNSFGrantDMS-00-72661andNIHGrantR01-EB001988-08. 4SupportedinpartbyNSFGrantDMS-99-71405andNIHGrant2R01-CA72028. AMS2000subjectclassification.62J07. Keywordsandphrases.Lasso,boosting,linearregression,coefficientpaths,variableselection. 407 408 EFRON,HASTIE,JOHNSTONEANDTIBSHIRANI earregression,willbediscussedhere,andmotivatedintermsofacomputationally simplermethodcalledLeastAngleRegression. Least Angle Regression (LARS) relates to the classic model-selection method known as Forward Selection, or “forward stepwise regression,” described in Weisberg [(1980), Section 8.5]: given a collection of possible predictors, we select the one having largest absolute correlation with the response y, say x , j1 and perform simple linear regression of y on x . This leaves a residual vector j1 orthogonal to x , now considered to be the response. We project the other j1 predictors orthogonally to x and repeat the selection process. After k steps this j1 results in a set of predictors x ,x ,...,x that are then used in the usual way j1 j2 jk toconstructak-parameterlinearmodel.ForwardSelectionisanaggressivefitting techniquethatcanbeoverlygreedy,perhapseliminatingatthesecondstepuseful predictorsthathappentobecorrelatedwith x . j1 Forward Stagewise, as described below, is a much more cautious version of Forward Selection, which may take thousands of tiny steps as it moves toward a final model. It turns out, and this was the original motivation for the LARS algorithm, that a simple formula allows Forward Stagewise to be implemented usingfairlylargesteps,thoughnotaslargeasaclassicForwardSelection,greatly reducing the computational burden. The geometry of the algorithm, described in Section 2, suggeststhe name “LeastAngle Regression.”It then happensthat this same geometry applies to another, seemingly quite different, selection method called the Lasso [Tibshirani (1996)]. The LARS–Lasso–Stagewise connection is conceptually as well as computationally useful. The Lasso is described next, in termsofthemainexampleusedinthispaper. Table1showsasmallpartofthedataforourmainexample. Ten baseline variables, age, sex, body mass index, average blood pressure and six blood serum measurements, were obtained for each of n=442 diabetes TABLE1 Diabetesstudy:442diabetespatientsweremeasuredon10baselinevariables;apredictionmodel wasdesiredfortheresponsevariable,ameasureofdiseaseprogressiononeyearafterbaseline AGE SEX BMI BP Serummeasurements Response Patient x x x x x x x x x x y 1 2 3 4 5 6 7 8 9 10 1 59 2 32.1 101 157 93.2 38 4 4.9 87 151 2 48 1 21.6 87 183 103.2 70 3 3.9 69 75 3 72 2 30.5 93 156 93.6 41 4 4.7 85 141 4 24 1 25.3 84 198 131.4 40 5 4.9 89 206 5 50 1 23.0 101 192 125.4 52 4 4.3 80 135 6 23 1 22.6 89 139 64.8 61 2 4.2 68 97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 36 1 30.0 95 201 125.2 42 5 5.1 85 220 442 36 1 19.6 71 250 133.2 97 3 4.6 92 57 LEASTANGLEREGRESSION 409 patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline. The statisticians were asked to construct a model that predicted response y from covariates x ,x ,...,x . Two hopes 1 2 10 were evident here, that the model would produce accurate baseline predictions ofresponseforfuturepatientsandthattheformofthemodelwouldsuggestwhich covariateswereimportantfactorsindiseaseprogression. TheLassois aconstrainedversionofordinaryleastsquares(OLS). Let x ,x , 1 2 ...,x be n-vectors representing the covariates, m = 10 and n = 442 in the m diabetes study, and let y be the vector of responses for the n cases. By location and scale transformations we can always assume that the covariates have been standardizedtohavemean0andunitlength,andthattheresponsehasmean0, (cid:1)n (cid:1)n (cid:1)n (1.1) y =0, x =0, x2 =1 forj =1,2,...,m. i ij ij i=1 i=1 i=1 This is assumedto be the casein the theorywhich follows,exceptthat numerical resultsareexpressedintheoriginalunitsofthediabetesexample. A candidate vector of regression coefficients (cid:2)β = (β(cid:2),β(cid:2),...,β(cid:2) )(cid:1) gives 1 2 m predictionvectorµ(cid:2), (cid:1)m (1.2) µ(cid:2)= xjβ(cid:2)j =X(cid:2)β [Xn×m=(x1,x2,...,xm)] j=1 withtotalsquarederror (cid:1)n (1.3) S((cid:2)β)=(cid:2)y−µ(cid:2)(cid:2)2= (y −µ(cid:2) )2. i i i=1 (cid:2) (cid:2) LetT(β)betheabsolutenormofβ, (cid:1)m (1.4) T((cid:2)β)= |β(cid:2) |. j j=1 (cid:2) (cid:2) (cid:2) TheLassochoosesβ byminimizingS(β)subjecttoaboundt onT(β), (1.5) Lasso: minimize S((cid:2)β) subjectto T((cid:2)β)≤t. Quadratic programming techniques can be used to solve (1.5) though we will present an easier method here, closely related to the “homotopy method” of Osborne,PresnellandTurlach(2000a). (cid:2) TheleftpanelofFigure1showsallLassosolutionsβ(t)forthediabetesstudy, as t increases from 0, where (cid:2)β = 0, to t = 3460.00, where (cid:2)β equals the OLS regressionvector,the constraintin (1.5) nolongerbinding.Weseethatthe Lasso tends to shrink the OLS coefficients toward 0, more so for small values of t. Shrinkage often improves prediction accuracy, trading off decreased variance for increasedbiasasdiscussedinHastie,TibshiraniandFriedman(2001). 410 EFRON,HASTIE,JOHNSTONEANDTIBSHIRANI FIG. 1. Estimates of regression coefficient(cid:3)s β(cid:2)j, j =1,2,...,10, for the diabetes study. (Left panel)Lassoestimates,asafunctionoft = j|β(cid:2)j|.Thecovariatesentertheregressionequation sequentiallyast increases, inorderj =3,9,4,7,...,1.(Rightpanel)ThesameplotforForward Stagewise Linear Regression. The two plots are nearly identical, but differ slightly for large t as showninthetrackofcovariate8. TheLassoalsohasaparsimonyproperty:foranygivenconstraintvaluet,only a subset of the covariates have nonzero values of β(cid:2) . At t =1000, for example, j only variables 3, 9, 4 and 7 enter the Lasso regression model (1.2). If this model provides adequate predictions, a crucial question considered in Section 4, the statisticianscouldreportthesefourvariablesastheimportantones. Forward Stagewise Linear Regression, henceforth called Stagewise, is an iterative technique that begins with µ(cid:2) =0 and builds up the regression function in successive small steps. If µ(cid:2) is the current Stagewise estimate, let c(µ(cid:2)) be the vectorofcurrentcorrelations (1.6) (cid:2)c=c(µ(cid:2))=X(cid:1)(y−µ(cid:2)), so that(cid:2)c is proportional to the correlation between covariate x and the current j j residualvector.The nextstepof the Stagewisealgorithm is takenin the direction ofthegreatestcurrentcorrelation, (1.7) j(cid:2)=argmax|(cid:2)cj| and µ(cid:2)→µ(cid:2)+ε·sign((cid:2)cjˆ)·xjˆ, with ε some small constant. “Small” is important here: the “big” choice ε=|(cid:2)cˆ| j leads to the classic Forward Selection technique, which can be overly greedy, impulsively eliminating covariates which are correlated with xˆ. The Stagewise j procedure is related to boosting and also to Friedman’s MART algorithm LEASTANGLEREGRESSION 411 [Friedman (2001)]; see Section 8, as well as Hastie, Tibshirani and Friedman [(2001),Chapter10andAlgorithm10.4]. The right panelof Figure 1 showsthe coefficientplot for Stagewiseapplied to the diabetes data. The estimates were built up in 6000 Stagewise steps [making ε in (1.7)smallenoughtoconcealthe“Etch-a-Sketch”staircaseseeninFigure2, Section 2]. The striking fact is the similarity between the Lasso and Stagewise estimates. Although their definitions look completely different, the results are nearly,butnotexactly,identical. The main point of this paper is that both Lasso and Stagewise are variants of a basic procedure called Least Angle Regression, abbreviated LARS (the “S” suggesting “Lasso” and “Stagewise”). Section 2 describes the LARS algorithm while Section 3 discussesmodificationsthat turn LARS into Lasso or Stagewise, reducingthecomputationalburdenbyatleastanorderofmagnitudeforeitherone. Sections5and6verifytheconnectionsstatedinSection3. Least Angle Regression is interesting in its own right, its simple structure lending itself to inferential analysis.Section 4 analyzesthe “degreesof freedom” ofaLARSregressionestimate.ThisleadstoaC typestatisticthatsuggestswhich p estimateweshouldpreferamongacollectionofpossibilitieslikethoseinFigure1. A particularly simple C approximation, requiring no additional computation p (cid:2) beyondthatfortheβ vectors,isavailableforLARS. Section7brieflydiscussescomputationalquestions.AnefficientS programfor all three methods, LARS, Lasso and Stagewise, is available. Section 8 elaborates ontheconnectionswithboosting. 2. TheLARSalgorithm. LeastAngleRegressionisastylizedversionofthe Stagewise procedure that uses a simple mathematical formula to accelerate the computations. Only m steps are required for the full set of solutions, where m is the number of covariates: m=10 in the diabetes example compared to the 6000 steps used in the right panel of Figure 1. This section describes the LARS algorithm.ModificationsofLARSthatproduceLassoandStagewisesolutionsare discussedinSection3,andverifiedinSections5and6.Section4usesthesimple structureofLARStohelpanalyzeitsestimationproperties. The LARS procedure works roughly as follows. As with classic Forward Selection, we start with all coefficients equal to zero, and find the predictor most correlated with the response, say x . We take the largest step possible in j1 the direction of this predictor until some other predictor, say x , has as much j2 correlation with the current residual. At this point LARS parts company with ForwardSelection.Insteadofcontinuingalongx ,LARSproceedsinadirection j1 equiangular between the two predictors until a third variable x earns its way j3 intothe“mostcorrelated”set.LARSthenproceedsequiangularlybetweenx ,x j1 j2 andx ,thatis,alongthe“leastangledirection,”untilafourthvariableenters,and j3 soon. 412 EFRON,HASTIE,JOHNSTONEANDTIBSHIRANI The remainder of this section describes the algebra necessary to execute the equiangular strategy. As usual the algebraic details look more complicated than thesimpleunderlyinggeometry,buttheyleadtothehighlyefficientcomputational algorithmdescribedinSection7. LARSbuildsupestimatesµ(cid:2)=X(cid:2)β,(1.2),insuccessivesteps,eachstepadding (cid:2) one covariate to the model, so that after k steps just k of the β ’s are nonzero. j Figure 2 illustrates the algorithm in the situation with m = 2 covariates, X = (x ,x ).Inthiscasethecurrentcorrelations(1.6)dependonlyontheprojectiony¯ 1 2 2 ofyintothelinearspaceL(X)spannedbyx andx , 1 2 (2.1) c(µ(cid:2))=X(cid:1)(y−µ(cid:2))=X(cid:1)(y¯ −µ(cid:2)). 2 The algorithm begins at µ(cid:2) = 0 [remembering that the response has had its 0 mean subtracted off, as in (1.1)]. Figure 2 has y¯ −µ(cid:2) making a smaller angle 2 0 withx thanx ,thatis,c (µ(cid:2) )>c (µ(cid:2) ).LARSthenaugmentsµ(cid:2) inthedirection 1 2 1 0 2 0 0 ofx ,to 1 (2.2) µ(cid:2) =µ(cid:2) +γ(cid:2)x . 1 0 1 1 Stagewise would choose γ(cid:2) equal to some small value ε, and then repeat the 1 process many times. Classic Forward Selection would take γ(cid:2) large enough to 1 make µ(cid:2) equal y¯ , the projection of y into L(x ). LARS uses an intermediate 1 1 1 valueofγ(cid:2),thevaluethatmakesy¯ −(cid:2)µ,equallycorrelatedwithx andx ;thatis, 1 2 1 2 y¯ −µ(cid:2) bisectstheanglebetweenx andx ,soc (µ(cid:2) )=c (µ(cid:2) ). 2 1 1 2 1 1 2 1 FIG. 2. The LARS algorithm in the case of m=2 covariates; y¯2 is the projection of y into L(x1,x2).Beginningatµ(cid:2)0=0,theresidualvectory¯2−µ(cid:2)0hasgreatercorrelationwithx1thanx2; thenextLARSestimateisµ(cid:2) =µ(cid:2) +γ(cid:2)x ,whereγ(cid:2) ischosensuchthaty¯ −(cid:2)µ bisectstheangle 1 0 1 1 1 2 1 betweenx1andx2;thenµ(cid:2)2=µ(cid:2)1+γ(cid:2)2u2,whereu2istheunitbisector;µ(cid:2)2=y¯2inthecasem=2, butnotforthecasem>2;seeFigure4.ThestaircaseindicatesatypicalStagewisepath.HereLARS givestheStagewisetrackasε→0,butamodificationisnecessarytoguaranteeagreementinhigher dimensions;seeSection3.2. LEASTANGLEREGRESSION 413 Letu betheunitvectorlyingalongthebisector.ThenextLARSestimateis 2 (2.3) µ(cid:2) =µ(cid:2) +γ(cid:2)u , 2 1 2 2 with γ(cid:2) chosen to make µ(cid:2) = y¯ in the case m = 2. With m > 2 covariates, 2 2 2 γ(cid:2) would be smaller, leading to another change of direction, as illustrated in 2 Figure 4. The “staircase” in Figure 2 indicates a typical Stagewise path. LARS is motivated by the fact that it is easy to calculate the step sizes γ(cid:2),γ(cid:2),... 1 2 theoretically,short-circuitingthesmallStagewisesteps. Subsequent LARS steps, beyond two covariates, are taken along equiangular vectors, generalizing the bisector u in Figure 2. We assume that the covariate 2 vectors x ,x ,...,x are linearly independent. For A a subset of the indices 1 2 m {1,2,...,m},definethematrix (2.4) XA=(··· sjxj ···)j∈A, wherethesignss equal±1.Let j (2.5) G =X(cid:1) X and A =(1(cid:1) G−11 )−1/2, A A A A A A A 1 beingavectorof1’soflengthequaling|A|,thesizeofA.The A (2.6) equiangularvector u =X w wherew =A G−11 , A A A A A A A ◦ istheunitvectormakingequalangles,lessthan90 ,withthecolumnsofX , A (2.7) X(cid:1) u =A 1 and (cid:2)u (cid:2)2=1. A A A A A We can now fully describe the LARS algorithm. As with the Stagewise procedure we begin at µ(cid:2) =0 and build up µ(cid:2) by steps, larger steps in the LARS 0 case.Supposethatµ(cid:2) isthecurrentLARSestimateandthat A (2.8) (cid:2)c=X(cid:1)(y−µ(cid:2) ) A is the vector of current correlations (1.6). The active set A is the set of indices correspondingtocovariateswiththegreatestabsolutecurrentcorrelations, (2.9) C(cid:2)=max{|(cid:2)c |} and A={j:|(cid:2)c |=C(cid:2)}. j j j Letting (2.10) s =sign{(cid:2)c } forj ∈A, j j wecomputeX ,A andu asin(2.4)–(2.6),andalsotheinnerproductvector A A A (2.11) a≡X(cid:1)u . A ThenthenextstepoftheLARSalgorithmupdatesµ(cid:2) ,sayto A (2.12) µ(cid:2) =µ(cid:2) +γ(cid:2)u , A+ A A 414 EFRON,HASTIE,JOHNSTONEANDTIBSHIRANI where (cid:4) (cid:5) C(cid:2)−(cid:2)c C(cid:2)+(cid:2)c (2.13) γ(cid:2)= min+ j , j ; j∈Ac AA−aj AA+aj + “min ”indicatesthattheminimumistakenoveronlypositivecomponentswithin eachchoiceofj in(2.13). Formulas(2.12)and(2.13)havethefollowinginterpretation:define (2.14) µ(γ)=µ(cid:2) +γu , A A forγ >0,sothatthecurrentcorrelation (cid:6) (cid:7) (2.15) c (γ)=x(cid:1) y−µ(γ) =(cid:2)c −γa . j j j j Forj ∈A,(2.7)–(2.9)yield (2.16) |c (γ)|=C(cid:2)−γA , j A showing that all of the maximal absolute current correlations decline equally. For j ∈ Ac, equating (2.15) with (2.16) shows that c (γ) equals the maximal j valueatγ =(C(cid:2)−(cid:2)c )/(A −a ).Likewise−c (γ),thecurrentcorrelationforthe j A j j reversedcovariate−x ,achievesmaximalityat(C(cid:2)+(cid:2)c )/(A +a ).Thereforeγ(cid:2) j j A j (cid:2) in (2.13) is the smallest positive value of γ such that some new index j joins (cid:2) the active set; j is the minimizing index in (2.13), and the new active set A+ is A∪{j(cid:2)};thenewmaximumabsolutecorrelationisC(cid:2)+=C(cid:2)−γ(cid:2)AA. Figure 3 concerns the LARS analysis of the diabetes data. The complete algorithmrequiredonlym=10stepsofprocedure(2.8)–(2.13),withthevariables (cid:2) FIG. 3. LARS analysis of the(cid:3)diabetes study: (left) estimates of regression coefficients βj, j =1,2,...,10; plotted versus |β(cid:2)j|; plot is slightly different than either Lasso or Stagewise, Figure 1; (right) absolute current correlations as function of LARS step; variables enter active (cid:2) set (2.9) in order 3,9,4,7,...,1; heavy curve shows maximum current correlation Ck declining withk. LEASTANGLEREGRESSION 415 joiningtheactivesetAinthesameorderasfortheLasso:3,9,4,7,...,1.Tracks (cid:2) of the regression coefficients β are nearly but not exactly the same as either the j LassoorStagewisetracksofFigure1. Therightpanelshowstheabsolutecurrentcorrelations (2.17) |(cid:2)ckj|=|x(cid:1)j(y−µ(cid:2)k−1)| for variables j =1,2,...,10, as a function of the LARS step k. The maximum correlation (2.18) C(cid:2)k =max{|(cid:2)ckj|}=C(cid:2)k−1−γ(cid:2)k−1Ak−1 declines with k, as it must. At each step a new variable j joins the active set, henceforthhaving|(cid:2)c |=C(cid:2) .Thesigns ofeachx in(2.4)staysconstantasthe kj k j j activesetincreases. Section 4 makes use of the relationship between Least Angle Regression and OrdinaryLeastSquaresillustratedinFigure4.SupposeLARShasjustcompleted step k−1, giving µ(cid:2)k−1, and is embarking upon step k. The active set Ak, (2.9), will have k members, giving X ,G ,A and u as in (2.4)–(2.6) (here replacing k k k k subscriptAwith“k”).Lety¯ indicatetheprojectionofyintoL(X ),which,since k k µ(cid:2)k−1∈L(Xk−1),is (cid:2) C (2.19) y¯k=µ(cid:2)k−1+XkG−k1Xk(cid:1)(y−µ(cid:2)k−1)=µ(cid:2)k−1+ Ak uk, k the last equality following from (2.6) and the fact that the signed current (cid:2) correlationsinA allequalC , k k (2.20) Xk(cid:1)(y−µ(cid:2)k−1)=C(cid:2)k1A. Sinceuk isaunitvector,(2.19)saysthaty¯k−µ(cid:2)k−1 haslength (cid:2) C (2.21) γ¯ ≡ k. k A k Comparison with (2.12) shows that the LARS estimate µ(cid:2) lies on the line k FIG. 4. At each stage the LARSestimate µ(cid:2)k approaches, but does not reach, the corresponding OLSestimatey¯k. 416 EFRON,HASTIE,JOHNSTONEANDTIBSHIRANI fromµ(cid:2)k−1 toy¯k, γ(cid:2) (2.22) µ(cid:2)k−µ(cid:2)k−1= γ¯k (y¯k−µ(cid:2)k−1). k Itiseasytoseethatγ(cid:2),(2.12),isalwayslessthanγ¯ ,sothatµ(cid:2) liescloserthany¯ k k k k to µ(cid:2)k−1. Figure 4 shows the successive LARS estimates µ(cid:2)k always approaching butneverreachingtheOLSestimatesy¯ . k Theexceptionisatthelaststage:sinceA containsallcovariates,(2.13)isnot m defined.Byconventionthealgorithmtakesγ(cid:2) =γ¯ =C(cid:2) /A ,makingµ(cid:2) =y¯ m m m m m m (cid:2) andβ equaltheOLSestimateforthefullsetofmcovariates. m The LARS algorithm is computationally thrifty. Organizing the calculations correctly, the computational cost for the entire m steps is of the same order as that required for the usual LeastSquares solution for the full set of m covariates. Section 7 describes an efficient LARS program available from the authors. With the modifications described in the next section, this program also provides economicalLassoandStagewisesolutions. 3. Modified versions of Least Angle Regression. Figures 1 and 3 show Lasso,StagewiseandLARSyieldingremarkablysimilarestimatesforthediabetes data.Thesimilarityisnocoincidence.Thissectiondescribessimplemodifications of the LARS algorithm that produce Lasso or Stagewise estimates. Besides improved computational efficiency, these relationships elucidate the methods’ rationale: all three algorithms can be viewed as moderately greedy forward stepwiseprocedureswhoseforwardprogressisdeterminedbycompromiseamong the currently most correlated covariates. LARS moves along the most obvious compromise direction, the equiangular vector (2.6), while Lasso and Stagewise putsomerestrictionsontheequiangularstrategy. 3.1. The LARS–Lassorelationship. The full set of Lassosolutions, as shown for the diabetes study in Figure 1, can be generated by a minor modification of the LARS algorithm (2.8)–(2.13). Our main result is described here and verified in Section 5. It closely parallels the homotopy method in the papers by Osborne, Presnell and Turlach (2000a, b), though the LARS approach is somewhat more direct. Let (cid:2)β be a Lasso solution (1.5), with µ(cid:2) =X(cid:2)β. Then it is easy to show that (cid:2) the sign of any nonzero coordinate β must agree with the sign s of the current j j correlation(cid:2)c =x(cid:1)(y−µ(cid:2)), j j (3.1) sign(β(cid:2) )=sign((cid:2)c )=s ; j j j seeLemma8ofSection5.TheLARSalgorithmdoesnotenforcerestriction(3.1), butitcaneasilybemodifiedtodoso.

Description:
Least Angle Regression (LARS), a new model selection algorithm, Lasso, boosting, linear regression, coefficient paths, variable selection. 407
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.