ebook img

Universality for eigenvalue algorithms on sample covariance matrices PDF

0.93 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Universality for eigenvalue algorithms on sample covariance matrices

Universality for eigenvalue algorithms on sample covariance matrices PERCYDEIFTANDTHOMASTROGDON 7 Abstract. Weproveauniversallimittheoremforthehaltingtime,oritera- 1 tioncount, ofthepower/inversepowermethodsandtheQReigenvaluealgo- 0 rithm. Specifically, we analyze the required number of iterations to compute 2 extremeeigenvaluesofrandom,positive-definitesamplecovariancematricesto n within a prescribed tolerance. The universality theorem provides a complex- a ityestimateforthealgorithmswhich,inthisrandomsetting,holdswithhigh J probability. The method of proof relies on recent results on the statistics of the eigenvalues and eigenvectors of random sample covariance matrices (i.e., 7 delocalization,rigidityandedgeuniversality). ] A N . 1. Introduction h t a In this paper, we prove a universal limit theorem for the fluctuations in the m runtime(orhaltingtime)ofthreeclassicaleigenvaluealgorithmsappliedtopositive- [ definite random matrices. The theorem is universal in the sense that the limiting distributiondoesnotdependonthedistributionoftheentriesofthematrix(within 1 a class). v 6 Onecantracethesearchforuniversalbehaviorineigenvaluealgorithmruntimes 9 tothelargely-experimentalworkofPfrang,DeiftandMenon[20]. Theauthorscon- 8 sideredthreealgorithms(QR,matrixsignandToda)andranthealgorithmstothe 1 time of first deflation, which we now describe in more detail. Given an N×N ma- 0 trixH,thealgorithmsproduceisospectraliteratesX ,X =H,specX =specH, . n 0 n 1 and generically X → diag(λ ,...,λ ). Necessarily, the λ ’s are the eigenvalues n 1 N i 0 of H. However, one does not typically run the algorithm until the norm of all of 7 1 the off-diagonal entries is small. Rather, one considers the submatrices Xn(k) which : consist of the entries of X that are in the first k rows and the last N−k columns. v n The k-deflation times are defined as i X T(k)(H):=min{n:(cid:107)X(k)(cid:107) ≤(cid:15)}, (cid:15)>0. r n F a Here (cid:107)·(cid:107) denotes the Frobenius norm1. Then the time of first deflation is given F byT (H):=min T(k)(H). Wedefinekˆ =kˆ(H)tobethelargestvalueof def 1≤k≤N−1 k such that T(k)(H)=T (H). It follows that when k =kˆ(H), the eigenvalues of def the leading k×k and (N −k)×(N −k) submatrices approximate the eigenvalues of H to O((cid:15)) . The algorithm is then applied to the smaller submatrices, and so on. 2000 Mathematics Subject Classification. 15B52,65L15,70H06. Key words and phrases. universality,eigenvaluecomputation,randommatrixtheory. The authors would like to thank Folkmar Bornemann for the data to display Fgap(t). This 2 workwassupportedinpartbygrantsNSF-DMS-1303018(TT)andNSF-DMS-1300965(PD).. 1Theauthorsin[20]actuallyconsideredascaled∞-normratherthantheFrobeniusnorm. 1 2 PERCYDEIFTANDTHOMASTROGDON QRalgorithmonHGandHB StatisticsofkforHG 0.6 0.6 0.5 0.5 0.4 Frequency0.3 Frequency00..34 0.2 0.2 0.1 0.1 0.0-2 -1 0 1 2 3 4 0.00 5 10 15 20 25 30 Rescaledtimeoffirstdeflation k (a) (b) Figure 1. (a) Sampled histograms for T¯ (H ) and T¯ (H ). def G def B Dispite the differences in the underlying distributions for H and G H , the statistics for the shifted and scaled time of first deflation B is the effectively the same. (b) The statistics of kˆ when N = 30 for H . It is clear that kˆ = N −1 occurs with the largest proba- G bility. Atypicalexperimentfrom[20]goesasfollows. LetY andY beN×N matrices G B ofiidstandardnormalandiidmean-zero,variance-oneBernoullirandomvariables, √ √ respectively. ThendefineH =(Y +YT)/ 2N andH =(Y +YT)/ 2N which G G G B B B arereal,symmetricrandommatrices(see[6]forcomplexHermitianmatrices). After sampling the integer-valued random variables T (H ) and T (H ), for N large, def G def B say 10,000 times, we define the empirical fluctuations T (H )−(cid:104)T (H )(cid:105) T¯ (H )= def G def G , def G σ G T (H )−(cid:104)T (H )(cid:105) T¯ (H )= def B def B . def B σ B where (cid:104)·(cid:105) and σ represent the sample mean and sample standard deviation, re- (·) spectively. We plot the histograms for the empirical fluctuations in Figure 1(a). Thehistogramsoverlapsurprisinglywellforthetwodifferentensembles,indicating that after centering and rescaling, the distribution of the time of first deflation is universal. Proving theorems about the random variable T (H) is particularly difficult as def one has to analyze the minimum of N − 1 correlated random variables. In [8] the authors proved a (universal) limit theorem for the 1-deflation time of the so- called Toda algorithm. In this paper we prove an analog of that result for the (N−1)-deflationtimefortheQR(eigenvalue)algorithmactingonpositivedefinite matrices. This is an important first step in proving a limit theorem for T (H) def because it is the most likely that kˆ = N −1, see Figure 1(b). We also include similar results for the power (P) and inverse power (IP) methods as the analysis is similar. For these two methods, we incorporate random starting vectors. The analysis and results of the current work are quite similar to that in [8], where we proveuniversalityfortheTodaeigenvaluealgorithm,showingitswideapplicability. Universality for eigenvalue algorithms on sample covariance matrices 3 1.1. Relationtopreviousworkandcomplexitytheory. Thestatisticalanaly- sisofalgorithmshasbeenperformedinmanysettings, usuallywithaneyetowards complexity theory. In relation to Gaussian elimination, the seminal work is the analysis of Goldstine and von Neumann [13] on the condition number of random matrices. This is closely related to the later work of Edelman [9], also on condi- tion numbers. The expected number of pivot steps in the simplex algorithm was analyzed by Smale [27] and Borgwardt [4]. The methodology of smoothed analysis was introduced in [28] and applied in a variety of settings [17, 19, 24]. The closest work, within the realm of complexity theory, to the current work is that of Kostlan [16]. Kostlan showed that for the power method on H the G expected halting time to compute an eigenvector is infinite. Kostlan showed that when one conditions on all of the eigenvalues being positive, the upper bound on the halting time is O(N2logN). Instead of conditioning, and eigenvector compu- tation, we turn to sample covariance matrices which (with high probability) have positive eigenvalues and use the power methods to compute the extreme eigenval- ues. With this we are able to determine the precise limiting distribution of the halting time, which contains far more information than simply an upper bound. To our knowledge, this is the first time this has been done for a classical numerical method. For α in the scaling region given by Condition 2.1, the halting times given in Theorem 1 scale like (α−2/3)N2/3logN in order to obtain an accuracy of N−α/2. This is a key conclusion of our results which gives an estimate on the complexity of the QR algorithm, and also the power and inverse power methods. Through many detailed computations, universality in numerical computation hasbeenobservedinmanynumericalalgorithmsbeyondtheQRalgorithmandthe power and inverse power methods (see [8, 6, 7, 20, 23] ): the conjugate gradient algorithm, the matrix sign eigenvalue algorithm, the Toda eigenvalue algorithm, the Jacobi eigenvalue algorithm, the GMRES algorithm, a genetic algorithm and thegradientandstochasticgradientdescentalgorithms. Thisworkpresentsfurther examples, in addition to [8], where one can prove this type of universality. This advances the contention of the authors that universality is a bona fide and basic phenomenon in numerical computation. 1.2. Open questions. Themainopenquestionrelatedtothisworkistheasymp- totics of the time of first deflation T . A related and unknown detail is the tail def behavior of the limiting distribution. As discussed in detail in [8], the limiting distribution in Theorem 1 in [8] for the halting time has one finite moment for real matrices and two finite moments for complex matrices. If one constructed an algorithm with a sub-Gaussian limiting distribution, it may be preferable. We be- lieve this is the case for T . We also believe that its distribution is related to the def largest gap in the spectrum of the stochastic Airy operator [22]. Furthermore, can one extend our results for the QR algorithm to indefinite ensembles? We only consider random matrices with entries that are exponentially localized, see 1. It is not known if this condition can be relaxed but it is itself an important openquestion. Finally,additionalhaltingcriteriacanbeemployed. Onecouldlook for the time to compute eigenvectors with the power method, or to compute the entire spectrum with the QR algorithm. 4 PERCYDEIFTANDTHOMASTROGDON 2. Main results In this paper we discuss computing the smallest and largest eigenvalues of ran- dompositivedefinite matricestoanaccuracy (cid:15). We havea basic conditionthatwe enforce on (cid:15) which requires that (cid:15) is appropriately small. Condition 2.1. α :=log(cid:15)−1/logN ≥5/3+σ/2, 2 for 0<σ <1/3 fixed. For j = 0,1,2,..., we let X be the iterates of the QR algorithm (QR, defined j in Section 5.2) and λ and λ be the iterates of the power and inverse power P,j IP,j methods, respectively (P and IP, respectively, defined in Section 5.1). We specify (discrete) halting times for these algorithms applied to a matrix H with starting vector v as follows: (cid:40) N−1 (cid:41) (cid:88) τ (H):=min j : |[X ] |2 ≤(cid:15)2 , QR,(cid:15) j Nn n=1 τ (H,v):=min{j :|λ −λ |≤(cid:15)2}, P,(cid:15) P,j P,j+1 τ (H,v):=min{j :|λ−1 −λ−1 |≤(cid:15)2}. IP,(cid:15) IP,j IP,j+1 Note that for the QR algorithm the (N,N) entry of X , [X ] , is an approxi- j j NN mation of the smallest eigenvalue λ as is λ . On the other hand, λ is an 1 IP,j P,j approximation of the largest eigenvalue λ . Our main results are summarized in N thefollowingTheoremandPropsosition. SeeDefinition1forthedefinitionofsam- ple covariance matrices and Definition 3 for the distribution function Fgap(t). The β constants λ and d are given in (2). ± Theorem 1 (Universality). Let H be a real (β = 1) or complex (β = 2) N ×N sample covariance matrix and let v be a (random) unit vector independent of H. Assuming (cid:15) satisfies Condition 2.1, for t∈R (cid:32) (cid:33) τ (H) Fgap(t)= lim P QR,(cid:15) ≤t β N→∞ 2−7/6λ1/3d−1/2N2/3(log(cid:15)−1−2/3logN) − (cid:32) (cid:33) τ (H,v) = lim P IP,(cid:15) ≤t N→∞ 2−7/6λ1/3d−1/2N2/3(log(cid:15)−1−2/3logN) − (cid:32) (cid:33) τ (H,v) = lim P P,(cid:15) ≤t . N→∞ 2−7/6λ1/3d−1/2N2/3(log(cid:15)−1−2/3logN) + This theorem is a direct consequence of Theorems 5 and 6, after noting that, for example, |τ −T | ≤ 1 where T appears in Theorem 5. This is a QR,(cid:15) QR,(cid:15) QR,(cid:15) universality theorem in the sense that it states that for N is sufficiently large the distribution of the halting time is independent of the distribution on H. The following Proposition shows that we obtain an accuracy of (cid:15) but not of (cid:15)2, i.e. thatourhaltingcriteriaaresufficientbutnottoorestrictive. Itisarestatement of Propositions 4 and 5. Universality for eigenvalue algorithms on sample covariance matrices 5 Proposition1. Assuming(cid:15)satisfiesCondition2.1,foranyrealorcomplexsample covariance matrix (cid:15)−1|[X ] −λ |, (cid:15)−1|λ −λ |, and (cid:15)−1|λ −λ | τQR,(cid:15) NN 1 IP,τIP,(cid:15) 1 P,τP,(cid:15) N converge to zero in probability, while (cid:15)−2|[X ] −λ |, (cid:15)−2|λ −λ |, and (cid:15)−2|λ −λ | τQR,(cid:15) NN 1 IP,τIP,(cid:15) 1 P,τP,(cid:15) N converge to ∞ in probability. A numerical demonstration of Theorem 1 is given in Section 4. The outline of the paper is as follows. In Section 3 we discuss the fundamental resultsofrandommatrixtheorythatarerequiredtoproveourresults. InSection4 we give a numerical demonstration of Theorem 1. Next, in Section 5, we discuss the fundamentals of the power methods and the QR algorithm before we apply the random matrix estimates in Section 6 to prove our results. In Appendix A we analyze the true error of the methods with our chosen halting criteria to see that thesecriteriaareindeedappropriatetothetask. Finally,inAppendixBwediscuss theasymptoticnormalityofeigenvectorprojectionsofrandomvectors. Thisallows us to show that Theorem 1 indeed holds for random starting vectors in the power and inverse power methods. 3. Results from random matrix theory We now introduce the ideas and results from random matrix theory that are needed to prove our main theorems. Let V be an M ×N real or complex matrix with M ≥ N. We consider the ordered eigenvalues λ (H) = λ , j = 1,2,...,N of j j H =V∗V/M, λ ≤λ ≤···≤λ . Let β ,β ,...,β denote the absolute value of 1 2 N 1 2 N the last components of the associated normalized eigenvectors. We only consider sample covariance matrices from independent samples. Definition 1 (Sample covariance matrix (SCM)). A sample covariance matrix (ensemble) is a real symmetric (β =1) or complex Hermitian (β =2) matrix H = V∗V/M, V = (V ) such that V are independent random variables ij 1≤i≤M,1≤j≤N ij for 1≤i≤M, 1≤j ≤N given by a probability measure ν with ij EV =0, E|V |2 =1, ij ij Next, assume there is a fixed constant ν (independent of N,i,j) such that (1) P(|V |>x)≤ν−1exp(−xν), x>1. ij For β =2 (when V is complex-valued) the condition ij EV2 =0, ij must also be satisfied. We assume all SCMs have M ≥ N. Define the averaged empirical spectral measure N 1 (cid:88) µ (z)=E δ(λ −z), N N i i=1 where the expectation is taken with respect to the given ensemble. For technical reasonsweletM =M(N)andd :=N/M satisfylim d =:d∈(0,1). More N N→∞ N specifically, we consider M =(cid:98)N/d(cid:99). 6 PERCYDEIFTANDTHOMASTROGDON Remark 3.1. The case where lim d =1 is of considerable interest: If M = N→∞ N N +R then it is known that the limiting distribution of the smallest eigenvalue is given in terms of the so-called Bessel kernel [2,10] when X has Gaussian divisible ij entries. If R → ∞, R ≤ CN1/2 and X are standard complex normal random ij variablesthenitisknownthatthesmallesteigevaluehasTracy–Widomfluctuations [7]. It is noted in [21, Section 1.4] that establishing all estimates we use below in the lim d =1 case is a difficult problem. In light of the current work, this is N→∞ N a particularly interesting problem as it would give different scalings for the halting times. Define the Marchenko–Pastur law (cid:114) 1 [(λ −x)(x−λ )] √ (2) ρ (x):= + − +, λ =(1± d)2, d 2πd x2 ± and [·] denotes the positive part. For SCMs, µ converges to ρ (x)dx weakly + N d and ρ (x)dx is called the equilibrium measure for the ensemble (see, for example, d [18, 21, 26, 29, 32]). Definition 2. Define γ to be the smallest value of t such that n n (cid:90) t = ρ (x)dx, n=1,2,...,N. N d −∞ Thus {γ } represent the quantiles of the equilibrium measure. We now describe n conditions on the matrices that simplify the analysis of the algorithms QR, P and IP. Condition 3.1. For 0<p<σ/4, (cid:16) (cid:17)p • λN−2 < λN−1 . λN−1 λN Let U denote the set of matrices that satisfy this condition. N,p Condition 3.2. For 0<p<σ/4, (cid:16) (cid:17)p • λ2 < λ1 . λ3 λ2 Let L denote the set of matrices that satisfy this condition. N,p Given an SCM, let v be a random (or deterministic) unit vector independent of the SCM. Define β =|(cid:104)v,u (cid:105)|, n=1,2,...,N where u is the nth eigenvector of n n n the SCM. Condition 3.3. For any fixed 0<s<σ/40, (1) β ≤N−1/2+s/2 for all n n (2) N−1/2−s/2 ≤β for n=1,2,N −1,N, n (3) N−2/3−s/2 ≤λ −λ ≤N−2/3+s/2, for n=N,N −1, N n−1 (4) N−2/3−s/2 ≤λ −λ ≤N−2/3+s/2, for n=2,3, and n 1 (5) |λ −γ |≤N−2/3+s/2(min{n,N −n+1})−1/3 for all n. n n Let R denote the set of matrices that satisfy these conditions. N,s Remark 3.2. Clearly the quantiles {γ } lie in the interval (λ ,λ ). Property (5) n − + above implies, in particular, that for N sufficiently large, the eigenvalues {λ } of n matrices in R lie in the interval (λ −η,λ +η) for any given η >0. N,s − + Universality for eigenvalue algorithms on sample covariance matrices 7 Theanalysisofthe eigenvaluesof samplecovariance matriceshasalonghistory, beginningwiththeworkofMar˘cenkoandPastur[18]. TheseminalworkofGeman [12] showed that for M,N → ∞, N/M → y ∈ (0,∞), the largest eigenvalue of an SCM converges a.s. to λ . Silverstein [25] established that for M,N → ∞, + N/M → y ∈ (0,1) the smallest eigenvalue converges a.s. to λ when V are − ij iid standard normal random variables. See [10, 14, 15] for the first results on the fluctuations of the largest and smallest eigenvalues when V are iid (real or ij complex)standardnormaldistributions. Universalityfortheeigenvaluesof 1V∗V N attheedgesandinthebulk,wasfirstprovedbyBenArousanPech´e[2]forGaussian divisible ensembles, in the limit N,M → ∞, M = N +ν, ν fixed. We reference [21] and [3] for the most comprehensive results. Note that we require (1) which is stronger than the assumptions in [12, 32] which only require moment conditions. Various limits of the eigenvectors have also been considered, see [1, 26]. But we reference [3] for the full generality we need to prove our theorems. Theorem 2. For SCMs N2/3λ−2/3d1/2(λ −λ ,λ −λ ,λ −λ ) + + N + N−1 + N−2 and N2/3λ−2/3d1/2(λ −λ ,λ −λ ,λ −λ ) − 1 − 2 − 3 − separately converge jointly in distribution to random variables (Λ ,Λ ,Λ ) 1,β 2,β 3,β which are the smallest three eigenvalues of the so-called stochastic Airy operator. Furthermore, (Λ ,Λ ,Λ ) are distinct with probability one. 1,β 2,β 3,β Proof. The first statement follows from [3, Theorem 8.3]. The second statement follows from [21, Theorem 1.1 & Corollary 1.2]. The fact that the eigenvalues of the stochastic Airy operator are distinct is shown in [22, Theorem 1.1]. (cid:3) Definition 3. The distribution function Fgap(t), supported on t ≥ 0 for β = 1,2 β is given by (cid:18) (cid:19) (cid:32) (cid:33) 1 1 Fgap(t)=P ≤t = lim P ≤t β Λ2,β −Λ1,β N→∞ 2−7/6N2/3λ−+2/3d−1/2(λN −λN−1) (cid:32) (cid:33) 1 = lim P ≤t . N→∞ 2−7/6N2/3λ−2/3d−1/2(λ −λ ) − 2 1 Theremainingtheoremsinthissectionarecompiledfromresultsthathavebeen obtained recently in the literature. We use a simple lemma (see, for example, [8, Lemma 3.2]): Lemma 1. If X →X in distribution2 as N →∞ then for any R>0 N P(|X /a |<R)=1+o(1) N N as N →∞ provided that a →∞. N Theorem 3. For SCMs, Condition 3.3 holds with high probability as N →∞, that is, for any s>0 P(R )=1+o(1), N,s 2For convergence in distribution, we require that the limiting random variable X satisfies P(|X|<∞)=1. 8 PERCYDEIFTANDTHOMASTROGDON as N →∞. Proof. It suffices to show that each of the sub-conditions 1-5 in Condition 3.3 hold with high probability. Conditions 3.3.1-2 hold with high probability directly by Proposition 6. Conditions 3.3.3-4 hold with high probability by the joint conver- gence of the top (bottom) three eigenvalues in Theorem 2 and Lemma 1. Finally, Condition3.3.5holdswithhighprobabilityasadirectconsequenceof[21,Theorem 3.3]. (cid:3) Theorem 4. For SCMs, limlimsupP(Uc )=limlimsupP(Lc )=0. N,p N,p p↓0 N→∞ p↓0 N→∞ Proof. It follows from Theorem 2 that lim P(λ −λ <p(λ −λ ))=P(Λ −Λ <p(Λ −Λ )). 3 2 2 1 3,β 2,β 2,β 1,β N→∞ Then (cid:32) (cid:33) (cid:92) limP(Λ −Λ <p(Λ −Λ ))=P {Λ −Λ <p(Λ −Λ )} 3,β 2,β 2,β 1,β 3,β 2,β 2,β 1,β p↓0 p>0 =P(Λ =Λ ). 3,β 2,β But from [22, Theorem 1.1] P(Λ =Λ )=0. And so, it suffices to show that 3,β 2,β (cid:18)λ (cid:18)λ (cid:19)p(cid:19) lim P(λ −λ <p(λ −λ ))= lim P 2 < 1 . N→∞ 3 2 2 1 N→∞ λ3 λ2 This will, in turn follow, if we show that (cid:20)λ (cid:18)λ (cid:19)p(cid:21) Γ :=λ −λ −p(λ −λ )+λ 2 − 1 N 3 2 2 1 − λ λ 3 2 converges to zero in probability for p fixed. We set λ = λ +N−2/3ξ where j − j (ξ ,ξ ,ξ ) converges jointly in distribution by Theorem 2. Let B be the event 1 2 3 R (cid:107)(ξ ,ξ ,ξ )(cid:107)≤R and for δ >0 consider 1 2 3 P(|Γ |≥δ)=P(|Γ |≥δ,B )+P(|Γ |≥δ,Bc). N N R N R Given B , we perform a formal expansion R λ (cid:18)λ (cid:19)p 2 − 1 =λ−1N−2/3(ξ −ξ )−pλ−1N−2/3(ξ −ξ )+O(N−4/3). λ λ − 2 3 − 1 3 3 2 Therefore, given B , Γ tends to zero uniformly and we find R N limsupP(|Γ |≥δ)≤limsupP(Bc). N R N→∞ N→∞ Because of joint convergence (in distribution) of (ξ ,ξ ,ξ ), the right-hand side 1 2 3 tendstozeroasR→∞. ThisestablishestheresultforL . Similarconsiderations N,p yield the result for U . (cid:3) N,p Universality for eigenvalue algorithms on sample covariance matrices 9 4. A numerical demonstration We include some numerical simulations that serve to demonstrate Theorem 1. Weincludeideasthatwerediscussedindetailin[8]. EnroutetoprovingTheorem1 we perform the following approximation step for A = QR, IP or P τ =τ −T +T −T∗ +T∗ . A,(cid:15) A,(cid:15) A,(cid:15) A,(cid:15) A,(cid:15) A,(cid:15) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) :=D1 :=D2 where T∗ is given in (7) and (14) below. The difference D is always less than A,(cid:15) 1 unityandthedifferenceD isO(N2/3)(seeProposition2,forexample). ThenT∗ 2 A,(cid:15) converges in distribution, after rescaling, to Fgap but it is clear from the proof of β Theorem 6 that the rate of covergence is logarithmic, at best. To improve the rate we note that (cid:32) (cid:33) τ (H) (3) Fgap(t)= lim P A,(cid:15) ≤t , β N→∞ 2−7/6λ1/3d−1/2N2/3(log(cid:15)−1−2/3logN +ζ ) ± A for any constant ζ . Here λ is taken if A = P and λ is taken if A = QR, IP. We A + − choose ζ (cf. with ζ chosen in [8]), using (7), by QR ζ =E[logN2/3(λ −λ )]. QR 2 1 After examining (14), we choose ζ =E[logN2/3(λ −λ )]−3/2logλ +1/2log2. IP 2 1 − Then changing λ →λ−1 and λ →λ−1 in (14) we choose 2 N−1 1 N ζ =E[logN2/3(λ −λ )]−1/2logλ +1/2log2. P N N−1 + Despitethefactthattheseζ ’sarenotconstant,fromTheorem2oneshouldexpect A they have well-defined limits as N → ∞. These effective constants can be easily approximated by sampling the associated matrix distributions. In Figure 2 we demonstrate (3) and hence Theorem 1 for the QR algorithm. Figures 3 and 4 demonstrate the analogous results for the inverse power method and power method, respectively. The ensembles we use are the following: LOE : V (in Definition 1 below) has iid standard real Gaussian entries, LUE : V has iid standard complex Gaussian entries, BE : V has iid mean-zero, variance-one Bernoulli entries (±1 with equal prob- ability), CBE : V hasiidmean-zero,variance-onecomplexBernoullientries({a,−a,a¯,−a¯}, a=(1+i)/2, with equal probability) The density dFgap(t) was computed by the authors in [8]. We sample the matrix dt 1 distributions for N large and use appropriate interpolation. The density dFgap(t) dt 2 was computed in [31] (and rescaled in [8]) and the data to reproduce it here was provided by the authors of that work. Finally,inFigure5weshowthestatisticsofthetimeoffirstdeflation,asdefined in the introduction, for LOE and BE when d = 2. This demonstrates universality for the time of first deflation but the limiting distribution (whatever it may be!) is clearly distinct from both histograms in Figure 1(a) and the limiting distribution in Theorem 1. And so, computing the limiting distribution for the rescaled time of 10 PERCYDEIFTANDTHOMASTROGDON β1 β2 1.0 1.0 0.8 0.8 Frequency00..46 Frequency00..46 0.2 0.2 0.0 0.0 0 1 2 3 4 0 1 2 3 4 Rescaledhaltingtime Rescaledhaltingtime (a) (b) Figure 2. A demonstration of Theorem 1 and (3) for the QR algorithm. (a) The rescaled halting times following (3) for LOE andBEford=1/2andd=2/3plottedagainst dFgap(t). (b)The dt 1 rescaled halting times following (3) for LUE and CBE for d=1/2 and d=2/3 plotted against dFgap(t). dt 2 β1 β2 1.0 1.0 0.8 0.8 Frequency00..46 Frequency00..46 0.2 0.2 0.0 0.0 0 1 2 3 4 0 1 2 3 4 Rescaledhaltingtime Rescaledhaltingtime (a) (b) Figure 3. AdemonstrationofTheorem1and(3)fortheinverse power method. (a) The rescaled halting times following (3) for LOE and BE for d = 1/2 and d = 2/3 plotted against dFgap(t). dt 1 (b) The rescaled halting times following (3) for LUE and CBE for d=1/2 and d=2/3 plotted against dFgap(t). dt 2 firstdeflationrequiresinformationaboutmuchmorethanjustthe(N−1)-deflation time. 5. Fundamentals of the algorithms HerewediscusstheQRalgorithmandpower/inversepowermethods. Wederive explicit formulae to analyze the halting times of the algorithms. 5.1. The power and inverse power methods. LetY ,Y ,Y ,...,beasequence 1 2 3 of independent, real, mean-zero, and variance-one random variables. The power and inverse power methods with random starting are given in Algorithms 1 and 2.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.