ebook img

Consistency properties of AIC, BIC, Cp and their modifications in the PDF

23 Pages·2014·0.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Consistency properties of AIC, BIC, Cp and their modifications in the

Consistency properties of AIC, BIC, Cp and their modi(cid:12)cations in the growth curve model under a large-(q;n) framework Rie Enomoto, Tetsuro Sakurai and Yasunori Fujikoshi Abstract. Thispaperisconcernedwithconsistencypropertiesofsomecriteria for selecting row vectors of a k (cid:2)p design matrix within individuals in the growth curve model, based on a sample of size n. Recently Enomoto, Sakurai and Fujikoshi (2013) showed that AIC and its modi(cid:12)cation have a consistency property for selecting hierarchical models of the row vectors under a condition on the order of the noncentrality matrix, assuming a large-(q;n) asymptotic framework such that q=n!d2[0;1). We extend the result to a family of log- likelihood-basedinformationcriteriaincludingAICandBIC,andCp. Further, theirconsistencypropertiesarealsoobtainedunderanewconditionontheorder ofthenoncentralitymatrix. Ourresultsarecheckednumericallybyconducting a Mote Carlo simulation. AMS 2010 Mathematics Subject Classi(cid:12)cation. 62H12, 62H30. Key words and phrases. AIC, BIC, Cp, Consistency property, Growth curve model, Large-(q;n) asymptotic framework, Simulation study. x1. Introduction The growth curve model introduced by Potthoff and Roy (1964) is written as (1.1) Y = A(cid:2)X+E; where Y;n(cid:2)p is an observation matrix, A;n(cid:2)q is a design matrix across individuals, X;k(cid:2)p is a design matrix within individuals, (cid:2) is an unknown matrix, and each row of E is independent and identically distributed as a p-dimensional normal distribution with mean 0 and an unknown covariance matrix (cid:6). We assume that that n(cid:0)p(cid:0)k (cid:0)1 > 0, and rank(X) = k. If we consider a polynomial regression of degree k(cid:0)1 on the time t with q groups, 1 2 R. ENOMOTO, T. SAKURAI AND Y. FUJIKOSHI then 0 1 0 1 1 0 (cid:1)(cid:1)(cid:1) 0 1 1 (cid:1)(cid:1)(cid:1) 1 (1.2) A = BBB@ 0n...1 1n...2 (cid:1).(cid:1).(cid:1). 0... CCCA; X = BBB@ t...1 t...2 (cid:1)(cid:1)...(cid:1) t...p CCCA: 0 0 (cid:1)(cid:1)(cid:1) 1 tk(cid:0)1 tk(cid:0)1 (cid:1)(cid:1)(cid:1) tk(cid:0)1 nq 1 2 p It is important to decide the degree in a polynomial growth curve model. In general, we consider the problem of selecting the row vectors of X. Suppose that j denotes a subset of ! = f1;:::;kg containing k elements, and X j j denote the k (cid:2)p matrix consisting of the rows of X indexed by the elements j of j. Note that X = X and k = k. We will let k denote the number of ! ! A elements of a set A. We then consider the following candidate model M with j k explanatory variables de(cid:12)ned by j (1.3) M ; Y = A(cid:2) X +E; j j j where (cid:2) is a q (cid:2)k matrix consisting of the columns of (cid:2) indexed by the j j elements of j, and E has the same distribution as in (1.1). Here we note that the design matrix A may be also an observation matrix of several explanatory variables. For such an application, see Satoh and Yanagihara (2010). Let (cid:2)^ j and (cid:6)^ be the MLE’s of (cid:2) and (cid:6) under M , which are given by j j j (cid:2)^ = (A′A)(cid:0)1A′YS(cid:0)1X′(X S(cid:0)1X′)(cid:0)1; j j j j 1 (cid:6)^ = (Y(cid:0)A(cid:2)^ X )′(Y(cid:0)A(cid:2)^ X ); j j j j j n where S = (n(cid:0)q)(cid:0)1Y′(I (cid:0)P )Y, and P = A(A′A)(cid:0)1A′. n A A There are several criteria for selecting a \best" model from a family of models M . The AIC and the BIC in our problem are given by j { } 1 (1.4) AIC = nlogj(cid:6)^ j+np(log2(cid:25)+1)+2 qk + p(p+1) ; j j 2 { } 1 (1.5) BIC = nlogj(cid:6)^ j+np(log2(cid:25)+1)+(logn) qk + p(p+1) : j j 2 Here,thelasttermfqk +p(p+1)=2gisthenumberofindependentparameters j under M . A consistent AIC (CAIC) based on Bozdogan (1987) is given by j { } 1 (1.6) CAIC = nlogj(cid:6)^ j+np(log2(cid:25)+1)+(1+logn) qk + p(p+1) : j j 2 Wealsoconsidertheothermodi(cid:12)cationsAICc, MAIC andMAIC whichare L H given in Section 2. Further, we consider Cp de(cid:12)ned by (1.7) Cp = ntr(cid:6)^ S(cid:0)1+2qk ; j j CONSISTENCY PROPERTIES 3 and its modi(cid:12)cation MCp, which is given in Section 2. In this paper, we assume that the true model is included in the full model M . So, without loss of generality, we may assume that the minimum model k including the true model is expressed as M for some j . Then, the true j0 0 model is expressed as expressed as (1.8) M0 : Y (cid:24) Nn(cid:2)p(A(cid:2)0X0;(cid:6)0(cid:10)In); where (cid:2) = (cid:2) , X = X , and (cid:6) is a given positive de(cid:12)nite matrix. We 0 j0 0 j0 0 write k = k . Let a set of candidate models denote by F. The set of all 0 j0 candidate models involves (2k (cid:0)1) candidate models. A candidate model is called an overspeci(cid:12)ed model or an underspeci(cid:12)ed model if it includes or does not include the true model M . We denote a set of overspeci(cid:12)ed models and 0 a set of underspeci(cid:12)ed model by F+ and F(cid:0), respectively. In general, it can be seen that the criteria considered in this paper depend through p, n, k , k and the characteristic roots of 0 ′ (1.9) Ω = (cid:0) (cid:0) ; j j j which is called a noncentrality matrix, where (cid:0)j(= (A′A)1=2)(cid:2)0X0(cid:6)(cid:0)01=2H(2j), H(j) = (X (cid:6)(cid:0)1=2)′(X (cid:6)(cid:0)1X′)(cid:0)1=2 ; p(cid:2)k and H(j);H(j) is an orthogonal 1 j 0 j 0 j j 1 2 matrix. It is known that AIC and Cp have not a consistency, but BIC and CAIC have a consistency property, under a large-sample framework (1.10) p; q and k are (cid:12)xed; n ! 1; and Ω = O(n). However, it is recently noted that AIC and Cp have a j consistency property in a high-dimensional framework. Such results can be foundinmultivariateregressionmodel,see,Fujikoshi,SakuraiandYanagihara (2014), Yanagihara, Wakaki and Fujikoshi (2014). Further, Enomoto, Sakurai and Fujikoshi (2013) have noted that AIC and its modi(cid:12)cation MAIC in our H problem have a consistency property for selecting hierarchical models of the row vectors of X under a large-(q;n) framework such that (1.11) p and k are (cid:12)xed; q ! 1; n ! 1; q=n ! d 2 [0;1); and Ω = O(n). In this paper we extend such properties to various criteria j including AIC, AICc, BIC, CAIC, MAIC , MAIC , Cp and MCp under Ω = L H j O(nq) as well as Ω = O(n). When Ω = O(nq), it is noted that these j j criteria have a consistency property, though some condition on the value of d isimposedforAIC. WhenΩ = O(n),itisshownthatBICandCAIChaveno j consistency property, but the other criteria have a consistency property under 4 R. ENOMOTO, T. SAKURAI AND Y. FUJIKOSHI some additional conditions. More precisely, we note that the probability of selecting the true model by BIC or CAIC tends to zero. Our results are also examined through a simulation experiment. The present paper is organized as follows. In Section 2, we summarize modi(cid:12)cations of AIC and Cp. Consistency properties of a log-likelihood-based information criterion are given in Section 3. In Section 4 we give consistency properties of Cp and MCp. Numerical experiments are given in Section 5. In Section 6, we summarize our conclusions. The proofs of our results are given in Appendix. x2. Modi(cid:12)cations of AIC and Cp In this section we summarize modi(cid:12)cations of AIC and Cp, and review their bias properties as estimators of the risks. As is well known, the AIC was proposed as an approximately unbiased estimator of the risk de(cid:12)ned by the expected (cid:0)2(cid:2)log-predictive likelihood. Let f(Y;(cid:2) ;(cid:6) ) be the density func- j j tion of Y under M . Then the expected (cid:0)2(cid:2)log-predictive likelihood under j M is de(cid:12)ned by j { } (2.1) R = E(cid:3)E(cid:3) (cid:0)2logf(Y ;(cid:2)^ ;(cid:6)^ ) ; A Y YF F j j where (cid:6)^ and (cid:2)^ are the maximum likelihood estimators of (cid:6) and (cid:2) under j j M , respectively. Here Y ;n(cid:2)p may be regarded as a future random matrix j F (cid:3) that has the same distribution as Y and is independent of Y, and E denotes the expectation with respect to the true model. The risk is expressed as { } (2.2) R = E(cid:3)E(cid:3) (cid:0)2logf(Y;(cid:2)^ ;(cid:6)^ ) +b ; A Y YF j j A where { } (2.3) b = E(cid:3)E(cid:3) (cid:0)2logf(Y ;(cid:2)^ ;(cid:6)^ )+2logf(Y;(cid:2)^ ;(cid:6)^ ) : A Y YF F j j j j The AIC and its modi(cid:12)cations have been proposed by regarding the term \(cid:0)b " as the bias term when we estimate R by A A (cid:0)2logf(Y;(cid:2)^ ;(cid:6)^ ) = nlogj(cid:6)^ j+np(log2(cid:25)+1): j j j and considering an asymptotic approximation of b . A bias-corrected AIC is A de(cid:12)ned by (2.4) AICc = nlogj(cid:6)^ j+np(log2(cid:25)+1)+b ; j A1 CONSISTENCY PROPERTIES 5 where n2(p(cid:0)k ) b =(cid:0)np+ j A1 n(cid:0)p+k (cid:0)1 j n(n+q)(n(cid:0)q(cid:0)1)k j (2.5) + : (n(cid:0)q(cid:0)p(cid:0)1)(n(cid:0)q(cid:0)p+k (cid:0)1) j Note that AICc is an exact unbiased estimator of R when M is an overspec- A j i(cid:12)ed model, i.e. E(AICc) = R ; j 2 F : A + The term b can be expressed as A1 { } 1 (p(cid:0)k )(p(cid:0)k +1)2 j j b = 2 qk + p(p+1) + A1 j 2 n(cid:0)p+k (cid:0)1 j k (2p+q(cid:0)k +1)(2q+p+1) j j (2.6) + n(cid:0)q(cid:0)p(cid:0)1 (n+q)k (q+p(cid:0)k +1)(p(cid:0)k ) j j j + : (n(cid:0)q(cid:0)p(cid:0)1)(n(cid:0)q(cid:0)p+k (cid:0)1) j Therefore, we can easily see that under a large-sample framework AICc = AIC+O(n(cid:0)1): Itisimportantthatamodi(cid:12)cationhasasmallbiasunderunderspeci(cid:12)edmod- els as well as overspeci(cid:12)ed models. Let b = b +b . It is known (Enomoto, A A1 A2 Sakurai and Fujikoshi (2013)) that n(p(cid:0)k )(p(cid:0)k +1) b = (cid:0) j j +2(p(cid:0)k +1)(cid:24) (cid:0)(cid:24) +O (n(cid:0)1); A2 n(cid:0)p+k (cid:0)1 j 1 2 g j where O (ni) denotes the term of i-th order with respect to n under (1.11), g ( ) ( ) 1 (cid:0)1 1 (cid:0)2 (2.7) (cid:24)1 = tr Ip(cid:0)kj + nΩj ; (cid:24)2 = (cid:24)12+tr Ip(cid:0)kj + nΩj : A modi(cid:12)cation under a large-sample framework (1.10) is given by (2.8) MAIC = nlogj(cid:6)^ j+np(log2(cid:25)+1)+b ; L j AL where (2.9) b = b +~b ; ~b = (p(cid:0)k +1)f2(cid:24)~ (cid:0)(p(cid:0)k )g(cid:0)(cid:24)~; AL A1 A2 A2 j 1 j 2 6 R. ENOMOTO, T. SAKURAI AND Y. FUJIKOSHI and { } n (cid:24)~ = tr(n(cid:6) )(cid:0)1(n(cid:0)q)S(cid:0)k ; 1 n(cid:0)q j j ( ) [ ] n 2 (cid:24)~ = ((cid:24)~)2+ trf(n(cid:6)^ )(cid:0)1(n(cid:0)q)Sg2(cid:0)k : 2 1 n(cid:0)q j j Then, it is known (Satoh, Kobayashi and Fujikoshi (1997)) that under a large- sample framework (1.10) { b +O(n(cid:0)2); j 2 F ; A + E(b ) = AL bA+O(n(cid:0)1); j 2 F(cid:0): The other modi(cid:12)cation based on a large-(n;q) framework (1.11) is given by (2.10) MAIC = nlogj(cid:6)^ j+np(log2(cid:25)+1)+b ; H j AH where (2.11) b = b +^b ; ^b = (p(cid:0)k +1)f2(cid:24)^ (cid:0)(p(cid:0)k )g(cid:0)(cid:24)^; AH A1 A2 A2 j 1 j 2 and (cid:24)^ = (cid:24)~; (cid:24)^ = f(cid:24)~, 1 1 2 2 3(n(cid:0)q)(p(cid:0)k +1)(n(cid:0)2p+2k (cid:0)2) j j f = n(n(cid:0)p+k (cid:0)1) j { } 2(n(cid:0)q+2)(p(cid:0)k +2) (n(cid:0)q(cid:0)1)(p(cid:0)k (cid:0)1) (cid:0)1 (cid:2) j + j : n+2 n(cid:0)1 Then,itisknown(Enomoto,SakuraiandFujikoshi(2013))thatunderalarge- (n;q) framework (1.11) { ( ) b ; j 2 F ; E ^b = A + A bA+Og(n(cid:0)1); j 2 F(cid:0): The Cp in regression model was proposed by Mallows (1973) for the uni- variate case. Sparks, Coutsourides and Troskie (1983) extended Mallows’ ap- proach to the multivariate case. Fujikoshi and Satoh (1997) gave a more gen- eralapproachtoCpinthemultivariatecase. Thecriterioninthegrowthcurve model may be essentially considered as an approximately unbiased estimator of the risk of M de(cid:12)ned by j { } (2.12) R = E(cid:3)E(cid:3) tr(cid:6)(cid:0)1(Y (cid:0)Y^ )′(Y (cid:0)Y^ ) ; C Y YF 0 F j F j CONSISTENCY PROPERTIES 7 where Y^ is a predictor of Y under M given by Y^ = X (cid:2)^ = P Y, and Y j j j j j j F is the same random matrix as in (2.1). The risk is expressed as { } (2.13) R = E(cid:3) (n(cid:0)k )tr(cid:6)^(cid:0)1(cid:6)^ +b ; C Y j ! j C where { } (2.14) b = E(cid:3)E(cid:3) tr(cid:6)(cid:0)1(Y (cid:0)Y^ )′(Y (cid:0)Y^ )(cid:0)(n(cid:0)k )tr(cid:6)^(cid:0)1(cid:6)^ : C Y YF 0 F j F j j ! j SimilarlytheCpanditsmodi(cid:12)cationhavebeenproposedbyregarding\(cid:0)b " C as the bias term when we estimate R by a minimum values of standardized C residuals sum of squares as (n(cid:0)k )tr(cid:6)^(cid:0)1(cid:6)^ ; j ! j and by evaluating the bias term b . Satoh, Kobayashi and Fujikoshi (1997) C proposed the following Cp and its modi(cid:12)cation MCp: (2.15) Cp = ntr(cid:6)^ S(cid:0)1+2qk j j MCp = ntr(cid:6)^ S(cid:0)1+q(p+k ) j j ( ) q(p(cid:0)k )(n(cid:0)q(cid:0)k ) 2k (cid:0)p(cid:0)1 (2.16) (cid:0) j j + j n(cid:0)q(cid:0)p+k (cid:0)1 n(cid:0)q(cid:0)p+k (cid:0)1 { j j } n(n(cid:0)q(cid:0)p+k (cid:0)1) (cid:2) j tr(cid:6)^ S(cid:0)1(cid:0)(n(cid:0)p+k (cid:0)1)p+qk : n(cid:0)q j j j The MCp satis(cid:12)es E(MCp) = R : C Further we can write MCp as { } 2k (cid:0)p+1 MCp = 1+ j ntr(cid:6)^ S(cid:0)1+2qk +p(p(cid:0)2k +1) n(cid:0)q j j j n (2.17) = Cp+(2k (cid:0)p+1) tr(cid:6)^ S(cid:0)1+p(p(cid:0)2k +1): j n(cid:0)q j j x3. Consistency of a log-likelihood-based information criterion We treat AIC and its modi(cid:12)cations as a uni(cid:12)ed criterion (3.1) IC = nlogdet((cid:6)^ )+np(log2(cid:25)+1)+m ; j j j which is called a log-likelihood-based information criterion, where m is a j positive constant expressing a penalty for the complexity of the model (1.3). 8 R. ENOMOTO, T. SAKURAI AND Y. FUJIKOSHI A speci(cid:12)c criterion is given by specifying the individual penalty term m . It j contains AIC, BIC, CAIC, AICc, MAIC and MAIC as a special case, as L H follows. 8 >>> 2fqkj +p(p+1)=2g (AIC) >>>< fqkj +p(p+1)=2glogn (BIC) fqk +p(p+1)=2g(1+logn) (CAIC) (3.2) m = j : j >>> bA1;j (AICc) > >> b (MAIC ) : AL;j L b (MAIC ) AH;j H Here the quantities b , b and b are the same ones as in (2.6), (2.9) A1;j AL;j AH;j and (2.11), respectively. In this section we show that the asymptotic probability of selecting the true model by AIC and its modi(cid:12)cations goes to 1 when the number q and the sample size n are approaching to 1 as in (1.11), under some additional assumptions. We denote the AIC for M by AIC . The best model chosen by j j minimizing the AIC is written as ^j = argminAIC : AIC j j2F Similar notations are used for the other criteria. The consistency property of IC is examined by using a key result (see, e.g., Fujikoshi, Enomoto and Sakurai (2013)) j(n(cid:0)q)Sj jW j (j) (3.3) = ; jn(cid:6)^jj jW(j)+B(j)j whereW(j) areindependentlydistributedasaWishartdistributionWp(cid:0)kj(n(cid:0) q;Ip(cid:0)kj) and a noncentral distribution Wp(cid:0)kj(q;Ip(cid:0)kj;Ωj), respectively. The matrix Ω is de(cid:12)ned by (1.9). j Our main assumptions are summarized as follows: A1 (The true model M ): j 2 F. 0 0 A2 (The asymptotic framework): q ! 1; n ! 1; q=n ! d 2 [0;1). A3 (The order assumption (i) of Ωj): For j 2 F(cid:0), (cid:3) Ω = n∆ = O (n) and lim ∆ = ∆ : j j g j j q=n!d A4 (The order assumption (ii) of Ωj): For j 2 F(cid:0), (cid:3) Ω = nq(cid:4) = O (nq) and lim (cid:4) = (cid:4) : j j g j j q=n!d Our consistency properties of a log-likelihood-based information criterion are given in two theorems, depending on the assumptions A3 and A4 on the order of the noncentrality matrix Ω as follows. j CONSISTENCY PROPERTIES 9 Theorem 3.1. Suppose that the assumptions A1, A2 and A3 are satis(cid:12)ed. (1)Letd ((cid:25) 0:797)betheconstantsatisfyinglog(1(cid:0)d )+2d = 0. Further, a a a assume that d 2 [0;d ), and a A5: For any j 2 F(cid:0), logjIp(cid:0)kj +∆(cid:3)jj > (k0(cid:0)kj)f2d+log(1(cid:0)d)g: Then, the model selection criterion AIC is consistent, i.e., the asymptotic probability of selecting the true model j by the AIC tends to 1, which may be 0 stated as lim P(^j = j ) = 1: AIC 0 q=n!d (2) Suppose that A6: For any j 2 F(cid:0), { } 2d logjIp(cid:0)kj +∆(cid:3)jj > (k0(cid:0)kj) 1(cid:0)d +log(1(cid:0)d) : Then, the model selection criteria AICc, MAIC and MAIC are consistent. L H (3) The model selection criteria BIC and CAIC are not consistent. More precisely, the probability of selecting the true model by BIC or CAIC tends to zero. Theorem 3.1 is an extension of Enomoto, Sakurai and Fujikoshi (2013) which proves consistency of AIC and MAIC in the case of selection of hier- H arichical models on the row vectors of X. Theorem 3.2. Suppose that the assumptions A1, A2 and A4 are satis(cid:12)ed. (1) If d 2 [0;d ), then, the model selection criterion AIC is consistent. Here a d is given Theorem 3.1. a (2) Suppose that for any j 2 F(cid:0), j(cid:4)jj > 0. Then, the model selection criteria AICc, BIC, CAIC, MAIC and MAIC are consistent. L H x4. Consistency of Cp and MCp In this section we give consistency properties of Cp and MCp. The derivation is done in a way similar to one for a log-likelihood-based information criterion, with the help of n tr(cid:6)^ S(cid:0)1 = tr(n(cid:6)^ )f(n(cid:0)q)Sg(cid:0)1 n(cid:0)q j j (4.1) = p+trB W(cid:0)1; (j) (j) where W and B are the same random matrices as in (3.3). (j) (j) 10 R. ENOMOTO, T. SAKURAI AND Y. FUJIKOSHI Theorem 4.3. Suppose that the assumptions A1, A2 and A3 are satis(cid:12)ed. Further, assume that A7: For any j 2 F(cid:0), tr∆(cid:3) > d(k (cid:0)k ): j 0 j Then, the model selection criteria Cp and MCp are consistent. Theorem 4.4. Suppose that the assumptions A1, A2 and A4 are satis(cid:12)ed. Further, suppose that for any j 2 F(cid:0), tr(cid:4)(cid:3)j > 0. Then, the model selection criteria Cp and MCp are consistent. These results will be worthy of note, since Cp and MCp are known to be inconsistent under a large-sample framework. x5. Simulation study Inthissection,wenumericallyexaminethevalidityofourclaimsandthespeed of the convergences of the criteria. Monte Carlo simulations were considered for several different values of n and q = dn, where p = 5, n = 50;100;200, n = (cid:1)(cid:1)(cid:1) = n = n=q and d = 0:1;0:2. We constructed a 5 (cid:2) 5 matrix X 1 q as in (1.2) of explanatory variables with t = 1+(i(cid:0)1)(p(cid:0)1)(cid:0)1. The true i covariance matrix (cid:6) was determined such that its (i;j)th element is (cid:26)ji(cid:0)jj, 0 where (cid:26) = 0:2;0:8. We consider the (cid:12)ve candidate models M ;:::;M , where 1 5 M denotes the model with the (cid:12)rst j rows of X. So, in this section a subset j j means j = f1;:::;jg. We assume that M is the minimum model including 2 the true model. The true model are included in M ;M ;M ;M , but it is not 2 3 4 5 included in M . Therefore, Ω = 0 when M ;M ;M ;M , and Ω ̸= 0 when 1 j 2 3 4 5 j M . 1

Description:
Abstract. This paper is concerned with consistency properties of some criteria . modifications of AIC and Cp. Consistency properties of a log-likelihood-based.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.