VarianceAnalysisofLinearSIMOModelswithSpatially (cid:63) CorrelatedNoise NiklasEveritta,GiulioBottegala,CristianR.Rojasa,H˚akanHjalmarssona aACCESSLinnaeusCenter,SchoolofElectricalEngineering,KTH-RoyalInstituteofTechnology,Sweden 5 1 0 Abstract 2 Substantialimprovementinaccuracyofidentifiedlineartime-invariantsingle-inputmulti-output(SIMO)dynamicalmodelsis n possiblewhenthedisturbancesaffectingtheoutputmeasurementsarespatiallycorrelated.Usinganorthogonalrepresentation a J forthemodulescomposingtheSIMOstructure,inthispaperweshowthatthevarianceofaparameterestimateofamodule is dependent on the model structure of the other modules, and the correlation structure of the disturbances. In addition, we 3 quantifythevariance-errorfortheparameterestimatesforfinitemodelorders,wheretheeffectofnoisecorrelationstructure, 1 model structure and signal spectra are visible. From these results, we derive the noise correlation structure under which the mentioned model parameterization gives the lowest variance, when one module is identified using less parameters than the ] Y othermodules. S . Keywords: Systemidentification,Asymptoticvariance,LinearSIMOmodels,Least-squares. s c [ 1 1 Introduction available information to the identification process, we v ask the fundamental questions: will, and how much, an 7 added sensor improve the accuracy of an estimate of a 7 Recently,systemidentificationindynamicnetworkshas certaintargettransferfunctioninthenetwork?Weshall 0 gained popularity, see e.g., Van den Hof et al. (2013); attempt to give some answers by focusing on a special 3 Dankers et al. (2013b,a, 2014); Ali et al. (2011); Mat- case of dynamic networks, namely single-input multi- 0 erassietal.(2011);Torresetal.(2014);HaberandVer- 1. haegen (2014); Gunes et al. (2014); Chiuso and Pil- output(SIMO)systems,andinawidercontext,dynamic networks. 0 lonetto (2012). In this framework, signals are modeled 5 asnodesinagraphandedgesmodeltransferfunctions. 1 To estimate a transfer function in the network, a large SIMO models are interesting in themselves. They find : numberofmethodshavebeenproposed.Somehavebeen applications in various disciplines, such as signal pro- v showntogiveconsistentestimates,providedthatacer- cessing and speech enhancement (Benesty et al., 2005), i X tainsubsetofsignalsisincludedintheidentificationpro- (Doclo and Moonen, 2002), communications (Bertaux r cess(Dankersetal.,2013b,a).Inthesemethods,theuser etal.,1999),(Schmidt,1986),(Trudnowskietal.,1998), a has the freedom to include additional signals. However, biomedical sciences (McCombie et al., 2005) and struc- little is known on how these signals should be chosen, turalengineering(Ulusoyetal.,2011).Someoftheseap- and how large the potential is for variance reduction. plications are concerned with spatio-temporal models, Fromatheoreticalpointofviewthereareonlyafewre- inthesensethatthemeasuredoutputcanbestrictlyre- sultsregardingspecificnetworkstructurese.g.,H¨agget latedtothelocationatwhichthesensorisplaced(Sto- al.(2011);Ramazietal.(2014);Wahlbergetal.(2009). ica et al., 1994), (Viberg et al., 1997), (Viberg and Ot- Togetabetterunderstandingofthepotentialofadding tersten, 1991). In these cases, it is reasonable to expect that measurements collected at locations close to each otherareaffectedbydisturbancesofthesamenature.In (cid:63) This work was partially supported by the Swedish Re- otherwords,noiseontheoutputscanbecorrelated;un- searchCouncilundercontract621-2009-4017,andbytheEu- derstanding how this noise correlation affects the accu- ropeanResearchCouncilundertheadvancedgrantLEARN, racyoftheestimatedmodelisakeyissueindata-driven contract267381. modelingofSIMOsystems. Emailaddresses: [email protected](NiklasEveritt), [email protected](GiulioBottegal),[email protected](Cristian R.Rojas),[email protected](H˚akanHjalmarsson). For SIMO systems, our aim in this contribution is to PreprintsubmittedtoAutomatica 14January2015 quantify the model error induced by stochastic distur- 1.1 Contributionofthispaper bances and noise in prediction error identification. We consider the situation where the true system can accu- Asamotivation,letusfirstintroducethefollowingtwo rately be described by the model, i.e., the true system outputexample.Considerthemodel: lies within the set of models used, and thus the bias (systematic)erroriszero.Then,themodelerrormainly y (t)=θ u(t 1)+e (t), consistsofthevariance-error,whichiscausedbydistur- 1 1,1 − 1 y (t)=θ u(t 2)+e (t), bances and noise when the model is estimated using a 2 2,2 − 2 finitenumberofinput-outputsamples.Inparticular,we shall quantify the variance error in terms of the noise where the input u(t) is white noise and ek,k = 1,2 is covariancematrix,inputspectrumandmodelstructure. measurement noise. We consider two different types of These quantities are also crucial in answering the ques- measurementnoise(uncorrelatedwiththeinput).Inthe tionswehaveposedabove,namely,whenandhowmuch, firstcase,thenoiseisperfectlycorrelated,letusforsim- adding a sensor pays off in terms of accuracy of the es- plicity assume that e1(t) = e2(t). For the second case, timatedtargettransferfunction. e1(t)ande2(t)areindependent.Itturnsoutthatinthe first case we can perfectly recover the parameters θ 1,1 andθ ,while,inthesecondcasewedonotimprovethe 2,2 There are expressions for the model error, in terms of accuracy of the estimate of θ by also using the mea- 1,1 the asymptotic (in sample size) (co-) variance of the surementy (t).Thereasonforthisdifferenceisthat,in 2 estimated parameters, for a variety of identification thefirstcase,wecanconstructthenoisefreeequation methodsformulti-outputsystems(Ljung,1999),(Ljung and Caines, 1980). Even though these expressions cor- y (t) y (t)=θ u(t 1) θ u(t 2) respondtotheCram´er-Raolowerbound,theyaretypi- 1 − 2 1,1 − − 2,2 − callyratheropaque,inthatitisdifficulttodiscernhow and we can perfectly recover θ and θ , while in the themodelaccuracyisinfluencedbytheaforementioned 1,1 2,2 second case neither y (t) nor e (t) contain information quantities.Thereisawell-knownexpressionforthevari- 2 2 aboute (t). ance of an estimated frequency response function that 1 lends itself to the kind of analysis we wish to facilitate Alsothemodelstructureplaysanimportantroleforthe (Ljung, 1985; Yuan and Ljung, 1984; Zhu, 1989). This benefit of the second sensor. To this end, we consider formula is given in its SIMO version by (see e.g., Zhu a third case, where again e (t) = e (t). This time, the (2001)) 1 2 modelstructureisslightlydifferent: n CovGˆ(ejω)≈ NΦu(ω)−1Φv(ω), (1) y1(t)=θ1,1u(t−1)+e1(t), y (t)=θ u(t 1)+e (t). 2 2,1 2 − whereΦ istheinputspectrumandΦ isthespectrum u v Inthiscase,wecanconstructthenoisefreeequation of the noise affecting the outputs. Notice that the ex- pression is valid for large number of samples N and y (t) y (t)=(θ θ )u(t 1). large model order n. For finite model order there are 1 2 1,1 2,2 − − − mainly results for SISO models (Hjalmarsson and Nin- ness, 2006; Ninness and Hjalmarsson, 2004, 2005a,b) The fundamental difference is that now only the differ- andrecently,multi-input-single-output(MISO)models. ence (θ1,1 θ2,1) can be recovered exactly, but not the − For MISO models, the concept of connectedness (Gev- parametersθ1,1andθ2,1themselves.Theycanbeidenti- ersetal.,2006)givesconditionsonwhenoneinputcan fiedfromy1(t)andy2(t)separately,aslongasy1 andy2 help reduce the variance of an identified transfer func- aremeasured.AsimilarconsiderationismadeinLjung tion. These results were refined in M˚artensson (2007). et al. (2011), where SIMO cascade systems are consid- For white inputs, it was recently shown in Ramazi et ered. al. (2014) that an increment in the model order of one transferfunctionleadstoanincrementinthevarianceof This paper will generalize these observations in the fol- another estimated transfer function only up to a point, lowingcontributions: after which the variance levels off. It was also quan- tified how correlation between the inputs may reduce Weprovidenovelexpressionsforthevariance-error • the accuracy. The results presented here are similar of an estimated frequency response function of a in nature to those in Ramazi et al. (2014), while they SIMO linear model in orthonormal basis form, or regard another special type of multi-variable models, equivalently,infixeddenominatorform(Ninnesset namelymulti-inputsingle-output(MISO)models.Note al.,1999).Theexpressionrevealshowthenoisecor- that variance expressions for the special case of SIMO relationstructure,modelordersandinputvariance cascade structures are found in Wahlberg et al. (2009), affectthevariance-erroroftheestimatedfrequency Everittetal.(2013). responsefunction. 2 (cid:104) (cid:105) • Fthoerfarenqoune-nwchyitsepiencptruutmsptehcetrbumen,ewfiteoshfotwhewchoerrreelain- wfohresroemne1o≤rt.h.o.n≤ormnmalabnadsisΓif(uqn)c=tionBs1(q),(.q.).,nBmni.(qO)r-, tionstructureisfocused. {Bk }k=1 thonormalwithrespecttothescalarproductdefinedfor • Winphuetnaonndeomnoedouflteh,ie.eo.u,ttphuetrse,laistiiodnesnhtiipfiebdetuwsienegnltehses c1om(cid:82)pπlfex(efiuωn)gcti(oeniωs)fd(ωz.),Lge(tz)us:iCntr→oduCc1e×mtheasve(cid:104)cft,ogr(cid:105)n:o=- parameters, we derive the noise correlation struc- 2π π ∗ tatio−n tureunderwhichthementionedmodelparameter- izationgivesthelowesttotalvariance. y (t) e (t) 1 1 Thepaperisorganizedasfollows:inSection2wedefine y2(t) e2(t) the SIMO model structure under study and provide an y(t):= .. , e(t):= .. . . . expression for the covariance matrix of the parameter estimates. Section 3 contains the main results, namely y (t) e (t) m m a novel variance expression for LTI SIMO orthonormal basisfunctionmodels.TheconnectionwithMISOmod- The noise sequence e(t) is zero mean and temporally elsisexploredinSection4.InSection5,themainresults { } white,butmaybecorrelatedinthespatialdomain: areappliedtoanon–whiteinputspectrum.Insection6 we derive the correlation structure that gives the mini- E[e(t)]=0 mumtotalvariance,whenoneblockhaslessparameters than the other blocks. Numerical experiments illustrat- E(cid:2)e(t)e(s)T(cid:3)=δt sΛ (4) − ingtheapplicationofthederivedresultsarepresentedin Section7.AfinaldiscussionendsthepaperinSection8. for some positive definite matrix covariance matrix Λ, and where E[] is the expected value operator. We ex- · pressΛintermsofitsCholeskyfactorization 2 ProblemStatement Λ=Λ ΛT , (5) CH CH whereΛ Rm m islowertriangular,i.e., e (t) CH × 1 y (t) ∈ 1 G1(q) Σ γ 0 ... 0 . 11 . u(t) Gk.(q) ek(t)Σ yk(t) ΛCH =γ2...1 γ.2.2. ...... 00 (6) ... γm1 γm2 ... γmm e (t) m y (t) Gm(q) Σ m forsome{γij}.AlsonoticethatsinceΛ>0, Fig.1.BlockschemeofthelinearSIMOsystem. Λ−1 =Λ−CHTΛ−CH1 . (7) We summarize the assumptions on input, noise and Weconsiderlineartime-invariantdynamicsystemswith modelasfollows: one input and m outputs (see Fig. 1). The model is de- scribedasfollows: Assumption1 The input u(t) is zero mean station- { } ary white noise with finite moments of all orders, and y1(t) G1(q) e1(t) varianceσ2 >0.Thenoise e(t) iszeromeanandtem- porally white, i.e, (4) holds{with}Λ > 0, a positive def- y2(t) G2(q) e2(t) (cid:2) (cid:3) ... = ... u(t)+ ... , (2) isnoimteemρa>tr0ix..TIhteisdaatsasuismgeednethraattedEin|eo(tp)e|n4+lρoop<, th∞atfiosr, the input u(t) is independent of the noise e(t) . The ym(t) Gm(q) em(t) trueinput{-outp}utbehaviorofthedatagenera{tings}ystem can be captured by our model, i.e the true system can be whereq denotestheforwardshiftoperator,i.e.,qu(t)= described by (2) and (3) for some parameters θo Rni, i ∈ u(t+1)andtheGi(q)arecausalstablerationaltransfer i = 1,...,m, where n1 ... nm. The orthonormal functions.TheGi aremodeledas basisfunctions k(q) a≤reassu≤medstable. (cid:50) {B } G (q,θ )=Γ (q)θ , θ Rni, i=1,...,m, (3) Remark2 Generalizationofthemodelstructure: i i i i i ∈ 3 Theassumptionthatthemoduleshavethesameor- (1999); S¨oderstr¨om and Stoica (1989)). Λ is assumed • thogonal parameterization in (3) is less restrictive known,however,thisassumptionisnotrestrictivesince than it might appear at first glance, and made for Λcanbeestimatedfromdata.Theestimateofθisgiven clarity and ease of presentation. A model consist- by ingofnon-orthonormalbasisfunctionscanbetrans- formedintothisformatbyalineartransformation, (cid:32) (cid:33) 1 whichcanbecomputedbytheGram-Schmidtproce- θˆ = (cid:88)N ϕ(t)Λ 1ϕT(t) − (cid:88)N ϕ(t)Λ 1y(t). (9) N − − dure (Trefethen and Bau, 1997). Notice that fixed t=1 t=1 denominatormodels,withthesamedenominatorin alltransferfunctions,alsofitthisframework(Nin- Inserting (8)in(9)gives ness et al., 1999). However, it is essential for our analysisthatthemodulessharepoles. (cid:32) (cid:33) 1 It is not necessary to restrict the input to be (cid:88)N − (cid:88)N • white noise. A colored input introduces a weight- θˆN = θ+ ϕ(t)Λ−1ϕT(t) ϕ(t)Λ−1e(t). ing by its spectrum Φu, which means that the t=1 t=1 basis functions need to be orthogonal with re- spect to the inner product f,g := fΦ ,g . Under Assumption 1, the noise sequence is zero mean, If Φu(z) = σ2R(z)R∗(z), w(cid:104)here(cid:105)RΦu(z) is(cid:104)a muoni(cid:105)c hence θˆN is unbiased. It can be noted that this is the stable minimum phase spectral factor; the trans- sameestimateastheoneobtainedbythepredictioner- formation Γ˜(q) = σ 1R(q) 1Γ (q) is a procedure ror method and, if the noise is Gaussian, by the max- i − − i imum likelihood method (Ljung, 1999). It also follows that gives a set of orthogonal basis functions in the thattheasymptoticcovariancematrixoftheparameter weighted space. If we use this parameterization, all estimatesisgivenby the main results of the paper carry over naturally. However,ingeneral,thenewparameterizationdoes not contain the same set of models as the original AsCovθˆN =(cid:0)E(cid:2)ϕ(t)Λ−1ϕT(t)(cid:3)(cid:1)−.1 (10) parameterization.AnotherwayistousetheGram- Schmidt method, which maintains the same model Here AsCov θˆ is the asymptotic covariance matrix set and the main results are still valid. If we would N of the parameter estimates, in the sense that the like to keep the original parameterization, we may, asymptotic covariance matrix of a stochastic sequence in some cases, proceed as in Section 5 where the f ,f C1 q isdefinedas1 inputsignalisgeneratedbyanAR-spectrum. { N}∞N=1 N ∈ × Notethattheassumptionthatn ... n isnot 1 m • restrictive as it only represents a≤n orde≤ring of the AsCovfN := limN E[(fN E[fN])∗(fN E[fN])]. modulesinthesystem. N→∞ · − − In the problem we consider, using Parseval’s formula 2.1 Weightedleast-squaresestimate and(7),theasymptoticcovariancematrix,(10),canbe writtenas2 Banydinthtreonducimngtθra=nsf(cid:104)eθr1Tf,u.n.c.t,iθomTn(cid:105)mTa∈triRxn, n := (cid:80)mi=1ni AsCovθˆ =(cid:20) 1 (cid:90) πΨ(ejω)Ψ (ejω)dω(cid:21)−1 N ∗ × 2π π Γ 0 0 = Ψ,Ψ −−1, (11) 1 (cid:104) (cid:105) Ψ˜(q)=0 ... 0 , where 0 0 Γ m 1 Ψ(q)= σΨ˜(q)Λ−CHT. (12) wecanwritethemodel(2)asalinearregressionmodel y(t)=ϕT(t)θ+e(t), (8) Note that Ψ(q) is block upper triangular since Ψ˜(q) is blockdiagonalandΛ−CHT isuppertriangular. where 1 Thisdefinitionisslightlynon-standardinthatthesecond ϕT(t)=Ψ˜(q)Tu(t). term is usually conjugated. For the standard definition, in general,allresultshavetobetransposed,however,allresults An unbiased and consistent estimate of the parameter inthispaperaresymmetric. vector θ can be obtained from weighted least-squares, 2 Non-singularityof(cid:104)Ψ,Ψ(cid:105)usuallyrequiresparameteriden- with optimal weighting matrix Λ 1 (see, e.g., Ljung tifiabilityandpersistenceofexcitation(Ljung,1999). − 4 2.2 Theintroductoryexample correlation of the noise processes, i.e., they are in- dependentofthevalueofβ.However,notethatthe crosscorrelationbetweenθˆ andθˆ in(16): With formal assumptions in place, we now consider the 1,1 2,1 introductory example in greater detail. Consider the (cid:112) model Var(θˆ1,1 1 β2θˆ2,1) − − (cid:34) (cid:35)T (cid:34) (cid:35) 1 1 1 yy1((tt))==θθ1,1qq−11uu((tt))++θe1(tq), 2u(t)+e (t) ((1143)) = (cid:112)1 β2 σ2Λ (cid:112)1 β2 2 2,1 − 2,2 − 2 − − − − 1 = β2. (17) whichusesthedelaysq 1 andq 2 asorthonormalbasis σ2 − − functions. With θ = [θ θ θ ]T; the corresponding 1,1 2,1 2,2 regressionmatrixis Thiscrosscorrelationwillinduceacrosscorrelation inthetransferfunctionestimatesaswell. (cid:34) (cid:35) (2) As seen in (16), the variance of θˆ1,2 strongly de- u(t 1) 0 0 ϕ(t)T = − . pends on β. In particular, when β tends to 0, one 0 u(t 1) u(t 2) is ideally able to estimate θˆ1,2 perfectly. Note that − − in the limit case β = 0 one has e (t) = e (t), so 1 2 that (2) can be rearranged to obtain the noise-free Thenoisevectorisgeneratedby equation (cid:34) (cid:35) (cid:34) (cid:35)(cid:34) (cid:35) e1(t) =Lw(t)= (cid:112) 1 0 w1(t) , (15) y1(t)−y2(t)=(θ1,1−θ2,1)u(t−1)+θ1,2u(t−2), e (t) 1 β2 β w (t) 2 2 − whichshowsthatbothθ andthedifferenceθ 1,2 1,1 − θ can be estimated perfectly. This can of course wherew (t)andw (t)areuncorrelatedwhiteprocesses 2,1 1 2 alsobeseenfrom(16),cf.(17). with unit variance. The parameter β [0, 1] tunes the ∈ correlationbetweene (t)ande (t).Whenβ =0,thetwo 1 2 The example shows that correlated measurements can processes are perfectly correlated (i.e., identical); con- be favorable for estimating for estimating certain pa- versely, when β = 1, they are completely uncorrelated. (cid:2) (cid:3) rameters, but not necessarily for all. The main focus of Note that, for every β [0, 1], one has E e (t)2 = (cid:2) (cid:3) ∈ 1 this article is to generalize these observations to arbi- E e (t)2 =1.Infact,thecovariancematrixofe(t)be- 2 trarybasisfunctions,numberofsystemsandnumberof comes estimatedparameters.Additionally,theresultsareused (cid:34) (cid:112) (cid:35) to derive the optimal correlation structure of the noise. 1 1 β2 Butfirst,weneedsometechnicalpreliminaries. Λ=LLT = − . (cid:112) 1 β2 1 − 2.3 Reviewofthegeometryoftheasymptoticvariance Then,whencomputing (11)inthisspecificcasegives Thefollowinglemmaisinstrumentalinderivingthere- (cid:112) sultsthatfollow. 1 1 β2 0 AsCovθˆN = σ12 (cid:112)1−β2 1− 0 . (16) (L2e0m11m))aL3et(JLe:mRmnaIIC.91inqHbjeadlmiffaerrsesnotniaabnledwMi˚athrtreensspseocnt 0 0 β2 stopaθn,naedndbyΨth∈e rLon2w×s→mo;f Ψle×taSnΨd be thre su,brspacneboefaLn1×orm- Wenotethat: thonormal basis for . Suppo{sBekSt}hka=t1J (θ≤o) Cn q is Ψ (cid:48) × the gradient of J witSh respect to θ and J (θo)∈= Ψ(z )L AsCov (cid:104)θˆ1,1 θˆ2,1(cid:105)T = σ12Λ, forsomez0 ∈CandL∈Cm×q.Then (cid:48) o 1 AsCovθˆ2,2 = σ2β2. AsCovJ(θˆ )=L (cid:88)r (z ) (z )L. (18) N ∗ BkS o ∗BkS o k=1 Theaboveexpressionsrevealstwointerestingfacts: 2.4 Non-estimablepartofthenoise (1) The (scalar) variances of θˆ and θˆ , namely the 1,1 2,1 estimatesofparametersofthetwomodulesrelated As seen in the example of Section 2.2, strong noise cor- to the same time lag, are not affected by possible relation may be helpful in the estimation. In fact, the 5 variance error will depend on the non-estimable part of Similartotheabove,fori m,wealsodefine ≤ thenoise,i.e.,thepartthatcannotbelinearlyestimated fromothernoisesources.Tobemorespecific,definethe (cid:104) (cid:105)T signal vector e (t) to include the noise sources from ei:m(t):= ei(t) ... em(t) , j i module 1 to mo\dule j, with the one from module i ex- cluded,i.e., and for j < i we define eˆi:mj(t) as the linear minimum variance estimate of e (t)|based on the other signals i:m e (t):= ej i(t),and j i \ \ (cid:104) (cid:105)T (cid:104) e1(t),...,ej(t) (cid:105)T j <i, Λi:m|j :=Cov[ei:m(t)−eˆi:m|j(t)]. e (t),...,e (t) j =i, 1 i 1 (cid:104)e (t),...,e (t),e −(t),...,e (t)(cid:105)T j >i. As a small example of why this formulation is useful, 1 i 1 i+1 j considerthecovariancematrixbelow,wherethereiscor- − relationbetweenanypair(e (t),e (t)): i j Now, the linear minimum variance estimate of e given i e (t),isgivenby j i 1 0.6 0.9 1 0 0 1 0.6 0.9 \ eˆ (t):=(cid:37)Te (t). (19) Λ=0.6 1 0.54=0.6 0.8 0 0 0.8 0 . ij ij j i | \ 0.9 0.54 1 0.9 0 0.44 0 0 0.44 Introducethenotation (cid:124) (cid:123)(cid:122) (cid:125) ΛCH λ :=Var[e (t) eˆ (t)], (20) i|j i − i|j FromtheCholeskyfactorizationaboveweseethat,since γ is zero, Lemma 5 gives that e (t) is orthogonal to with the convention that λ := λ . The vector (cid:37) in 32 3 i0 i ij e (t) given e (t), i.e., there is no information about (19)isgivenby | 2 2 3 \ e (t) in e (t) if we already know e (t). This is not ap- 3 2 1 parentfromΛwhereeveryentryisnon-zero.Ifweknow (cid:2) (cid:3) 1 (cid:2) (cid:3) (cid:37)ij = Covej i(t) − E ej i(t)ei(t) . e (t) a considerable part of e (t) and e (t) can be esti- \ \ 1 2 3 mated.Withoutknowinge (t),λ =λ =λ =1,while 1 1 2 3 Wecall ifweknowe (t),λ =0.64andλ =0.19. 1 21 31 | | e (t) eˆ (t) i − i|j 3 Mainresults thenon-estimablepartofe (t)givene (t). i j i \ In this section, we present novel expressions for the variance-errorofanestimatedfrequencyresponsefunc- Definition4 When eˆ (t) does not depend on e (t), ij k tion. The expression reveals how the noise correlation where 1 k j, k = i|, we say that e (t) is orthogonal ≤ ≤ (cid:54) i structure, model orders and input variance affect the toe (t)conditionallytoe (t). k j i variance-erroroftheestimatedfrequencyresponsefunc- \ tion. We will analyze the effect of the correlation struc- The variance of the non-estimable part of the noise is ture of the noise on the transfer function estimates. To closely related to the Cholesky factor of the covariance thisend,collectallmtransferfunctionsinto matrixΛ.Wehavethefollowinglemma. (cid:104) (cid:105) Lemma5 Let e(t) Rm have zero mean and covari- G:= G1 G2 ... Gm . ∈ ance matrix Λ > 0. Let Λ be the lower triangular CH Cholesky factor of Λ, i.e., Λ satisfies (5), with γ For convenience, we will simplify notation according to CH ik asitsentriesasdefinedby (6).Thenforj <i, { } thefollowingdefinition: i Definition6 The asymptotic covariance of Gˆ(ejω0) := (cid:88) λij = γi2k. G(ejω0,θˆN)forthefixedfrequencyω0 isdenotedby | k=j+1 AsCovGˆ. Furthermore,γ =0isequivalenttothat e (t)isorthog- ij i onaltoej(t)conditionallytoej i(t). In particular, the variance of Gˆi(ejω0) := Gi(ejω0,θˆiN) \ forthefixedfrequencyω isdenotedby 0 PROOF. SeeAppendixA. (cid:4) AsVarGˆ . i 6 Defineχ astheindexofthefirstsystemthatcontainsthe k boafssiyssftuenmcstitohnatBdko(enjωo0t)c.oNnotatiicnetthheatbaχskis−fu1niscttihoen.number AsCovθˆ˜1|B1(eiω)|2 . . Theorem7 Let Assumption 1 hold. Suppose that the . parameters θi Rni, i = 1,...,m, are estimated using ∈ weightedleast-squares(9).Lettheentriesofθbearranged AsCovθˆ˜ (eiω)2 asfollows: k|Bk | θ¯=[θ ... θ θ ... θ ... θ ... .. 1,1 m,1 1,2 m,2 1,n1 . ... θ θ ... θ ... θ ]T.(21) m,n1 2,n1+1 m,n1+1 m,nm AsCovθˆ˜ (eiω)2 andthecorrespondingweightedleast-squaresestimatebe nm|Bnm | denotedbyθˆ¯.Then,thecovarianceofθˆ¯is AsCovθˆ¯= 1 diag(Λ ,Λ ,..., σ2 1:m χ2:m|χ2−1 Fig. 2. A graphical representation of AsCovGˆ where each ...,Λ ). (22) term of the sum in (25) is represented by a layer. A basis χnm:m|χnm−1 function only affects the covariance between modules that Inparticular,thecovarianceoftheparametersrelatedto alsocontainthatbasisfunction.Thus,thefirstbasisfunction basisfunctionnumberk isgivenby affects the complete covariance matrix while the last basis functionn onlyaffectsmodulesχ ,...,m. m nm AsCovθˆ¯ = 1 Λ , (23) Remark9 The orthogonal basis functions correspond k σ2 χk:m|χk−1 to a decomposition of the output signals into orthogonal components and the problem in a sense becomes decou- where pled.Asanexample,considerthesystemdescribedby θˆ¯k =(cid:104)θˆχk,k ... θˆm,k(cid:105)T , y1(t)=θ1,1 1(q)u(t)+e1(t), B y (t)=θ (q)u(t)+e (t), 2 2,1 1 2 andwhere,forχ i m, B k ≤ ≤ y3(t)=θ3,1 1(q)u(t)+θ3,2 2(q)u(t)+e3(t), (26) B B λ AsVarθˆi,k = i|σχk2−1. (24) pSaurpapmoseetetrh,a(t2w4e)abreecoimnteesrestedinestimatingθ3,2.Forthis Italsoholdsthat λ AsCovGˆ =(cid:88)nm (cid:34)0χ0k−1AsCo0vθˆ¯ (cid:35)|Bk(ejω0)|2, (25) To understand thAesVmaerchθˆ3a,n2i=smsσ32b|2ehind this express(i2o7n), k=1 k let u (t) = (q)u(t), and u (t) = (q)u(t) so that 1 1 2 2 B B the system can be visualized as in Figure 3, i.e., we can 1wherχe AsC1ovmθˆ¯aktriisxgwivitehnablyl e(2n3tr)ieasndeq0uχakl−t1oizserao.χkFo−r consideru1 andu2 asseparateinputs. k × − zχekro=m1a,tr0iχceks−1ofisdiamneenmsipotnysmcoamtrpiax.tibInle(t2o5t)h,e0ddiaegnoonteasl Ftioirnstawboeuotbθserv,eatnhdatthietitseromnlyθy3uthactocnotrnitbauitnisngintfoorymais- 3,2 3,1 1 3 blocks. a nuisance from the perspective of estimating θ . This 3,2 term vanishes when u = 0 and we will not be able to 1 achievebetteraccuracythantheoptimalestimateofθ PROOF. SeeAppendixB. (cid:4) for this idealized case. So let us study this setting fir3s,t2. Straightforward application of the least-squares method, Remark8 Thecovarianceofθˆ¯ ,whichcontainthepa- using u and y , gives an estimate of θ with variance k 2 3 3,2 rameters related to basis function k, is determined by λ/σ2, which is larger than (27) when e depends on e 3 1 which other models share the basis function ; cf. (17) and e . However, in this idealized case, y = e and Bk 2 1 1 of the introductory example. The asymptotic covariance y = e , and these signals can thus be used to estimate 2 2 ofGˆcanbeunderstoodasasumofthecontributionsfrom e3. This estimate can then be subtracted from y3 before eachofthen basisfunctions.Thecovariancecontribu- theleast-squaresmethodisapplied.Theremainingnoise m tion from a basis function k is weighted by k(ejω0)2 iny3 willhavevarianceλ32,ife3 isoptimallyestimated andonlyaffectsthecovarianBcebetweensystem|Bsthatcon|- (see(19)–(20)),andhence|theleast-squaresestimatewill tainthatbasisfunction,asvisualizedinFigure2. nowhavevarianceλ /σ2,i.e.,thesameas (27). 32 | 7 Tounderstandwhyitispossibletoachievethesameac- curacy as this idealized case when u is non-zero, we e1(t) 1 need to observe that our new inputs u (t) and u (t) are y (t) 1 2 1 orthogonal(uncorrelated)3.Returningtothecasewhen θ1,1 Σ onlytheoutputy isusedforestimatingθ ,thisimplies 3 3,2 thatwepaynopriceforincludingthetermθ3,1u1 inour e2(t) model,andthenestimatingθ andθ jointly,i.e.,the u (t) y (t) 3,1 3,2 1 2 variance of θˆ will still be λ/σ24. The question now is θ2,1 Σ 3,2 ifwecanusey andy asbeforetoestimatee .Perhaps 1 2 3 e (t) eˆ (t) surprisingly, we can use the same estimate as when u1 3 − 3|2 wnoaws,zeinroa.dTdhiteiorneatdoetrhemparyevoibojuecstotphtaimt tahliessetismtimataetoefweil,l u1(t) θ˜3,1 Σ y˜3(t) 3 contain a term which is a multiple of u . However, due 1 to the orthogonality between u and u , this term will 1 2 only affect the estimate of θ (which we anyway were u (t) 3,1 2 θ not interested in, in this example), and the accuracy of 3,2 the estimate of θ will be λ /σ2, i.e. (27). Figure 4 3,2 32 illustrates the setting with y˜ d|enoting y subtracted by 3 3 Fig. 4. The SIMO system of Remark 9, described by (26), theoptimalestimateofe .Inthefigure,thenewparam- 3 butwithy˜ denotingy subtractedbytheoptimalestimate eter θ˜3,1 reflects that the relation between u1 and y˜3 is ofe3 andθ˜33,1 reflectsth3attherelationbetweenu1 andy˜3 is differentfromθ3,1asdiscussedabove.Akeyinsightfrom differentfromθ3,1. thisdiscussionisthatfortheestimateofaparameterin the path from input i to output j, it is only outputs that Corollary10 Let the same assumptions as in Theo- are not affected by input i that can be used to estimate rem7hold.Then,foranyfrequencyω ,itholdsthat 0 the noise in output j; when this particular parameter is estimated,usingoutputsinfluencedbyinputiwillintro- (cid:88)ni duce a bias, since the noise estimate will then contain a AsVarGˆ = (ejω0)2AsVarθˆ , (28) i k i,k term that is not orthogonal to this input. In (24), this |B | k=1 manifestsitselfinthatthenumeratorisλ ,onlythe χ 1firstsystemsdonotcontainu . i|χk−1 where k i − λ AsVarθˆi,k = i|σχk2−1, (29) e (t) 1 y (t) 1 θ1,1 Σ andλ isdefinedin (20). ij | e (t) u (t) 2 y (t) 1 θ2,1 Σ 2 PROOF. Follows from Theorem 7, since (28) is a di- agonalelementof (25). (cid:4) e (t) 3 y (t) 3 θ3,1 Σ FromCorollary10,wecantellwhenincreasingthemodel orderofG willincreasetheasymptoticvarianceofGˆ . j i u2(t) Corollary11 Under the same conditions as in Theo- θ 3,2 rem7,ifweincreasethenumberofestimatedparameters ofG fromn ton +1,theasymptoticvarianceofG will j j j i Fig.3.TheSIMOsystemofRemark9,describedby (26). increase,ifandonlyifallthefollowingconditionshold: (1) n <n , j i (2) e (t) is not orthogonal to e (t) conditioned on We now turn our attention to the variance of the indi- i j e (t), vidualtransferfunctionestimates. j i (3) |B\nj+1(ejω0)|2 (cid:54)=0. 3 Thissinceu(t)iswhiteandB andB areorthonormal. 1 2 4 Withu andu correlated,thevariancewillbehigher,see Section41forafu2rtherdiscussionofthistopic. PROOF. SeeAppendixC. (cid:4) 8 Remark12 Corollary 11 explicitly tells when an in- Ifthemodelordersaredistinct,thecorrespondinggraph crease in the model order of Gj from nj to nj +1 will is given in Figure 5, where one can see that AsVarGˆ4 increase the variance of Gi. If nj ≥ ni, there will be no dependsony2(andony4ofcourse),butdependsneither increase in the variance of Gi, no matter how many ad- on y nor y since γ =γ =0, AsVarGˆ depends on 3 1 43 41 3 ditionalparametersweintroducetothemodelGj,which y ,butnotony sinceγ =0andAsVarGˆ dependson was also seen the introductory example in Section 2.2. 1 2 32 2 y ,whilethevarianceofGˆ isgivenby(30).Ifn =n , Naturally, if e (t) is orthogonal to e (t) conditioned on 1 1 2 4 i j thefirstconditionofCorollary11isnotsatisfiedandwe e (t), eˆ (t) does not depend on e (t) and there is no inj\cireaseiin|jvarianceofGˆ ,cf.Remajrk9. have to cut the edge between Gˆ4 and Gˆ2. Similarly, if i n = n , we have to cut the edge between Gˆ and Gˆ , 1 2 2 1 and if additionally n = n = n , we have to cut the 1 2 3 3.1 AgraphicalrepresentationofCorollary11 edgebetweenGˆ andGˆ . 3 1 FollowingthenotationinBayesianNetworks(Koskiand Noble, 2012), Conditions 1) and 2) in Corollary 11 can Gˆ Gˆ Gˆ Gˆ 1 2 3 4 be interpreted graphically. Each module is represented by a vertex in a weighted directed acyclic graph . Let G theverticesbeorderedbymodelorder,i.e.,letthefirst vertexcorrespondtoGˆ .UnderAssumption1,withthe Fig. 5. Graphical representation of Conditions 1) and 2) in 1 additionalassumptionthatmoduleiisthefirstmodule Corollary11fortheCholeskyfactorgivenin(31). with order n , let there be an edge, denoted by j i, i → fromvertexjtoi,j <i,ife (t)isnotorthogonaltoe (t) i j conditioned on e (t). Notice that this is equivalent to j i 4 ConnectionbetweenMISOandSIMO γ = 0. Let the\weight of the edge be γ and define ij ij (cid:54) the parents of vertex i to be all nodes with a link to vertexi,i.e.,pa (i):= j :j i .Then,(29),together There is a strong connection between the results pre- withLemma5,sGhowsth{atonl→you}tputscorrespondingto sentedhereandthoseregardingMISOsystemspresented parentsofnodeiaffecttheasymptoticvariance.Indeed, in Ramazi et al. (2014). We briefly restate the problem avertexwithoutparentshasvariance formulationandsomeresultsfromRamazietal.(2014) toshowtheconnection.TheMISOdatageneratingsys- AsVarGˆ = λi (cid:88)ni (ejω0)2, (30) tmemspiastiinallsyomcoerrseelnasteedthinepduutaslaonfdthoeneSIoMutOpucta,sae.MWISitOh i σ2 |Bk | k=1 systemisdescribedby whichcorrespondsto(28)with u (q) 1 λ =...=λ =λ . (cid:104) (cid:105)u2(q) i|0 i|i−1 i y(t)= G1(q) G2(q) ... Gm(q) .. +e(t). . Thus,AsVarGˆ isindependentofthemodelorderofthe i u (q) othermodules. m Theinputsequence u(t) iszeromeanandtemporally As an example, consider four systems with the lower { } white,butmaybecorrelatedinthespatialdomain, Choleskyfactorofthecovarianceofe(t)givenby: E[u(t)]=0 1 0.4 0.2 0 E(cid:2)u(t)u(s)T(cid:3)=δ Σ , t s u 0.4 1.16 0.08 0.3 − Λ= forsomepositivedefinitematrixcovariancematrixΣ = 0.2 0.08 1.04 0 u Σ ΣT ,whereΣ isthelowertriangularCholesky CH CH CH 0 0.3 0 1.09 factorofΣ.Thenoisee(t)iszeromeanandhasvariance T λ. The asymptotic covariance of the estimated parame- 1 0 0 0 1 0 0 0 terscanbeexpressedusing (11)with 0.4 1 0 00.4 1 0 0 = (31) Ψ =ΨMISO :=Ψ˜Σ . (32) 0.2 0 1 00.2 0 1 0 CH 0 0.3 0 1 0 0.3 0 1 We make the convention that (cid:80)k2 x = 0 whenever (cid:124) (cid:123)(cid:122) (cid:125) k=k1 k k >k . ΛCH 1 2 9 Theorem13(Theorem4inRamazietal.(2014)) MISO systems. Let the first χ systems contain basis k UnderAssumption1,butwithn n ..., n ,for functionk,so 1 2 m ≥ ≥ ≥ anyfrequencyω itholdsthat 0 (cid:88)m λ (cid:88)nj AsVarθˆ¯kMISO =λΣ1−:χ1k AsVarGˆ = (ejω0)2, (33) i σ2 |Bk | where Σ denotes the covariance matrix of the first j=i i|j k=nj+1+1 χ input1s:.χHkence k ewshteimreabnlme+pa1rt:=of0uia(nt)dgσivi2|ejniusjt\hi(et)v.ariance of the non- AsCovGˆ =λ(cid:88)n1 (cid:34)Σ1−0:χ1k 0 0 (cid:35)|Bk(ejω0)|2, Corollary14(Corollary6inRamazietal.(2014)) k=1 m−χk Under Assumption 1, but with n1 n2 ..., nm. and ≥ ≥ ≥ Suppose that the order of block j is increased from n to j nanj+ce1.oTfGhˆeniftahnedreoinslyanifianllctrheaesfeolilnowthinegacsoynmdpittiootnicshvaorldi-: AsCovθˆ¯MISO =λ diag(Σ1−:χ11,Σ1−:χ12,...,Σ1−:χ1nm).(36) i Note that, while the correlation between the noise (1) n <n , j i sourcesisbeneficial,thecorrelationintheinputisdetri- (2) u (t) is not orthogonal to u (t) conditioned on i j mentalfortheestimationaccuracy.Intuitively,ifweuse u (t), j i thesameinputtoparallelsystems,andonlyobservethe (3) |Bn\j+1(ejω0)|2 (cid:54)=0. sum of the outputs, there is no way to determine the contribution from the individual systems. On the other Remark15 The similarities between Corollary 10 and hand, as observed in the example in Section 2.2, if the Theorem13,andbetweenCorollary11andCorollary14 noise is correlated, we can construct equations with re- arestriking.Inbothcasesitisthenon-estimablepartof ducednoiseandimprovetheaccuracyofourestimates. theinputandnoise,respectively,alongwiththeestimated basis functions that are the key determinants for the re- This difference may also be understood from the struc- sulting accuracy. Just as in Corollary 10, Theorem 13 ture of Ψ, which through (11) determines the variance canbeexpressedwithrespecttothebasisfunctions: properties of any estimate. Consider a single SISO sys- tem G as the basic case. For the SIMO structure con- AsVarGˆ =(cid:88)ni AsVarθˆ (ejω0)2. (34) sidered1in this paper, as noted before, ΨSIMO of (12) is i i,k|Bk | block upper triangular with m columns (the number of k=1 outputs), while ΨMISO is block lower triangular with as However,now many columns as inputs. ΨMISO is block lower triangu- lar since Ψ˜ is block diagonal and Σ is lower triangu- CH AsVarθˆi,k = σ2λ (35) lcaorrriensp(3o2n)d.sAtdodeinxgteanndionugtpΨuStIMyOj twoitthheoSnIeMcoOlusmtrnuc(taunrde i|χk nj rows): whereσ2 isdeterminedbythecorrelationstructureofthe inputs ui|il(t) to the systems Gi(q,θi) that do share basis ΨSIMO =(cid:34)ΨSIMO (cid:63)(cid:35), (37) function k(q) (i = 1,...,χk). Note that in the SIMO e 0 (cid:63) B casewehad λ wherethezerocomesfromthatΨSIMO alsoisblockup- AsVarθˆi,k = σi|χ2k per triangular. Since ΨMISO is bleock lower triangular, addinganinputu totheMISOstructureextendsΨMISO j wthheerneoλisie|χksoiusrdceetserem(itn)eadffbeycttihnegcsoyrsrteelamtisonGs(tqru,θct)urtehaotf withnj rows(andonecolumn): i i i (cid:34) (cid:35) do not share basis function k(q) (i = 1,...,χk). Note ΨMISO 0 that (33) found in Ramazi eBt al. (2014) does not draw ΨMISO = , (38) e theconnectiontothevarianceoftheparameters.Thisis (cid:63) (cid:63) madeexplicitinthealternateexpressions (35)and (34). where (cid:63) denotes the added column and added row re- Thecorrelationbetweenparametersrelatedtothesame spectively.AdditionofcolumnstoΨ decreasesthevari- basis functions is not explored in Ramazi et al. (2014). anceofG ,whileadditionofrowsincreasesthevariance. 1 Infact,itispossibletofollowthesamelineofreasoning First, a short motivation of this will be given. Second, leadingtoTheorem7andarriveatthecounter-partfor wewilldiscusstheimplicationforthevarianceanalysis. 10