ebook img

Random regression analyses using B-splines to model growth of Australian Angus cattle PDF

28 Pages·2005·0.52 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Random regression analyses using B-splines to model growth of Australian Angus cattle

Genet.Sel.Evol.37(2005)473–500 473 (cid:1)c INRA,EDPSciences,2005 DOI:10.1051/gse:2005012 Original article Random regression analyses using B-splines to model growth of Australian Angus cattle ∗ Karin M AnimalGeneticsandBreedingUnit1,UniversityofNewEngland, ArmidaleNSW2351,Australia (Received4October2004;accepted2May2005) Abstract – Regression on the basis function of B-splines has been advocated as an alterna- tivetoorthogonalpolynomialsinrandomregressionanalyses.Basictheoryofsplinesinmixed modelanalysesisreviewed,andestimatesfromanalysesofweightsofAustralianAnguscattle from birth to 820 days of age are presented. Data comprised 84533 records on 20731 ani- malsin43herds,withahighproportionofanimalswith4ormoreweightsrecorded.Changes inweightswithageweremodelledthrough B-splinesof ageat recording. Atotal ofthirteen analyses,consideringdifferentcombinationsoflinear,quadraticandcubicB-splinesandupto sixknots,werecarriedout.Resultsshowedgoodagreementforallageswithmanyrecords,but fluctuatedwheredataweresparse.Onthewhole,analysesusingB-splinesappearedmorerobust against“end-of-range”problemsandyieldedmoreconsistentandaccurateestimatesofthefirst eigenfunctions thanprevious, polynomial analyses. A model fittingquadratic B-splines, with knotsat0,200,400,600and821daysandatotalof91covariancecomponents,appearedtobe agoodcompromisebetweendetailednessofthemodel,numberofparameterstobeestimated, plausibilityofresults,andfit,measuredasresidualmeansquareerror. covariancefunction/growth/beefcattle/randomregression/B-splines 1. INTRODUCTION Randomregression(RR)analyseshavebecomeastandardprocedureforthe geneticanalysisof“repeatedrecords”onindividualswhicharerecordedalong acontinuousscale,suchastime.Alargeproportionofapplicationsconsidered test day production of dairy cows, but RR analyses of growth records or data onfeed intake ofmeatproducing animals arefindingincreasing use. Arecent review of applications has been given by Schaeffer [35]. The underlying idea inRRanalysesisthatsubject-specific curvescanbedescribedastheweighted ∗Correspondingauthor:[email protected] 1AGBUisajointventurebetweentheNSWDepartmentofPrimaryIndustriesandtheUniver- sityofNewEngland. Article published by EDP Sciences and available at http://www.edpsciences.org/gseor http://dx.doi.org/10.1051/gse:2005012 474 K.Meyer sumofasetoffunctionsofthecontinuouscovariable,theso-calledbasisfunc- tions. The majority of RR analyses so far fitted polynomial of time or age at recording as basis functions. In particular, following Kirkpatrick et al. [22], Legendrepolynomials havebeenwidelyused. Higher order polynomials are flexible and have been found to be capable of modelling changes in means and variances along a continuous scale well. However, such polynomials often put a high emphasis on observations at the extremes, and are notoriously problematic for high orders of fit. “Runge’s phenomenon” describes the observation that the error of polynomial approx- imation of a curve increased with the order of polynomial fit, and, moreover, thaterrorswerepredominantly duetooscillations attheextremesofthecurve (e.g.[2],Chapter2).Notsurprisingly,RRanalysesfittingcubic,quarticoreven higher order polynomials thus have frequently encountered “end-of-range” problems, resulting in erratic and implausible estimates of variance compo- nentsandgeneticparameters.Thishasbeenespeciallyprevalentatthehighest ages with least records, for data sets with many more records at earlier than laterages,andforanalyseswheresubstantialproportionsofanimalshadfewer recordsthantheorderofpolynomials fitted. An alternative to high degree polynomials are “piece-wise polynomials”, i.e.,curvesconstructed frompiecesoflowerdegreepolynomials, joinedatse- lectedpoints, so-called knots. Suchcurvesarecommonly referred toasspline curves,andarewidelyused,inparticularinnon-parametricanalysesinvolving smoothingofcurves(e.g.[13,34]).Anapplicationofsplinesfortheanalysisof test dayrecords ofdairy cowshas been presented byWhite etal. [41]. Apar- ticular type of spline curve is the so-called B-spline [2]. B-splines yield the samefitassplinesbasedontruncatedpowerfunctions,buthavebetternumeri- calproperties[9,34];pertinentdetailsarereviewedbelow.Shietal.[36]found changes in CD4 counts of children over time to be inadequately modelled by regressingonquadraticpolynomialsoftime,andproposedB-splinesasamore flexiblealternative. Recently,RiceandWu[33]describedtheuseofB-splines to model random effects curves in mixed model analyses, and demonstrated their efficacy in estimating covariance functions. Similarly, James et al. [17] employedB-splinefunctionstomodelgrowthcurvesforlongitudinaldatawith sparse, irregular measurements. All these applications of RR analyses using B-splines wereatthephenotypic level,butuseofB-splines canreadily beex- tendedtogeneticRRanalyses.TorresandQuaas[39]performedaRRanalysis using B-splines to model test day records of dairy cows, but there have been noapplications togrowthdataoffarmanimals. B-splinerandomregression 475 AnumberofRRanalysesofweightdataforbeefcattlehavebeenreported. Previous,largescaleanalysesofrecordsfrombirthtoapproximatelytwoyears of ageconsidered amodel fitting cubic regressions on age atrecording for all randomeffects,i.e.,directandmaternal,geneticandpermanentenvironmental effects [26,32]. This choice was based on a compromise between computa- tionaldemandsanddetailedness ofthemodel,thefactthatacubicpolynomial is the lowest degree polynomial for which both first and second derivatives, i.e. growth rate and acceleration could vary with age, and limits in the num- bersofrecordsavailableperanimal.Recentanalysesofasetoffielddatafrom AustralianAnguscattle,however,foundthismodeltorepresent aproblematic choice for such data: cubic polynomials were most likely to yield erratically high estimates of variances at the highest ages. In addition, estimates of the firstgeneticeigenfunctions differedmarkedly fromthosefittingaquadratic or quartic polynomial to modelchanges inanimals’ additive genetic effects with age[28]. Afterabriefreviewofsplinefunctions, thispaperpresentsasetofanalyses using RR on basis functions of B-spline to model growth trajectories of beef cattle,frombirthtojustovertwoyearsofage.Dataarefieldrecords,analysed previously fitting regressions on Legendre polynomials of age at recording, andresultsfromthetwosetsofanalyses arecontrasted. 2. REVIEWOFSPLINEBASICS Splinesaregenerallydefinedascurveswhichconsistofindividualsegments which are joined smoothly; the segments are given by polynomials and the points at which they join are referred to as knots. Often the polynomials are cubicandfirstorsecond derivativesareconstrained tobesmooth[5]. The name spline originates from the long thin, flexible strip of wood tra- ditionally used in drawing curves. When fixed at given points (“knots”), the wood takes the shape which minimises the energy required for bending, thus yielding the smoothest shape possible. Similarly, bivariate splines are com- monly called “thin plate” splines, referring to a thin, bendable sheet of metal used to approximate the surface. There is a variety of different spline func- tions, andalarge bodyofliterature. Numerous textbooks covering thesubject areavailable (e.g.[2,4,13,34]). In the simplest case, a spline function consists of linear segments. This is commonly described as a “broken-stick” curve, and is a straightforward ex- tension of the parametric, linear regression. Let y denote some observations i recorded attimest,withi = 1,...,n,andassume wehaveknots T .Infitting i k 476 K.Meyer alinearspline,wethenassumeanon-parametric regression model (cid:1) yi = β0+β1ti+ β1k(ti−Tk)++ei (1) k fory,withβ andβ denotingtheinterceptandlinearregressioncoefficients, i 0 1j respectively, ei the residual error pertaining to yi, and (x)+ = max(0,x) equal toxifxispositiveandequalto0otherwise[34].Thisgivesaslopeofβ forthe 1 firstsegment, i.e.,fort ≤ T ,aslope ofβ +β for thesecond segment with i 1 (cid:2) 1 11 T ≤ t ≤ T , and a slope of β + m β for the segment between bordered 1 i 2 1 k=1 1k by Tm and Tm+1. For splines consisting of polynomial segments of degree p, Eq.(1)isexpandedto (cid:1) y = β +β t +...+β tp+ β (t −T )p +e, (2) i 0 1 i p i pk i k + i k where β are the p−th degree regression coefficient. (x)p are known as trun- p + catedpowerfunctions,i.e.,Eq.(2)givesasplinewithtruncatedpowerbasisof degree p[34]. These are the “grafted” polynomials considered by Fuller [11]. For p = 2, El Faro et al. [10] used these to model lactation curves of dairy cows, and Meyer[25]appliedsuchsplinesinmodellingseasonalchangesinvariancesof maturecowweightsinaRRanalysis. 2.1. Penalisedandsmoothingsplines Fitting spline functions as given above can yield “wiggly” estimates of curves, especially if p is low and there are many knots. Moreover, the choice of knots can have a substantial influence on estimates. These problems can be alleviated by imposing a penalty, which reduces the influence of the re- gression coefficients β .Thisyieldsasmoothercurve,andthepenalty isthus pk referred to as “roughness” penalty [13]. Ruppert et al. [34] suggested to con- strain the sum of squared regression coefficients β in Eq. (2), imposing a (cid:2) pk penalty of λ β2 , added to the criterion to be minimised in least-squares or k pk maximumlikelihoodestimation,tosmooththeresultingcurve.Thesmoothing parameter, λ, governs the degree of smoothing, small values yielding curves closetotheunpenalisedfit,andlargevaluesresultinginestimatesmoresimilar to the corresponding parametric regression. With smoothing, choice of knots islesscrucial thanforλ= 0. Penalised splines are readily accommodated in a mixed model framework. In essence, imposing a roughness penalty yields a system of equations where B-splinerandomregression 477 the penalised regression coefficients are estimated as random effects, i.e., shrinking them towards their mean, whilst the unpenalised coefficients are treated asfixedeffects. Theshrinkage isdetermined byλ,whichis equivalent to the ratio of error variance and variance due to the regression coefficients. This implies that a suitable value of λ can be estimated from the data in a REML analysis. A widely used spline is the cubic smoothing spline, consid- eredindepthbyGreenandSilverman[13]andVerbylaetal.[40],whichgen- erallyfitsoneknotateachobservation.Thisisthesplinefunctionincorporated intheAsRemlsoftwarepackage[12],andmostRRanalysesinquantitativege- netic analysis using splines have fitted this type, though often with a reduced setofknots(e.g.[3,15,41]). Whiteetal.[41]reviewedtheresults ofVerbylaetal.[40], extending them tothecasewhereknotsarechosenindependentlyofthedatalocations,andpre- sented an application to the analysis of test-day records on dairy cows. They showed that a mixed model analysis fitting splines for each animal is equiv- alent to a RR analysis fitting unity, time at recording, and segment-specific functions of time as covariables. However, in contrast to “standard” RR anal- yses, this analysis considers the matrix of covariances among RR coefficients to be highly structured. Whilst the first two RR coefficients (β and β ) are 0 1 assumed correlated and have different variance, all other spline terms are as- sumed to have the same variance and to be independently distributed. This results in a total of only four covariance components among RR coefficients to be estimated. White et al. [41] emphasized that this fitted separate splines with the same degree of smoothing for each animal, but questioned whether the model might be improved by allowing for non-zero covariances between thefirsttwoandthesegment-specific coefficients. 2.2. B-splines Alternativestothetruncatedpowerbaseconsideredaboveexist.Acommon choice are B-spline functions, which yield equivalent fits, but have better nu- mericalproperties [34].Here,the“B”standsforbasis[2].B-splinescomprise a set of overlapping, smooth and non-negative functions, which are unimodal andsumtounity forallvalues oft.Likesplines withatruncated powerbasis, theycanhavedifferentdegree p. B-spline functions can be defined recursively. Basis functions of degree p = 0havevaluesofunityforallpointsinagiveninterval,andzerootherwise. 478 K.Meyer Forthek−thintervalgivenbyknotsTk andTk+1 withTk ≤ Tk+1, (cid:3) B (t) = 1 if Tk ≤ t ≤ Tk+1 (3) k,0 0 otherwise. Higherdegreebasisfunctions, B for p > 0,arethendetermined fromthe k,p valuesofthelowerdegreebasisfunctions, alreadyevaluated, andthewidthof theadjoining intervals betweenknots. Thegeneralrelationship is[2]: Bk,p(t)= Tkt+−p −TkTkBk,p−1(t)+ TkT+kp++p1+−1 −Tkt+1Bk+1,p−1(t). (4) Foreach p,therearealimitednumber ofnon-zero basis functions oflower order, which can beexploited whenevaluating Eq.(4). Efficient strategies are described in the relevant literature (e.g. [2]). Alternatively, for equally spaced knots, B-spline functions can be obtained as the difference between splines withabasisoftruncated powerfunctions; seeEilersandMarx[9]fordetails. Foragivenrangeoft,T toT ,divisionintomintervalsrequiresspecifica- 0 m tionofm−1internalknots.Togetherwiththetwoexternalknots(T andT ), 0 m thisyieldsm+1knotpointsandm+pnon-zerofunctionsB tobeconsidered. k,p Evaluation of Eq. (4), however, requires p additional knots to be specified at each side of the interval, so that, for calculation, we have 2p + m + 1 knots intotal.For“uniform”B-splines withknotsplacedatequalintervals∆,Eilers andMarx[9]suggest toplacethese additional knotsatequal intervals outside therangeoft,i.e.atT −p∆,...,T −∆andT +∆,...,T +p∆.Thiswould 0 0 m m result in all B for a given p to have the same shape, i.e., simply be hori- k,p zontally shifted copies of each other. Alternatively, we could define the two external knots tohave multiplicity p+1, i.e.,set the padditional knots either side to the value of the corresponding external knot. This is common prac- tice for implementation of B-splines in statistical packages, for example in R [16]. For p ≥ 2, this would yield B of different shapes for those involving k,p only internal knots and those spanning external and additional knots, even if allinternalknotsarespacedatequalintervals. Likesplineswithatruncatedpowerbasis,B-splinescanbepenalisedtoob- tainsmoother curves. Thepenalty fortheformer, described above, resulted in a formulation analogous to a ridge regression. Corresponding penalties for a B-spline basis, suggested by Eilers and Marx [8], are difference based, where the degree of fit and order of difference for the penalty can be chosen inde- pendently. Eilers and Marx [8] refer to the resulting functions as “P-splines”, whilst other authors use the term P-splines for any penalised splines, regard- lessofbase.Again,penalisedB-splinesarereadilyestimatedwithinthemixed B-splinerandomregression 479 modelframework[7].Thecovariancematrixamongrandomsplinecoefficient isagainassumed tobehighly structured, withthestructure determined bythe differencematrixusedindefiningpenalties.Adetailedreviewoftheproperties ofB-splinesandP-splinesisgivenbyEilersandMarx[9]. 2.3. Splinesinrandomregression analyses Penalised splines work best with a certain degree of “overfitting”, i.e., a relatively large number ofknots compared to the number of distinct values of thecontinuouscovariabletornumberofobservations [34].Inthatcase,fitting random,subject-specific splinecurves(e.g.[6,14,40]) mayyieldareasonable descriptionofthecovariancestructurealongthetrajectory,inspiteoftherigid structure imposed on the covariance matrix among RR coefficients. However, computationalrequirementsforsuchmodelsinvolving manycoefficientstobe estimatedmaybelarge, inparticular ingeneticevaluation. Hence, random regression analyses of animal breeding data have mainly beencarriedoutfittingpolynomialbasisfunctionswithanarrowbasis,i.e.,few regression coefficients. A logical alternative to penalised splines as described above,thenistofitarandomregressiononalimitednumberofB-splinebasis functions, assuming the covariance matrix among RR coefficients is unstruc- tured, as outlined by Rice and Wu [33]. The same approach has been taken by Shi et al. [36] and James et al. [17], who combined it with reduced rank estimation toavoidoverparameterisation. ThisyieldsthestandardRRmodelcommonlyappliedinquantitativegenetic analyses of longitudinal or “repeated” records. Whilst such models are likely to entail more coefficients than a corresponding RR analysis with polynomial basis, thematricesinthemixedmodelaresparser. ForaB-spline ofdegree p, each row in the design matrix has at most p + 1 non-zero coefficients, i.e., computational requirements are less than for corresponding analyses with the samenumberofcoefficients forapolynomialbasis. Few guidelines on the choice of knots are available. For “grafted polyno- mials”, the obvious recommendation was the place knots where the curve to be modelled was expected to change [11]. Most applications of B-splines to modelflexiblerandomcurves[17,33,36]sofarusedcubicsplineswithequally spaced knots, and cross-validation or information criteria to choose between models involving different numbers of knots. Torres [38] chose a quadratic splinefor10equallyspacedintervalstomodellactationrecordsofdairycows, resulting in 12 basis function and RR coefficients. Eilers and Marx [9] em- phasized that with penalised B-splines, equal placement of knots is all that is 480 K.Meyer required,anddemonstrated goodinterpolation forareaswithfewobservations when the distribution of data over the range of t was highly uneven. How- ever,theyconsideredascenariowithrelativelymanyknots.Splinecurveswith fewer knots tend to be smoother, but give a less good fit to local details than those withmoreknot points. Concerns about thepotential impactofthenum- berandplacementofknotsforthemodelsuggestedbyRiceandWu[33]have been voiced [6]. Considering penalised splines with truncated power basis, Ruppert et al. [34] strongly advocated placement of knots at equal quantiles oft. CovariancematricesamongRRcoefficientscanbeestimatedusingstandard REML methodology or Bayesian analyses; see Meyer and Kirkpatrick [31] for a recent review. For a B-spline basis, the covariance functions defined by thematrices amongRRcoefficients aretensor products ofB-splines [33], i.e., bivariate B-splines (if f(x) is a function of x and g(y) is a function of y, the bivariatefunction p(x,y) = f(x)g(y)iscalledatensorproduct). In addition to covariance functions per se, we are interested in their eigen- functions and corresponding eigenvalues [31]. As emphasised by Kirkpatrick and Heckman [20], these can be estimated directly from the estimated ma- trix of coefficients of the covariance function, i.e., the matrix of covariances among RR coefficient, if we fit orthogonal basis functions, such as Legen- drepolynomials, ascovariables. AsB-splinesfunctions arenotorthogonal, an eigendecomposition of the covariance matrix ofRRcoefficient does not yield the eigenfunctions and -values of the covariance function. Hence, Rice and Wu [33] suggested to extract the required quantities numerically, by evaluat- ing the covariance function on a fine grid over the range of t, and calculating theeigenvalues and-vectorsoftheresulting covariance matrix. 2.4. Illustration Figure1showsthevaluesofcovariables foracontinuous scaledividedinto fourequal intervals. Thefirstcolumnrepresents thebasisfunctions for p = 0. Clearly, fitting such covariables would be equivalent to a multivariate analy- sis,considering observations ineachintervaltobeadifferenttrait.Regressing on B-spline functions of degree p, an observation has a non-zero effect for at most p+1intervals, i.e.,hasalocalratherthanglobalinfluence. Asshownin the second column, a linear B-spline basis (p = 1) results in blending of in- formation across twointervals, withvalues ofthetrajectory attheknots com- pletely determined bytheobservations atthesepoints. Atafixedeffectslevel, linear B-spline have been suggested to blend information for various ages at B-splinerandomregression 481 1 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0 −1 1 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0 −1 1 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0 −1 1 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0 −1 1 1 1 1 0.5 0.5 0.5 0 0 0 0 −1 1 1 1 0.5 0.5 0 0 0 −1 Figure 1. Values of covariables in random regression analyses using B-spline ba- sis functions, dividing the longitudinal scale into four equal intervals (first column: degree0, second column: degree 1, and third column: degree 2), together with cor- respondingvaluesforaquadraticsplinefunctionwithtruncatedpowerfunctionbase (fourthcolumn),andLegendrepolynomials(fifthcolumn,polynomialsofdegree0to 5shown). 482 K.Meyer recording within fixed contemporary groups, instead of the customary subdi- visionintodistinct subclasses [1,37]. Quadratic B-spline functions, given in the third column, have been calcu- latedrepeating theexternalknotsasdescribed above,resulting inthreeidenti- calknotsateachside.Thisgivesweightsofunityfortheexternalknotpoints. If equal-distant additional knots outside the range had been used, the end points ofthecurvewouldhavebeenthecombination oftwoestimated splines curves, with equal weights of 0.5. Similarly, the curve at the mid-point of t, is the straight average of the estimates of the third and fourth spline curve. For the point in the middle of the second interval (fourth tick-mark), weights for the spline functions two, three and four are 0.125, 0.750 and 0.125, re- spectively. For a cubic B-spline (not shown), there would be a seventh basis function spanning allfourintervals. Corresponding functions for a quadratic spline with truncated power basis are given in the fourth column, for t standardised to the interval from 0 to 1. When fitting a regression on Legendre polynomials (fifth column; values shownareunnormalisedLegendrepolynomials),observationsatallpoints(ex- cept for selected points where the polynomial has a value of zero) affect the estimate ofeach regression coefficient, i.e.,allobservations have global influ- ence.Moreover, weightsattheextremesareconsistently high. 3. MATERIALSANDMETHODS 3.1. Data Data comprised records for weights of Australian Angus cattle, measured frombirthto820daysofage.Datawereextractedforherdswithmostrecords per animal available, deleting any records on animals with less than three observations and records in single record subclasses; see Meyer [28] for de- tails. This yielded 84533 records on 20731 animals, which were progeny of 1348siresand8167dams. Figure 2 shows the distribution of records over ages at recording in 10-day intervals, together with corresponding mean weights. Not shown in Figure 2 are 17891 birth weight records. Genetic evaluation of beef cattle in Australia considersweightsattargetagesof200,400and600dayswithpermissibleage ranges of 80 to 300 days, 301 to 500 and 501 to 700 days, respectively. The distribution of records over ages of recording reflects this régime, with very few weights recorded between birth and 80 days of age and limited numbers of weights available after 700 days. On average there were 4.08 records per

Description:
covariance function / growth / beef cattle / random regression / B-splines. 1. INTRODUCTION . Splines are generally defined as curves which consist of individual segments which are splines_knots_penalties.pdf. [10] El Faro L.,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.