Journal of Statistical Software JSS June 2014, Volume 58, Issue 5. http://www.jstatsoft.org/ Smoothing Spline ANOVA Models: R Package gss Chong Gu Purdue University Abstract This document provides a brief introduction to the R package gss for nonparametric statistical modeling in a variety of problem settings including regression, density estima- tion, and hazard estimation. Functional ANOVA (analysis of variance) decompositions arebuiltintomodelsonproductdomains,andmodelingandinferentialtoolsareprovided for tasks such as interval estimates, the“testing”of negligible model terms, the handling of correlated data, etc. The methodological background is outlined, and data analysis is illustrated using real-data examples. Keywords: Bayesian confidence interval, cross-validation, functional ANOVA decomposition, Kullback-Leibler projection, mixed-effect models, penalized likelihood. 1. Introduction Nonparametric function estimation using stochastic data, otherwise known as smoothing, has been studied by generations of statisticians. While scores of methods have proved successful for univariate smoothing, ones practical in multivariate settings number far less. Smoothing spline ANOVA models are a versatile family of smoothing methods that are suitable for both univariate and multivariate problems. In this article, we introduce the package gss for R (R Core Team 2014) that embodies suites of functions implementing smoothing spline ANOVA models in the settings of Gaussian and non-Gaussian regression, density estimation, and hazard estimation. The first public release ofgssdatedbackto1999,whenthetotalnumberof RpackagesonCRAN,theComprehensive R Archive Network, was in dozens. The package was originally designed as a front end to RKPACK (Gu 1989), a collection of Ratfor routines for Gaussian regression. Over the years, new functionalities have been added, numericalefficiencyimproved, theuser interface refined, and gss has now matured into a comprehensive package that can be used in a great variety of problem settings. As active development tapers off, gss is likely to remain stable in the 2 Smoothing Spline ANOVA Models: R Package gss foreseeable future, and it is time to compile an introductory document for the current version of the package. A treatise on the theory and practice of smoothing spline ANOVA models can be found in a recently updated monograph by the author (Gu 2013), which contains detailed discussions of gss with extensive code illustrations. This document is intended as a software-oriented executive summary of the monograph, and for the exposition of the key methodological ingre- dients, technical discussions are unavoidable. Attempts have been made to keep the technical discussions lucid if not rigorous, yet readers unfamiliar with the methodology may still find the contents more mathematical than they might prefer. The rest of the article is organized as follows. In Section 2, the methodological background is outlined, and in Section 3, the basic functionality of the gss suites are introduced with limited illustrations. Model configurations via explicit and implicit means are described in Section 4, and extra features such as semiparametric models and mixed-effect models are discussed in Section 5. After a brief discussion of the numerical engine in Section 6, three real-dataexamplesarepresentedinSection7todemonstratedataanalysisusinggssfacilities. Section 8 notes on some further software utilities without detailed elaboration, and Section 9 concludes the article with a brief summary. Chunks of executable R code are collected in the supplementary material. The output of the code, which includes a few figures, are however not reproduced in this document. 2. Methodological background Observing Y ∼ N(η(x ),σ2), i = 1,...,n on x ∈ [0,1], one may estimate η(x) via the i i i minimization of a penalized least squares functional, 1 (cid:88)n (cid:0)Y −η(x )(cid:1)2+λ(cid:90) 1(cid:0)η(cid:48)(cid:48)(x)(cid:1)2dx. (1) i i n 0 i=1 The minimization takes place in a function space (cid:8)f : (cid:82)1(cid:0)f(cid:48)(cid:48)(x)(cid:1)2dx < ∞(cid:9) of infinite dimen- 0 sion, and the solution is known as a smoothing spline. The minimizer of (1) is a piecewise cubic polynomial, so it is called a cubic smoothing spline. The method in (1) has been generalized from univariate Gaussian regression to a variety of estimationproblemsongenericdomains,whichincludeGaussianandnon-Gaussianregression, density estimation, and hazard estimation. Key ingredients of the methodology are outlined below. Smoothing splines are not to be confused with penalized regression splines, with which esi- mation takes place in a finite-dimensional function space pre-configured using basis functions of local support. Penalized regression splines are numerically more vulnerable to the curse of dimensionality, as the number of local-support basis functions explodes when the dimension of the domain goes up. 2.1. Penalized likelihood estimation Thepenalizedleastsquaresfunctionalin(1)isaspecialcaseofthegeneralpenalizedlikelihood functional, L(η)+λJ(η), (2) Journal of Statistical Software 3 where L(η) is usually taken as the minus log likelihood of the observed data and J(η) is a (cid:8) (cid:9) quadratic“roughness”functional; theminimizationof(2)takesplaceinH = f : J(f) < ∞ . With different configurations of L(η) and J(η), the minimizer of (2) may or may not be a piecewise polynomial, but it is called a smoothing spline for being the solution to a penalized minimization problem. Function evaluation η(x) appears in L(η), so the evaluation functional [x]f = f(x) should be (cid:8) (cid:9) continuous in H = f : J(f) < ∞ . A space in which the evaluation functional is continuous is known as a reproducing kernel Hilbert space possessing a reproducing kernel R(x,y), a (cid:0) (cid:1) nonnegative-definitefunctionsatisfyingthereproducingproperty, f,R(x,·) = f(x),∀f ∈ H, where (·,·) is the inner product in H. J(f) is usually a square semi norm in H, and a square full norm in an orthogonal complement (cid:8) (cid:9) of the null space N = f : J(f) = 0 of J(f), H = H (cid:9) N . The reproducing kernel J J J (cid:0) (cid:1) R (x,y) of H satisfies J R (x,·),f = f(x), ∀f ∈ H . When L(η) depends on η only J J J J through η(x ), i = 1,...,n, as is the case in (1), the minimizer of (2) resides in the space i (cid:8) (cid:9) N ⊕span R (x ,·),i = 1,...,n , of finite dimension (Kimeldorf and Wahba 1971). J J i For J(f) = (cid:82)1(cid:0)f(cid:48)(cid:48)(x)(cid:1)2dx as in (1), N = span{1,x}, and an orthogonal complement of N 0 J J is H = (cid:8)f : (cid:82)1f(x)dx=(cid:82)1f(cid:48)(x)dx=0,(cid:82)1(cid:0)f(cid:48)(cid:48)(x)(cid:1)2dx < ∞(cid:9). J 0 0 0 2.2. Functional ANOVA decomposition On X = X ×X , one has a functional ANOVA decomposition of η(x) = η(x ,x ), 1 2 (cid:104)1(cid:105) (cid:104)2(cid:105) η(x) = (I −A +A )(I −A +A )η 1 1 2 2 = A A η+(I −A )A η+A (I −A )η+(I −A )(I −A )η 1 2 1 2 1 2 1 2 = η +η (x )+η (x )+η (x ,x ), (3) ∅ 1 (cid:104)1(cid:105) 2 (cid:104)2(cid:105) 12 (cid:104)1(cid:105) (cid:104)2(cid:105) where I is the identity operator, A , A are averaging operators acting respectively on argu- 1 2 ments x , x that satisfy A1 = 1; subscripts in brackets denote coordinates of a point on a (cid:104)1(cid:105) (cid:104)2(cid:105) multi-dimensional domain while ordinary subscripts are reserved for multiple points. Exam- ples of averaging operators include Af = (cid:82)bf(x)dx/(b−a) and Af = (cid:80)m f(x )/m. The a i=1 i two-way ANOVA decomposition of (3) also implies a one-way ANOVA decomposition on X with A = A A , η(x) = (I−A+A)η = Aη+(I−A)η = η +η (x), where η = η +η +η 1 2 ∅ x x 1 2 12 for η , η , and η as in (3). Similar constructions in more than two dimensions can be done 1 2 12 directly or recursively. For X ×X discrete, η(x ,x ) is a matrix of treatment means usually denoted by µ in 1 2 (cid:104)1(cid:105) (cid:104)2(cid:105) ij standard ANOVA model notation, with (3) in the form of µ = µ +(µ −µ )+(µ −µ )+(µ −µ −µ +µ ) ij ·· i· ·· ·j ·· ij i· ·j ii = µ+α +β +(αβ) , i j ij (cid:80) (cid:80) (cid:80) (cid:80) (cid:80) where µ = c µ for c = 1, µ = d µ for d = 1, and µ = c d µ . i· j j ij j j ·j i i ij i i ·· i,j j i ij ANOVA decompositions can be built into (2) via the configuration of J(η) in the so-called tensorproductreproducingkernelHilbertspaces,andtheresultingestimatesarecalledtensor (cid:80) productsplines. Schematically, onemaywritetheANOVAdecompositionasη = η , with β β J(η) of the form J(η) = (cid:80) θ−1J (η ), where J (η ) quantify the roughness of η and θ are β β β β β β β β tuning parameters adjusting their relative weights in J(η). 4 Smoothing Spline ANOVA Models: R Package gss SelectivetermeliminationinANOVAdecompositionshelpstocombatthecurseofdimension- alityinestimation; italsofacilitatestheinterpretationofthefittedmodels. Modelscontaining only main effects are known as additive models (Hastie and Tibshirani 1990). 2.3. Efficient approximation (cid:8) (cid:9) As noted earlier, the minimizer of (2) often resides in N ⊕ span R (x ,·),i = 1,...,n , J J i permitting numerical computation, but the computation is generally of the order O(n3). When L(η) depends on η via more than a finite number of function evaluations, however, the exact solution is usually intractable numerically. Under mild conditions, as n → ∞ and λ → 0, the minimizer ηˆ of (2) in H converges to the true η at a rate U(ηˆ−η ) = O (n−1λ−1/r +λp), where U(ηˆ−η ) is a setting-specific 0 0 p 0 quadratic loss, r > 1 codes the“smoothness”of functions in H implied by J(f), and p ∈ [1,2] depending on how smooth η is; in the setting of (1), U(g) = (cid:82)1g2(x)f(x)dx for f(x) the 0 0 (cid:82)1(cid:0) (4) (cid:1)2 limiting density of {x }, r = 4, and p = 2 when η (x) dx < ∞. Now consider a space i 0 0 H∗ = N ⊕span(cid:8)R (z ,·),j = 1,...,q(cid:9), (4) J J j where {z } is a random subset of {x }. The minimizer ηˆ∗ of (2) in H∗ converges to η at j i 0 the same rate as ηˆ when qλ2/r → ∞, hence is an efficient approximation of ηˆ. The optimal convergence rate O (n−pr/(pr+1)) is achieved at λ (cid:16) n−r/(pr+1), so one needs q (cid:16) n2/(pr+1)+(cid:15), p ∀(cid:15) > 0. One may calculate ηˆ∗ for practical estimation, as is done in most of the gss suites, and the computation is of the order O(nq2). Unlike penalized regression splines, the space H∗ is data-adaptive rather than pre-configured, with resources allocated only where needed. This does not make the estimation problem any easier, but does make the numerical tasks much more manageable on high-dimensional domains. Unless q = n, the minimizer of (1) in H∗ is no longer a piecewise cubic polynomial, but we will abuse the terminology and still call it a cubic spline. 2.4. Cross-validation For the method in (2) to work in practice, a critical issue is the proper selection of smoothing parameters, the λ in front of J(η) and the θ’s hidden in J(η) for tensor product splines. For Gaussian regression as in (1), write Y = (cid:0)Y ,...,Y (cid:1)(cid:62) and Yˆ = (cid:0)η (x ),...,η (x )(cid:1)(cid:62), 1 n λ 1 λ n where the dependence of ηˆ = η on λ is spelled out. One has Yˆ = A(λ)Y, where A(λ) is λ the so-called smoothing matrix and the argument λ also represents the θ’s if present. One may select the smoothing parameters by minimizing the generalized cross-validation score of Craven and Wahba (1979), n−1Y(cid:62)(cid:0)I −A(λ)(cid:1)2Y V(λ) = , (5) (cid:8)n−1tr(cid:0)I −αA(λ)(cid:1)(cid:9)2 for α = 1, which is designed to track the mean square error (cid:80)n (cid:0)η (x )−η (x )(cid:1)2. Cross- i=1 λ i 0 i validation occasionally yields severe undersmoothing, and a fudge factor α = 1.4 proves to be effective in curbing undersmoothings on “bad” samples while maintaining the generally favorable performances of cross-validation on“good”ones. Journal of Statistical Software 5 Inmostothersettings, onemayuseasetting-specificKullback-LeiblerdiscrepancyKL(η ,η ) 0 λ as the performance measure for the minimizer η of (2), and schematically, KL(η ,η ) = λ 0 λ A(η )+B(η ,η )+C(η ),whereA(η )canbecomputed,C(η )canbedropped,andB(η ,η ) λ 0 λ 0 λ 0 0 λ can be estimated by some version of cross-validation. The resulting cross-validation scores are typically of the form V(λ) = A(η )+Bˆ(η ,η ) = L(η )+αP(η ) (6) λ 0 λ λ λ for α = 1, where the likelihood L(η ) decreases with λ while P(η ) moves in the opposite λ λ direction; see, e.g., Gu and Wang (2003). A fudge factor α > 1 may improve performance in some settings but not in some others, though a larger α generally yields a smoother estimate. 2.5. Bayesian confidence intervals The square norm J(f) in H and the reproducing kernel R (·,·) are“inverses”of each other, J J (cid:2) (cid:3) andλJ(η)in(1)actsliketheminusloglikelihoodofaGaussianprocesspriorwithE η(x) = 0 (cid:2) (cid:3) and E η(x)η(y) ∝ R (x,y). The minimizer ηˆ yields the posterior mean under such a prior J andonemayalsocalculatetheposteriorstandarddeviations(x),yieldingBayesianconfidence intervals, ηˆ(x)±1.96s(x), for η(x) (Wahba 1983). Doing the analysis on individual terms in an ANOVA decomposition, one gets the component-wise intervals (Gu and Wahba 1993). When L(η) is non-quadratic such as with non-Gaussian regression, one may consider its quadratic approximation at ηˆ, Q (η); the approximate posterior minus log likelihood Q (η)+ ηˆ ηˆ λJ(η) is Gaussian with mean ηˆ, based on which approximate Bayesian confidence intervals can be constructed. 2.6. Kullback-Leibler projection ANOVA structures may be enforced via selective term elimination in estimation, and may be inferred from fitted models containing possibly redundant terms. The latter task resembles hypothesis testing, with H : η ∈ H versus H : η ∈ H ⊕ H ; for an example, consider 0 0 a 0 1 H = {η : η = η +η +η } and H = {η : η = η }. 0 ∅ 1 2 1 12 Lacking sampling distributions in settings with infinite-dimensional nulls, the classical testing approachisoflittlehelpinthissituation. Instead, anapproachbasedontheKullback-Leibler geometry was developed in Gu (2004): one calculates an estimate ηˆ ∈ H ⊕H , obtains its 0 1 Kullback-Leibler projection η˜ ∈ H by minimizing a setting-specific KL(ηˆ,η) over η ∈ H , 0 0 then inspects an “entropy decomposition,” KL(ηˆ,η ) = KL(ηˆ,η˜) + KL(η˜,η ), an exact or c c approximate identity, where η is a degenerate fit such as a constant regression function or a c uniform density. When KL(ηˆ,η˜)/KL(ηˆ,η ) is small, one loses little by cutting out H . c 1 3. Basic functionality: Estimation and inference Listed in Table 1 are gss suites with brief descriptions. To be presented in this section are their basic user interfaces with core arguments and some defaults. The syntax of the gss suites has been designed to resemble that of the lm and glm suites in base R. Each suite has a fitting function and utility functions for the evaluation of the fits, and most also have a method project implementing the Kullback-Leibler projection or a variant thereof. 6 Smoothing Spline ANOVA Models: R Package gss Suite Purpose/Feature ssanova Gaussian regression ssanova9 Gaussian regression, with correlated data ssanova0 Gaussian regression, legacy routine gssanova Non-Gaussian regression, direct cross-validation gssanova1 Non-Gaussian regression, indirect cross-validation gssanova0 Non-Gaussian regression, legacy routine with indirect CV ssden Density estimation ssden1 Density estimation, in high dimensions sscden Conditional density estimation sscden1 Conditional density estimation, for large samples ssllrm Conditional density estimation, with discrete y in f(y|x) sshzd Hazard estimation, with or without covariate sshzd1 Hazard estimation, with continuous covariate sscox Estimation of relative risk in proportional hazard models Table 1: gss suites with brief descriptions. 3.1. Regression suites For Gaussian regression, one may use ssanova or ssanova0. For non-Gaussian regression, one may use gssanova, gssanova1, or gssanova0. For Gaussian regression with correlated data, one may use ssanova9. ssanova The following sequence loads the NOx data included in gss and calculates the minimizer of (1) in H∗ of (4), with λ minimizing V(λ) in (5) for α = 1.4. R> data("nox", package = "gss") R> fit <- ssanova(log(nox) ~ equi, data = nox) The default α = 1.4 is based on simulations of limited scales (Kim and Gu 2004), and may be overridden via an argument alpha. One may also choose to select λ using the GML score of Wahba (1985) through method = "m". To evaluate the fit on a grid, one may try R> xx <- sort(nox$equi) R> est <- predict(fit, data.frame(equi = xx), se = TRUE) where est is a list with ηˆ(x) in est$fit and s(x) in est$se. The fitted curve can then be plotted along with the Bayesian confidence intervals. R> plot(nox$equi, log(nox$nox), col = 3) R> lines(xx, est$fit) R> lines(xx, est$fit + 1.96 * est$se, col = 5) R> lines(xx, est$fit - 1.96 * est$se, col = 5) The default q in (4) is set at 10n2/9, again based on simulations (Kim and Gu 2004), and can be overridden via an argument nbasis. Due to the random selection of {z } in (4), repeated j Journal of Statistical Software 7 calls to ssanova generally return slightly different fits unless q = n, but one may ensure the same selection of {z } by using an argument seed, or one may pass the same {z } into j j subsequent calls through id.basis = fit$id.basis. To fit a tensor product spline on X = X ×X with all terms included, use 1 2 R> fit <- ssanova(y ~ x1 * x2) and to evaluate η (x)+η (x) for x in xx1 and x in xx2, say, with standard errors, try 1 12 (cid:104)1(cid:105) (cid:104)2(cid:105) R> predict(fit, data.frame(x1 = xx1, x2 = xx2), se = TRUE, + inc = c("x1", "x1:x2")) To calculate the Kullback-Leibler projection into the space of additive models, use R> project(fit, inc = c("x1", "x2")) which returns a list object with the element ratio containing the ratio KL(ηˆ,η˜)/KL(ηˆ,η ). c To fit an additive model, use R> fit <- ssanova(y ~ x1 + x2) gssanova and gssanova1 Non-Gaussian regression models can be fitted using gssanova or gssanova1, which calcu- late the minimizer of (2) in H∗ of (4); gssanova uses direct cross-validation as in (6) for smoothing parameter selection, while gssanova1 employs the indirect cross-validation of Gu (1992a). Discussionsconcerningdirectandindirectcross-validationcanbefoundinGu(2013, Section 5.2). Practical performances of gssanova and gssanova1 for different distribution families were compared in Gu (2013, Section 5.4, 8.6) via limited simulations. The syntax is similar to that of glm in base R, but the argument family only takes character strings from the list "binomial", "poisson", "Gamma", "inverse.gaussian", "nbinomial", "weibull", "lognorm", and "loglogis"; only one link is used for each family, one that is free of constraint. The likelihood term L(η) in (2) is of the form, n 1 (cid:88) (cid:0) (cid:1) L(η) = l η(x );Y . i i i n i=1 The link and the minus log likelihood l(η;y) for exponential families binomial, Poisson, Gamma, and inverse Gaussian are listed in Table 2, along with those of the negative bi- nomial family. For the binomial family, the response may be two columns of (y ,m −y ), or i i i a column of y /m with m entered via an optional argument weights. For the negative bi- i i i nomial family, the response is either two columns of (y ,ν ), or a column of y with a common i i i ν to be estimated along with η(x). The following sequence generates some synthetic binomial data and calculates a cubic spline logistic fit. 8 Smoothing Spline ANOVA Models: R Package gss Family Response density f(y) Link l(η;y) = −logf(y) Binomial (cid:0)m(cid:1)py(1−p)m−y η = log(cid:8)p/(1−p)(cid:9) −yη+mlog(1+eη) y Poisson λye−λ/y! η = logλ −yη+eη Gamma 1 yα−1e−y/β η = logµ = log(αβ) ye−η +η βαΓ(α) Inv. Gauss. √ 1 y−3/2e−(y−µ)2/2σ2µ2y η = logµ ye−2η/2−e−η 2πσ2 Neg. Binom. Γ(ν+y)pν(1−p)y η = log(cid:8)p/(1−p)(cid:9) (ν +y)log(1+eη)−νη y!Γ(ν) Table 2: Likelihood for exponential families in gssanova. R> x <- (0:100)/100 R> eta <- 400 * x^5 * (1 - x)^3 - 1 R> p <- plogis(eta) R> y <- rbinom(x, 3, p) R> fit <- gssanova(cbind(y, 3 - y) ~ x, family = "binomial") gssanova fits can also be evaluated using predict, with ηˆ(x) and s(x) on the link scale. R> est <- predict(fit, data.frame(x = x), se = TRUE) The fit can then be plotted along with Bayesian confidence intervals on the probability scale. R> plot(x, plogis(est$fit), type = "l", ylim = c(0, 1)) R> points(x, y/3, col = 3) R> lines(x, plogis(est$fit + 1.96 * est$se), col = 5) R> lines(x, plogis(est$fit - 1.96 * est$se), col = 5) The method project also works with gssanova fits. The Weibull, log normal, and log logistic families are the same accelerated life models as implemented in the survreg suite of the survival package (Therneau and Grambsch 2000; Therneau 2014). The response is of the form Y = (X,δ,Z), where X = min(T,C) is the follow-up time for T the lifetime of an item and C the right-censoring time, δ = I is the [T≤C] censoring indicator, and Z < X is a possible left-truncation time at which the item enters surveillance. Of interest is the estimation of the hazard function λ(t) = −dlogS(t)/dt, where S(t) = P(T > t) is the survival function. For the accelerated life models, logT follows a location-scale family distribution, with 1 f(z) logt−µ λ(t;µ,σ) = , for z = , σt1−F(z) σ where f(z) is a probability density on (−∞,∞) and F(z) is the associated distribution func- tion; the Weibull family has f(z) = we−w for w = ez, the log normal family has f(z) = φ(z), and the log logistic family has f(z) = w/(1+w)2 for w = ez. One is to estimate the location parameter η = µ as a function of the covariate u, and the minus log likelihood is given by (cid:90) X l(η;Y) = δlogλ(X;µ,σ)− λ(t;µ,σ)dt, Z Journal of Statistical Software 9 where Z = 0 if there is no left-truncation. Instead of a Surv(...) construct for survreg, the response for gssanova should be a matrix of two or three columns, with X in the first column, δ in the second, and Z in an optional i i i third. The scale parameter σ is assumed to be common and is estimated along with η(u), in the form of ν = σ−1. The following sequence loads the Stanford heart transplant data included in gss and fits a Weibull regression, with the age at transplant as the covariate. R> data("stan", package = "gss") R> fit <- gssanova(cbind(time + 0.01, status) ~ age, data = stan, + family = "weibull") One has a proportional hazard model in a Weibull fit, ν λ(t,u) = ez = νtν−1e−νη(u) = λ (t)λ (u) 0 1 t where the relative risk λ (u) is defined up to a multiplicative constant. Cutting out η from 1 ∅ η = η∅+ηu, one may evaluate and plot the relative risk e−νηu(u). R> est <- predict(fit, stan, se = TRUE, inc = "age") R> ix <- order(stan$age) R> age0 <- stan$age[ix] R> plot(age0, exp(-fit$nu * est$fit[ix]), type = "l", ylim = c(0,5)) R> lines(age0, exp(-fit$nu * (est$fit[ix] - 1.96 * est$se[ix])), col = 5) R> lines(age0, exp(-fit$nu * (est$fit[ix] + 1.96 * est$se[ix])), col = 5) ssanova0 and gssanova0 The original ssanova and gssanova referred to in Gu (2002) have been renamed as ssanova0 andgssanova0,whichcalculatetheminimizerof(2)inHusingthelegacyRKPACKroutines. The numerical treatments in RKPACK take advantage of a special structure that only holds for q = n, and the O(n3) algorithms in RKPACK often execute faster than the O(nq2) algorithms powering ssanova, gssanova, and gssanova1. Repeated calls to ssanova0 or gssanova0 yield identical results, as q = n in this setting. Unfortunately, important modeling tools such as the Kullback-Leibler projection and the mixed-effect models (see the section on optional arguments) are numerically incompatible with the algorithms implemented in RKPACK, and an α > 1 in (5) is difficult to insert in the RKPACK routines. With limited capabilities compared to the alternatives, these legacy suites are largely obsolete. ssanova9 ForGaussianregressionwithcorrelateddata,Y ∼ N(η(x),σ2W−1),whereY = (Y ,...,Y )(cid:62) 1 n (cid:0) (cid:1)(cid:62) and η(x) = η(x ),...,η(x ) , one may set 1 n (cid:0) (cid:1)(cid:62) (cid:0) (cid:1) L(η) = Y−η(x) W Y−η(x) 10 Smoothing Spline ANOVA Models: R Package gss in (2), where W may depend on a few correlation parameters. One may use ssanova9 in this setting, with the smoothing parameters selected along with correlation parameters by a cross-validation score derived in Han and Gu (2008). An argument cov specifies W in ssanova9 calls. For W−1 known, use cov = list("known", w), where w contains W−1. For longitudinal observations, use cov = list("long", id), where id is a factor; when data at the same id levels are grouped together in Y, W−1 = I+ γZZ(cid:62),whereZZ(cid:62) isblock-diagonalwithblocksoftheform11(cid:62). Forseriallycorrelateddata, one may set cov = list("arma", c(p, q)) to use an ARMA(p,q) model for (cid:15) = Y−η(x). Moregenerally,onemayenterW−1viacov = list(fun = fun, env = env, init = init), with W−1 to be evaluated by cov$fun(gamma, cov$env); env contains constants and init provides initial values for the correlation parameters γ, which should be on scales free of constraint. 3.2. Density estimation suites To estimate a probability density f(x) on X, one may use ssden or ssden1. To estimate a conditional density f(y|x) on X ×Y, use sscden or sscden1. For Y discrete, one may use ssllrm to fit f(y|x), which is regression with cross-classified responses. ssden A probability density f(x) is positive and integrates to one. Consider a logistic density transform f(x) = eη(x)/(cid:82) eη(x)dx, which can be made one-to-one by cutting out η in a X ∅ one-way ANOVA decomposition η = η +η . One may set ∅ x n (cid:90) 1 (cid:88) L(η) = − η(X )+log eη(x)dx i n X i=1 in (2), where X are independent samples from f(x). i The following sequence draws samples from a normal mixture and fits a cubic spline to the log density. R> u <- runif(100) R> x <- rnorm(100) * 0.1 R> x <- ifelse(u > 1/3, x + 0.7, x + 0.3) R> fit <- ssden(~ x) The domain X does contribute to estimation through (cid:82) eη(x)dx, and it is a good idea to X specify X explicitly via domain = data.frame(x = c(a, b)); if unspecified, as is the case in the call above, the domain is set internally, extending the data range by 5% on both ends. To use the fitted univariate density, one has the usual d-, p-, and q- functions, but no r-. R> domain <- fit$domain$x R> xx <- seq(domain[1], domain[2], length = 101) R> plot(xx, dssden(fit, xx), type = "l") R> qq <- qssden(fit, (0:10)/10) R> pssden(fit, qq)
Description: