ebook img

NASA Technical Reports Server (NTRS) 20140011280: Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation PDF

0.13 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NASA Technical Reports Server (NTRS) 20140011280: Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation

Ensemble Data Assimilation without Ensembles: Methodology and Application to Ocean Data Assimilation Christian L. Keppenne, Michele M. Rienecker, Robin M. Kovach, and Guillaume Vernieres Global Modeling and Assimilation Office Code 610.1 NASA Goddard Space Flight Center Greenbelt, MD 20771 Abstract Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time. Here, SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE model developed at Los Alamos National Laboratory. The results are validated against un-assimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used to produce the latest ocean analysis by the Global Modeling and Assimilation Office (GMAO). Further, when only temperature data are assimilated, FAST is better able to improve the model salinity field than SAFE or EnOI. 1. Introduction Following a seminal paper by Evensen (1994), the ensemble Kalman filter (EnKF) has become a widely used research tool and research topic. At the time of writing, more than 400 papers have been published on the EnKF and many more on closely related ensemble data assimilation methods. While they differ in terms of the approach used to update or resample the ensemble of model states, all ensemble data assimilation methods require an ad hoc number of concurrent model integrations to estimate the distribution of background errors. This approach (also known as sequential Monte Carlo) is essentially an O(n) procedure, where n is the size of the model state vector. In contrast, the original Kalman (1960) filter algorithm propagates its background-error- covariance estimates by means of matrix multiplications of O(n3). Hence, ensemble methods are considered economical from a numerical standpoint, even though their cost can be seen as overwhelming in comparison to conventional data assimilation methods. Because of this, nearly all implementations of ensemble assimilation methods must compromise between ensemble size and model resolution. 1 Because the analysis and error estimates depend on the state of each ensemble member, ensemble methods are considered flow-adaptive. Another attractive property of these methods is that they provide estimates of the cross-field covariance between observed and unobserved model fields that are needed to update unobserved system variables. For example, ocean sub-surface fields can be updated even if only surface observations are available. In what follows we will use the term “multivariate” to refer to the ability to update fields of prognostic model variables other than the observed variable using estimates of the cross-field error covariances. Conversely, algorithms that update only a single observed model field will be referred to as “univariate”. The purpose of this paper is to introduce two data assimilation algorithms that, like the EnKF and other ensemble methods, are flow-adaptive and multivariate but, unlike ensemble methods, rely on only a single model trajectory to estimate all the necessary error-covariance information. As such, these methods obviate the requirement to compromise between ensemble size and model resolution faced by all EnKF implementations for higher resolution numerical models. The first algorithm (Space Adaptive Forecast-error Estimation: SAFE) is based on the concept of sampling an ensemble in space. It estimates error covariances from the joint spatial distribution of model variables in a single background state. The second algorithm (Flow Adaptive error Statistics from a Time series: FAST) estimates covariances from the distribution of an ensemble of high-pass filtered lagged instances of the model state vector sampled along the same trajectory. Because they do not require multiple integrations of the numerical model, SAFE and FAST are overwhelmingly faster and considerably less resource hungry than the EnKF and other ensemble assimilation schemes. The underlying assumption on which SAFE and FAST are based is that errors in the forecasts used in assimilation are primarily phase errors in space and/or time. For the ocean this assumption makes sense as the dominant source of error can be related to errors in surface forcing, i.e., the timing, intensity, or location of particular atmospheric synoptic events. Thus, the forecast (or background) errors can be related to the timing or intensity in the propagation or advection of oceanic anomalies. The algorithms are outlined in Section 2 together with the Goddard Earth Observing System (GEOS) modeling and data assimilation system developed at the Global Modeling and Assimilation Office (GMAO). The methodology is compared to conventional assimilation techniques in Section 3 where SAFE and FAST are applied to the assimilation of Argo temperature (T) profiles into the ocean component of the coupled modeling system. Unassimilated Argo salinity (S) observations are used to validate the assimilation. Conclusions follow in Section 4. 2. Assimilation Algorithms 2.1 Preamble Most sequential data assimilation algorithms are inspired by or derived from the Kalman filter (e.g., Analytical Sciences Corporation Technical Staff 1974) and involve the following steps, xf (cid:4) M(xa , f ), (1a) k k(cid:2)1 k(cid:2)1 y (cid:4) H (xt)(cid:3)ε , E(ε εT)(cid:4) R , (1b) k k k k k k k K (cid:4) PfHT[H PfHT (cid:3)R ](cid:2)1, (1c) k k k k k k k xa (cid:4) xf (cid:3)K [y (cid:2)H (xf)], (1d) k k k k k k where the subscript k refers to the kth of a sequence of assimilations, xf and xa denote the model forecast and analyzed states, M is the model operator, and f represents the forcing between times t and t . The k-1 k-1 k 2 observations, y , assimilated at time t are related to the true system state, xt, at time t by equation (1b) where k k k H is the observation operator, E denotes the expectation operator and (cid:5)(cid:5), with covariance matrix R , is the k k k observation error vector. The Kalman gain matrix, K , dictates how the observations and model forecast are k weighted in the analysis computation (equation 1d). It depends on H , R and the background error covariance k k matrix, P (cid:4)E((xt (cid:2)xf)(xt (cid:2)xf)T). (2) k k k Of course, since xt is unknown, P cannot be computed directly from equation (2) and must be estimated, either k explicitly or implicitly, by some other means. In fact, the procedure used to estimate P can be used to classify k data assimilation methods. In the EnKF and most ensemble methods inspired from it, P is estimated from the statistical distribution of an k ensemble of model forecasts, xf (cid:4) M(xa , f ), i(cid:4)1,(cid:2),n, (3) i,k i,k(cid:2)1 i,k(cid:2)1 started from an ensemble of n analyzed model states at time t . To filter spurious long-range covariances k-1 resulting from the finite ensemble size, nearly all ensemble data assimilation implementations follow Houtekamer and Mitchell (2001) and (dropping the k subscript) decompose P as P (cid:4) P (cid:6)C, (4) e where P represents the background covariances estimated from the ensemble of model states, C is a compactly e supported correlation matrix and (cid:6) denotes the Schur (i.e., element by element) product of two matrices. In a class of methods known alternatively as ensemble optimal interpolation (EnOI: e.g., Borovikov et al. 2005; Oke et al. 2005, 2010; Wan et al. 2010; Vernieres et al. 2012) or asymptotic ensemble filters, the time dependency is neglected and P is estimated from the statistics of one or more model run histories or from combinations of model histories. In many cases, EnOI methods are competitive with the EnKF because they make up for the performance degradation due to neglecting the forecast-error evolution by estimating error statistics from a much larger ensemble. Optimal interpolation (OI: Eliassen 1954) refers to an older class of data assimilation methods in which background-error covariances are modeled with Gaussian functions or other analytically or empirically derived functions. Cross-field covariances are generally neglected in these methods and only the model field corresponding to the observed variable is updated. 2.2 Space Adaptive Forecast error Estimation (SAFE) The SAFE algorithm attempts to combine the simplicity and cost effectiveness of OI with the large sample size of EnOI and the flow dependency of the EnKF. It estimates background error covariances by sampling each model field in the neighborhood of every grid point and using the resulting sample as if it came from an ensemble of model states. Because the size of the sampling neighborhood determines the covariance amplitudes, rescaling is necessary. Note however that an error-covariance rescaling step is also implicitly present in most ensemble assimilation techniques where the background-error covariance amplitude is determined by parameters of a covariance inflation procedure. 3 While the concept of sampling errors from an ensemble in space is heuristically convenient, it is impractical in geophysical fluid models with complicated boundaries. Instead, the procedure is approximated with the following algorithm. For simplicity of notation, we assume that the model state can be split according to (cid:12)Pvv Pvw(cid:9) x (cid:4)[v,w], P (cid:4)(cid:10) (cid:7), (5) (cid:11)Pwv Pww(cid:8) where v is an observed model field and w is unobserved. The generalization to more than two model fields is obvious. We also assume that all the data assimilated correspond to the same model field. In view of the above, the model update is split according to va =vf +PPvvHHHHT(cid:11)(cid:11)(cid:11)(cid:12)(cid:11)HHHHHHHPPvvHHHHT +++RRR(cid:8)(cid:8)(cid:8)(cid:9)(cid:8)(cid:1)1(cid:12)(cid:11)(cid:11)(cid:11)(cid:11)yy(cid:1)(cid:1)HHHH((vvvvf)))(cid:8)(cid:9)(cid:8). (6a) (cid:1)v wa =wf +PwvHT(cid:12)(cid:11)HPvvHT +R(cid:9)(cid:8)(cid:1)1(cid:12)(cid:11)y(cid:1)H(vf)(cid:9)(cid:8), (6b) =wf +Pwv(Pvv)(cid:1)1(cid:1)v. (6c) The application of equation (6c) is further simplified by assuming that the w background error in grid cell (i, j, k) is predominantly related to the v error in grid cell (i, j, k) and negligibly related to the v errors in other grid cells, thus neglecting the off diagonal elements of Pvv in (6c). Instead the unobserved model field is updated according to Pwv wa =wf + ijk (cid:1)v , i=1, ,I, j=1, ,J, k=1, ,K, (6d) ijk ijk Pvv ijk ijk where I, J and K denote the number of grid cells along the x, y, and z space dimensions, respectively. The first step is to estimate the background-error variance of the observed field with (cid:13) (cid:14) σ2 (cid:4)diag(Pvv)(cid:4)(cid:15)(v(cid:2)(cid:15)(v) 2), (7) vv where (cid:15) is a local 3D averaging operator. Several averaging operators were tested. For our implementation, repetitive application of a gridpoint (spatial) Laplacian smoother was found to be effective. The variance estimate is rescaled such that diag(HPvvHt) (cid:4)(cid:16)2 diag(R), (8) where the double vertical bar stands for an L2 vector norm. The parameter (cid:16) is prescribed. It is a scalar representing the global mean (asymptotic) target ratio of background error variances to data error variances and its role is similar to that of multiplicative covariance inflation parameters used in many EnKF implementations. Note that this formulation assumes a steady state regime where the average global mean error variance increase between successive assimilations equals the mean error variance decrease resulting from each assimilation step. 4 After estimating the background error variances, the update of equation (6a) is applied. This step corresponds to an OI analysis with the model background error variances calculated with equation (7). Let (cid:17) represent the 12 covariance of the v background errors at locations 1 and 2. It is estimated with (cid:17) (cid:4)(cid:19) (cid:19) (cid:18) , (9a) 12 v1 v2 12 (cid:18) (cid:4)c (max( 1 v (cid:2)v , 1 x (cid:2)x (cid:3) 1 y (cid:2) y (cid:3) 1 z (cid:2)z )), (9b) 12 0 L 1 2 L 1 2 L 1 2 L 1 2 v x y z where (cid:19) and (cid:19) are estimated with equation (7), the Ls are length scales, in units of the variable v and in the v1 v2 three space dimensions and c is the popular function given by equation (4.10) of Gaspari and Cohn (1999), or 0 any other compactly supported correlation function. The max function selects the largest of its arguments. Equation (9b) ensures that (cid:17) is 0 if either v differs significantly from v or if locations 1 and 2 are very distant 12 1 2 from each other. The intent is that in the majority of cases, (cid:1) =(cid:1)(cid:1) c ((v (cid:1)v )/L ), 12 v1 v2 0 1 2 v and the modulation of the background error covariances with the c function enforces error covariance 0 localization in a state-dependent manner. The local cross-field covariances of the v and w errors in every grid cell are estimated with (cid:13) (cid:14)(cid:13) (cid:14) (cid:19)2 (cid:4)(cid:15)(v(cid:2)(cid:15)(v) w(cid:2)(cid:15)(w)). (10) vw They are used to update the fields of unobserved variables according to equation (6b). The size of the regions over which the (cid:15)(cid:20)smoothing operator is applied is generally of little consequence. Figure 1 illustrates this fact. It shows time series of differences in RMS observation minus forecast (OMF) for Argo temperature (T) and salt (S) data when Argo T data are assimilated every five days into the coupled GEOS-5 system, using SAFE. Each panel shows the RMS OMF reduction from the corresponding RMS OMF from a control run without data assimilation, such that negative numbers indicate that the analysis is closer to the data than the control. Fig. 1a corresponds to the active (i.e., assimilated) T data and Fig. 1b to the passive (un-assimilated) S data. As in every other experiment discussed herein, the initial ocean state of each run (including the control run) is taken from the GMAO ocean analysis (Vernieres et al. 2012) and the experimental setup is as detailed in Section 3 and uses the model configuration discussed in Section 2.4. In the three cases shown, the (cid:15) operator consists in 5 (red), 10 (blue) and 20 (green) iterations of a diffusive filter. While the case with 20 iterations produces somewhat better results for S (larger RMS S reduction), the differences from the other two cases are small. In the SAFE run of Section 3, (cid:15) involves 10 diffusion steps. 2.3 Flow Adaptive error Statistics from a Time series (FAST) While SAFE is based on the idea of sampling an ensemble in space, FAST samples an ensemble in time. The analyzed state at time t is computed from n previous instances of the model state vector sampled from the k recent history of the current model run, (cid:21) (cid:22) X (cid:4) x (cid:2)x , j (cid:4)0,(cid:2),n(cid:2)1, (11a) k k(cid:2)j (k) x (cid:4) 1(cid:23)n(cid:2)1 x , (11b) (k) n j(cid:4)0 k(cid:2)j 5 where x =x(t ), x =x(t =t (cid:1)(cid:1)), etc., for a given time lag (cid:24)(cid:25) Arguably, (cid:24) should be such that x differs k k k(cid:1)1 k(cid:1)1 k k-1 significantly from x while it still contains information that is useful at t . Hence, it seems reasonable to set L k k to the assimilation interval, as is done in Section 3. While one could be tempted to compute the analysis from X without further preprocessing as though it were k made of the current state of the ensemble of model trajectories from an EnKF run, the resulting error covariance estimates would be dominated by the instances furthest away from the center of the time window since x is (k) the simple moving average of length n estimated at time t . To prevent this from occurring and improve the k(cid:2)n/2 assimilation performance, the lagged state instances are first high-pass filtered, (cid:21) (cid:22) X(cid:26) (cid:4) x (cid:2)x0 , j (cid:4)0,(cid:2),n(cid:2)1, (12) k k(cid:2)j k(cid:2)j where the sequence of x0 is obtained by low-pass filtering the model history. One simple way to do this is with k an exponential moving average (EMA), x0 (cid:4)(cid:27)x (cid:3)(1(cid:2)(cid:27))x0 , (13) k k k(cid:2)1 where 0(cid:28)(cid:27)(cid:28)1. A good choice to filter out time scales longer than half the time window is (cid:27)(cid:4)4/(n(cid:3)2). A simple moving average can be used, as in (11b), at the expense of more bookkeeping. The case with (cid:27) = 0.5 is essentially equivalent to forming the ensemble of first order differences over the time window. Finally, the X(cid:26) ensemble is resampled. (This step removes sequential ordering information but it is not strictly k necessary.) The resulting ensemble mean is removed, (cid:21) (cid:22) X(cid:26)(cid:26)(cid:4) x(cid:26)(cid:26) (cid:4)(cid:23)n(cid:2)1(cid:29) x(cid:26) , j (cid:4)0,(cid:2),n(cid:2)1, (14a) k k(cid:2)j (cid:21)i(cid:4)0 ij k(cid:2)i (cid:22) X(cid:26)(cid:26)(cid:26)(cid:4) x(cid:26)(cid:26) (cid:2)x(cid:26)(cid:26), j (cid:4)0,(cid:2),n(cid:2)1, (14b) k k(cid:2)j where the (cid:29) are drawn from a uniform random distribution. ij The FAST procedure makes the same calculations to estimate background error covariances and compute assimilation increments with the ensemble of equation (14b) as the EnKF makes with its ensemble of model states at time t (e.g., equation 2b-f of Keppenne et al. 2008). One notable difference is that FAST calculates k only one increment. Because a single model integration is involved, the ensemble size can be increased at a very minimal cost. Figure 2 illustrates the importance of time-filtering and resampling the lagged ensemble of background states. It shows the reduction in RMS OMF for both T and S with respect to the corresponding RMS OMF statistics from the control run in experiments assimilating Argo T every five days. Five cases are shown. The green lines correspond to the full FAST methodology (equations 11-14) with n=20 and (cid:27)=0.18 (period 10 EMA). The four other cases shown correspond to (1) the deviations from their ensemble mean of the most recent 20 unfiltered background states sampled every five days (magenta), (2 and 3) the deviations from their ensemble mean of the most recent 20 first order time differences (cyan) and second-order time differences (blue) of background states sampled every five days and (4) the EnOI with the static ensemble of 20 error EOFs used to produce the GMAO 6 analysis (see Section 2.4.2: red). The latter is estimated from a 186-member static ensemble according to the procedure detailed in Vernieres et al. (2012). Clearly, computing covariances from unfiltered background states, a procedure which corresponds to using signal covariances, results in the poorest performance for the passive S data but draws the model state closest to the active T data (possible overfitting). The performance obtained with the dynamic ensembles of most recent first and second order time differences is close to that obtained with the static ensemble of leading EOFs. Among the five cases shown, FAST with 50-day high-pass filtering (period-10 EMA removal from a time series with L = 5 days) performs best for salt and achieves a good compromise for temperature. Presumably, the 50- day filtering retains pertinent information and avoids aliasing to the lower frequencies but it is possible that better results could be obtained with a different FAST configuration. The results of Figure 2 are for the entire water column, but it is shown in Section 3 that the better FAST performance is most noticeable above the thermocline. 2.4 GEOS-5 Modeling and Ocean Data Assimilation System 2.4.1 GEOS-5 atmosphere-ocean general circulation model The SAFE and FAST algorithms are tested in Section 3 in the context of assimilating Argo temperature data into the GFDL MOM4.1 ocean model coupled to the NASA GEOS-5 AGCM and to the Los Alamos CICE ice model (all of which comprise the GEOS-5 AOGCM). The model configuration is the same as that used for the GMAO ocean analysis (Vernieres et al. 2012). In summary, the OGCM is run with a geopotential vertical coordinate on a ½° grid with a gradual meridional refinement to ¼° at the Equator and with 40 vertical levels. The grid is Cartesian south of 60°N and tripolar northward thereof. The AGCM grid is 1° × 1.25° with 72 levels. The CICE model is run on the same horizontal grid as the OGCM. The AGCM is constrained by replaying the Modern-Era Retrospective analysis for Research and Applications (MERRA: Rienecker et al. 2011) while the ocean observations are assimilated. The replay procedure replaces the AGCM state with the state of the analysis every six hours. 2.4.2 GEOS integrated ocean data assimilation system (iODAS) The components of the GEOS-5 AOGCM are connected to each other and to the GEOS integrated ocean data assimilation system (iODAS) with the Earth System Modeling Framework (ESMF). Besides SAFE and FAST, two other assimilation algorithms available in iODAS are an EnOI utilizing EOFs of short-term forecast errors (Vernieres et al. 2012) and a univariate OI (UOI) algorithm with adaptive error covariance localization. They are both used in Section 3 as comparison benchmarks. The parallel implementation of iODAS follows Keppenne and Rienecker (2003). The procedure used to compute the short-term forecast-error empirical orthogonal functions (EOFs) used by the EnOI involves a 186-member ensemble of forecast error estimates. It is detailed in Vernieres et al. (2012). The covariance model used by the UOI assumes that the forecast errors have a compactly supported distribution with variance equal to the variance of the ensemble of error estimates from which the EOFs used with EnOI are computed. This corresponds to equation 9 with (cid:19) and (cid:19) corresponding to the cumulative EOF standard v1 v2 deviation at locations 1 and 2. In the SAFE, FAST, EnOI and UOI experiments discussed in Section 3, background-error covariances are localized according to equation (4) where the element of C corresponding to the ith and jth model state variables at space-time locations (x, y, z, t) and (x, y, z, t), is given by i i i i j j j j c (cid:4)c (max( 1 r (cid:2)r , 1 x (cid:2)x (cid:3) 1 y (cid:2)y (cid:3) 1 z (cid:2)z ))c (1 t (cid:2)t ), (15) ij 0 L i j L i j L i j L i j 0 L i j r x y z t 7 where r and r are the adaptive localization variable at locations i and j and the r field is chosen to correspond to i j the observed variable. Note the similarity with equation (9b), except for the appearance of the temporal term, c (1 t (cid:2)t ). The latter results from differences between the measurement times and the analysis time. The 0 L i j t application of equation (15) to modulate the background-error covariances enforces a state-dependent error- covariance localization, even when the raw covariances are time-independent, as is the case with EnOI and UOI. 3 Application To validate the SAFE and FAST algorithms and to evaluate their usefulness as alternatives to the EnOI, we ran four AOGCM experiments assimilating T profiles from the broad-scale global array of temperature/salinity profiling floats (Argo: Gould et al. 2004). In three of these runs, namely in the SAFE, FAST and EnOI runs, both T and S ocean model fields are updated. In the fourth run, referred to as UOI run, only the T field is updated. [Note that the UOI is included for completeness, even though it has been known for some time that assimilation that does not update salinity carefully can give a poor analysis (e.g., Sun et al. 2007).] Besides the active Argo T data, passive Argo S data are also processed. The passive qualifier means that these data are not assimilated. Their only use is to quantify the effect of the assimilation on the unassimilated S variable. A control run in which all the T and S data are processed passively was also run. The runs cover a two-year period starting January 1, 2010. The ocean initial conditions are the same for all runs and come from the GMAO ocean analysis (Vernieres et al. 2012). The GEOS-5 replay procedure constrains the atmosphere to MERRA over the period of the runs. Every five days, data from a 5-day time window centered about the analysis time are processed. The observational error model is Gaussian in the horizontal and vertical. The observational error variance varies as a function of depth according to the magnitude of the vertical gradient. Details are provided in Vernieres et al. (2012). The assimilation increments are applied incrementally over a five-\day period, as in the incremental analysis update procedure of Bloom et al. (1996), but without rewinding the model clock (Keppenne et al. 2008). The SAFE run estimates its background-error covariances from equations (7) and (10) where the (cid:15) operator consists of 10 diffusion steps. To improve the performance in the low latitudes, SAFE error covariances are explicitly disabled when they involve a grid cell within the 10°N-10°S latitude band and another grid cell outside of it. This step is exclusively applied in the SAFE run to prevent the state variables at grid cells outside the waveguide from participating in the estimation of the background error variance ((cid:15) operator) at grid cells inside the waveguide. The FAST run applies equations (11-14) with a five-day lag, n=20 and (cid:27)=0.18. Only 20 lags are used to facilitate comparison with the EnOI, since the latter uses a static ensemble of 20 leading EOFs (Section 2.4.2). The UOI run takes its background temperature error variance estimate from the EnOI run (weighted sum of squared EOFs). The error-covariance localization scales are the same in all runs and are identical to those used in Vernieres et al. (2012). For each assimilation cycle, SSTs are assimilated first, followed by the in situ data. To illustrate the SAFE and FAST error covariance models, Figure 3 shows time sequences of zonal vertical cross sections at the Equator through the SAFE (Fig. 3 a-d) and FAST (Fig. 3 e-h) background error standard deviation estimates for the model’s ocean temperature. The succession is shown with a 3-month interval. The FAST and SAFE sections are qualitatively similar. Yet, the SAFE estimates are noticeably smoother because the number of grid cells participating in the SAFE spatial averaging is larger than the number of lagged state instances used in the FAST computations. Also note the general resemblance to the corresponding section through the time-independent background error standard deviation field used by both the EnOI and UOI runs of 8 Section 3 (Fig. 3i). The differences between the equatorial sections are largest in the Indian and Atlantic Ocean. The processing time of each run with data assimilation expressed in terms of the time taken by the control run on 30 Intel Altix Sandy Bridge nodes (360 2.8 GHz cores) is shown in Figure 4. UOI takes 70% longer than the control run while FAST and EnOI both take about twice as long as UOI and SAFE takes nearly 50% longer than UOI. For comparison, the best case scenario for a 20-member EnKF run in which ensemble members are run sequentially is also shown. Running ensemble members in parallel, while possible with the GEOS iODAS would require many more compute nodes. Figure 5 illustrates the background-error covariance models used in each run by showing marginal T and S assimilation increments corresponding to the impact of a unit T innovation at (0ºN, 140ºW, 180m) at the end of the runs (January 1, 2012). The top row of panels (a), (e) and (i) shows zonal sections through the corresponding marginal T increments in the SAFE (left), FAST (middle) and EnOI (right) runs. The 2nd row of panels (b), (f) and (j) shows corresponding meridional T sections. Panels (c), (g) and (k) (3rd row) and the bottom row of panels (d), (h) and (l) show zonal and meridional sections through the corresponding marginal S increments. The differences apparent in Figure 5 result primarily from differences in covariance modeling approach (static ensemble in EnOI, time-lagged ensemble in FAST, spatial pseudo ensemble in SAFE). However, differences also arise from differences in the state adaptive error-localization of equation (15) since the differences between the respective background states have increased over time (particularly evident in Figure 6). The amplitude differences between the SAFE, FAST and EnOI marginal gains reflect differences in the background error estimates at the observation location. In this example, there is more correspondence between the shapes of the marginal T and S increments from the EnOI (panels (i) and (k) and panels (j) and (l)) than those from SAFE or FAST. The amplitude of the T marginal increment is also largest in the EnOI run. Yet, the amplitude of the S marginal increment is relatively small in the EnOI run, reflecting lower covariance between the T and S error estimates at this particular observation location. To further illustrate how the SAFE, FAST and EnOI error-covariance models differ, Figure 6 shows the time evolution (sampled every three months) of zonal sections through the marginal S increment corresponding to a unit T innovation at the same Equatorial location considered in Figure 5. Not surprisingly since the EnOI estimates background covariances from a static ensemble, its marginal S gain at this location displays the least temporal variation. The latter result from how the background T field (r in equation (15)) changes with time. Conversely, the FAST marginal S gain varies the most with time as one could have expected because the corresponding background error covariances are high pass filtered by design and represent errors/uncertainties at periods shorter than 50 days in this case. Clearly, the FAST covariances are influenced by tropical instability waves which mostly occur between July and November and have wavelengths of 1000-2000 km and periods of 20-40 days (e.g., Willett et al., 2006). While the spatial ensemble involved in the SAFE background error covariance calculations also varies with the background fields, the resulting covariances only capture variability in space, not in time. Figure 7 quantifies the improvement (negative values) or worsening (positive values) over the control by showing to what extent the RMS OMF statistics differ from the corresponding statistics from the control run. RMS OMF differences are shown in each panel for the SAFE (blue), FAST (red), EnOI (green) and UOI (magenta) runs. Figure 7a corresponds to the active Argo T data, while Figs. 7b and 7c correspond to the passive Argo S data above and below 300 meters. While the four data assimilation methods perform similarly for T, FAST stands out for its better performance in terms of S, especially in the upper ocean (Fig. 7b). On the other hand, the underperformance of UOI, which degrades the model salt field compared to the control run, is especially striking in the thermocline (Fig. 7c). 9 The global RMS observation minus forecast (OMF) differences corresponding to the active T data are comparable in the four runs with T data assimilation (SAFE: 0.76 ºC, FAST: 0.88 ºC, EnOI: 0.76 ºC, UOI: 0.87 ºC), as they each explain approximately the same fraction of the T innovation variance of the control run (1.272 ºC2). This result is as expected given that each run sets (cid:16)=1 in equation (8) to facilitate the comparison. Figure 8 further illustrates the respective performance of each run with T assimilation. The difference of the RMS OMF (horizontally and over time) in the data assimilation runs from that in the control is shown as a function of depth for 2011 (blue: SAFE, red: FAST, green: EnOI, magenta: UOI). Negative numbers mean that the data assimilation brings the (5-day lead) forecast state closer to the data than the control and should be the norm if the data are unbiased. Fig. 8a corresponds to the active T data and Fig. 8b to the passive S data. For T, the level of improvement over the control is similar for all runs and is largest near a depth of 100 meters. For S, the results are markedly different. UOI is worse than the control over the entire water column and while SAFE, FAST and EnOI all improve upon the control over the entire column, FAST produces the largest improvement over the entire depth range. The horizontal distributions of the differences in RMS S OMF from those of the control during 2011 for each of the SAFE, FAST, EnOI and UOI runs are shown in Figure 9 for the upper 300 meters and in Figure 10 for depths greater than 300 meters. In the upper ocean, SAFE, EnOI and UOI all show significant degradations from the control in the Western Equatorial South Pacific (red areas in Figs. 9a, 9c, and 9d). FAST does better in the same area and performs best overall (Fig. 7b). Since the upper ocean salt content is heavily influenced by precipitation and evaporation and the corresponding fluxes are constrained to the MERRA forcing in all runs, including the control, it is not surprising that the analyses (which all assimilate T only) do not outperform the control at the surface and in the mixed layer above the halocline. Positive impacts on the model salinity from the T data assimilation are most likely to manifest themselves further away from the surface. Accordingly, the positive impact of the S field correction in the SAFE, FAST and EnOI runs is more apparent below 300 meters, especially in the Northern Atlantic, Gulf Stream and Kuroshio areas and in the area of the West Australian and Leeuwin currents in the Southeast Indian Ocean. While FAST performs best overall, it under-performs the control in the Indian sector of the Southern Ocean. A similar but less-pronounced underperformance in the same area is also noticeable for SAFE (Fig. 10a), but not for EnOI. Since the comparison is restricted to 2011, these regional comments are not definitive. 4. Conclusions When ensemble data assimilation schemes are applied to complex numerical models, the ensemble size is always a limiting factor or the object of compromise. The methodologies introduced here are designed to possess the main advantages of ensemble data assimilation methods such as the EnKF, namely the ability to update state variables even if unobserved (or not directly assimilated) and to estimate the spatial distribution of background errors, without incurring the cost of ensemble integrations. In EnKF implementations, the amplitude of ensemble perturbations is often the result of manually tuned covariance inflation. The approach used in SAFE and in FAST whereby the global ratio of the traces of the background error and observation error covariance matrices is specified at runtime achieves essentially the same purpose. FAST more closely resembles the EnKF since it is essentially an EnKF with a high-pass filtered proxy ensemble sampled from the model’s recent history. The spatial background error covariance representation of SAFE is a more radical departure from mainstream ensemble data assimilation methods. While SAFE is nearly as economical as conventional OI, our results hint that it is not as effective as FAST or EnOI in updating fields of un-observed variables. The better performance of FAST in this respect may stem from its error covariance model ability to capture sub-seasonal variability. While more work is required to fully evaluate the potential of the new methods, our initial results suggest that 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.