Scaling, Similarity, and the Fourth Paradigm for Hydrology Christa D. Peters-Lidard1, Martyn Clark2, Luis Samaniego3, Niko E. C. Verhoest4, Tim van Emmerik5, Remko Uijlenhoet6, Kevin Achieng7, Trenton E. Franz8, Ross Woods9 1Earth Sciences Division, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA 5 2Research Applications Laboratory, National Center for Atmospheric Research, Boulder, CO 80301, USA 3UFZ-Helmholtz Centre for Environmental Research, Leipzig, 04318, Germany 4Laboratory of Hydrology and Water Management, Ghent University, Coupure links 653, B-9000 Ghent, Belgium 5Water Resources Section, Delft University of Technology, Delft, 2628 CN, The Netherlands 6Hydrology and Quantitative Water Management Group, Wageningen University, 6700 AA Wageningen, The Netherlands 10 7Department of Civil and Architectural Engineering, University of Wyoming, Laramie, WY 82071, USA 8School of Natural Resources, University of Nebraska-Lincoln, Lincoln, NE 68583, USA 9Department of Civil Engineering, University of Bristol, Bristol, BS8 1TR, UK Correspondence to: Christa D. Peters-Lidard ([email protected]) Abstract. In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for 15 universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm), and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modelling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing, describe a mutual information framework for 20 testing these hypotheses, describe boundary condition, state/flux, and parameter data requirements across scales to support testing these hypotheses, and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity. 1 Introduction 25 This synthesis paper is an outcome of the “Symposium in Honor of Eric Wood: Observations and Modeling across Scales”, held June 2-3, 2016 in Princeton, New Jersey, USA. The focus of this contribution is the heterogeneity of hydrological processes, their organization, scaling and similarity, and the impact of the heterogeneity on water and energy states and fluxes (and vice versa). We argue here that the growth of hydrologic science, from empiricism (1st paradigm), via theory (2nd paradigm), to computational simulation (3rd paradigm) has yielded important advances in understanding and predictive 30 capabilities – yet we argue that accelerating advances in hydrologic science will require us to embrace the 4th paradigm of 1 data-intensive science, to use emerging datasets to synthesize/scrutinize theories and models, and improve the data support for the mechanisms of Earth System change. The Fourth Paradigm is a concept that focuses on how science can be advanced by enabling full exploitation of data via new computational methods. The concept is based on the idea that computational science constitutes a new set of methods 5 beyond empiricism, theory, and simulation, and is concerned with data discovery in the sense that researchers and scientists require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies and processes. By integrating these tools and technologies for research, we provide new opportunities for researchers and scientists to share and analyze data and thereby encourage new scientific discovery. As shown in Figure 1, the scientific method applied to hydrology is not a linear process—rather, because hydrology is already in the 3rd paradigm, empiricism (the 1st paradigm) 10 and theoretical development (the 2nd paradigm) both lead to new theories and hypotheses that are embodied in computational models. These hypotheses may not be rigorously tested with many datasets, either because the datasets have not been gathered into an effective, accessible platform, or because the datasets require additional processing and information theoretic techniques to apply them to the model predictions for hypothesis testing. Further, as noted by Pfister and Kirchner (2017), hypothesis testing with models is fraught with challenges that require not only consideration of the data required to 15 test a given hypothesis, but also careful consideration of how to encode hypotheses as uniquely falsifiable predictions (Figure 1). Advances in data science now allow the 4th paradigm to inject “big data” into the scientific method using rigorous information theoretic methods without eliminating the other parts of the scientific method. Our focus here on scaling and similarity directs attention to one of the most challenging problems in the hydrologic sciences. As defined by Blöschl and Sivapalan (1995), scale is a “characteristic length (or time) of process, observation, model” and 20 scaling is a “transfer of information across scales” (see also Bierkens et al., 2000; Grayson and Blöschl, 2000). Functional relationships between hydrologic variables may also exist and these may be scale-independent (or scale-invariant). Similarity is present when characteristics of one system can be related to the corresponding characteristics of another system by a simple conversion factor, called the scale factor. We should note that the terms ‘scaling’ and ‘similarity’ used here are specific to the hydrology literature and distinct from the general notions of self-similarity, fractals, and emergent behavior in 25 the nonlinear dynamics literature. Classic examples of similarity include the ratio of catchment areas (Willgoose et al., 1991; Smith, 1992), and the topographic index ln(a/tanβ) (Beven and Kirkby, 1979) that are used for relating flows of two catchments and relating the topographic slopes and contributing areas to water table depths, respectively. Another example is the hillslope Péclet number (Berne et al., 2005; Lyon and Troch, 2007). Heterogeneity or variability in hydrology manifests itself at multiple spatial scales (e.g., Seyfried and Wilcox, 1995; Blöschl and Sivapalan, 1995), from local (O(1 m); e.g., 30 macropores) to hillslope (O(100 m); e.g., preferential flowpaths) to catchment (O(10 km); e.g., soils) and regional (O(1000 km); e.g., geology). Similarly, temporal variability is reflected on event, seasonal and decadal time scales (e.g., Woods, 2 2005). Understanding scaling and similarity requires understanding how the interactions among multiple processes across scales affect the (emergent) hydrologic behaviour at other space-time scales; such understanding underpins methods for computational simulation. The scaling and similarity problem is nevertheless very difficult. As asserted by Dooge (1986), “within the physical sciences 5 and the earth sciences there is and can be no universal model for water movement.” Despite numerous attempts at integrating local models across soils (e.g. Kim et al., 1997), hillslopes (Troch et al., 2015) and watersheds e.g., (Reggiani et al., 1998, 1999, 2000, 2001), universal laws in hydrology and the required closure relations remain elusive because the physics are likely scale-dependent (e.g. Bierkens, 1996) and the data required to test these hypotheses are either not readily available or not easily synthesized, or, even worse, would never be observable (Beven, 2006). Further, computational 10 advances have enabled so-called “hyper-resolution” or, using an alternative term that is not necessarily equivalent, “hillslope-resolving” modelling (e.g. Chaney et al., 2016; Wood et al., 2011), but as noted in the discussion between Beven and Cloke (2012) and Wood et al. (2012), and later discussed in Beven et al. (2015), the ability to provide meaningful information from hillslope-resolving models is limited both by a lack of tested parameterizations at a given model scale as well as by lack of data for model evaluation (e.g. Melsen et al., 2016a). 15 In principle, moving to finer spatial and temporal resolutions may improve accuracy simply by reducing the truncation error in the numerical solution of the system of partial differential equations. In an analogy with fluid mechanics and the atmospheric sciences where “large eddy simulations” are designed to capture the most energetic motions and thereby reduce the sensitivity to turbulence closure, one might ask whether “hillslope-resolving” models might resolve the most energetic components (in an information theoretic/entropy sense) of the terrestrial water storage spectrum such that the closure 20 problem may be simplified. As discussed in many of the studies cited above, topography is fractal and this, combined with scaling between the pedon and the hillslope, drives much of the scaling behavior seen in hydrology. Most of the apparent fractal nature in relation to hydrology has been demonstrated at the scale of river networks (e.g. Tarboton et al, 1988), so a hypothesis that could be tested with data following the 4th paradigm is to what extent resolving these river networks in models reduces the information loss. Further, proposed scaling relationships may be appropriate above a given scale, but as 25 we move downward in scales from watershed to hillslope to local, these relationships may break down. These current tactics in the hydrologic sciences are representative of the third paradigm of scientific investigation (Hey et al., 2009), characterized by applying computational science to simulate complex systems. The so-called third paradigm builds on the earlier first (empirical) and second (theoretical) paradigms. As discussed by Clark et al., (this issue), computational science approaches to modeling hydrologic systems have been discussed for decades. With the advent of high-resolution 30 earth observing systems (McCabe et al., this issue), proximal sensing (Robinson et al., 2008), sensor networks (Xia et al., 2015), and advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015), there is now an opportunity to 3 recast the hydrologic scaling problem into a data-driven hypothesis testing framework e.g., (Rakovec et al., 2016a). By embracing such a framework, hydrologic analysis can become explicitly “scale-aware” by testing specific parameterizations at a given model scale. Now is the time for a fourth paradigm in hydrologic science. With this goal in mind, this paper addresses the following questions: 5 1. What are the key scaling and similarity concepts (hypotheses) that require testing? 2. What framework could we use to test these hypotheses? 3. What are the data requirements to test these hypotheses? and 4. What are the model requirements to test these hypotheses? 2 Scaling and similarity hypotheses 10 Most scaling work to date has built on the Representative Elementary Area (REA) concept (Wood et al., 1988), and extensions to the Representative Elementary Watersheds (REW) introduced by Reggiani et al. (1998, 1999, 2000, 2001) – the REA/REW concept seeks to define physically meaningful control volumes for which it is possible to obtain simpler descriptions of the rainfall-runoff process (i.e., simpler than those at the point scale). An alternative, but related, concept is the Representative Hillslope (RH; Troch et al., 2003; Berne et al., 2005; Hazenberg et al., 2015). The REA/REW approach is 15 conceptually similar to Reynolds averaging, and relies on the fundamental assumption that the physics are known at the smallest scale considered (e.g. Miller and Miller, 1956). Critically, the fluxes at the boundaries of the model control volumes require parameterization (the so-called “closure” relations). These closure assumptions are typically ad-hoc, and include sub- grid probability distributions, scale-aware parameters, or new flux parameterizations. Fundamentally, these approaches conform to the third paradigm, in the sense that they take as given a set of conservation equations that govern behaviour at 20 the fundamental (patch, tile, grid, hillslope, or REW) scale (Figure 2). Testing both the scaling and closure assumptions as hypotheses using data would move hydrology towards the fourth paradigm. The examples above represent the classic “Newtonian” approach in hydrology, but the 4th paradigm advocated here is not specific to testing hypotheses derived from that approach, and as shown in Figure 1, represents an augmentation to the scientific method in hydrology. Foundational (Sivapalan, 2005; McDonnell et al., 2007) and more recent work (Thompson 25 et al., 2011; Harman and Troch, 2014) on “Darwinian” hydrology has used scale and similarity concepts to synthesize catchments across scales, places and processes. As noted in McDonnell et al., (2007) there has been a call for a reconciliation of the Newtonian and Darwinian approaches, starting first in the ecology community (Harte, 2002), and we believe that 4 moving to a 4th paradigm with the augmented scientific method depicted in Figure 1 will embody the wishes of Darwin from his “Structure of Coral Reefs” as quoted in Harman and Troch (2014): “. . . In effect, what an immense addition to our knowledge of the laws of nature should we possess if a tithe of the facts dispersed in the Journals of observant travellers, in the Transactions of academies and learned societies, were collected 5 together and judiciously arranged! From their very juxtaposition, plan, co-relation, and harmony, before unsuspected, would become instantly visible, or the causes of anomaly be rendered apparent; erroneous opinions would at once be detected; and new truths – satisfactory as such alone, or supplying corollaries of practical utility – be added to the mass of human knowledge. A better testimony to the justice of this remark can hardly be afforded than in the work before us.” An important avenue to advance hydrologic understanding and predictive capabilities is through attention on hypotheses of 10 hydrologic scaling and similarity, i.e., different ways to relate processes and process interactions across spatial scales. One of the foundational works in hydrologic similarity is the topographic index (Beven and Kirkby, 1979) – the topographic index defines local areas of topographic convergence, and is used to relate the probability distribution of local water table fluctuations to catchment-average surface runoff and sub-surface flow. Building on this topographic similarity, this index was expanded to include soils and study runoff production (Sivapalan et al., 1987; Sivapalan et al., 1990), and further 15 applied to examine scaling of evaporation (Famiglietti and Wood, 1994) and soil moisture (Wood, 1995; Peters-Lidard et al., 2001). Such controls of water table depth on runoff production and evapotranspiration at catchment scales represent just one hypothesis of similarity and scaling behaviour – an example alternative hypothesis, used in the VIC model (Liang et al., 1994), is the description of how sub-element variability in soil moisture affects the development of saturated areas in a catchment and the partitioning of precipitation into surface runoff and infiltration (Moore and Clarke, 1981; Dümenil and 20 Todini, 1992; Wood et al., 1992; Hagemann and Gates, 2003). Other scaling hypotheses are used for other physical processes, for example, how small-scale variability in snow affects large-scale snow melt (Luce et al., 1999; Liston, 2004; Clark et al., 2011a), and how energy fluxes for individual leaves scale up to the vegetation canopy (de Pury and Farquar, 1997; Wang and Leuning, 1998). The critical issue here is the interplay between the scale of the model elements and the choice of the closure relations: As 25 computational resources permit higher resolution simulations across larger domains (Wood et al., 2011), more physical processes can be represented explicitly, and the closure relations must be tailored to fit the spatial scale of the model simulation. To some extent such “hyper-resolution” approaches abandon the quest for physically meaningful control volumes that characterizes the REA and REW concepts, and the representation of sub-element processes in fully 3-D simulation of watersheds (e.g., Kollet and Maxwell, 2008; Maxwell and Miller, 2005) is becoming less and less obvious, and 30 perhaps less and less necessary. A key question now is whether “hyper-resolution” applications through explicit 3-D models, or (at least for some variables) with clustered 2-D simulations (e.g., the HydroBlocks of Chaney et al., 2016), provide 5 reasonable representations of scaling and similarity. Considering infiltration excess and saturation excess runoff generation processes, high resolution numerical studies indicate that excess infiltration doesn’t appear to have an ergodic limit (e.g. Maxwell and Kollet (2008), while excess saturation processes scale with the geometric of subsurface saturated hydraulic conductivity (e.g. Meyerhoff and Maxwell, 2011). Similarly, one might imagine different scaling relations for 5 evapotranspiration depending on the nature of controls due to radiation (topography), vegetation, and/or soil moisture (e.g., Rigden and Salvucci, 2015). For example, as recently shown by Maxwell and Condon (2016), the interplay of water table depths with rooting depths along a given hillslope exerts different controls on evaporation and transpiration, which links the water table dynamics with the land surface energy balance, even at continental scales. This finding is based on limited data, and would benefit from formal hypothesis testing in an information-based framework, as described in the next section. 10 3 A hypothesis testing framework for hydrologic scaling and similarity As demand increases for hillslope-resolving or “hyper-resolution” modelling (e.g., Beven et al., 2015; Beven and Cloke, 2012; Bierkens et al., 2015; Wood et al., 2011, 2012), the question arises as to whether the physics in our models, the parameters that are used in the models, and the input data (e.g., “forcings”) are adequate to support such endeavours (e.g. Melsen et al., 2016b). Following from Nearing and Gupta (2015), we can formulate a framework for testing hypotheses 15 based on measuring information provided by a model (e.g., parameterizations based on similarity concepts) as distinct from information provided to a model (e.g., forcing data or parameters). We should note that this is not hypothesis testing in the traditional sense, but rather a framework for scrutinizing hydrological scaling and similarity hypotheses with data. This concept was demonstrated by Nearing et al. (2016), who evaluated the information loss due to forcing data, parameters, and physics in the North American Land Data Assimilation System (NLDAS) model ensemble. In this example, information was 20 first measured using point data for soil moisture and evaporation, and compared to regressions that are kernel density estimators of the conditional probability densities and represent the upper bound of information available on a given variable from the forcing data alone and given the forcing data and parameters. As shown in Figure 2, we can measure the total information about a given variable z contained in observations (H(z), left bar), and then measure the information about that variable provided by a given model simulation (I(z; yM), right bar). The intermediate bars represent losses of information due 25 to forcing data (boundary conditions) and due to parameters. If we take this example, and expand it to conceptualize a framework for hypothesis testing in hydrology, we can imagine multiple instances of H(z) computed at different spatial scales, as well as multiple instances of mutual information I(z, yM), computed for models employing different representations of processes at that scale. One concrete example hypothesis described in the previous section is the use of TOPMODEL parameterizations for groundwater, versus representative 30 hillslopes, versus “HydroBlocks” (Chaney et al., 2016) versus explicit 3-D modeling. 6 Critical to this exercise is the availability of forcing data, such as precipitation, radiation, humidity, temperature and wind speed, that have sufficient information content at the scale being evaluated such that it can adequately characterize the variable (e.g., soil moisture) or process (e.g., evapotranspiration; runoff) being studied (e.g. Berne et al., 2004). Similarly, the parameters provided to the model must also contain information about the variable or process being studied at a 5 particular spatial and temporal scale. The Nearing and Gupta approach provides a framework for explicitly measuring the information available from observations, comparing that to information provided by a model, and attributing lost information to forcings, parameters and physics, and hence provides a rigorous method to test our physics assumptions by confronting them with observations. Clearly, this leads to requirements for data that can support such framework. 4 Data requirements 10 As shown in Figure 1, the 4th paradigm for hydrology is characterized by the rigorous application of large datasets towards testing hypotheses as encapsulated in models. The process of constructing models requires observations both as input data, and for model and process validation or hypotheses testing. A distinguishing characteristic of data for model and process validation will be that we are observing spatial and temporal patterns of fluxes and states represented in our modeling framework, for example, soil moisture, snow pack or evapotranspiration. As discussed by McCabe et al. (this issue), there 15 has been a dramatic increase in the type and density of hydrologic information that is becoming available at multiple scales, from point- to meso-scale and regional to global. For example, the number of remote sensing missions dedicated to observing the water cycle, allows further development of (large scale) hydrological models and data assimilation frameworks for more accurate soil moisture, evaporation, and streamflow prediction. In particular, there are exciting developments in meso-scale (i.e. hillslope to catchment) observations, which are critical for testing hypotheses about scaling (REA, RH, 20 REW) by connecting point measurements, hydrological models, and remote sensing observations. Examples include recent advances in cosmic ray neutron sensors (Franz et al., 2015; Köhli et al., 2016; Zreda et al., 2008), distributed temperature sensing (DTS; Steele-Dunne et al., 2010; Bense et al., 2016; Dong et al., 2016), soil moisture observations, the use of crowd- sourcing (De Vos et al., 2016) and microwave signal propagation from telecommunications towers for precipitation (Leijnse et al., 2007), to the rise in the use of unmanned autonomous vehicles to characterize the landscape at centimeter scale 25 (Vivoni et al., 2014). These alternative data sources enhance our ability to observe, understand, and simulate the hydrological cycle. Advances in citizen science (Buytaert et al., 2014; Hut et al., 2016) and the use of so-called “soft” data for hydrological modeling (Van Emmerik et al., 2015; Seibert and McDonnell, 2002) show that even though these new data are collected on nontraditional spatiotemporal scales, they might give us new insights in how processes at different scales are coupled. 7 Advances in hydrogeophysical characterization of the subsurface (Binley et al., 2015), such as electrical methods, ground penetrating radar and gravimetry, offer non-invasive meso-scale information that can be used to provide parameters or to infer boundary conditions, states or fluxes. Recently, Christensen, et al. (2017) demonstrated that dense airborne electromagnetic data can be used to map hydrostratigraphic zones, which is an encouraging capability. Imaging the subsoil 5 may be feasible at local scales, but it is a challenge at river basin or continental scales. Hence, we encourage more joint efforts in hydrogeophysical imaging for integrated characterization of the subsurface. Combined, these observations may be used in a benchmarking exercise similar to Nearing et al. (2016). Synthesizing hydrogeophysical methods with point observations and laboratory/field techniques for estimating "effective" soil hydraulic functions/parameters is a challenging opportunity (e.g. Kim et al., 1997), but one which might be tractable using a data- 10 driven hypothesis testing framework. These new data sources allow us to understand and apply scaling between data sources (point scale to remotely sensed data) and between model scales; and provide the critical data required to test alternative scaling hypotheses. Beyond the new meso-scale observations, extensive catchment databases now exist to support hypothesis testing including the TERENO (Zacharias et al., 2011), MOPEX (Duan et al., 2006), CONUS benchmarking (Newman et al., 2015a), GRDC 15 (http://www.bafg.de/GRDC/EN/01_GRDC/13_dtbse/database_node.html) and EURO-FRIEND databases (Stahl et al., 2010). Recent similarity studies (Sawicz et al., 2011) have systematically analyzed large numbers of catchments focusing on streamflow-oriented signatures such as the runoff coefficient, baseflow index and slope of the flow duration curve, and then have explored relationships between these signatures and model process time scales (Carrillo et al., 2011). Coopersmith et al. (2012) generalized this work with four nearly orthogonal signatures that included aridity, seasonality of rainfall, peak 20 rainfall, and peak streamflow, and demonstrated that 77% of MOPEX catchments can be described by only six classes defined by combinations of the four signatures. Clearly there is information contained in these catchment databases about not just the coevolution of climate (forcing) and landscape properties (parameters), but also the physics of the catchment responses. Comparative hydrology (e.g., Kovács, 1984; Falkenmark and Chapman, 1989; Gupta et al, 2014) takes a first needed step in the direction of the fourth paradigm, and following the framework described above, we can explicitly quantify 25 the mutual information in the signatures, parameters and forcings to help elucidate these connections beyond classification. One of the crucial factors that complicate scaling is the anthropogenic effect on catchments. Recent advances in modeling the co-evolution of the human-water system (see e.g. Troy et al., 2015; Ciullo et al., 2017) focused on identifying generic key processes and relations. Yet, it is unknown how these relate to systems on larger (and smaller) scales. To arrive at new understandings of scaling and similarities in human-influenced catchments, studying these issues from a socio-hydrological 30 point of view should be an integrated part of the way forward (e.g. Van Loon et al., 2016). 8 5 Modeling framework requirements Embracing the fourth paradigm in hydrology will face several challenges. First, it is necessary to implement/extend a hydrologic modelling framework with sufficient flexibility to evaluate competing hypotheses of similarity and scaling behavior (Clark et al., 2011b). One possible framework is the Structure for Unifying Multiple Modeling Alternatives 5 (SUMMA), recently introduced by Clark et al. (2015), which has the capability to incorporate alternative spatial configurations and alternative flux parameterizations. Frameworks like SUMMA, which pursue the method of multiple working hypotheses, enable decomposing complex models into the individual decisions made as part of model development, and focusing attention on specific decisions (e.g., related to scaling and similarity) while keeping all other components of a model constant, hence enabling users to isolate and scrutinize specific hypotheses. One confounding issue is that models 10 with parameterizations designed to represent sub grid processes may not add information in a manner proportional to increased information in the inputs, while models that have a single column tile / subtile form may show a more direct relationship between information in inputs and information in outputs. Similarly, integrated models with lateral flow of water in surface and subsurface systems that generate runoff directly will have a different spatial sensitivity to the resolution of the input data than more traditional land surface models with no lateral flow and a parameterized runoff generation. Hence, the 15 modeling framework must be able to isolate the role that surface and subsurface connectivity play in processing information at different scales. A second challenge consists of understanding how to deal with different uncertainties/errors of different observational products and hydrologic models when comparing them for studying the scaling behavior. Several papers have highlighted the problem of different climatologies or sensitivities of remote sensing products (e.g. Albergel et al., 2012; Brocca et al., 20 2011), gridded meteorological products (Clark and Slater, 2006; Newman et al., 2015b), and streamflow observations (Di Baldassarre and Montanari, 2009; McMillan et al., 2010). A true correspondence of these remotely sensed variables with model results is often hampered, due to vertical mismatches in the soil column between the different products (Wilker et al., 2006), approximations in the structure of the hydrological model used, its parameterization and discretization, the initial conditions, and errors in forcing data (De Lannoy et al., 2007). Because of this, modeled variables often do not correspond 25 well to observations; nevertheless, similar trends and dynamics between the different products are found (Koster et al., 2009). In several data assimilation studies, the problem of differences in climatologies is resolved by bias-correcting the observations towards the model (e.g. Crow et al., 2005; Kumar et al., 2014; Lievens et al., 2015a, 2015b; Martens et al., 2016; Reichle and Koster, 2004; Sahoo et al., 2013; Verhoest et al., 2015). Yet, such (statistical) operations may not be 30 appropriate for scaling studies. First of all, these methods only rescale the remotely sensed value, yet the uncertainties in these products need rescaling as well. Second, depending on the bias-corrections method used (ranging from only correcting 9 for the first moment to full CDF matching) different scaling relations may be found. Ideally, multiscale data should be used in a way that best demonstrates the ability of the models to reproduce processes at the scales at which those data are available, particularly with respect to reproducing attributes of dynamics, such as the time rate of decorrelation using an information metric, and the mutual information across variables, space and time. 5 Testing hypotheses with multiple scale information also require assimilation/modeling frameworks that allow integrating data into models at their native resolution so that simulations and observations can be compared without the need of introducing ad-hoc downscaling/upscaling rules. One such framework has recently been proposed by Rakovec et al. (2016b). This framework uses the multiscale parameter regionalization (MPR) (Samaniego et al., 2010) technique to link the resolutions of the various data sources with the target modeling resolution and keeping a single set of model transfer 10 parameters that are applicable to all scales. As a result, seamless, flux-matching simulations can be obtained. The MPR- based assimilation framework proposed by Rakovec et al. (2016b) is general and can be used within any land surface or hydrologic model. This framework was originally tested with mesoscale hydrological model (mHM) (Kumar et al., 2013; Samaniego et al., 2010) in order to test hypotheses related to model transferability across scale and locations as well as process description. This data assimilation approach is general and can be used—for example within the SUMMA (Clark et 15 al., 2015) modeling framework—to test hypothesis related with the appropriate model complexity at a given scale. A model agnostic MPR system called MPR-flex has been recently applied to the Variable Infiltration Capacity (VIC) model to estimate seamless parameter and flux fields over CONUS (Mizukami, N., Clark, M., Newman, A., Wood, A., Gutmann, E., Nijssen, B., Samaniego, L. Rakovec, under review). This symbiosis of model parameterization (MPR-Flex) and simulation frameworks (e.g., SUMMA, mHM, etc.) is a very promising avenue to test scaling laws as well as the uncertainty 20 decomposition described above. Finally, the issue of subjective modeling decisions (e.g. the choice of time step, spatial resolution, numerical scheme, study region, time period for calibration / validation, performance metrics, etc.) and associated uncertainties is an issue that requires further attention (e.g. Krueger et al., 2012). 6 Summary and Next Steps In this paper we review advances in hydrologic scaling and similarity. Beginning with the challenge of Dooge (1986), we 25 posit that roadblocks in the search for universal laws of hydrology are hindered by our third-paradigm approach, and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Building on other synthesis papers in this issue (Clark et al., McCabe et al.), advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015) have laid the foundation for a data-driven hypothesis testing framework for scaling and similarity. To achieve this goal, we have (1) summarized important scaling and similarity concepts (hypotheses) that require testing; (2) described a mutual 30 information framework for testing these hypotheses; (3) described boundary condition, state/flux, and parameter data 10