LLNL-JRNL-661076 Hierarchical Probabilistic Inference of Cosmic Shear M. D. Schneider, D. W. Hogg, P. J. Marshall, W. A. Dawson, D. J. Bard, D. Boutigny, D. Lang, J. Meyers September 23, 2014 The Astrophysical Journal Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. LLNL-JRNL-661076 PreprinttypesetusingLATEXstyleemulateapjv.8/13/10 HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR Michael D. Schneider1,2, David W. Hogg3, Philip J. Marshall4, William A. Dawson1, Joshua Meyers5, Deborah J. Bard4, Dustin Lang6 (Dated: Draft May 14, 2015) LLNL-JRNL-661076 ABSTRACT Point estimators for the shearing of galaxy images induced by gravitational lensing involve a com- plex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigatethebiasedinferencesinmostcommonpointestimatorsandispracticalforupcominglensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distri- bution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics. Subject headings: gravitational lensing: weak; methods: data analysis; methods: statistical; catalogs; surveys; cosmology: observations 1. INTRODUCTION 2014). All analyses of which we are aware use point estimators for the 2D ellipticities of galaxies in a cat- All observations to date are consistent with a cosmo- alog to infer cosmic shear effects. From the estimated logicalmodelinwhichmassisclusteredonspatialscales galaxy ellipticity components one can compute a two- ranging from galaxy sizes of order a kiloparsec to hun- point angular correlation function that is related to the dredsofmegaparsecs,correspondingtoangularscaleson mass clustering power spectrum in the standard cosmo- theskyoftensofarcsecondstoseveraldegrees. Thecos- logical framework. mological mass distribution acts as a gravitational lens There are several drawbacks to existing cosmic shear for all luminous sources, which imparts inhomogeneous inference methods that may severely limit the amount image distortions because the mass distribution is clus- of information that can be extracted about the 3D cos- tered. In principle, the coherent lensing distortions of mological mass distribution and underlying cosmologi- luminous sources can be used to reconstruct the 3D cos- cal model, for a given set of observations (LSST Dark mological mass distribution over most of the observable Energy Science Collaboration 2012). Bernstein & Arm- volume of the universe. However, except near the most strong (2014) (BA14) give a review of the primary chal- dense and rare mass peaks, the gravitational lensing lenges in existing methods including low signal-to-noise, shearing of galaxy images is a percent-level effect per- large instrumental and observational systematics, finite turbing unknown at order unity intrinsic galaxy shapes. pixelsampling,uncertaintyinthemorphologiesofgalax- The ‘cosmic shear’ of galaxies therefore can only be de- ies prior to lensing distortions, and biased ellipticity es- tected through statistical correlations of large numbers timators in the presence of pixel noise. of galaxy images. AsinBA14andalsoSheldon(2014),Milleretal.(2007, Recently, cosmicshearhasbeenusedtoplacecompet- 2013) we consider a probabilistic model for cosmic shear itive constraints on cosmological model parameters (Jee inference that can in principle avoid many of the weak- et al. 2013; Kilbinger et al. 2013; Heymans et al. 2013; nesses in existing methods. Unlike in BA14 we initially MacCrann et al. 2014; Kitching et al. 2014; Kilbinger set aside issues of computational feasibility and aim to specify a statistical framework that is a complete de- [email protected] scription of the data (i.e., including all physical param- 1Lawrence Livermore National Laboratory, Livermore, CA, 94551,USA. eters that are needed to completely describe the mea- 2UniversityofCalifornia,Davis,Davis,CA95616,USA. sured pixel values in a photometric survey). While we 3Center for Cosmology and Particle Physics, New York Uni- seevalueinsimplywritingdownthecompletestatistical versity,NewYork,NY,USA. 4SLAC National Accelerator Laboratory, Menlo Park, CA framework for cosmic shear we were also able to identify 94025,USA. statistical sampling and computational algorithms that 5Kavli Institute for Particle Astrophysics and Cosmology, are likely to make our framework practical for upcom- StanfordUniversity,Stanford,CA94035,USA. ing surveys. We demonstrated a simple implementation 6Department of Physics, Carnegie Mellon University, Pitts- burgh,PA,15213,USA. 2 M. D. Schneider et al. of our framework in the GREAT37 community cosmic iesarefurthercomplicatedbyobjectblendingthat shear measurement challenge (Mandelbaum et al. 2013, may mimic intrinsic features of the galaxy popula- 2014) and will further demonstrate the computational tion(Dawsonetal.2014). Werequireaparametric performance of our framework in subsequent papers. model for galaxies and a likelihood function that Most of the parameters that are needed to describe can describe all such variations in galaxy images the pixel data are uninteresting for cosmology and can over the model parameter ranges that are impor- be marginalized out; this is an advantage of proba- tant for describing the data. With an incomplete bilistic modeling. If we can infer probability distribu- galaxy parametric model and without propagating tion functions (PDFs) for the nuisance parameters and all the information about the model in the likeli- marginalize—instead of either asserting distributions or hood function, it is well known that ‘model fitting else fixing the nuisance parameters to heuristically cho- biases’ can be catastrophic for cosmic shear (Voigt sen or maximum-likelihood values—we expect to obtain &Bridle2010;Melchioretal.2010;Bernstein2010; better (more accurate and also more conservative) infer- Kacprzak et al. 2014). ences about the cosmological parameters. Adopting this probabilistic approach moves us (relative to other meth- 2. Specification of the probability distribution for the ods)alongthebias–variancetrade-offtowardslessbiased galaxymodelparameterstobemarginalizedinthe inferences. Ofcourse,inthecontextoffullyprobabilistic shearinference. Whiletheaccuratespecificationof inference, without point estimators, there isn’t as clear modelsforgalaxyimagesmaysoundchallengingto a definition of “bias” and it may be difficult to put the the discerning reader, the specification of the joint long literature on weak-lensing biases into context here. distribution of intrinsic galaxy properties is per- However, we expect many of the known biases in point haps even more daunting. A primary aim of this estimators to be ameliorated when we permit the free- paper is to describe and demonstrate how a hier- dom to infer and marginalize out nuisances. In detail, archical model can allow meaningful inference of as we give more freedom to the model (more flexibility the properties of the galaxy distribution simulta- in the nuisance-parameter space), we will move to even neouslywiththeshear. Withouthierarchicalinfer- lessbiased(thoughalsoprobablylessprecise)inferences, ence(i.e.,inferringthedistributionsofnuisancepa- possibly at large computational cost. rameters) we are susceptible to an ‘overconfidence Explicit specification or inference of the distributions problem’ wherein asserting a prior that is too sim- of all model parameters not only has the potential to re- ple can yield inferences that appear more precise duce biases in the shear inference but also is the only than warranted. way to guarantee an optimal measurement (in the sense 3. Monte Carlo sampling of model parameters from that no other accurate inference method could yield the joint posterior distribution given multi-epoch tighter marginal posterior distributions on the cosmo- imaging of a large galaxy survey. Because both logicalmodelparameters). Mostpastcosmicshearanal- theshearandthePSFvaryincorrelatedwaysover yseshavenotspecifiedthedistributionsofgalaxyintrin- any given image, the inferences for different galax- sic properties and observing conditions (e.g., the PSF) iesarestatisticallydependent. Inastraightforward that are assumed in their cosmological inferences. But, probabilistic inference we would need to fit model because intrinsic galaxy properties and observing con- parameters simultaneously to all galaxy images, ditions do describe important features of the data, all whichisaformidablecomputationalchallenge. In- cosmic shear analyses must have (at least implicitly) stead we sample model parameters for each galaxy marginalized over some distributions in these parame- independently and perform a subsequent joint in- ters. Implicit marginalization over un-specified priors ference using importance sampling given interim cannotyieldanoptimalmeasurement. Saidanotherway, posterior samples for each galaxy image. without explicit marginalization of model parameters it is not possible to saturate the Cram´er-Rao bound (e.g. In Section 2 we outline our framework for the com- Kendall, M. G. & Stuart, A. 1969). In particular, anal- plete cosmic shear inference problem. In Section 2.1 we yses in which measured ellipticities of galaxies are aver- enumerate the components the framework and in Sec- agedtogetheroverskypatches(orsomethingequivalent) tion 2.2 we outline the key algorithms we use for Monte havemadeanimplicit(andstrong)assumptionaboutthe Carlosamplingofmodelparameterstorenderourframe- distribution of intrinsic shapes; this assumption (since work computationally feasible. In Section 3 we focus on unexamined) is likely to be sub-optimal at best. implementation details for just the inference of intrinsic Once a complete statistical model is specified, there galaxyproperties,leavingsimilardetailsforthePSFand are three key practical challenges for probabilistic shear lensing mass inferences to future work. We describe our inference, model for the distribution of intrinsic galaxy properties 1. Specificationofaninterimmodeldescribinggalaxy in Section 3.1 and give numerical examples of shear and morphologies,fluxprofiles,andspectralenergydis- ellipticity inferences in Section 3.2. In Section 4 we dis- tributions. Galaxyfeaturesarecomplexandmulti- cuss how to expand the inferences demonstrated for the faceted because of the evolution of galaxy proper- intrinsic ellipticity distribution to the PSF and lensing tieswithcosmictime, galaxymergers, andvarying mass models and summarize our conclusions. environments. The observable features of galax- 2. PART1: DESCRIPTIONOFTHESTATISTICAL 7 http://www.great3challenge.info/, Our GREAT3 submis- FRAMEWORK sions are under team ‘MBI’: http://great3.projects.phys.ucl. 2.1. Three conditionally independent branches ac.uk/leaderboard/team/14 Hierarchical probabilistic inference of cosmic shear 3 TABLE 1 Sampling parameters for the full statistical model. The central line separates sampled from conditional parameters. Parameter Description θ Cosmologicalparameters ΨIC Initialconditonsforthe3Dgravitationalpotential ΨLT Late-time3Dgravitationalpotential ψs 2Dlenspotential(givensourcephoto-z bins) As Parametersfortheline-of-sightsourcedistribution Πns,i PSFforgalaxyns observedinepochi Ωi Observingconditionsinepochi {ωns} Galaxymodelparameters;ns=1,...,ngal,s {αns} Parametersforthedistributionof{ωns} {ξns} Scalingparametersfor{ωns} m,τ Hyperpriorparametersfor{ξns} A Hyperparameterfor{αns}classifications {dns,i} Pixeldataforgalaxiesns=1,...,ngal,s inepochi G0|aη Priorspecificationfor{αns} s Sourcesample(e.g.,photo-z bin) W Surveywindowfunction danc,i AncillarydataforPSFinepochi p Priorparams. forobservingconditions a Priorparams. forA σpix,i Pixelnoiser.m.s. inepochi I Modelselectionassumptions Fig. 1.— Probabilistic graphical model for our complete frame- workforshearinference. Arrowsindicatestatisticaldependencies. We begin this section by enumerating the variables Grey shading indicates quantities that are not sampled. See Ta- anddependenciesofacompletestatisticalframeworkfor ble1fordescriptionsoftheparametersymbols. shear inference. In subsequent sections we demonstrate the inference of a subset of the variables in our frame- constraints on ΨIC in Section 2.2.1.0 but otherwise con- work with numerical models. We defer a more detailed fine our analyses to the inference of gravitational lensing examinationofsomeaspectsofthestatisticalframework quantities after projecting the 3D gravitational poten- to later publications. But we believe it is useful at this tial ΨLT over the line-of-sight distribution of lenses in a stage to list all contributions to the cosmic shear infer- survey. ence problem given the considerable literature on the We define the 2D lensing potential that describes the subject that has not yet converged on a unified statis- shearing (and magnification) of any sample of galaxies tical picture. in a survey by the following integral of the 3D potential Under a limited set of assumptions described below, ΨLT (see, e.g., equation 6.14 of Bartelmann & Schneider the statistical framework for cosmic shear separates into 2001)(also section3.2of Narayan& Bartelmann1996), three conditionally independent branches: 1) cosmolog- (cid:90) ical lensing mass density, 2) observational point-spread ψ¯ (x)≡W(x) daΨLT(x,a)K(a;A ), (1) s s functions, 3) intrinsic galaxy properties. In Appendix F we describe the computational imple- where a is the cosmological scale factor in the Freid- mentation of the framework in this Section. mann equation, x are the coordinates in the plane of the sky, W(x) is the angular window function of the 2.1.1. Lensing mass distribution survey, the subscript s denotes a particular sample of We start by specifying the parameters θ of the cos- ‘source’galaxiesthatarelensedbytheforegroundpoten- mological model (top right of Figure 1), sampled from a tialΨLT,andK isthelensingkernelincludingthelensing prior distribution Pr(θ) given all past cosmological ex- efficiency based on the distances between observer, lens, periments. Although not explicitly implemented in this andsources,andtheintegraloverthesourcedistribution paper, we expect θ to include parameters such as the with parameters A (see equation 6.19 of Bartelmann & s mean mass density Ω and the r.m.s. of mass density Schneider 2001). m fluctuations σ that primarily determine the amplitude TheparametersA canincludetheuncertaintiesinthe 8 s of the cosmic shear correlation function. survey redshift distribution (e.g., from photometric red- From these parameters, we can predict the 3D cos- shifterrors,seeLoh&Spillar1986;Connollyetal.1995; mological mass distribution, described by the 3D grav- Huterer et al. 2006) as well as other astrophysical sys- itational lensing potential Ψ. Assuming, for example, tematics that are redshift-dependent (such as intrinsic Gaussian initial conditions ΨIC for the 3D mass den- alignments of galaxies, see Catelan et al. 2001; Bridle & sity in the early universe and gravitational evolution King 2007). While both the galaxy redshift distribution according to General Relativity, the probability distri- andintrinsicalignmentcorrelationsmightbemoreprop- bution for the late-time 3D mass density ΨLT responsi- erly included in the specification of the properties of the ble for gravitational lensing of galaxies depends only on galaxy distribution (Section 2.1.3), we include parame- the cosmological parameters and the initial conditions, ters A for the line-of-sight integration of the potentials s Pr(ΨLT|θ,ΨIC). We discuss possibilities for inferring to connect with more common analysis conventions. 4 M. D. Schneider et al. Marginalizing the nuisance parameters in the line-of- potential ΨLT. We introduce the scaling parameters ξ ns sight projection, the 2D lensing potential ψ given the inAppendixBasacomputationalconvenienceinMCMC cosmologyandsurveysampleisdistributedaccordingto, sampling. The gravitational potential ΨLT can influence the distribution of galaxy properties because of assem- (cid:90) bly bias (Wechsler et al. 2002), which is the label in the Pr(ψ |θ,s,W)∝ dΨLT s literature for variations in galaxy (or surrounding dark (cid:90) matter halo) properties with the local surrounding mass × dΨICPr(cid:0)ΨLT|ΨIC,θ(cid:1)Pr(cid:0)ΨIC|θ(cid:1) density. In a simple description of the dependence of galaxy properties on mass density, it is well established (cid:90) × dA Pr(ψ |ΨLT,A ,W)Pr(A |θ,s,W). (2) that the amplitude of galaxy clustering correlations de- s s s s pendsongalaxycolor(e.g.,Zehavietal.2011). However, because cosmic shear is insensitive to the clustering of For a deterministic cosmological model (as empha- source galaxies to first order (Schneider et al. 2002), we sized in Jasche & Wandelt 2013), Pr(ψ|ΨLT,A,W) ∝ do not consider ΨLT dependencies further in this paper δD(cid:0)ψ−ψ¯(ΨLT,A,W)(cid:1),whereδDistheDiracdeltafunc- in any model distributions for the intrinsic galaxy prop- tion. Inserting this expression into Equation (2) yields a erties ω. trivialsolutionfortheintegraloverΨLT,leavingintegra- 2.2. Sampling methods to divide and conquer tionsoverthe3DmassdensityinitialconditionsΨIC and line-of-sight uncertainties A to evaluate the marginal 2.2.1. Interim sampling of model parameters for individual s distribution of the lens potential ψ given the cosmology sources θ. Efficient numerical evaluation of Equation (2) is po- The pixel data d in the vicinity of a galaxy n enters n tentially a major computational challenge for our frame- our framework only through the likelihood function work, whichwewilladdressinsubsequentpublications. Pr(d |ω ,ψ,Π). (4) n n 2.1.2. Point-spread functions Wecanreasonablyassumeuncorrelatednoiseinthecom- Toconnectthelenspotentialtotheobservableproper- mon case that sky noise (i.e., Poisson noise from sky ties of galaxy images, we must also specify the distribu- background)dominatesthenoiseinthepixelvalues,giv- tion of the point-spread function (PSF) realizations spe- ing a likelihood function, cimifigeincvtteson, taphnaedr(adtmiemteetece-tvrosarrΩiraeibslspeuo)cnhosbeas,esPrtvrh(inΠegsceoe|Ωnindg)i,.tiooFnpostricagstiveaeplniogΩcnh-, lnPr(dn)=−12ne(cid:88)pochs (cid:0)dn,i−σd¯2(ωn,ψ,Πi)(cid:1)2 +const., n,i i i=1 pix,n,i the PSF Π(Hn;Ω) can be computed at the focal plane (5) position Hn for any galaxy n. We model the PSF Πn,i where d¯ is the model prediction for the pixel values, i as distinct random variables for each galaxy location n indexes multiple observation epochs, and σ is the and epoch i. pix,n,i noise r.m.s. per pixel for object n in epoch i. Note the The observing conditions Ω in an observation epoch i i model prediction in the likelihood includes the PSF Π willnotbeknownperfectlybutwillbeconstrainedbyan- i for each observation i of a given galaxy n. This makes cillarydatad andadistributionPr(Ω |p)conditioned anc i the model predictions potentially expensive to compute on time-independent PSF model parameters p (e.g., the unlesstheconvolutionofthePSFwiththegalaxymodel alloweddistributionsofopticsdistortionsoratmospheric can be done efficiently. We will return to this issue in turbulence power spectra). Appendix A. In Equation (4) we have assumed that the likelihood 2.1.3. Intrinsic source properties functionforallsourcesinanimagecanbefactoredintoa Finally, the intrinsic properties ωn of a model for product of likelihoods for distinct sources. This will not galaxy n (e.g., ellipticity, size, brightness) are described be true in when the pixel noise is correlated for sources by the distribution that are adjacent on the sky (e.g., when unresolved flux contaminates the pixels of neighboring sources). Pr(ω |α,I), (3) n The model prediction d¯ for the pixel values of a where α are parameters in the distribution of the in- galaxyimagedependsonthecosmologicalmassdensityψ trinsic galaxy properties and I denotes galaxy modeling throughtheactionofgravitationallensingonthegalaxy assumptions (such as the form of the galaxy morphol- morphologyandflux(inanygivenbandpassaslensingis ogy parameterization). For example, α could describe achromatic). Forsimpleellipticalmodelsofgalaxies,the the width of the intrinsic (i.e., unlensed) ellipticity dis- ellipticityparameterinω isdegeneratewiththereduced n tribution of a given galaxy population. And in our con- shearg ≡γ/(1−κ). However,thestatisticaldistribution tribution to the GREAT3 challenge described in Man- of all elements of ω should be isotropic across different delbaum et al. (2014), the model assumptions I include galaxies in the sky (although this is not true for small modeling all galaxies as elliptical S´ersic profiles. It will fields, e.g., around a cluster), while the ellipticities of be important to keep I explicit as we consider possible lensed galaxy images are not. We therefore maintain a model-fitting biases in our framework (Voigt & Bridle conceptualdistinctionbetweenω andψinthemodelfor n 2010;Zuntzetal.2013;Violaetal.2014;Kacprzaketal. pixelvaluesofagalaxyimaged¯ eventhoughthisdistinc- 2014; Bruderer et al. 2015). tion is not computationally meaningful for the simplest In Figure 1, the galaxy properties ω also depend on models of galaxy images. See also the discussion follow- ns ‘scaling parameters’ ξ and, via α , the gravitational ing Equation (15) below. ns ns Hierarchical probabilistic inference of cosmic shear 5 The first step in our computational algorithm is to shears at different sky locations and redshifts. This is draw‘interimsamples’of(ω ,ψ,Π )viaMarkovChain distinct from conventional algorithms that involve cross- n n,i Monte Carlo (MCMC) from the likelihood in Equa- correlating galaxy ellipticity estimators at different sky tion (5) for a single source (or region of sky) identified locations. We instead seek a generative model for the in all available exposures of a survey. The method of shear correlations. So, the interim sampling of the vari- source identification for selecting the pixel values d in able shear or lens potential must happen at a computa- n Equation (5) is not important as long as our model pre- tional step following the interim sampling of individual dictions for the observed pixels allow for observationally galaxy and PSF properties. truncated source profiles or multiple overlapping sources Wecaninfervariableshearmodelswithoutacosmolog- (as will be needed for blended objects). Allowing for ical prediction of the mass density (which can be com- such selection effects may also admit interim sampling putationally expensive) by specifying an interim prior of the combined pixel data from different instruments or for the lens potential ψ for a given source distribution. s surveys. The only requirement is that the mathematical specifi- cationoftheinterimpriorforψ allowsforspatialcorre- s PSFmodelhierarchy— NotethatweinferadistinctPSF lations that encapsulate the possible cosmological inter- realization Πn,i for each galaxy n in each observation pretations. For example, even though the late-time lens epochi,whicharerelatedacrossgalaxiesinagivenepoch potential is not Gaussian distributed, we might specify by the common observing conditions Ωi. The PSFs in a multi-variate Gaussian prior on ψ for the purposes of different epochs are further related by another level of interimsampling. Correlationsinthelenspotentialsam- hierarchywithacommonPSFmodelwithparametersp. plescouldbecapturedwithacovariancemotivatedfrom So, we can also directly marginalize the likelihood over cosmology, the PSF realizations, nepoch(cid:90) (cid:89) Pr(d|{ω },ψ,p)= dΩ Pr(Ω |p) n i i (cid:90) (cid:96)d(cid:96) i=1 (cid:104)ψ(x)ψ(x(cid:48))(cid:105)= C (|x−x(cid:48)|)J ((cid:96)|x−x(cid:48)|), (9) ngal(cid:90) 2π (cid:96) 0 (cid:89) × dΠ Pr(Π |Ω )Pr(d |ω ,ψ,Π ) n,i n,i i n,i n n,i n=1 nepoch(cid:90) (cid:34)ngal (cid:35) (cid:89) (cid:89) = dΩiPr(Ωi|p) Pr(dn,i|ωn,ψ,Ωi) (6) where C(cid:96) is an angular power spectrum computed, e.g., i=1 n=1 via Equation (1), and J0 is the zeroth order Bessel func- nepoch tion of the first kind. While Equation (9) is motivated (cid:89) = Pr(di|{ωn},ψ,p) (7) fromcosmologicaltheory,wedonotconnectC(cid:96) toacos- mological model when interim sampling ψ. Instead, we i=1 useEquation(9)tointroducesomedegreeofspatialcor- ngal (cid:89) relationsintheinterimsamplesofψthatcanincreasethe (cid:54)= Pr(d |ω ,ψ,p) (8) n n efficiencyoflatersamplingstepsforconstrainingcosmol- n=1 ogy as we describe in the next section. Marginalizing the PSF realizations Π over all galax- In practice, we first sample galaxy model parameters n,i ies n in a given epoch i still yields a likelihood function without a correction for the applied shear and magnifi- that is separable for separate objects n in Equation (6), cation. Giventhegalaxymodelparameters,wecanthen conditioned on the observing conditions Ω . This shows reinterpret the parameters under an assumed lens po- i that more information about the observing conditions tential model because we know how lensing affects the Ω leads to smaller statistical correlations among the model for the galaxy light profile. We therefore ignore i marginal likelihoods for different galaxies independent the lensing parameters in the first step of our inference, from the actual PSF realizations. Under our framework including a model for lensing shears and magnifications therefore, it is the parameters of the distribution from only when combining inferences from distinct sources as which PSF realizations are drawn that are most impor- we describe next. tant to constrain rather than estimation and interpola- tion of PSF realizations at different image plane loca- tions. Lensing mass inference— The intrinsic galaxy properties ω and the PSF Π can be interim sampled with dis- n n,i 2.2.2. Hierarchical inference via importance sampling tinct parameters for each object. We might consider in- terim sampling uncorrelated shears and magnifications for every object, but in most cases the shear and the in- It is possible to infer model constraints independently trinsic ellipticity are strongly degenerate for isolated ob- for each galaxy and then combine these independent in- jects (the exception being resolved galaxy images with ferences in a hierarchical framework using the technique multiple morphological components that have different of‘importancesampling’(e.g.Geweke1989;Wraithetal. responses to the action of an applied shear). To in- 2009; Hogg et al. 2010). fer a shear or lens potential that varies over the sky, Our goal with the importance sampling algorithm is we then need to specify the model correlations between to evaluate the likelihood marginalized over individual 6 M. D. Schneider et al. galaxy intrinsic properties and PSF realizations8, the marginalized likelihood can be approximated by n(cid:89)gal(cid:90) Pr(dn|α,Ω,ψ)≈ ZKn (cid:88)PPr(rω(ωnk|α|I,ψ))PPrr((ΠΠnk||IΩ)), Pr(d|α,Ω,ψ)∝ dω Pr(ω |α) nk 0 nk 0 n n k (15) n=1 n(cid:89)epoch(cid:90) n(cid:89)gal × dΠ Pr(Π |Ω ) Pr(d|α,Ω,ψ)= Pr(d |α,Ω,ψ). (16) n,i n,i i n i=1 n=1 ×Pr(d |ω ,ψ,Π ), (10) n,i n n,i WhatwehaveachievedwithEquation(15)istheability tofitgalaxymodelsandPSFstoindividualgalaxies(via where MCMC) and then to combine the fit information from d≡(cid:8){d }nepoch(cid:9)ngal , (11) eachgalaxyintoinferencesonthedistributionsofintrin- n,i i=1 n=1 sic galaxy properties α and PSF parameters p. Equa- tion (15) is fast to compute once the K samples for each and Ω≡{Ω}nepoch. Using the identity, galaxyaregenerated. Forthefinalshearinferencewewill i=1 marginalizetheparametersα,butwithEquation(15)we (cid:90) (cid:90) f(x) have already addressed the primary computational bot- dxp(x)f(x)= dxp(x)g(x) , (12) tleneck in modeling images of large numbers of sources g(x) in a field. Although not part of the preceding derivation, we in- for arbitrary probability distributions p(x),f(x) and as- serted the lensing potential ψ into the list of conditional suming g(x) has nonzero probability mass over the do- dependencies in the numerator of Equation (15). This is mainwheref(x)isnonzero,wecanfactorEquation(10) because given interim posterior samples of galaxy model as, parameters, we can always reinterpret those model pa- rameters under a different assumption for the shear for ngal(cid:90) nepoch(cid:90) any model where we know how to calculate the action of (cid:89) (cid:89) Pr(d|α,Ω,ψ)∝ dω dΠ shear on the model galaxy9. n n,i To evaluate Equation (15) we need K independent n=1 i=1 samples from the interim posteriors for each galaxy. Pr(ω |α) Pr(Π |Ω ) × n n,i i WhilethiscanbeslowtogenerateviaMCMCwhenK is Pr(ω |I ) Pr(Π |I ) n 0 n,i 0 required to be large, we have found in simple tests that ×Pr(d |ω ,ψ,Π )Pr(ω |I )Pr(Π |I ). (13) even K =1 may be sufficient when assuming a constant n,i n n,i n 0 n,i 0 shearoveragivenareaofthesky. WeuseK =10inthe The final line of Equation (13) defines the ‘interim pos- numerical examples in Section 3. terior’ from which we draw samples when we analyze Importance sampling the lens potential— Given interim each galaxy independently (following Hogg et al. 2010). samples of the lens potential ψ we can then infer cos- We refer to Pr(ω |I ) and Pr(Π |I ) as ‘interim priors’. s n 0 i 0 mological parameter constraints from the sampled lens The interim priors can be chosen to make computations potentials for all galaxy samples in all fields using im- convenient. We have found flat or broad Gaussian dis- portance sampling as in Equation (15), but with a new tributions to be sufficient interim prior specifications. conditionalPDFthatincludesthedependenceofthelens We use the following algorithm to combine samples potential cosmological parameters θ and the cosmologi- from the interim posteriors for each galaxy into a hier- cal initial conditions ΨIC, archical inference of the global marginal posterior. For eachobjectnwegenerateK samples(cid:2)ω ,{Π }nepoch(cid:3), sampled from the interim posterior givnekn pixneilkdia=t1a for Pr(d |ΨIC,θ,W)∝ 1 (cid:88)N (cid:81)sPr(ψˆs,k|ΨIC,θ,W), galaxy n in all epochs, n N (cid:81) Pr(ψˆ |I) k=1 s s,k (17) 1 where, again, we have implicitly marginalized over the Pr(ω ,Π |d )= n n n Z line-of-sightdistributionerrorparametersAs fromEqua- n tion (2). ×Pr(d |ω ,Π )Pr(ω |I )Pr(Π |I ), (14) n n n n 0 n 0 Evaluating the numerator on the right-hand side of Equation (17) requires first realizing the 3D mass den- where Z is a normalization that we never need to com- n sity perturbations at some early time (e.g., by a tra- pute,andI denotesthesetofassumptionsencodedinto 0 ditional initial conditions generator for cosmological N- theinterimpriors. OncewehavethisK-elementinterim bodysimulations). TheinitialconditionsΨIC thenmust sampling for every galaxy n we can build importance- be evolved under gravity in an expanding universe (e.g., sampling approximations to various other marginalized byrunninganN-bodysimulation). Eachrealizationand likelihoods. Specifically,fortheintegralinEquation(13), evolution of ΨIC is a potentially expensive computation becauseΨICmustcoveratleasttheentirevolumeprobed 8 WemaynotalwayswanttoassumethatthePSFrealizations foragivengalaxyarestatisticallyindependentacrossobservation 9 Thisisadistinctconceptfromestimatingtheactionofshear epochs,butitisausefulclarifyingassumptionhere. ontheobservedpixelvaluesofagalaxyasinBA14. Hierarchical probabilistic inference of cosmic shear 7 by the survey in question (tens of gigaparsecs for future the shear field, apply the realized shear to each galaxy surveys) and contain enough small-scale mass resolution model and then compare the pixel model with the data to identify model galaxy locations with respect to the for each galaxy. We consider the computational sepa- cosmologicalmassdensity(typicallyofordertensofkilo- ration of each of these steps to be a key benefit of the parsecs). Thus evaluating Equation (17) for one pair of framework we present here. values of θ and ΨIC could require running an N-body Finally, one should keep in mind that the relationship simulation covering ∼6 orders of magnitude in dynamic between the lens potential defined over the sky and the range. Ultimately to infer cosmological parameter con- lensingdistortionsofmeasuredgalaxyimagesaredepen- straints we would run an MCMC chain sampling in val- dentontheastrometricsolutionforeachexposure. How- ues of ΨIC and θ, potentially requiring (cid:38) 104 expensive ever,weexpectthistobeanegligiblecontributiontothe cosmological N-body simulations10. shear inference in most cases. Possible centroid shifts of Inferring cosmological parameter constraints from the galaxy images should instead be captured by the PSF procedure just described may sound impractical. How- model. ever, we are encouraged by noting that, 2.3. Statistical framework summary • Jasche et al. (2014) demonstrated feasible MCMC In Sections 2.1 and 2.2 we introduced a number of pa- sampling of 3D density cells of the cosmological rameterswithnontrivialinterdependencies. Herewecol- mass density by using an analytic perturbative lect all the parameters we have introduced and summa- model for the evolution of the mass density from rizethestatisticalframeworkbypresentingthefactoriza- Gaussian initial conditions, tionoftheunmarginalizedjointposteriorforallsampled model parameters, • upcoming surveys such as LSST are already con- sidering in excess of 104 expensive cosmological Pr(θ,{ψ },{Π ,Ω },{ω ,α ,ξ },A,τ,m,a |{d },X)∝ s i i n n n η n simulations for parameter inferences (Dodelson & Pr(θ)·Pr({ψ }|W,{s},θ)·Pr(A|a) Schneider 2013; Taylor et al. 2013), s (cid:34)ngalnepoch (cid:35) (cid:89) (cid:89) • several post-processing methods have shown × Pr(Π |Ω ,I)·Pr(d |ω ,ξ ,ψ ,Π ) n,i i n,i n n s n,i promise for reducing the computation times n=1 i=1 in generating ensembles of cosmological simula- (cid:34)ngal (cid:35) tions(Schneideretal.2011;Angulo&White2010; (cid:89) × Pr(ω |α ,I)·Pr(α |A,a )·Pr(ξ |m,τ) Mead & Peacock 2014; Angulo & Hilbert 2015). n n n η n n=1 The practicality and tradeoffs of these methods remains (cid:34)nepoch (cid:35) (cid:89) an area for future research, both with respect to each × Pr(Ωi|p,danc,i) (18) other and with respect to traditional analyses with two- i=1 point function estimators of the mass density. where in the first line we collapsed the conditional pa- Equation (17) also helps to understand the utility in rameters into, separating the galaxy population into different samples s. It is straightforward to derive separate inferences of X ≡[d ,{s},W,a,p,I]. (19) anc ΨIC and θ for different sub-samples s of the galaxies to WecollectthedefinitionsofallvariablesinEquation(18) test for consistency with respect to unmodeled system- in Table 1 and display the statistical dependencies in atic errors or new physical phenomena not captured by Figure 1. The variables a ,m,τ,A will be defined in the cosmological model for the mass density evolution. η Section 3. One could similarly infer different distributions of the The final line of Equation (18) is the likelihood func- intrinsic galaxy properties ω for different sub-samples to tion for the pixel data d of galaxy n in observation testforvariablemodelfittingbiasesandunmodeledred- n,i epoch i, which depends on the intrinsic galaxy proper- shift evolution in the galaxy population. But, once we ties ω ,ξ , the lensing potential ψ for source sample havecomputedtheinferencesofthe2Dlenspotentialfor n n s s containing galaxy n, and the PSF Π . The preced- different sub-samples s, the combined inference of cos- n,i inglinesinEquation(18)describethehierarchicalPDFs mology in Equation (17) is not much more complicated for the conditional dependencies in the likelihood func- than that for an undivided galaxy sample. tion, including the important factorizations across dis- To summarize, by placing an interim prior on ψ that tinct galaxies and observation epochs. In Section 4.1 we includes spatial correlations over the observed sky area, discusshowmarginalizationoftheconditionaldependen- we stand to gain from a similar computational factoriza- cies in the likelihood couples the inference from individ- tion of the analysis as that achieved with the model for ualgalaxiesandepochsasdeterminedbythehierarchical the intrinsic galaxy properties. Most published cosmic parameters that are constant across galaxies and epochs shearanalysesfollowareversealgorithmwheretheshear in Equation (18) (e.g., parameters that do not have n or is estimated, turned into two-point correlation estima- i subscripts). See Appendix F for a description of the tors and compared with a two-point correlation model. computational implementation of this framework. Instead, our framework must start with the two-point correlation model (for a Gaussian density field), realize 3. PART2: IMPLEMENTATIONFORINFERRING INTRINSICGALAXYPROPERTIES 10 Thenumber104issimplyaguessfortherequirednumberof 3.1. Model for the conditional distribution of intrinsic MCMC steps to infer marginal constraints on θ from a converged galaxy properties chain. 8 M. D. Schneider et al. We choose a non-parametric distribution to describe the parameters α (eq. 2.2 of Neal 2000), the galaxy model parameters both because we have lit- tle information to guide us on the choice of a paramet- n−1 1 (cid:88) ric distribution and because we need a flexible distri- α |α ,...,α ∼ δ (α ) n 1 n−1 n−1+A D (cid:96) bution to minimize any biases in the model parameter (cid:96)=1 inferences (as mentioned in Section 1). We therefore A choose a Dirichlet Process (DP) model (e.g. Ferguson + n−1+AG0(·), (21) 1973; Walker et al. 1999; Neal 2000; Mu¨ller & Quintana 2004; Wang & Dunson 2011) for the distribution of in- where n indexes galaxies and δ is the Dirac delta func- D trinsicgalaxyproperties,withdistinctparametersα for tion. Given the discreteness property of the DP there n each galaxy n. isnonzeroprobabilitythatdrawsof α willberepeated. n For a given galaxy n we assume the properties ω Letα∗,...,α∗ betheuniquevaluesamongα ,...,α , n 1 K 1 n−1 describing the galaxy image are drawn from a (multi- and let N be the number of repeats of α∗ in the ‘latent c c variate) Normal distribution with mean 0 and variance class’c. Thentheconditionalupdatedistributioncanbe parameters α . Denote this by, ω ∼ N(0,α ), where equivalently written as (eq. 10 of Teh 2010) n n n N indicates the Normal distribution. By including the index n on α , we allow for the possibility that every 1 n α |α ,...,α ∼ galaxy has properties drawn from a distinct generating n 1 n−1 n−1+A distribution. But, we are accustomed to thinking about (cid:32) K (cid:33) galaxies as belonging to different types or populations. (cid:88)N δ (α∗)+AG (·) . (22) So a better model might be to select a small number of c D c 0 c=1 valuesforαthatcharacterizedifferentgalaxytypes. The parameters ω for galaxy n would then be drawn from a This equation is useful because it shows the clustering n zero-meanNormaldistributionwithvarianceparameters properties of the DP and the meaning of the param- α selected from one of a few possible classes. This is a eter A. The value of αc∗ will be assigned to αn with description of a mixture of Normal distributions for ω. probability proportional to the number of times α∗ has c The DP model for ω can be derived by starting been previously drawn, Nc. Note that when A is large, with a mixture of Normal distributions where the num- successive values for αn are drawn from the base distri- ber of mixture components is allowed to be arbitrarily bution G0 with high probability, generating a new αK∗ large (e.g., Wang & Dunson 2011). We can describe this each time. When A is small, αn is assigned to one of mathematicallywiththe followinghierarchy(Neal2000, the previous values of α∗ with high probability. We will c Eqn. 2.1), refer to groups of galaxies {n} that have equal values of α∗ as a ‘cluster’ or ‘sub-population’, representing po- k tentialgalaxytypeclassifications(analogoustothecom- mon ‘early-’ or ‘late-type’ classifications). Equation (21) showsthatlargervaluesofAwillleadtomoreclustersor categories of galaxy types while bringing A to zero will force all galaxies to be assigned to the same cluster or ω ∼N(0,α ), α ∼G(α |A), G∼DP(A,G ). typeclassification. Notethataslongastheobservations n n n n 0 (20) areexchangeable(whichistrueforsourcesinanimage), Equation (20) indicates that the properties of galaxy n the ordering of values in α is arbitrary. aredrawnfromaNormaldistributionwithvariancesα , To summarize, we are motivated to use the DP model n as previously described. The variances α for the distri- for the following reasons, n bution of the properties ω of galaxy n are also modeled n • The DP is analogous to a mixture model when the asrandomvariables,assumedtobedrawnfromsomedis- numberofmixturecomponentsislearnedfromthe tribution G(α |A) (yet to be specified) with parameter n data, A. A key feature of the DP model is that G(α |A) is a n • Samples for galaxy n are naturally informed by discrete distribution in α . If we consider again a mix- n those for galaxies 1,...,n−1, thereby improving ture of Normal distributions for ω, then G(α|A) would the sampling efficiency (for arbitrary ordering of specify the finite and discrete values that α can take in observed galaxies), ordertodescribetheparametersofeachmixturecompo- nent. ThefinalstatementinEquation(20)saysthatthe • The DP can find unknown clusters in the galaxy discrete support of the distribution G(α|A) is sampled parameters, teaching us about galaxy formation from yet another distribution denoted DP(A,G0) with and different prospects for shear inference from parameter A and ‘base distribution’ G0. The base dis- different data subsets. For example, we show tributiondescribesthecontinuousrangeofsupportfrom in Section 3.2.2 that knowledge of a galaxy sub- whichthediscretesupportofG(α|A)canbedrawn. The population that is more round than average can parameter A is called the DP ‘precision’ or ‘clustering’ improve the shear inference. parameter, which we describe next. Forsamplingalgorithmsandtobetterunderstandhow With such an algorithm to cluster the features in the parameters α for distinct galaxies may take identical galaxy population we are able to simultaneously fit dif- n values it is useful to first marginalize the distribution G ferent distributions for each galaxy and exploit the sta- in Equation (20) to get the conditional update rule for tistical information from a large sample of galaxies. As
Description: