ebook img

Subadditivity of the entropy and its relation to Brascamp-Lieb type inequalities PDF

0.36 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Subadditivity of the entropy and its relation to Brascamp-Lieb type inequalities

Subadditivity of the Entropy and its Relation to Brascamp–Lieb Type Inequalities Eric A. Carlen1 and Dario Cordero–Erausquin2 8 0 1. Department of Mathematics, Rutgers University, 0 2 Hill Center, 110 Frelinghuysen Road Piscataway NJ 08854-8019USA n 2. Institut de Math´ematiques de Jussieu, Universit´e Pierre et Marie Curie - Paris 6, a 4 Place Jussieu, 75252 Paris Cedex 05, France J 9 October 2007 2 ] A F Abstract . h t WeproveageneraldualityresultshowingthataBrascamp–Liebtypeinequalityisequivalent a m toaninequalityexpressingsubadditivityoftheentropy,withacompletecorrespondenceofbest [ constants and cases of equality. This opens a new approach to the proof of Brascamp–Lieb type inequalities, via subadditivity of the entropy. We illustrate the utility of this approach by 2 proving a general inequality expressing the subadditivity property of the entropy on Rn, and v 0 fully determining the cases of equality. As a consequence of the duality mentioned above, we 7 obtain a simple new proof of the classical Brascamp–Lieb inequality, and also a fully explicit 8 determination of all of the cases of equality. We also deduce several other consequences of 0 . the general subadditivity inequality, including a generalization of Hadamard’s inequality for 0 1 determinants. Finally, we also prove a second duality theorem relating superadditivity of the 7 Fisher information and a sharp convolution type inequality for the fundamental eigenvalues of 0 Schr¨odinger operators. Though we focus mainly on the case of random variables in Rn in this : v paper, we discuss extensions to other settings as well. i X r a Mathematics Subject Classification Numbers: 26D15, 94A17 1Work partially supported by U.S.National Science Foundation grant DMS 06-00037. (cid:13)c 2007 bythe authors. This papermay bereproduced, in its entirety,for non-commercial purposes. 1 1 Introduction Let (Ω, ,µ) bea measure space, and let f bea probability density on (Ω, ,µ). That is, f is a non S S negative integrable function on Ω with fdµ= 1. On the convex subset of probability densities Ω R f : fln(1+f)dµ< , (1.1) ∞ (cid:26) ZΩ (cid:27) the entropy of f, S(f), is defined by S(f)= f(x)lnf(x)dµ(x). ZΩ With this sign convention for the entropy, the inequalities we derive are of superadditive type; however, the terminology “subadditivity of the entropy” is too well entrenched to use anything else. Now let p : Ω R be measurable. Let ν be a Borel measure on R, and define f to be the (p) → probability density on (R, ,ν) such that for all bounded continuous functions φ on R, B φ(p(x))f(x)dµ(x) = φ(t)f (t)dν(t). (1.2) (p) ZΩ ZR In other words, the measure f dν is the “push–forward” of the measure f dµ under p: (p) p#(fdµ)= f dν . (p) The entropy of S(f ) is defined just as S(f) was, except with (R, ,ν) replacing (Ω, ,µ). We (p) B S shall be concerned with the following two questions: (1) Given m measurablefunctions p ,...,p on Ω, and m nonnegative numbersc ,...,c , is there 1 m 1 m a finite constant D such that m c S(f ) S(f)+D (1.3) j (pj) ≤ j=1 X for all probability densities f with finite entropy (i.e. satisfying (1.1))? (2) Given m measurablefunctions p ,...,p on Ω, and m nonnegative numbersc ,...,c , is there 1 m 1 m a finite constant D such that m m cj f (p (x))dµ(x) eD f1/cj(t)dν(t) (1.4) ZΩj=1 j j ≤ j=1(cid:18)ZR j (cid:19) Y Y for any m nonnegative functions f ,...,f on R? 1 m For example, consider the case that Ω = Rn with its standard Euclidean structure, and µ is LebesguemeasureonRn, whileν is Lebesguemeasureon R. Supposethata ,...,a arem vectors 1 m that span Rn, and define p (x) = a x . j j · In this case, if we let X denote a random vector with values in Rn whose law has the density f, then f is simply the density of the law of a X. If we define the entropy of a random variable (pj) j · 2 to be the entropy of its density, provided it has one, we can rewrite this Euclidean version of (1.3) as m c S(a X) S(X)+D. j j · ≤ j=1 X In case m = n, c = 1 for all j, and a ,...,a is an orthonormal basis of Rn, then this inequality j 1 n { } holds with D = 0, and is the classical subadditivity of the entropy inequality. It is even easier to recognize (1.4) as a classical result in this setting: It becomes m m cj f (a x)dµ(x) eD f1/cj(t)dt , Rn j j · ≤ R j Z j=1 j=1(cid:18)Z (cid:19) Y Y which is the classical Brascamp–Lieb inequality. A celebrated theorem of Brascamp and Lieb [9] says that the best constant eD in this inequality can be computed by using only centered Gaussian functions as trial functions. A new proof based on optimal mass transport was given by Barthe [2] who also gave a characterization (depending on the vectors a and the constants c ) of when the j j constant is finite together with a description of the optimizers in some situations. Carlen, Lieb and Loss [11] introduced a new approach to the Brascamp-Lieb inequalities based on heat flow (see also [3]). These authors also completed the gaps left by Barthe in the description of the optimizers. Bennett, Carbery, Christ and Tao [6] used a similar approach to deal with the multidimensional versions of the Brascamp-Lieb inequality (see also [7] for a direct approach of the finiteness of the constant eD). The paper [11] (and [6] in the multidimensional setting) develops a “splitting procedure” that will prove useful in our situation too. But we shall see that working with entropy clarifies many technical points. There are a number of other examples, besides the classical one in the Euclidean setting, where choices of (Ω,µ) and ν lead to inequalities of interest. For a second example, take Ω = Sn−1, the unit sphere in Rn, and let µ be the uniform probability measure on Sn−1. Then take ν to be the probability measure on R that is the law of u x, where u is any unit vector in Rn, so that for all · continuous functions φ, φ(u x)dµ(x) = φ(t)dν(t). ZSn−1 · ZR (By the rotational invariance of µ, this does not depend on the choice of u.) Now let e ,...,e 1 n { } denote the standard orthonormal basis in Rn, and define p (x) on Sn−1 by p (x) = e x. Then j j j · one has the optimal inequalities n 1 S(f ) S(f), (1.5) 2 (pj) ≤ j=1 X for any probability density f on (Ω,µ) with finite entropy, and n n 1/2 n 1/2 f (e x)dµ(x) f2(e x)dµ(x) = f2(t)dν(t) , (1.6) ZSn−1j=1 j j · ≤ j=1(cid:18)ZSn−1 j j · (cid:19) j=1 Z[−1,1] j ! Y Y Y for any n nonnegative functions f ,...,f on [ 1,1]. See [11] for the original proofs of (1.5) and 1 n − (1.6), in which (1.5) was deduced from (1.6). See [4] for a different and direct proof of (1.5). Since we are concerned in this paper with the relation between subadditivity of entropy and Brascamp–Liebtypeinequalities,itisworthrecallingtheshortargumentfrom[11]thatprovidedthe 3 passage from (1.6) to (1.5): Let f be any probability density on Sn−1, and let f ,f ,...,f (p1) (p2) (pn) be its n marginals, as above. Then define another probability density g on Sn−1 by n n 1 1/2 1/2 g(x) := f (e x) where C := f (e x)dµ(x) . C j=1 (pj) j · ZSn−1j=1 (pj) j · Y Y Then by positivity of the relative entropy (Jensen’s inequality), we have n f 1/2 0 ln fdµ = S(f) lnf (e x) f(x)dµ(x)+lnC ≤ ZSn−1 (cid:18)g(cid:19) −ZSn−1j=1 (pj) j ·  X   n 1 = S(f) f lnf dν +lnC − 2 R (pj) (pj) Z j=1 X n   1 = S(f) S(f )+lnC. (1.7) − 2 (pj) j=1 X Finally, (1.6) implies that n n 1/2 1/2 C = f (e x)dµ(x) f (e x)dµ(x) = 1 ZSn−1j=1 (pj) j · ≤ j=1(cid:18)ZSn−1 (pj) j · (cid:19) Y Y since each f is a probability density. Thus, ln(C) 0, so that (1.6) now follows from (1.7). This (pj) ≤ argument may give the impression that (1.6) is a “stronger” inequality than (1.5), but as we shall see, this is not the case. For a third example, take Ω = , the symmetric group on n letters. Let µ be the uniform n S probability measure on , and take ν to be the uniform probability measure on 1,2,...,n , so n S { } ν(i) = 1/n for all i. Define the functions p : Ω 1,2,...,n R by p (σ) = σ(j) for any j j → { } ⊂ permutation σ of 1,2,...,n Then one has the optimal inequalities { } n 1 S(f ) S(f), (1.8) 2 (pj) ≤ j=1 X for any probability density f on (Ω,µ), and n n 1/2 n n 1/2 f (p (σ))dµ(σ) f2(p (σ))dµ(σ)) = f2(i)ν(i) , (1.9) j j ≤ j j j ZSnj=1 j=1(cid:18)ZSn (cid:19) j=1 i=1 ! Y Y Y X for any n nonnegative functions f ,...,f on 1,...,n . See [12] for the proof of (1.9). One could 1 n { } then derive (1.8) using the exact same argument that was used to derive (1.5) from (1.6). There are more examples of interesting specializations of (1.3) and (1.4). However, these ex- amples suffice to illustrate the context in which the present work is set, and we now turn to the results. One basic result of this paper is the following: The two questions concerning (1.3) and (1.4) that were raised above are in fact one and the same: We shall prove here that the answer to one question is “yes” if and only if the answer to the 4 other question is “yes” — with the same constant D, and with a complete correspondence of cases of equality. Thus, if one’s goal is to prove a generalized Brascamp-Lieb type inequality, one possible route is to directly prove the corresponding generalized subadditivity of the entropy inequality. We shall demonstrate the utility of this approach by giving a simple proof of the classical Brascamp-Lieb inequality on Rn, including a determination of all of the cases of equality, through a direct analysis of the entropy. We shall use rather elementary properties of the entropy (scaling properties and conditional entropy) together with geometric properties of the Fisher information. Moreover, the generalized subadditivity of the entropy inequality that we prove here is new (in its full generality), and is interesting in and of itself. As we shall see, it turns out to have a rich geometric structure. From the point of view of information theory, it might also be of interest to use the converse impli- cation and to reinterpret some Brascamp-Lieb inequalities (such as the sharp Young’s convolution inequality) in terms of subadditivity inequalities for the entropy. Therest of the paperis organized as follows. InSection 2, we give theproof that (1.3) and(1.4) are dual to one another, so that once one has one inequality established with the cases of equality determined, one has the same for the other. We shall state this duality in a very general setting. In Section 3, we prove the sharp version of the general Euclidean subadditivity of the entropy inequality. In Section 4 we shall deducesome interesting consequences from this, including a generalization of Hadamard’s inequality for the determinant. The finalSection 5 gives another duality result showing that the superadditivity inequalities for Fisher information are dual to certain convolution type inequalities of ground state eigenvalues of Schro¨dingeroperators. Theseinequalitiesappeartobenew. Theymaybeofsomeintrinsicinterest, but our interest in them here is that a direct proof of the eigenvalue inequalities would yield a direct proof of Fisher information inequalities that would in turn yield entropy and Brascamp-Lieb inequalities. 2 Duality of the Brascamp–Lieb inequality and subadditivity of the entropy We show that the Brascamp–Lieb inequality is dual to the subadditivity of the entropy, so that once one has proved one of these inequalities with sharp constants, one has the other with sharp constants too. In fact, we shall see that there is an exact correspondence also for cases of equality, but in the next theorem, we focus on the constants. Weshallstatetheresultinamoregeneralsettingthantheonedescribedintheintroduction. We consider a reference measure space (Ω, ,µ) and a family of measure spaces (M , ,ν ) together j j j S M with measurable functions p : Ω M , j m. For a probability density f on Ω (with respect j j → ≤ to µ), the marginal f is thus defined as the probability density on M (with respect to ν ) such (pj) j j that φ(p (x))f(x)dµ(x) = φ(t)f (t)dν (t). (2.1) j (pj) j ZΩ ZMj 5 for all bounded measurable functions φ on M ; accordingly the entropies are given by j S(f)= fln(f)dµ and S(f )= f ln(f )dν . (pj) (pj) (pj) j ZΩ ZMj As explained in the introduction, we are mainly interested in the case (M , ,ν ) = (R, ,ν) for j j j M B all j m, where ν is the Lebesgue measure on R. ≤ 2.1 THEOREM. Let (Ω, ,µ) be a measure space, m 1 and for j m, let (M , ,ν ) be a j j j S ≥ ≤ M measure space together with a measurable function p from Ω to M . For any probability density f j j on Ω, let f the probability density on M be defined as in (2.1). Finally, let c ,...,c be any (pj) j { 1 m} set of m nonnegative numbers. Then for any D R, the following two assertions are equivalent: ∈ 1. For any m nonnegative functions f :M R , j m, we have j j + → ≤ m m cj f (p (x))dµ(x) eD f1/cj(t)dν (t) . (2.2) j j ≤ j j ZΩ j=1 j=1 ZMj ! Y Y 2. For every probability density f on (Ω, ,µ) with finite entropy, we have S m c S(f ) S(f)+D. (2.3) j (pj) ≤ j=1 X The proof depends an a well known expression for the entropy as a Legendre transform: For any probability density f in Ω, and any function φ such that eφ is integrable, eφ fln dµ = fφdµ f lnfdµ. f − ZΩ (cid:18) (cid:19) ZΩ ZΩ On the other hand, by Jensen’s inequality, eφ ln eφdµ f ln dµ. ≥ f (cid:18)ZΩ (cid:19) ZΩ (cid:18) (cid:19) Therefore, f lnfdµ+ln eφdµ fφdµ, (2.4) ≥ ZΩ (cid:18)ZΩ (cid:19) ZΩ and there is equality if and only if eφ is a constant multiple of f on the support of f. We shall use that this Legendre duality nicely combines with the operation of taking marginals. Proof of Theorem 2.1: First, assume (2.2). Consider any probability density f on Ω, and any m functions φ on M , j m. Using (2.4) with φ defined on Ω by j j ≤ m φ(x) := c φ (p (x)) (2.5) j j j j=1 X 6 and (2.1) we get m m f(x)lnf(x)dµ f(x) c φ (p (x)) dµ ln ecjφj(pj(x))dµ(x) j j j ≥   −   ZΩ ZΩ j=1 ZΩj=1 X Y     m m = c f (t)φ (t)dν (t) ln ecjφj(pj(x))dµ(x) . j (pj) j j −   j=1 ZMj ZΩj=1 X Y   (2.6) Then from the assumption (2.2) applied with f = eφj, j m n cj ecjφj(pj(x))dµ(x) eD eφj(t)dν (t) . j ≤ ZΩj=1 j=1 ZMj ! Y Y Therefore, (2.6) becomes m f(x)lnf(x)dµ(x) c f (t)φ (t)dν (t) ln eφj(t)dν (t) D . (2.7) ≥ j (pj) j j − j − ZΩ j=1 ZMj ZMj !! X Now the optimal choice φ = lnf leads to (2.3). j (pj) Conversely, suppose that (2.3) is true. Consider m functions φ on M , j m, and define φ on j j ≤ Ω as in (2.5). Suppose that eφ is integrable, and choose f to be the probability density −1 f(x)= eφ(x)dµ(x) eφ(x) , (2.8) (cid:18)ZΩ (cid:19) so that there is equality in (2.4). Then we have from (2.4) that m n ln ecjφj(pj(x))dµ(x) = f(x) c φ (p (x)) dµ(x) f(x)lnf(x)dµ(x) j j j     − ZΩj=1 ZΩ j=1 ZΩ Y X   m   = c f (t)φ (t)dν (t) f(x)lnf(x)dµ(x) j (pj) j j − j=1 ZMj ZΩ X (2.9) On the other hand, (2.3) reads as n f(x)lnf(x)dµ(x) c f (t)lnf (t)dν (t) D , (2.10) ≥ j (pj) (pj) j − ZΩ j=1 ZMj X and so (2.9), and then (2.4) applied on (M ,ν ) with the probability density f and the function j j (pj) φ for each j m, imply j ≤ m m ln ecjφj(pj(x))dµ c f (t)φ (t)dν (t) f (t)lnf (t)dν (t) +D   ≤ j (pj) j j − (pj) (pj) j ZΩj=1 j=1 ZMj ZMj ! Y X   m c ln eφj(t)dν (t) +D . j j ≤ j=1 ZMj ! X (2.11) 7 Exponentiating both sides, we obtain (2.2). We next examine the relation between cases of equality in the two inequalities. 2.2 THEOREM. Using the notation of the previous theorem, suppose that f is a probability density on Ω for which equality holds in the subadditivity inequality (2.3). Then the marginals f ,f ,...,f of f yield equality in the Brascamp–Lieb inequality (2.2), and moreover, f and (p1) (p2) (pm) its marginals satisfy m f = e−D (f (p (x)))cj . (2.12) (pj) j j=1 Y Conversely, suppose that f ,...,f are m probability densities (on M with respect to ν for 1 m j j j = 1,...,m, respectively) for which equality holds in the Brascamp–Lieb inequality (2.2). Then the probability density f defined on Ω by n f(x):= e−D (f (p (x)))cj j j j=1 Y yields equality in the subadditivity inequality (2.3) and moreover f is the jth marginal of f; i.e. j f = f for j m . j (pj) ≤ Proof: Suppose that for some probability density f, m c S(f ) S(f) = D. Then with this i=1 i (pi) − f, we must have equality in the first inequality in (2.6), which comes from (2.4). By what we have P said about the cases of equality in (2.4), this means that φ, defined in (2.5) is a constant multiple of lnf. Moreover, to get equality in (2.7), we were forced to choose φ = ln(f ). This ensures j (pj) that (2.12) is true. Furthermore, to get equality in our intermediate application of the Brascamp–Lieb inequality, we must have that f ,...,f is a set of extremals for the Brascamp–Lieb inequality. { (p1) (pn)} The other assertion follows in the same way. By whatwehave justestablished, onecould tryto provetheclassical Brascamp–Lieb inequality by first proving a general subadditivity of the entropy inequality for random variables in Rn. We do this in the next section, and shall see that the determination of all of the cases of equality is particularly transparent via this route. While the Brascamp–Lieb inequality and subadditivity inequality are equivalent, there is an extra richness to the investigation of the cases of equality in the subadditivity inequality, as this involves statistical independence in a crucial way. Some hint of this can be seen in the following simple example, which sets the stage for the next section: Let m = n, c = 1 for all j, and a ,...,a be an orthonormal basis of Rn. Take all reference j 1 n { } measures to be Lebesgue measure. Then the Brascamp-Lieb inequality reduces to an equality, by Fubini’s theorem, with D = 0, and any set of non negative integrable functions f ,...,f 1 n { } provides a case of equality. On the other hand the dual inequality, is the classical subadditivity of the entropy inequality m S(X a ) S(X) , i · ≤ i=1 X and equality occurs exactly when the coordinates X a ,...,X a form a set of independent 1 n { · · } random variables. 8 In this example, it may appear that the entropy inequality is the more complicated of the two inequalities. However, the fact that statistical independence enters the picture on the entropy side is quite helpful: We will make much use of simple entropy inequalities that are saturated only for independent random variables in our investigation of the cases of equality in the next section. 3 The general subadditivity of the entropy inequality in Rn Let Rn be equipped with its standard Euclidean structure. Let X denote a random vector (or variable if n = 1) with values in Rn, and suppose that X has a density f. We denote this correspondence between the random variable X and its density f by writing X f and set ∼ S(X) = S(f)= f(x)lnf(x)dnx. Rn Z Thus, in this section, we are specializing the general context of the introduction to the case in which Ω is Rn, and µ is Lebesgue measure. We shall also take ν to be Lebesgue measure on R. Given a non zero vector a on Rn, identify a with the linear functional a(x) = a x. Then, if · f X is a probability density on Rn, f , as defined by (1.2), is the density of a X, that is (a) ∼ · f a X, and (a) ∼ · S(X a) = S(f ) = f (t)lnf (t)dt. (a) (a) (a) · R Z Note that (1.2) specializes to the requirement that for every bounded and continuous φ:R R, → φ(x a)f(x) dnx = φ(t)f (t) dt . (3.1) (a) Rn · R Z Z It follows that for all t R, f (t) = 1 f(x)dn−1x. It is a direct consequence of (3.1) that ∈ (a) |a| {a·x=t} for all λ> 0, R f (t)= λ−1f (λ−1t) . (3.2) (λa) (a) With these preliminaries out of the way, we turn to the main question to be addressed in this section: Consider m non zero vectors a ,...,a in Rn, and m numbers c ,...,c with c > 0 for 1 m 1 m j all j. Then, we ask: Is there a finite constant D R so that ∈ m c S(a X) S(X)+D (3.3) j j · ≤ j=1 X for all random vectors X in Rn, and if so, what is the least such value of D, and what are the cases of equality? In general there is no finite constant D for which (3.3) is true for all X. There are some simple requirements on a ,...,a and c ,...,c for this to be the case. 1 m 1 m { } { } First of all, for (3.3) to hold for any finite constant D, the set of vectors a ,...,a must span 1 m { } Rn. Thefollowing construction is usefulfor this and other purposes: Let V beany proper subspace 9 of Rn, and let V⊥ be its orthogonal complement. Then for any number λ> 0, let X denote the V,λ centered Gaussian random vector (see below for definition) such that u V, E (u X )2 = λ and u V⊥, E (u X )2 = 1. (3.4) V,λ V,λ ∀ ∈ · ∀ ∈ · (cid:0) (cid:1) (cid:0) (cid:1) Then n dim(V) S(X ) = ln(2πe) ln(λ) (3.5) V,λ −2 − 2 while for any a in Rn, 1 1 S(a X ) = ln(2πe) ln(λ Pa2+ P⊥a2) , (3.6) V,λ · −2 − 2 | | | | where P is the orthogonal projection onto V, and P⊥ = I P. − Now take V to be the orthogonal complement of the span of a ,...,a . If the latter is a 1 m { } proper subspace of Rn, then dim(V) 1, and we see that for any finite D, (3.3) would be violated ≥ for sufficiently large λ, since then Pa 2 = 0 for each j. j | | Beyond this spanning condition, there are some simple compatibility conditions that must be satisfied by the vectors a and the numbers c . First of all, it follows from (3.2) that for all λ > 0, j j S(λX) = S(X) nln(λ) and S(a λX) = S(a X) ln(λ) . − · · − Therefore, (3.3) can only hold when m c = n . (3.7) j j=1 X There is a further necessary condition that is somewhat less obvious. The key observation to makeisthattherighthandsideof(3.6)tendstoinfinityasλtendstozeroifandonlyif P⊥a2 = 0, | | Consider any subset J of 1,...,m , and let { } V := span a ; j J . J j { ∈ } Let G denote the Gaussian random variable X defined by (3.4) when V = V . Note that for J VJ,λ J each j J, P⊥a 2 = 0, so that for such j, j ∈ | | 1 1 1 S(a G )= ln(2πe) ln(a 2) ln(λ) , j J j · −2 − 2 | | − 2 which tends to infinity as λ tends to zero. Therefore, letting λ approach zero, we see that the leading term in m c S(a G ) S(G ) is at least j=1 j j · J − J P 1 dim(V ) c ln(λ) . J j 2 −  j∈J X   (It is exactly this unlessfor some i / J, a V , in which case we could have taken an even “worse” i J ∈ ∈ set J.) Hence, if dim(V ) c < 0, there can be no upper boundon m c S(a G) S(G). J − j∈J j j=1 j j· − Therefore, (3.3) can only hold when it is the case that for all J, P P c dim(V ) . (3.8) j J ≤ j∈J X 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.