Equivalence of additive-combinatorial linear inequalities for Shannon entropy and differential entropy Ashok Vardhan Makkuva and Yihong Wu∗ September 12, 2016 6 1 0 2 p Abstract e ThispaperaddressesthecorrespondencebetweenlinearinequalitiesofShannonentropyand S differential entropy for sums of independent group-valued random variables. We show that any 9 balanced (with the sum of coefficients being zero) linear inequality of Shannon entropy holds if and only if its differential entropy counterpart also holds; moreover, any linear inequality ] T for differential entropy must be balanced. In particular, our result shows that recently proved I differential entropy inequalities by Kontoyiannis and Madiman [KM14] can be deduced from . s their discrete counterparts due to Tao [Tao10] in a unified manner. Generalizations to certain c abelian groups are also obtained. [ Our proof of extending inequalities of Shannon entropy to differential entropy relies on 2 a result of R´enyi [R´en59] which relates the Shannon entropy of a finely discretized random v variable to its differential entropy and also helps in establishing the entropy of the sum of 8 9 quantized random variables is asymptotically equal to that of the quantized sum; the converse 4 uses the asymptotics of the differential entropy of convolutions with weak additive noise. 7 0 . Contents 1 0 6 1 Introduction and main result 2 1 1.1 Additive-combinatorial inequalities for cardinality and Shannon entropy . . . . . . . 2 : v 1.2 Equivalence of Shannon and differential entropy inequalities . . . . . . . . . . . . . . 3 i X 1.3 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 r 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 a 2 On sharp constants in additive-combinatorial entropy inequalities 6 3 Proof of Theorem 1 8 4 Proof of Theorem 2 9 5 Proofs of lemmas 11 5.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.3 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.4 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ∗AshokVardhanMakkuvaiswiththeDepartmentofECEandtheCoordinatedScienceLab,UniversityofIllinois atUrbana-Champaign,Urbana,IL,email: [email protected]. YihongWuiswiththeDepartmentofStatistics, Yale University,New Haven CT 06511, email: [email protected]. 1 6 Extensions to general groups 17 A Proof of Proposition 1 19 1 Introduction and main result 1.1 Additive-combinatorial inequalities for cardinality and Shannon entropy Over the past few years, the field of additive combinatorics has invited a great deal of mathemat- ical activity; see [TV06] for a broad introduction. An important repository of tools in additive combinatorics is the sumset inequalities, relating the cardinalities of the sumset and the difference set A±B = {a±b : a ∈ A,b ∈ B} to those of A and B, where A and B are arbitrary subsets of integers, or more generally, any abelian group. Onecanconsidertheinformation-theoreticanalogsoftheseadditivecombinatoricinequalitiesby replacing the sets by (independent, discrete, group-valued) random variables and, correspondingly, the log-cardinality by the Shannon entropy. For example, the inequality max{|A|,|B|} ≤ |A+B|≤ |A||B| translates to max{H(X),H(Y)} ≤ H(X +Y)≤ H(X)+H(Y), (1) which follows from elementary properties of entropy. The motivation to consider these analogs comes from the interpretation that the Shannon entropy 1 H(X) , P[X = x]log P[X = x] x X of a discrete random variable X can be viewed as the logarithm of the effective cardinality of the alphabet of X in the sense of asymptotic equipartition property (AEP) [CT06], which states that the random vector consisting of n independent copies of X is concentrated on a set of cardinality exp(n(H(X)+o(1)) as n → ∞. While this observation was fruitful in deducing certain entropy inequality,e.g.,Han’sinequality[Han78],directlyfromtheirsetcounterpartcf.[Ruz09a,p.5],ithas notprovenusefulforinequalitiesdealingwithsumssincethetypicalsetofsumscanbeexponentially larger than sums of individual typical sets. Forgoing this soft approach and capitalizing on the submodularity property of entropy, in the past few years several entropy inequalities for sums and differences have been obtained [TV05, LP08, Mad08, Tao10, MK10, MMT12], such as the sum-difference inequality [Tao10, Eq. (2.2)] H(X +Y) ≤ 3H(X −Y)−H(X)−H(Y), (2) which parallels the following (cf., e.g., [GHR07, Eq. (4)]) |A−B|3 |A+B|≤ . |A||B| More recently, a number of entropy inequalities for integer linear combinations of independent random variables have been obtained in [WSV15, Appendix E], e.g., H(pX +qY)−H(X +Y) ≤ (7⌊log|p|⌋+7⌊log|q|⌋+2)(2H(X +Y)−H(X)−H(Y)), for non-zero integers p,q, which are counterparts of results on sum of dilated sets in [Buk08]. 2 It is worth noting that all of the aforementioned results for Shannon entropy are linear in- equalities for entropies of weighted sums of independent random variables, which are of the general form: n m α H a Z ≤ 0, (3) i ij j i=1 j=1 X X with a ∈ Z, α ∈R, Z ,...,Z being independent discrete group-valued random variables. ij i 1 m 1.2 Equivalence of Shannon and differential entropy inequalities Recall that the differential entropy of a real-valued random vector X with probability density function (pdf) f is defined as X 1 h(X) = f (x)log dx. X f (x) X Z Again, in the senseof AEP, h(X) can beinterpreted as thelog-volume of the effective supportof X [CT06]. Inasimilarvein,onecanconsidersimilaradditive-combinatorialinequalitiesfordifferential entropies on Euclidean spaces. Recently Kontoyiannis and Madiman [KM14] and Madiman and Kontoyiannis [MK10, MK15] made important progress in this direction by showing that while the submodularity property, the key ingredient for proving discrete entropy inequalities, fails for differential entropy, several linear inequalities for Shannon entropy nevertheless extend verbatim to differential entropy; for example, the sum-difference inequality (2) admits an exact continuous analog [KM14, Theorem 3.7]: h(X +Y)≤ 3h(X −Y)−h(X)−h(Y). (4) These results prompt us to ask the following question, which is the focus of this paper: Question 1. Do all linear inequalities of the form (3) for discrete entropy extend to differential entropies, and vice versa? A simple but instructive observation reveals that all linear inequalities for differential entropies are always balanced, that is, the sum of all coefficients must be zero. In other words, should n m α h a Z ≤ 0, (5) i ij j i=1 j=1 X X hold for all independent Rd-valued Z ’s, then we must have n α = 0. To see this, recall the j i=1 i fact that h(aZ) = h(Z)+ dloga for any a > 0; in contrast, Shannon entropy is scale-invariant. n P Therefore,whenevertheinequality (5)isunbalanced, i.e., α 6= 0, scalingallrandomvariables i=1 i by a and sending a to either zero or infinity leads to a contradiction. For instance, in (1), the left P inequality (balanced) extends to differential entropy but the right inequality (unbalanced) clearly does not. Surprisingly, as we show in this paper, a balanced linear inequality holds for Shannon entropy if and only if it holds for differential entropy, thereby fully resolving Question 1. This result, in a way, demystifies the striking parallel between discrete and continuous entropy inequalities. In particular, it shows that the results in [KM14, MK15], which are linear inequalities for mutual information such as I(X;X+Y) = h(X+Y)−h(Y) or Ruzsa distance dist (X,Y) , h(X−Y)− R 3 1h(X)− 1h(Y) [Ruz09a, Tao10, KM14]) and hence expressible as balanced linear inequalities for 2 2 differential entropy, can be deduced from their discrete counterparts [Tao10] in a unified manner. While our results establish that all balanced linear inequalities for Shannon entropy extend to differential entropy and vice versa, it is worth pointing out that this does not hold for affine inequalities. Note that non-trivial affine inequality for Shannon entropy does not exist simply because one can set all random variables to be deterministic; however, this is not the case for differential entropy. For instance, the following balanced affine inequality 1 d h(X +Y) ≥ (h(X)+h(Y))+ log2 (6) 2 2 holds for any independent Rd-valued random variables X and Y, which is a direct consequence of the entropy power inequality (see [Bar84, Lemma 3.1] for generalizations of (6)). However, the Shannonentropyanalogue of(6), replacingallhbyH,isclearly false(considerdeterministicX and Y).On the other hand, there exists no unbalanced linear inequality for differential entropy while it’s not true for Shannon entropy. Consider for instance, the Shannon entropy inequality H(X +Y) ≤ H(X)+H(Y) holds for any independent discrete random variables X and Y, which follows directly from the elementary properties of Shannon entropy. However, the differential entropy counterpart, h(X + Y) ≤ h(X)+h(Y)can beshowntobefalsebytakingX andY tobeindependentGaussian random variables with zero mean and variance 1 and 1 respectively. 2πe To explain our proof that discrete entropy inequalities admit continuous counterparts, we first note that the main tool for proving differential entropy inequalities in [MK10, KM14, MK15] is the data processing inequality of mutual information, replacing the submodularity of Shannon entropy exploited in [Tao10]. However, this method has been applied on a case-by-case basis as there seems to be no principled way to recognize the correct data processing inequality that needs to be introduced. Instead, to directly deduce a differential inequality from its discrete version, our strategy is to rely on a result due to R´enyi [R´en59] which gives the asymptotic expansion of the Shannon entropy of a finely quantized continuous random variable in terms of its differential entropy, namely, H(⌊mX⌋) = dlogm+h(X)+o(1), m → ∞ (7) for continuous Rd-valued X. In fact, this approach has been discussed in [KM14] at the suggestion of a reviewer, where it was noted that differential entropy inequalities can be approximately ob- tained from theirdiscrete counterparts via thisquantization approach, since H(⌊mX⌋+⌊mY⌋) and H(⌊m(X +Y)⌋) can only differ by a few bits, which might be further improvable. Indeed, as we shall prove later in Lemma 1, this entropy difference is in fact vanishingly small, which enables the additive-combinatorial entropy inequalities to carry over exactly from discrete to Euclidean spaces, and, even more generally, for connected abelian Lie groups. Interestingly, in addition to bridging the discrete and continuous notion of entropy, R´enyi’s result also plays a key role in establishing the vanishing entropy difference. In establishing that all linear discrete entropy inequalities follow from their continuous analogs, the following are the two key ideas of our approach: First we show that given any finite collection of discrete Rd-valued random variables, we can embed them into a high dimensional Euclidean space and project them back to Rd such that the Shannon entropy of any linear combinations of the projected random variables is equal to an arbitrarily large multiple of that the given random variables. Next we add independent noise, e.g., Gaussian, with arbitrarily small variance to these projected discrete random variables and relate their Shannon entropy to the differential entropy 4 of their noisy versions. Sending the variance to zero and then the dimension to infinity yields the desired inequality for discrete entropy. 1.3 Main results Throughout the rest of the paper, to make the statements concise and exclude trivial cases, all differential entropies are assumed to exist and be finite. We now state our main results on linear entropy inequalities. Theorem 1. Let (a ) ∈ Zn×m satisfies that a ,...,a are relatively prime, for each i= 1,...,n. ij i1 im Let α ,...,α ∈ R be such that n α = 0. Suppose for any independent Zd-valued random 1 n i=1 i variables U ,...,U , the following holds: 1 m P n m α H a U ≤ 0. (8) i ij j i=1 j=1 X X Then for any independent Rd-valued continuous random variables X ,...,X , the following holds: 1 m n m α h a X ≤ 0 (9) i ij j i=1 j=1 X X Remark 1. Without loss of any generality, we can always assume that the coefficients of each linear combination of random variables in (8) are relatively prime. This is because for each i we can divide a ,...,a by their greatest common divisor so that the resulting entropy inequality i1 im remains the same, thanks to the scale invariance of the Shannon entropy. Theorem 2. Let (a )∈ Rn×m and α ,...,α ∈ R be such that n α = 0. If ij 1 n i=1 i P n m α h a X ≤ 0 i ij j i=1 j=1 X X holds for any Rd-valued independent and continuous random variables X ,...,X , then 1 m n m α H a U ≤ 0 i ij j i=1 j=1 X X holds for any Rd-valued independent and discrete random variables U ,...,U . 1 m Remark 2 (iidrandomvariables). For additive-combinatorial entropyinequalities, when(someof) the random variables are further constrained to be identically distributed, a number of strength- ened inequalities have been obtained. For instance, if U and U′ are independent and identically distributed (iid) discrete random variables, then (cf., e.g., [MK10, Theorems 1.1 and 2.1]) 1 H(U −U′)−H(U) ≤ ≤ 2 (10) 2 H(U +U′)−H(U) and for iid continuous X,X′, 1 h(X −X′)−h(X) ≤ ≤ 2 (11) 2 h(X +X′)−h(X) 5 which are stronger than what would be obtained from (2) and (4) by substituting Y = X′. As evident from the proof, both Theorem 1 and Theorem 2 apply verbatim to entropy inequal- ities involving independent random variables with arbitrary distributions. Consequently, (11) and (10) are in fact equivalent. Formally, fix a partition S ,...,S of [m] , {1,...,m}. Then (8) 1 K holds for independent U ,...,U so that {U } are iid for k ∈ [K] if and only if (9) holds for 1 m j j∈Sk independent X ,...,X so that {X } are iid for k ∈ [K]. It is worth noting that this result is 1 m j j∈Sk not a special case of Theorems 1 and 2; nevertheless, the proofs are identical. Remark 3. Thenature of the equivalence results that we obtained in this paper for linear inequal- ities forweighted sumsofindependentrandomvariables bearsomesimilarity toaresultestablished by Chan in [Cha03] for linear entropy inequalities of subsets of random variables, as opposed to sums of independent random variables. In particular, he established that the class of linear in- equalities for Shannon entropy and differential entropy are equivalent provided the inequalities are “balanced” in the following sense. For example, consider the following entropy inequalities for discrete random variables X and X : 1 2 H(X )+H(X )−H(X ,X ) ≥ 0, (12) 1 2 1 2 H(X ,X )−H(X ) ≥ 0. (13) 1 2 1 The inequality (12) is said to be balanced because the sum of the coefficients of the entropy terms in which X appears equals zero and the same is true for X as well. However, the inequality 1 2 (13) is unbalanced because X appears only in the first term. Though the notion of balancedness 2 considered in [Cha03] is different from ours, the technique employed for extending the discrete en- tropy inequalities to the continuous case is similar to ours, i.e., through discretization of continuous random variables; however, as discussed before, the key argument is to show that the entropy of the sum of quantized random variables is asymptotically equal to that of the quantized sum, a difficulty which is not present in dealing with subsets of random variables. To deduce the discrete inequality from its continuous counterpart, the method in [Cha03] is to assume, without loss of generality, the discrete random variables are integer-valued and use the fact that H(A) = h(A+U) for any Z-valued A and U independently and uniformly distributed on [0,1]. Clearly this method does not apply to sums of independent random variables. 1.4 Organization Therestofthepaperisorganizedasfollows. Beforegivingtheproofofthemainresults,inSection2 we pause to discuss the open problem of determining the sharp constants in additive-combinatorial entropy inequalities and the implications of our results. The proof of the main theorems are given inSections 3and4, withthetechnical lemmas proved inSection 5. Following [KM14], thenotion of differential entropy can be extended to locally compact groups by replacing the reference measure (Lebesgue) by the corresponding Haar measure. In Section 6 we generalize Theorem 1 to random variables taking values in connected abelian Lie groups. 2 On sharp constants in additive-combinatorial entropy inequali- ties The entropy inequalities (10) and (11) can be viewed as the information theoretic analogs of the following additive-combinatorial inequality proved by Ruzsa [Ruz91]: For any finite A ⊂ Zn( or 6 any abelian group) |A−A| |A+A| log ≤ 2log . (14) |A| |A| The constant “2” in (14) is known to be sharp (see [HRY99] or [Ruz09b, p. 107]). The crucial idea for the construction is to approximate cardinality by volume by considering the lattice points inside a convex body. In particular, for any convex body K in Rn, denote its quantized version [K] , K ∩(1Zn), where L ∈ N. The sum and difference sets of [K] is related to those of K L L L through [K ±K] = [K] ±[K] . If we fix the dimension n and let L → ∞, it is well-known that L L L the cardinality of [K] is related to the volume of K via |[K] | = vol(K)Ln(1+o(1)). Thus, L L |[K] ±[K] | vol(K ±K) L L = (1+o(1)). |[K] | vol(K) L A classical result of Rogers and Shephard [RS57] states that for any convex K ∈ Rn, vol(K−K)≤ 2n vol(K) with equality if and only if K is a simplex. Since K is convex, K +K = 2K and thus n vol(K+K)= 2nvol(K). Now taking K to be the standard simplex ∆ = x ∈ Rn : n x ≤ 1 , (cid:0) (cid:1) n + i=1 i we obtain (cid:8) P (cid:9) log |[∆n]L−[∆n]L| log (2nn) −log 1 +o (1) log 2n +o (1) |[∆n]L| = n! n! L = n L , log |[∆n|[]∆L+n][L∆|n]L| log 2nn! −log n1! +oL(1) nlo(cid:0)g2(cid:1)+oL(1) where we used vol(∆ ) = 1,vol(∆ − ∆ ) = 1 2n and vol(∆ +∆ ) = 2n. Sending L → ∞ n n! n n n! n n n n! followed by n → ∞ yields that the sharpness of (14). (cid:0) (cid:1) Analogously, one can investigate the best possible constants in the Shannon entropy entropy inequalities (10) as well as its continuous analog (11). It is unclear if the constants 1/2 and 2 are the best possible. However, as a consequence of Theorem 1 and Theorem 2, one can establish that the sharp constants for the discrete and continuous versions are the same, and dimension-free (see Appendix A for a proof): Proposition 1. For i.i.d. U and U′ and i.i.d. X and X′, 1 H(U −U′)−H(U) h(X −X′)−h(X) ≤ inf = inf 2 U∈Zn H(U +U′)−H(U) X∈Rn h(X +X′)−h(X) h(X −X′)−h(X) H(U −U′)−H(U) ≤ sup = sup ≤ 2. X∈Rn h(X +X′)−h(X) U∈Zn H(U +U′)−H(U) Furthermore, the infimum and the supremum are independent of the dimension n. It is worth pointing out that the dimension-freeness of the best Shannon entropy ratio follows from standard arguments (tensorization and linear embedding of Zn into Z), which have been previously used for proving analogous results for set cardinalities [HRY99]; however, it is unclear how to directly prove the ratio of differential entropy is dimension-independent without resorting to Theorem 1. In view of the success of continuous approximation in proving the sharpness of (14), proving the sharpness of (11) for differential entropies might be more tractable than its discrete counterpart (10). 7 3 Proof of Theorem 1 We first introduce the notations followed throughout the paper. For x ∈ R, let ⌊x⌋ , max{k ∈ Z : k ≤ x} and {x} = x −⌊x⌋ denote its integer and fractional parts, respectively. For any k ∈ N, define ⌊2kx⌋ {2kx} [x] , , {x} , . (15) k 2k k 2k Hence, ⌊2kx⌋ {2kx} x = + = [x] +{x} . 2k 2k k k For x ∈ Rd, [x] and {x} are defined similarly by applying the above operations componentwise. k k For N > 0, denote the hypercube B(d) , [−N,N]d. For a Rd-valued random variable X, let N X(N) denote a random variable distributed according to the conditional distribution P . If X|X∈B(d) N X has a pdf f , then X(N) has the following pdf: X (d) f (x) {x ∈ B } f (x) = X 1 N . (16) X(N) P[X ∈ B(d)] N The following lemma is the key step to proving Theorem 1. Lemma 1. Let X ,...,X be independent [0,1]d-valued continuous random variables such that 1 m both h(X ) and H(⌊X ⌋) are finite for each j ∈ [m]. Then for any a ,...,a ∈ Z that are j j 1 m relatively prime, m m lim H a X −H a [X ] = 0. i i i i k k→∞ " # ! (cid:18) Xi=1 k(cid:19) (cid:18)Xi=1 (cid:19) The next lemma allows us to focus on bounded random variables. Lemma2(Truncation). LetX ,...,X beindependentRd-valuedrandom variablesanda ,...,a ∈ 1 m 1 m R. If each X has an absolutely continuous distribution and h(X ) is finite, then j j m m (N) lim h a X =h a X . N→∞ j j j j j=1 j=1 X X The following lemma is a particularization of [R´en59, Theorem 1] (see (7)) to the dyadic sub- sequence m = 2k: Lemma 3. For any Rd-valued random variable X with an absolutely continuous distribution such that both H(⌊X⌋) and h(X) are finite, lim (H([X] )−dklog2) = h(X). k k→∞ We are now ready to prove Theorem 1. Proof. We start by considering the case where X ∈ [0,1]d for each j ∈ [m]. Since X ’s are j j independent and 2k[X ] is Zd-valued for each j ∈ [m], by assumption, j k n m α H a [X ] ≤0 (17) i ij j k i=1 j=1 X X 8 holds where n α = 0. (18) i i=1 X By Lemma 3, H([X] ) = dklog2+h(X)+o (1). Thus, k k m m h a X +dklog2+o (1) = H a X ij j k ij j j=1 j=1 X X k m (a) = H a [X ] +o (1), ij j k k j=1 X where (a) follows from Lemma 1. Multiplying on both sides by α and summing over i, and in view i of (18), we have n m n m α h a X +o (1) = α H a [X ] . i ij j k i ij j k i=1 j=1 i=1 j=1 X X X X By (17), sending k to infinity yields the desired result. For the general case where X ∈ Rd, let Y = m a X for i ∈ [n]. Let X˜(N) , Xj(N)+N, j i j=1 ij j j 2N d which belongs to [0,1] . Thus, P n m n m n d 1 α h a X˜(N) = α h a X(N) + α ·log i ij j i ij j i 2N i=1 j=1 i=1 j=1 i=1 (cid:18) (cid:19) X X X X X n m (N) = α h a X , (19) i ij j i=1 j=1 X X where (19) follows from (18). Hence, n n m (a) (N) α h(Y ) = lim α h a X i i N→∞ i ij j i=1 i=1 j=1 X X X n m (=b) lim α h a X˜(N) N→∞ i ij j i=1 j=1 X X (c) ≤ 0, where (a) follows form Lemma 2 and (b) follows from (19), and (c) follows from the earlier result d for [0,1] -valued random variables. The proof of Theorem 1 is now complete. 4 Proof of Theorem 2 Theorem2reliesofthefollowingtwolemmas. Thefirstresultisawell-knownasymptoticexpansion of the differential entropy of a discrete random variable contaminated by weak additive noise. For completeness, we provide a short proof in Section 5.3. 9 Lemma 4. Let U be a discrete Rd-valued random variable such that H(U) < ∞ and Z be a Rd-valued continuous random variable with h(Z) > −∞. If U and Z are independent, then h(U +εZ)= h(Z)+logε+H(U)+o (1). ε The following lemma, proved in Section 5.4, allows us to blow up the Shannon entropy of linear combinations of discrete random variables arbitrarily. Lemma 5. Let U ,...,U be Rd-valued discrete random variables. Let k ∈ N. Then for any 1 m A= (a ) ∈ Rn×m, there exist Rd-valued discrete random variables U(k),...,U(k) such that ij 1 m m m (k) H a U = kH a U ,∀i∈ [n]. ij j ij j j=1 j=1 X X We now prove Theorem 2. Proof. LetZ beindependentRd-valuedGaussianrandomvariableswithzeromeanandU ,...,U j 1 m be independent Rd-valued discrete random variables. Let U(k),...,U(k) be independent Rd-valued 1 m m (k) m discrete random variables such that H a U = kH a U for each i ∈ [n], guar- j=1 ij j j=1 ij j anteed by Lemma 5. (cid:16) (cid:17) (cid:16) (cid:17) P P (k) Let ε > 0. For each j ∈ [m], let X = U +εZ . Then we have, j j j (k) h(X ) = H(U )+h(Z )+logε+o (1). j j j ε Hence, for each i ∈ [n], m m m (k) h a X = h a U +ε a Z ij j ij j ij j j=1 j=1 j=1 X X X m m (a) (k) = H a U +h a Z +logε+o (1) ij j ij j ε j=1 j=1 X X m m = kH a U +h a Z +logε+o (1), ij j ij j ε j=1 j=1 X X n m where(a)followsfromLemma4. SinceX ’sareindependent,byassumption, α h a X ≤ j i=1 i j=1 ij j n 0 where i=1αi. Hence, P (cid:16)P (cid:17) P n m n m k α H a U + α h a Z +o (1) ≤ 0. i ij j i ij j ε i=1 j=1 i=1 j=1 X X X X Thus, n m n α h m a Z i=1 i j=1 ij j oε(1) α H a U + + ≤ 0. i ij j P (cid:16)Pk (cid:17) k i=1 j=1 X X The proof is completed by letting ε → 0 followed by k → ∞. 10