ebook img

Sharp Bounds on Arimoto's Conditional R\'{e}nyi Entropies Between Two Distinct Orders PDF

0.99 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Sharp Bounds on Arimoto's Conditional R\'{e}nyi Entropies Between Two Distinct Orders

1 Sharp Bounds on Arimoto’s Conditional Re´nyi Entropies Between Two Distinct Orders Yuta Sakai and Ken-ichi Iwata Graduate School of Engineering, University of Fukui, Japan, Email: {y-sakai, k-iwata}@u-fukui.ac.jp 7 Abstract 1 0 This study examines sharp bounds on Arimoto’s conditional Re´nyi entropy of order β with a fixed another one 2 of distinct order α (cid:54)= β. Arimoto inspired the relation between the Re´nyi entropy and the (cid:96) -norm of probability r b distributions, and he introduced a conditional version of the Re´nyi entropy. From this perspective, we analyze the e F (cid:96) -norms of particular distributions. As results, we identify specific probability distributions whose achieve our sharp r 2 boundsontheconditionalRe´nyientropy.Thesharpboundsderivedinthisstudycanbeapplicabletootherinformation measures, e.g., the minimum average probability of error, the Bhattacharyya parameter, Gallager’s reliability function ] T E , and Sibson’s α-mutual information, whose are strictly monotone functions of the conditional Re´nyi entropy. 0 I . s c I. INTRODUCTION [ In information theory, the Shannon entropy H(X) and the conditional Shannon entropy H(X | Y) [34] are 2 v traditional information measures of random variables (RVs) X and Y, whose characterize several theoretical limits 4 for information transmission. Later, the Re´nyi entropy H (X) [26] was axiomatically proposed as a generalized 1 α 0 Shannon entropy with order α. For a discrete RV1 X ∼P, the Re´nyi entropy of order α∈[0,∞] is defined by 0 0 r 2. Hα(X)=Hα(P):=rl→imα1−r ln(cid:107)P(cid:107)r, (1) 0 7 where ln denotes the natural logarithm, the (cid:96)r-norm of a discrete probability distribution P is defined by 1 (cid:18) (cid:19)1/r (cid:88) : (cid:107)P(cid:107) := P(x)r (2) v r i x∈supp(P) X for r ∈R, and supp(P):={x∈X |P(x)>0} denotes the support of a distribution P on a countable alphabet X. r a Note that (1) is well-defined since the limiting value exists for each α∈[0,∞] as follows: α H (X)= ln(cid:107)P(cid:107) for α∈(0,1)∪(1,∞), (3) α 1−α α H (X)=ln|supp(P)|, (4) 0 H (X)=E[−lnP(X)]=:H(X), (5) 1 H (X)=−ln(cid:107)P(cid:107) , (6) ∞ ∞ ThisworkwaspartiallysupportedbytheMinistryofEducation,Science,SportsandCulture,Grant-in-AidforScientificResearchthroughthe JapanSocietyforthePromotionofScienceunderGrant26420352. 1AnRVX withdistributionP isdenotedbyX∼P. February3,2017 DRAFT 2 where |·| denotes the cardinality2 of the countable set, E[·] denotes the expectation of the RV, and (cid:107)P(cid:107) := ∞ lim (cid:107)P(cid:107) = max P(x) denotes the (cid:96) -norm of P. Moreover, Arimoto [2] proposed a conditional r→∞ r x∈supp(P) ∞ version3 of the Re´nyi entropy H (X |Y) as a generalized conditional Shannon entropy with order α. For a pair of α RVs (X,Y)∼P P , the conditional Re´nyi entropy [2] of order α∈[0,∞] is defined by X|Y Y r H (X |Y):= lim lnN (X |Y), (7) α r→α1−r r where the expectation of (cid:96) -norm is denoted by r N (X |Y):=E(cid:2)(cid:107)P (·|Y)(cid:107) (cid:3) (8) r X|Y r for r ∈ (0,∞], and note in (8) that Y ∼ P . In this study, the RV Y can be considered to be either discrete Y or continuous. By convention, we write H (X | Y = y) := H (P (· | y)) for y ∈ supp(P ). As with the α α X|Y Y unconditional Re´nyi entropy (1), note that (7) is also well-defined since the limiting value also exists for each α∈[0,∞] as follows:4 α H (X |Y)= lnN (X |Y) for α∈(0,1)∪(1,∞), (9) α 1−α α H (X |Y)= sup H (X |Y =y), (10) 0 0 y∈supp(PY) H (X |Y)=E[−lnP (X |Y)]=:H(X |Y), (11) 1 X|Y H (X |Y)=−lnN (X |Y). (12) ∞ ∞ NotethatArimoto[2]proposedH (X |Y)intermsoftherelationbetweenthe(cid:96) -normandtheunconditionalRe´nyi α r entropy, shown in (1) (see also [41, Section II-A]). As shown in (5) and (11), Re´nyi’s information measures can be reduced to Shannon’s information measures as α → 1. In many situations, Re´nyi’s information measures derive strongerresultsthanShannon’sinformationmeasures(cf.[1],[6],[7],[9],[38]).Inaddition,thequantityH (X |Y) α is closely related to Gallager’s reliability function E [14, Eq. (5.6.14)] and Sibson’s α-mutual information [21], 0 [35]; and thus, coding theorems with them can be written by H (X |Y) (cf. [2], [41]). Many basic properties of α H (X |Y) were studied by Fehr and Berens [13]. α Bounds on information measures are crucial tools in several engineering fields, e.g., information theory, coding theory, cryptology, machine learning, statistics, etc. In this paper, a bound is said to be sharp if there is no tighter bound than it in the same situation. One of well-known sharp bounds is Fano’s inequality [11], which bounds the conditional Shannon entropy H(X |Y) from above for fixed (i) average probability of error Pr(X (cid:54)=f(Y)) and (ii) size of support |supp(P )|, where the function f is an estimator of X given Y. As related bounds, the reverse X 2Inthisstudy,supposethat|S|=∞ifS isacountablyinfiniteset;thus,notein(4)thatH0(X)=∞ifsupp(P)iscountablyinfinite. 3TherearemanydefinitionofconditionalRe´nyientropy(cf.[13],[37],[38]).Inthispaper,theconditionalRe´nyientropymeansArimoto’s definitionunlessotherwisenoted. 4Proofsof(10)–(12)canbefoundin,e.g.,[13,Propositions1and2]. February3,2017 DRAFT 3 of Fano’s inequality, i.e., sharp lower bounds on H(X |Y) with a fixed minimum average probability of error P (X |Y):=minPr(X (cid:54)=f(Y)), (13) e f were established by Kovalevsky [23] and Tebbe and Dwyer [36] (see also [12]). Ho and Verdu´ [20] generalized Fano’s inequality by relaxing its constraints from fixed number |supp(P )| to fixed distribution P . Very recently, X X Sason and Verdu´ [33] generalized Fano’s inequality and the reverse of it to sharp bounds on the conditional Re´nyi entropy H (X |Y). In their study [33], interplay between H (X |Y) and P (X |Y) was investigated with broad α α e applications and comparisons to related works. On the other hand, we [29] derived sharp bounds on H(X | Y) with a fixed H (X | Y), and vice versa, by analyzing interplay between H(X | Y) and N (X | Y) (cf. (7)). α r Unconditional versions of its results [29] were also examined in [30]. In this study, we further generalize interplay between Shannon’s information measure and Re´nyi’s information measure of our results [29], [30] to interplay between two Re´nyi’s information measures with distinct orders α(cid:54)=β. We start to analyze extremal probability distributions v (·) and w(·) defined in Section II, where the extremal n distributions means that our sharp bounds on H (X) can be achieved by them (cf. Section III). To utilize the nature α of the expectation N (X |Y) of (cid:96) -norm, our analyses of this study are concentrated on the (cid:96) -norm of extremal r r r distributions. Main results of this study are shown in Section IV, which show sharp bounds on H (X |Y) with β a fixed another one H (X | Y), α (cid:54)= β, in several situation. In this study, we represent our bounds via specific α distributions to ensure sharpnesses of the bounds. The main results of this study are organized as follows: • Section IV-A shows sharp bounds on Hβ(X |Y) with fixed Hα(X |Y) and the cardinality |supp(PX)|<∞ for distinct orders α(cid:54)=β as follows: – Theorem 5 gives bounds on H (X |Y) with a fixed H (X |Y) for α∈(0,∞), and vice versa. α ∞ – Theorem 6 gives bounds on H (X |Y) with a fixed H (X |Y) for α,β ∈[1/2,∞] and |supp(P )|≤2. β α X – Theorem 7 gives bounds on H (X |Y) with a fixed H (X |Y) for α,β ∈[1/2,∞] and |supp(P )|≥3. β α X • Section IV-B shows sharp bounds on Hβ(X |Y) with a fixed Hα(X |Y) for two orders α∈(0,1)∪(1,∞] and β ∈(0,∞], as shown in Theorem 8. Note that unlike Theorems 5–7, Theorem 8 has no constraint of the support supp(P ). X Finally, Section V shows some applications of sharp bounds on the conditional Re´nyi entropy to other related information measures, whose are strictly monotone functions of the conditional Re´nyi entropy. II. EXTREMALDISTRIBUTIONSvn(·)ANDw(·)ANDTHEIRPROPERTIES In this subsection, we introduce the probability distributions v (·) and w(·), which play significant roles in this n study. In addition, the (cid:96) -norms and Re´nyi entropy of them are investigated. Until Section III, we defer to show r extremality of these distributions v (·) and w(·) in terms of the (cid:96) -norm and Re´nyi entropy. n r February3,2017 DRAFT 4 For each n∈N and p∈[1/n,1], we define the n-dimensional probability vector5 v (p):=(v ,v ,v ,...,v ), (14) n 0 1 2 n−1 where N denotes the set of positive integers and v is chosen so that i  p if i=0, v := (15) i 1−p  otherwise n−1 for each i∈{0,1,2,...,n−1}. In addition, for p∈(0,1], we define the infinite-dimensional probability vector w(p):=(w ,w ,w ,...), (16) 0 1 2 where w is chosen so that i  p if 0≤i<(cid:98)1/p(cid:99), wi := 1−(cid:98)1/p(cid:99)p if i=(cid:98)1/p(cid:99), (17) 0 if i>(cid:98)1/p(cid:99) for each i ∈ {0,1,2...}, and (cid:98)x(cid:99) := max{z ∈ Z | z ≤ x} denotes the floor function of x ∈ R. Note that |supp(v (p))|=n for every p∈[1/n,1), and |supp(w(p))|=m+1 for every m∈N and p∈[1/(m+1),1/m), n i.e., these are discrete probability distributions with finite supports. Since v (1) has only one probability mass 1 1 whenever n=1, we omit its trivial case in our analyses; and assume that n∈N in this study, where N denotes ≥2 ≥k the set of integers n satisfying n≥k. By the definition (2) of (cid:96) -norm, for each r ∈(0,∞), the (cid:96) -norms of these r r distributions v (·) and w(·) can be calculated as follows: n (cid:16) (cid:17)1/r (cid:107)v (p)(cid:107) = pr+(n−1)1−r(1−p)r , (18) n r (cid:18)(cid:22)1(cid:23) (cid:18) (cid:22)1(cid:23) (cid:19)r(cid:19)1/r (cid:107)w(p)(cid:107) = pr+ 1− p , (19) r p p respectively. In particular, the (cid:96) -norms are (cid:107)v (p)(cid:107) = p for p ∈ [1/n,1] and (cid:107)w(p)(cid:107) = p for p ∈ (0,1]. ∞ n ∞ ∞ Substituting (18) and (19) into (1), the Re´nyi entropies of the distributions v (·) and w(·), respectively, can also be n calculated as follows: 1 (cid:16) (cid:17) H (v (p))= ln pα+(n−1)1−α(1−p)α , (20) α n 1−α 1 (cid:18)(cid:22)1(cid:23) (cid:18) (cid:22)1(cid:23) (cid:19)α(cid:19) H (w(p))= ln pα+ 1− p , (21) α 1−α p p respectively. We first show the monotonicities of (cid:96) -norm of the distributions v (·) and w(·) in the following lemma. r n Lemma1. Letr ∈(0,1)∪(1,∞]andn∈N befixednumbers.Ifr ∈(0,1),thenboth(cid:96) -normsp (cid:55)→(cid:107)v (p )(cid:107) ≥2 r v n v r and p (cid:55)→ (cid:107)w(p )(cid:107) are strictly decreasing functions of p ∈ [1/n,1] and p ∈ (0,1], respectively. Conversely, w w r v w 5Notein[30,Eq.(3)]thattheprobabilityvectorvn(·)isdefinedbyanotherform;however,asimplechangeofvariablesimmediatelyshows thattheseareessentiallyequavalent. February3,2017 DRAFT 5 if r ∈ (1,∞], then both (cid:96) -norms p (cid:55)→ (cid:107)v (p )(cid:107) and p (cid:55)→ (cid:107)w(p )(cid:107) are strictly increasing functions of r v n v r w w r p ∈[1/n,1] and p ∈(0,1], respectively. v w ProofofLemma1: Since(cid:107)v (p )(cid:107) =p and(cid:107)w(p )(cid:107) =p forp ∈[1/n,1]andp ∈(0,1],respectively, n v ∞ v w ∞ w v w Lemma 1 is trivial if r =∞. Hence, it suffices to consider the (cid:96) -norm for r ∈(0,1)∪(1,∞). r We first verify the monotonicity of the function p(cid:55)→(cid:107)v (p)(cid:107) . A direct calculation shows n r ∂(cid:107)v (p)(cid:107) ∂ (cid:16) (cid:17)1/r n r (=18) pr+(n−1)1−r(1−p)r (22) ∂p ∂p 1(cid:16) (cid:17)(1/r)−1(cid:18) ∂ (cid:16) (cid:17)(cid:19) = pr+(n−1)1−r(1−p)r pr+(n−1)1−r(1−p)r (23) r ∂p 1(cid:16) (cid:17)(1/r)−1(cid:16) (cid:17) = pr+(n−1)1−r(1−p)r rpr−1−r(n−1)1−r(1−p)r−1 (24) r (cid:16) (cid:17)(1/r)−1(cid:16) (cid:17) = pr+(n−1)1−r(1−p)r pr−1−(n−1)1−r(1−p)r−1 . (25) If we define the sign function as  −1 if x<0, sgn(x):= 0 if x=0, (26) 1 if x>0 for x∈R, then it follows that (cid:18)∂(cid:107)v (p)(cid:107) (cid:19) (cid:18)(cid:16) (cid:17)(1/r)−1(cid:19) (cid:16) (cid:17) sgn n r (=25)sgn pr+(n−1)1−r(1−p)r sgn pr−1−(n−1)1−r(1−p)r−1 (27) ∂p (cid:124) (cid:123)(cid:122) (cid:125) =1 (cid:16) (cid:17) =sgn pr−1−(n−1)1−r(1−p)r−1 (28)  −1 if r <1, = 0 if r =1, (29) 1 if r >1 for every n∈N , p∈(1/n,1), and r ∈(0,∞). This implies that for any fixed n∈N , ≥2 ≥2 • if r ∈(0,1), then p(cid:55)→(cid:107)vn(p)(cid:107)r is strictly decreasing for p∈[1/n,1], • if r ∈(1,∞), then p(cid:55)→(cid:107)vn(p)(cid:107)r is strictly increasing for p∈[1/n,1]; and therefore, the assertion of Lemma 1 holds for p(cid:55)→(cid:107)v (p)(cid:107) . n r We next verify the monotonicity of the function p(cid:55)→(cid:107)w(p)(cid:107) . Since (cid:98)1/p(cid:99)=m for each p∈(1/(m+1),1/m] r and m∈N, we readily see that ∂(cid:107)w(p)(cid:107)r (=19) ∂ (cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)1/r (30) ∂p ∂p = 1(cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)(1/r)−1(cid:18) ∂ (cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)(cid:19) (31) r ∂p = 1(cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)(1/r)−1(cid:16)rmpr−1−rm(cid:0)1−mp(cid:1)r−1(cid:17) (32) r February3,2017 DRAFT 6 =m(cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)(1/r)−1(cid:16)pr−1−(cid:0)1−mp(cid:1)r−1(cid:17) (33) for every m∈N, p∈(1/(m+1),1/m), and r ∈(0,∞). Hence, we obtain sgn(cid:18)∂(cid:107)w(p)(cid:107)r(cid:19)(=33)sgn(cid:18)m(cid:16)mpr+(cid:0)1−mp(cid:1)r(cid:17)(1/r)−1(cid:19) sgn(cid:16)pr−1−(cid:0)1−mp(cid:1)r−1(cid:17) (34) ∂p (cid:124) (cid:123)(cid:122) (cid:125) =1 =(cid:16)pr−1−(cid:0)1−mp(cid:1)r−1(cid:17) (35)  −1 if r <1, = 0 if r =1, (36) 1 if r >1 for every m ∈ N, p ∈ (1/(m+1),1/m), and r ∈ (0,∞). This implies that for each fixed m ∈ N and r ∈ (0,1)∪(1,∞), • if r ∈(0,1), then p(cid:55)→(cid:107)w(p)(cid:107)r is strictly decreasing for p∈(1/(m+1),1/m], • if r ∈(1,∞), then p(cid:55)→(cid:107)w(p)(cid:107)r is strictly increasing for p∈(1/(m+1),1/m]. Finally, it follows that lim (cid:107)w(p)(cid:107) = lim (cid:16)(cid:4)1/p(cid:5)pr+(cid:0)1−(cid:4)1/p(cid:5)p(cid:1)r(cid:17)1/r (37) r p→(1/m)+ p→(1/m)+ (cid:16) (cid:0) (cid:1)r (cid:16) (cid:0) (cid:1)(cid:17)r(cid:17)1/r = (m−1) 1/m + 1−(m−1) 1/m (38) (cid:16) (cid:17)1/r = (m−1)m−r+m−r (39) =m(1−r)/r (40) =(cid:107)w(1/m)(cid:107) (41) r for each m ∈ N and r ∈ (0,1), which implies that p (cid:55)→ (cid:107)w(p)(cid:107) is continuous on p ∈ (0,1]; therefore, the ≥2 r monotonicity of p(cid:55)→(cid:107)w(p)(cid:107) come from (36) can be improved as follows: r • if r ∈(0,1), then p(cid:55)→(cid:107)w(p)(cid:107)r is strictly decreasing for p∈(0,1], • if r ∈(1,∞), then p(cid:55)→(cid:107)w(p)(cid:107)r is strictly increasing for p∈(0,1]. This completes the proof of Lemma 1. Lemma 1 implies the existences of inverse functions. Let6 1−t θ(r):= lim , (42) t→r t 6Eq.(42)isdefinedtofulfillθ(∞)=−1. February3,2017 DRAFT 7 and let I (r) and J(r) be real intervals defined by n  (cid:2)1,nθ(r)(cid:3) if 0<r <1, I (r):= (43) n (cid:2)nθ(r),1(cid:3) if 1<r ≤∞,  [1,∞) if 0<r <1, J(r):= (44) (0,1] if 1<r ≤∞ for each n∈N and r ∈(0,1)∪(1,∞], respectively. For each r ∈(0,1)∪(1,∞] and n∈N , we denote by ≥2 ≥2 N−1(v :·):I (r)→[1/n,1], (45) r n n N−1(w :·):J(r)→(0,1] (46) r inversefunctionsofp (cid:55)→(cid:107)v (p )(cid:107) andp (cid:55)→(cid:107)w(p )(cid:107) ,respectively.Assimpleinstancesofthem,ifr =∞,then v n v r w w r N−1(v :t )=t andN−1(w :t )=t fort ∈[1/n,1]andt ∈(0,1],respectively,because(cid:107)v (p )(cid:107) =p ∞ n v v ∞ w w v w n v ∞ v and (cid:107)w(p )(cid:107) =p for p ∈[1/n,1] and p ∈(0,1], respectively. w ∞ w v w Since logarithm functions are strictly monotone, it also follows from (1) and Lemma 1 that both Re´nyi entropies p (cid:55)→H (v (p )) and p (cid:55)→H (w(p )) also have inverse functions for every7 α∈(0,∞], as with (45) and (46). v α n v w α w For each n∈N and α∈(0,∞], we denote by ≥2 H−1(v :·):[0,lnn]→[1/n,1], (47) α n H−1(w :·):[0,∞)→(0,1] (48) α inverse functions of p (cid:55)→H (v (p )) and p (cid:55)→H (w(p )), respectively. By convention of the Shannon entropy, v α n v w α w wewriteH−1(v :·)andH−1(w :·)astheinversefunctionsH−1(v :·)andH−1(w :·)withα=1,respectively. n 1 n 1 In general, these inverse functions are hard-to-express in closed-forms, as with the inverse function of the binary entropy function h : t (cid:55)→ −tlnt−(1−t)ln(1−t). As special cases of them, we give the following specific 2 closed-forms. Fact 1. If α=1/2, α=2, or α=∞, then the inverse functions (47) and (48) can be expressed in the following closed-forms: (cid:112) n(n−1)−(n−2)eµ+2 eµ(n−1)(n−eµ) H−1(v :µ)= for n∈N and µ∈[0,lnn], (49) 1/2 n n2 (cid:112) (m+1)+(m−1)eµ+2 eµm(1+m−eµ) H−1(w :µ)= with m=(cid:98)eµ(cid:99) for µ∈[0,∞), (50) 1/2 m(1+m)2 (cid:112) 1+ e−µ(n−1)(n−eµ) H−1(v :µ)= for n∈N and µ∈[0,lnn], (51) 2 n n (cid:112) m+ e−µm(1+m−eµ) H−1(w :µ)= with m=(cid:98)eµ(cid:99) for µ∈[0,∞), (52) 2 m(1+m) H−1(v :µ)=e−µ for n∈N and µ∈[0,lnn], (53) ∞ n 7Ifα=1,i.e.,iftheseareShannonentropies,theseinversefunctionsalsoexistdueto[30,Lemma1]. February3,2017 DRAFT 8 α=1 (Shannon) α=1 (Shannon) α=1/2 α=1/2 µ) µ) : : vn w α=∞ ( ( (µ,p)=(ln2,1/2) 1 1 −α −α α=2 H α=∞ H = = (ln3,1/3) p p α=2 [nats] [nats] µ=H (v (p)) µ=H (w(p)) α n α (a)PlotofHα−1(vn:µ)withn=4forµ∈[0,ln4]. (b)PlotofHα−1(w:µ)forµ∈[0,ln4]. Fig.1. Plotsoftheinversefunctions(47)and(48)withα=1/2,α=1,α=2,andα=∞.ThehorizontalaxesdenotetheRe´nyientropyof distributionsvn(p)andw(p),i.e.,theargumentsoftheinversefunctions(47)and(48).Theverticalaxesdenotetheparameterpofdistributions vn(p)andw(p),i.e.,thevaluesoftheinversefunctions(47)and(48).Fact1isusedtoplotthemforthecases:α=1/2,α=2,andα=∞. H−1(w :µ)=e−µ for µ∈[0,∞), (54) ∞ where e denotes the base of natural logarithm. Fact 1 can be verified by the quadratic formula in the case of8 α = 1/2 and α = 2. In Fig. 1, we illustrate instances of the inverse functions of Fact 1, along with the inverse functions H−1(v :·) and H−1(w :·) of the n Shannon entropies. As with Fact 1, from the relation between the Re´nyi entropy and (cid:96) -norm (cf. (1)), the inverse r functions N−1(v :·) of (45) and N−1(w :·) of (46) can also be expressed in closed-forms if r =1/2, r =2, or r n r r =∞. By Fact 1, sharp bounds established in this paper can be expressed in closed-forms in some situations. Using the inverse functions H−1(v : ·) and H−1(w : ·) of the Shannon entropies, we introduce relations of n the convexity/concavity of the (cid:96) -norm with respect to the Shannon entropy of distributions v (·) and w(·) in r n Lemmas 2 and 3, respectively. Lemma 2 ([29, Lemma 2]). If n = 2, for each r ∈ (0,1)∪(1,∞), the (cid:96) -norm µ (cid:55)→ (cid:107)v (H−1(v : µ))(cid:107) is r 2 2 r strictly concave in µ∈[0,ln2]. In addition9, for each n∈N , the (cid:96) -norm µ(cid:55)→(cid:107)v (H−1(v :µ))(cid:107) is strictly ≥2 ∞ n n ∞ concave in µ ∈ [0,lnn]. Moreover, for each n ∈ N and r ∈ [1/2,1)∪(1,∞), there exists an inflection point ≥3 χ (r)∈(0,lnn) such that satisfies the following: n • the (cid:96)r-norm µ(cid:55)→(cid:107)vn(H−1(vn :µ))(cid:107)r is strictly concave in µ∈[0,χn(r)], • the (cid:96)r-norm µ(cid:55)→(cid:107)vn(H−1(vn :µ))(cid:107)r is strictly convex in µ∈[χn(r),lnn]. Lemma 3 ([29, Lemma 3]10). For each m∈N and r ∈(0,1)∪(1,∞], the (cid:96) -norm µ(cid:55)→(cid:107)w(H−1(w :µ))(cid:107) is r r 8Ifα=∞,thenFact1isalmosttrivialfromthedefinition(6). 9Thisconcavityisshowninnot[29,Lemma2]butthebelowparagraphof[29,Lemma2]. 10In[29,Lemma3],thecaser=∞isnotconsidered;however,itcanalsobeprovedbythefactthat(cid:107)w(p)(cid:107)∞=pforp∈(0,1],aswith Lemma2. February3,2017 DRAFT 9 strictly concave in µ∈[lnm,ln(m+1)]. In [29], Lemmas 2 and 3 were used to derive sharp bounds on the conditional Shannon entropy H(X |Y) with a fixed conditional Re´nyi entropy H (X |Y), and vice versa, from perspectives of the expectation of (8) and (11). α In this study, we establish sharp bounds on H (X | Y) with a fixed H (X | Y) for distinct orders α (cid:54)= β in a β α similar manner to [29], i.e., the property of expectation (8) are employed. To this end, we further examine the convexity/concavity of (cid:96) -norms with respect to (cid:96) -norm for distributions v (·) and w(·), as with Lemmas 2 and 3, r s n respectively. To derive such convexity/concavity lemmas, we now give the following Lemma 4. Lemma 4. We define the function g(n,z;r,s):=(cid:0)zr+(n−1)(cid:1)ln z−(cid:0)zs+(n−1)(cid:1)ln z (55) r s for each n∈N , z ∈(0,∞), and r,s∈(0,∞), where the q-logarithm function11 [40] is defined by ≥2  lnx if q =1, ln x:= (56) q x1−q−1  if q (cid:54)=1 1−q for x>0 and q ∈R. Then, the following three assertions hold: • For any n∈N≥2, any z ∈(0,1), and any 0<r <s<∞, it holds that g(n,z;r,s)=−g(n,z;s,r)>0, (57) • if n=2, then for any z ∈(1,∞) and 1/2≤r <s<∞, it holds that g(2,z;r,s)=−g(2,z;s,r)<0, (58) • for any n≥N≥3 and any 1/2≤r <s<∞, there exists ζ(n;r,s)∈(1,∞) such that  (cid:16) (cid:17) (cid:16) (cid:17) −1 if ζ(n;r,s)<z <∞, sgn g(n,z;r,s) =−sgn g(n,z;s,r) = 0 if z =1 or z =ζ(n;r,s), (59) 1 if 1<z <ζ(n;r,s) for every z ∈(1,∞). Lemma 4 is proved in Appendix A. Defining12 1−a γ(r,s):= lim , (60) (a,b)→(r,s) 1−b we present the convexity/concavity of (cid:96) -norms of v (·) and w(·) with respect to (cid:96) -norms of them, r (cid:54)= s, in s n r Lemmas 5 and 6, respectively. We emphasize that Lemma 4 is a key lemma for deriving Lemmas 5 and 6. 11Notethatthelimitingvaluelimq→1(x1−q−1)/(1−q)=lnxcanbeverifiedbyL’Hoˆpital’srule. 12In(60),supposethatγ(∞,∞)=1. February3,2017 DRAFT 10 Lemma 5. For each n∈N and r,s∈(0,1)∪(1,∞), it holds that ≥2 • the (cid:96)∞-norm t(cid:55)→(cid:107)vn(Nr−1(vn :t))(cid:107)∞ is strictly concave in t∈In(r), • if s∈(0,1), then t(cid:55)→(cid:107)vn(N∞−1(vn :t))(cid:107)s is strictly concave in t∈[1/n,1], • if s∈(1,∞), then t(cid:55)→(cid:107)vn(N∞−1(vn :t))(cid:107)s is strictly convex in t∈[1/n,1]. Moreover, if n=2, then it holds that for any distinct r,s∈[1/2,1)∪(1,∞), • if γ(r,s)>1, then t(cid:55)→(cid:107)vn(Nr−1(vn :t))(cid:107)s is strictly convex in t∈In(r), • if γ(r,s)<1, then t(cid:55)→(cid:107)vn(Nr−1(vn :t))(cid:107)s is strictly concave in t∈In(r). Furthermore, for each n ∈ N and distinct r,s ∈ [1/2,1)∪(1,∞), there exists an inflection point τ(n;r,s) ∈ ≥3 I (r)\{1,nθ(r)} such that n • if γ(r,s)>1, then – the (cid:96) -norm t(cid:55)→(cid:107)v (N−1(v :t))(cid:107) is strictly convex in t∈I(1)(r,s), s n r n s n – the (cid:96) -norm t(cid:55)→(cid:107)v (N−1(v :t))(cid:107) is strictly concave in t∈I(2)(r,s), s n r n s n • if γ(r,s)<1, then – the (cid:96) -norm t(cid:55)→(cid:107)v (N−1(v :t))(cid:107) is strictly convex in t∈I(2)(r,s), s n r n s n – the (cid:96) -norm t(cid:55)→(cid:107)v (N−1(v :t))(cid:107) is strictly concave in t∈I(1)(r,s), s n r n s n where real intervals I(1)(r,s) and I(2)(r,s) are defined by n n  (cid:2)1,τ(n;r,s)(cid:3) if r ∈(0,1), I(1)(r,s):= (61) n (cid:2)τ(n;r,s),1(cid:3) if r ∈(1,∞),  (cid:2)τ(n;r,s),nθ(r)(cid:3) if r ∈(0,1), I(2)(r,s):= (62) n (cid:2)nθ(r),τ(n;r,s)(cid:3) if r ∈(1,∞), respectively. Note that the convexity and the concavity of Lemma 5 are switched each other according to either γ(r,s)>1 or γ(r,s)<1. We illustrate two regions of pairs (r,s) which fulfill γ(r,s)>1 and γ(r,s)<1, respectively, in Fig. 2. Proof of Lemma 5: In a similar way to the proofs of [15, Lemma 1] and [29, Lemma 2], we prove this lemma by verifying signs of derivatives. A simple calculation yields ∂2(cid:107)v (p)(cid:107) ∂ (cid:18)(cid:16) (cid:17)(1/r)−1(cid:16) (cid:17)(cid:19) n r (=25) pr+(n−1)1−r(1−p)r pr−1−(n−1)1−r(1−p)r−1 (63) ∂p2 ∂p (cid:18) ∂ (cid:16) (cid:17)(1/r)−1(cid:19)(cid:16) (cid:17) = pr+(n−1)1−r(1−p)r pr−1−(n−1)1−r(1−p)r−1 ∂p (cid:16) (cid:17)(1/r)−1(cid:18) ∂ (cid:16) (cid:17)(cid:19) + pr+(n−1)1−r(1−p)r pr−1−(n−1)1−r(1−p)r−1 (64) ∂p (cid:18)1−r(cid:16) (cid:17)(1/r)−2(cid:18) ∂ (cid:16) (cid:17)(cid:19)(cid:19)(cid:16) (cid:17) = pr+(n−1)1−r(1−p)r pr+(n−1)1−r(1−p)r pr−1−(n−1)1−r(1−p)r−1 r ∂p (cid:16) (cid:17)(1/r)−1(cid:16) (cid:17) + pr+(n−1)1−r(1−p)r (r−1)pr−2+(r−1)(n−1)1−r(1−p)r−2 (65) February3,2017 DRAFT

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.