ebook img

Channels with cost constraints: strong converse and dispersion PDF

0.33 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Channels with cost constraints: strong converse and dispersion

1 Channels with cost constraints: strong converse and dispersion Victoria Kostina Sergio Verdu´ California Institute of Technology Princeton University Pasadena, CA 91125 Princeton, NJ 08544 [email protected] [email protected] 5 Abstract—This paper shows the strong converse and the finite memory [9]. In a more general setting not requiring 1 dispersion of memoryless channels with cost constraints and the assumption of stationarity or finite memory, Verdu´ and 0 performs refined analysis of the third order term in the asymp- Han [10] showed a necessary and sufficient condition for a 2 toticexpansion of themaximum achievable channel codingrate, showing that it is equal to 1logn in most cases of interest. The channelwithoutcostconstraintstosatisfythestrongconverse, ct analysis is based on a non-a2symnptotic converse boundexpressed while Han [11, Theorem 3.7.1] generalized that condition to O in terms of the distribution of a random variable termed the the setting with cost constraints. In the special case of finite- b-tiltedinformationdensity,whichplaysarolesimilartothatof input channels, that necessary and sufficient condition boils 8 the d-tilted information in lossy source coding. We also analyze down to the capacity being equal to the limit of maximal ] tchheanfunneldsawmiethntcaolsltimcoitnssotrfalionstssy. joint-source-channel coding over normalized mutual informations. In turn, that condition is T implied by the information stability of the channel [12], a I Index Terms—Converse, finite blocklength regime, channels condition which in general is not easy to verify. Using a s. with cost constraints, joint source-channel coding, strong con- novel notion of strong information stability, a general strong c verse, dispersion, memoryless sources, memoryless channels, [ Shannon theory. converse result was recently shown in [13, Theorem 3]. The strong converse for DMC with separable cost was shown by 3 Csisza´r and Ko¨rner [14, Theorem 6.11] and by Han [11, v I. INTRODUCTION 4 Theorem 3.7.2]. Regarding continuous channels, in the most 2 This paper is concernedwith the maximumchannelcoding basic case of the memoryless additive white Gaussian noise 1 rate achievable at average error probability ǫ > 0 where (AWGN) channel with the cost function being the power of 5 the cost of each codeword is constrained. The capacity-cost the channel input block, b (xn) = 1 xn 2, the strong con- 1. functionC(β) ofachannelspecifiesthemaximumachievable verse was shown by Shannnon [15] (cno|ntem| poraneously with 0 channelcodingrate compatiblewith vanishingerrorprobabil- Wolfowitz’s finite-alphabet strong converse). Yoshihara [16] 4 ity and with codeword cost not exceeding β in the limit of proved the strong converse for the time-continuous channel 1 large blocklengths. with additive Gaussian noise having an arbitrary spectrum : v A channelis said to satisfy the strong converse if ǫ 1 as and also gave a simple proof of Shannon’s strong converse Xi n for any code operating at a rate above the ca→pacity. result. Under the requirementthat the power of each message → ∞ For memoryless channels without cost constraints, the strong converges stochastically to a given constant β, the strong r a conversewas first shown by Wolfowitz: [1] treats the discrete conversefortheAWGN channelwith feedbackwasshownby memoryless channel (DMC), while [2] generalizes the result Wolfowitz [17]. Note that in all those analyses of the power- to memoryless channels whose input alphabet is finite while constrained AWGN channel the cost constraint is meant on the output alphabet is the real line. Arimoto [3] showed a a per-codeword basis. In fact, the strong converse ceases to new converse bound stated in terms of Gallager’s random hold if the cost constraint is averaged over the codebook [18, coding exponent, which also leads to the strong converse Section 4.3.3]. for the DMC. Dueck and Ko¨rner [4] found the reliability Channel dispersion quantifies the backoff from capacity, function of DMC for rates above capacity, a result which implies a strong converse. Kemperman [5] showed that the unescapable at finite blocklengths due to the random nature strong converse holds for a DMC with feedback. A simple ofthechannelcomingintoplay,as opposedto the asymptotic proof of strong converse for memoryless channels that does representation of the channel as a deterministic bit pipe not invoke measure concentration inequalities was recently of a given capacity. More specifically, for coding over the given in [6]. For a class of discrete channels with finite DMC, the maximum achievable code rate at blocklength memory, the strong converse was shown by Wolfowitz [7] n compatible with error probability ǫ is approximated by and independently by Feinstein [8], a result soon generalized C VQ 1(ǫ) [19], [20] where C is the channel capacity, − n − to a more general class of stationary discrete channels with V isqthe channel dispersion, and Q−1() is the inverse of · theGaussian complementarycdf.Polyanskiyetal. [20] found ThisworkwassupportedinpartbytheNationalScienceFoundation(NSF) the dispersion of the DMC withoutcost constraintsas well as under Grant CCF-1016625 and by the Center for Science of Information (CSoI),anNSFScience andTechnology Center,underGrantCCF-0939370. thatoftheAWGNchannelwithapowerconstraint.Inparallel, 2 Hayashi[21,Theorem3]gavethedispersionoftheDMCwith n, n, P and b , respectively. Denote YnXn n andwithoutcostconstraints(withthelooseestimateofo(√n) A B | forthethirdorderterm).For constantcompositioncodesover C(β)= sup I(X;Y), (2) theDMC,Polyanskiy[18,Sec.3.4.6]showedthedispersionof E[bP(XX):] β ≤ constantcompositioncodesovertheDMC,whileMoulin[22] λ⋆ =C(β). (3) ′ refined the third-orderterm in the expansionof the maximum achievable code rate, under regularity conditions. Wang et Since C(β) is non-decreasing concave function of β [14, al. [23] gave a second-order analysis of joint source-channel Theorem6.11],λ⋆ 0.ForrandomvariablesY andY¯ defined ≥ coding over finite alphabets based on constant composition on the same space, denote codebooks. dP meInntathlilsimpaitpefor,rwcoeddienmgoonvsetrractheatnhnaetltshewnitohncaosystmcpotnostitcrafiunntsdais- ıYkY¯(y)=logdPYY¯(y). (4) closelyapproximatedintermsofthecdfofa randomvariable If Y is distributed accordingto P , we abbreviate the Y X=x we refer to as the b-tilted information density, which paral- notation as | lels the notion of d-tilted information for lossy compression dP [g2e4n]e.raWlechsahnonwelsa wsiimthplienpnuotn-caossytmcpotnostitcraicnotnsvienrseterbmousnodffbor- ıX;Y¯(x;y)=log dYP|XY¯=x(y). (5) tilted information density. Not only does this bound lead to a in lieu of ıY X=x Y¯(y). The information density ıX;Y(x;y) general strong converse result, but it is also tight enough to betweenreali|zationksoftworandomvariableswithjointdistri- find the channel dispersion-cost function and the third order butionP P followsbyparticularizing(5)to P ,P , X Y X Y X Y term equal to 21logn when coupled with the corresponding wherePX |PY X PY1. Ingeneral,however,{the|functio}n aDcMhiCev,alboigliMty⋆b(onu,nǫd,.βM),othreelsopgeacriifithcmalloy,fwtheemshaoxwimtuhmatafcohriethve- dinist(r5ib)utdioone→s. not|req→uire PY¯ to be induced by any input able code size at blocklength n, error probability ǫ and cost Further, define the function β, is given by, under mild regularity assumptions 1 X;Y¯(x;y,β)=ıX;Y¯(x;y)−λ⋆(b(x)−β). (6) logM⋆(n,ǫ,β)=nC(β) nV(β)Q 1(ǫ)+ logn+O(1) − − 2 (1) The special case of (6) with PY¯ =PY⋆, where PY⋆ is the p unique output distribution that achieves the supremum in (2) where V(β) is the dispersion-cost function, thereby refining [25], defines b-tilted information density: Hayashi’s result [21] and providing a matching converse to the result of Moulin [22]. We observe that the capacity- Definition 1 (b-tilted informationdensity). The b-tilted infor- cost and the dispersion-cost functions are given by the mean mation density between x and y is  (x;y,β). X;Y⋆ ∈ X ∈ Y and the variance of the b-tilted information density. This novel interpretation juxtaposes nicely with the corresponding SinceP isuniqueevenifthereareseveral(ornone)input results in [24] (d-tilted information in rate-distortion theory). Y⋆ distributions P that achieve the supremum in (2), there is Furthermore, we generalize (1) to lossy joint source-channel X⋆ no ambiguity in Definition 1. If there are no cost constraints codingofgeneralmemorylesssourcesoverchannelswithcost. (i.e. b(x)=0 x ), then C(β)=0 regardless of β, and Section II introduces the b-tilted information density. Sec- ∀ ∈X ′ tion III states the new non-asymptotic converse bound which X;Y¯(x;y,β)=ıX;Y¯(x;y). (7) holds for a general channel with cost constraints, without making any assumptions on the channel (e.g. alphabets, sta- The counterpart of the b-tilted information density in rate- tionarity, memorylessness). An asymptotic analysis of the distortion theory is the d-tilted information [24]. converse and achievability bounds, including the proof of the Example 1. For n uses of a memoryless AWGN channel strongconverseandtheexpressionforthechanneldispersion- with unit noise power and maximalpower not exceeding nP, cost function, is presented in Section IV in the context of C(P)= nlog(1+P),andtheoutputdistributionthatachieves memoryless channels. Section V generalizes the results in (2) is Yn2⋆ (0,(1+P)I). Therefore Sections III and IV to the lossy joint source-channel coding ∼N setup.  (xn;yn,P)= nlog(1+P) loge yn xn 2 Xn;Yn⋆ 2 − 2 | − | loge + yn 2 xn 2+nP , II. b-TILTEDINFORMATION DENSITY 2(1+P) | | −| | (cid:16) (cid:17) (8) In this section, we introduce the concept of b-tilted infor- mation density and several relevant properties in a general where the Euclidean norm is denoted by xn 2 = n x2. single-shot approach. It is easy to check that under P ,|the|distribuit=io1n oif YnXn=xn Fix the transitionprobabilitykernelP : and the | P Y X costfunctionb: X 7→[0,∞].Intheapplic|atioXno→fthYissingle- 1We write PX → PY|X → PY to indicate that PY is the marginal of shot approach in Section IV, X, Y, PY|X and b will become PXPY|X,i.e.PY(y)=RXdPY|X(y|x)dPX(x). 3  (xn;Yn,P) is the same as that of (by ‘ ’ we mean that Xn;Yn⋆ ∼ equality in distribution) D(P P ) I⋆+o(I⋆) xn n: b (xn) β,  (xn;Yn,P) Yn|Xn=xnk Yn ≤ n n ∀ ∈A n ≤(19) Xn;Yn⋆ where is the single-letter channel input alphabet, and n P loge xn 2 A ∼ 2 log(1+P)− 2(1+P)"W|nxPn2|2 −n− |P2| #, (9) lIain⋆mquna→=si∞-can1moIdan⋆.x,P(X1n8):bi(mXpnl)i≤esbat.sh.aIt(XPYn⋆;Y×n.)... ×SinPcYe⋆ Cis(βal)way=s where Wℓ denotes a non central chi-square distributed ran- λ dom variable with ℓ degrees of freedom and non-centrality Corollary 2. For all P P X X⋆ parameterλ. The mean of (9) is nlog(1+P), in accordance ≪ with (16), while its variance is212(nP(21++2P|x)2n|2)log2e which Var[X;Y⋆(X;Y,β)]==EE[[VVaarr[[ıX;Y⋆((XX;;YY,)βX)|]X].]] ((2210)) becomes nV(P) (found in [20] and displayed in (46)) af- X;Y⋆ | ter averaging with respect to Xn⋆ distributed according to P (0,PI). Xn⋆ ∼N Proof: Appendix B. Denote 2 β = inf b(x), (10) III. NONASYMPTOTICBOUNDS min x ∈X Converseandachievabilityboundsgivenecessaryandsuffi- β =sup β 0: C(β)<C( ) . (11) max { ≥ ∞ } cientconditions,respectively,on (M,ǫ,β) in orderfora code Theorem 1 below highlights the importance of b-tilted to exist with M codewords and average error probability not information density in the optimization problem (2). Of key exceeding ǫ and cost not exceeding β. Such codes (allowing significanceintheasymptoticanalysisinSectionIV,Theorem stochastic encodersanddecoders)are rigorouslydefinednext. 1givesanontrivialgeneralizationofthewell-knownproperties Definition 2 ((M,ǫ,β) code). An (M,ǫ,β) code for of information density to the setting with cost constraints. P ,b is a pair of random transformations P (en- Y X XS aTchheioevrienmg (12.) Fisixsuβcmhinth<at tβhe<conβsmtraaxi.ntAisssuamcehiethveadt PwXith⋆ {cSodeX|r) aYn}dZP,Zt|hYep(droebcoadbeilri)tysiuscehvatlhuaattePd[wSit6=hSZ]eq≤uipǫ,r|owbhaebrlee − − − equality: on an alphabet of cardinality M, and the codewords satisfy E[b(X⋆)]=β. (12) the maximal cost constraint (a.s.) b(X) β. (22) Then, the following equalities hold. ≤ C(β)=supE[ (X;Y,β)] (13) X;Y PX The non-asymptotic quantity of principal interest is =supE[X;Y⋆(X;Y,β)] (14) M⋆(ǫ,β), the maximum code size achievable at error proba- PX bility ǫ and cost β. =E[ (X⋆;Y⋆,β)] (15) X;Y⋆ =E[ (X⋆;Y⋆,β)X⋆], (16) Theorem 3 (Converse). The existence of an (M,ǫ,β) code X;Y⋆ | for P ,b requires that Y X where (16) holds P -a.s., and P P P , P { | } X⋆ X Y X Y X⋆ PY|XP→rooPf:YA⋆.ppendix A. → | → → ǫ≥mγ>a0x(cid:26) suY¯px:bi(nxf)≤βP(cid:2)ıX;Y¯(x;Y)≤logM −γ|X =x(cid:3) Throughout the paper, we assume that the assumptions of exp( γ) (23) − − Theorem 1 hold. (cid:27) For channels without cost, the inequality ≥mγ>a0x(cid:26) suY¯pxi∈nXf P X;Y¯(x;Y,β)≤logM −γ|X =x D(PY X=x PY⋆) C x (17) (cid:2) (cid:3) | k ≤ ∀ ∈X exp( γ) . (24) iskeytoprovingstrongconverses.Theorem1generalizesthis − − (cid:27) result to channels with cost, showing that E[ (x;Y,β)X =x] C(β) x . (18) Proof: The bound in (23) is due to Wolfowitz [26]. The X;Y⋆ | ≤ ∀ ∈X bound in (24) simply weakens (23) using b(x) β. ≤ Note that (18) is crucialfor showing both the strong converse By restricting the channel input space appropriately, con- and the refined asymptotic analysis. verse bounds for channels with cost constraints can be ob- Remark1. Thegeneralstrongconverseresultin[13,Theorem tained from the converse bounds in [20], [27]. Their analysis 3] includes channels with cost using the concept of ‘quasi- becomes tractable by the introduction of b-tilted information caod’, which is defined as any output distribution P such density in (24) and an application of (18). Yn Achievabilityboundsforchannelswith cost constraintscan 2Weallow βmax=+∞. be obtained from the random coding bounds in [20], [27] 4 by restricting the distribution from which the codewords are that achieves C(β), to obtain drawntosatisfyb(X) β a.s.Inparticular,fortheDMC,we ≤ n maychoosePXn tobeequiprobableonthesetofcodewordsof ǫ inf P  (x ;Y ,β) nC(β)+nα X;Y⋆ i i the type closest (among types satisfying the cost constraint) ≥xn n " ≤ # ∈A i=1 to the input distribution P that achieves the capacity-cost X X⋆ exp( nα) (27) function. As shown in [21], such constant composition codes − − n n achieve the dispersion of channel coding under input cost inf P  (x ;Y ,β) c(x )+nα constraints. Unfortunately, the computation of such bounds ≥xn n " X;Y⋆ i i ≤ i # ∈A i=1 i=1 may become challenging in high dimension, particularly with X X exp( nα), (28) continuous alphabets. − − where for notational convenience we have abbreviated IV. ASYMPTOTICANALYSIS c(x)=E[X;Y⋆(x;Y,β)X=x], (29) | To introduce the blocklength into the non-asymptotic and (28) employs (14). converse of Section III, we consider (M,ǫ,β) codes for To show that the right side of (28) converges to 1, we P ,b , where P : n n and b : n invokethe following law of largenumbersfor non-identically YnXn n YnXn n [{0, ]|. We ca}ll such codes|(n,MA,ǫ,β7→) coBdes, and denAote t7→he distributed random variables. ∞ corresponding non-asymptotically achievable maximum code Lemma 1 (e.g. [28]). Suppose that W are uncorrelated and size by M⋆(n,ǫ,β). i ∞i=1Var Wcii <∞forsome strictly positivesequence(cn) increasinghto +i . Then, P ∞ A. Assumptions n n 1 W E W 0 in L2. (30) The following basic assumptions hold throughout Section cn i− " i#!→ i=1 i=1 IV. X X (i) The channel is stationary and memoryless, P = YnXn Let W = (x ;Y ,β) and c =i. Since (recall (iv)) P ... P . | i X;Y⋆ i i i YX YX (ii) wThhe|erce×obst:fu×nctio[n0|,is s]e.parable, bn(xn)= n1 ni=1b(xi), ∞ Var 1iX;Y⋆(xi;Yi,β)|Xi =xi ≤Vmax ∞ i12 (31) (iii) EachcodeAwo7→rdisc∞onstrainedtosatisfythemPaximalcost Xi=1 (cid:20) (cid:21) Xi=1 constraint, b (xn) β. < , (32) n ∞ ≤ (iv) sup Var[ (x;Y,β)X=x]=V < . x∈A X;Y⋆ | max ∞ by virtue of Lemma 1, the right side of (28) converges to Under these assumptions, the capacity-cost function is given 1, so any channel satisfying (i)–(iv) also satisfies the strong by converse. As noted in [18, Theorem 77] in the contextof the AWGN C(β)= sup I(X;Y). (25) PX:E[b(X)]≤β channel,thestrongconversedoesnotholdifthecostconstraint is averaged over the codebook,i.e. if, in lieu of (22), the cost Observe that in view of assumptions (i) and (ii), as long as requirement is PY¯n is a product distribution, PY¯n =PY¯ ×...×PY¯, M 1 n E[b(X)S =m] β. (33) Xn;Y¯n(xn;yn,β)= X;Y¯(xi;yi,β). (26) M mX=1 | ≤ Xi=1 To see why the strong converse does not hold in general, fix a code of rate C(β) < R < C(2β) none of whose codewords cost more than 2β and whose error probability B. Strong converse satisfies ǫ 0. Since R < C(2β), such a code exists. n → Althoughthetoolsdevelopedin SectionsIIandIIIare able Now,replacehalfofthecodewordswiththeall-zerocodeword to resultin a strong conversefor channelsthatexhibitergodic (assumingb(0)=0)whileleavingthedecisionregionsofthe behavior(seealsoRemark1),forthesakeofconcretenessand remainingcodewordsuntouched.The averagecostof the new length,weonlydealherewiththememorylesssetupdescribed code satisfies (33), its rate is greater than the capacity-cost in Section IV-A. function,R>C(β), yetitsaverageerrorprobabilitydoesnot We show that if transmission occurs at a rate greater than exceed ǫ + 1 1. n 2 → 2 thecapacity-costfunction,theerrorprobabilitymustconverge to 1, regardlessof the specificsof the code.Towardsthis end, C. Dispersion wefixsomeα>0,wechooselogM nC(β)+2nα,andwe ≥ weaken the bound (24) in Theorem 3 by fixing γ = nα and First, we give the operational definition of the dispersion- PY¯n = PY⋆ ×...×PY⋆, where Y⋆ is the output distribution cost function of any channel. 5 Definition 3 (Dispersion-cost function). The channel Remark 3. As we show in Appendix F, Theorem 4 applies to dispersion-cost function, measured in squared information channels with abstract alphabets provided that in addition to units per channel use, is defined by (i)–(ii), they meet the following criteria: 1(nC(β) logM⋆(n,ǫ,β))2 (a) The cost function b: A→[0,∞] is such that for all γ ∈ V(β)= limlimsup − . (34) [β, ), b 1(γ) is nonempty. In particular, this condition ǫ→0 n→∞ n 2loge 1ǫ is s∞atisfie−d if the channel input alphabet is a metric A An explicit expression for the dispersion-cost function of a space, and b is continuousandunboundedwith b(0)=0. discrete memoryless channel is given in the next result. (b) The distribution of ı (xn;Yn), where P = Theorem 4. In addition to assumptions (i)–(iv), assume that Xn;Yn⋆ Yn⋆ P ... P doesnotdependonthechoiceofxn , the capacity-achieving input distribution PX⋆ is unique and wYh⋆e×re ×=Y⋆xn n: b (xn)=β . ∈Fn that the channel has finite input and output alphabets. Fn { ∈A n } (c) Forallxintheprojectionof onto ,i.e.forallxsuch n F A logM⋆(n,ǫ,β)=nC(β)− nV(β)Q−1(ǫ)+θ(n), (35) that (x,x2,...,xn)∈Fn for some x2,...,xn, C(β)=E[X;Y⋆(Xp⋆;Y⋆,β)], (36) E  (X;Y,β) C(β)3 X=x < . (42) X;Y⋆ V(β)=Var[ (X⋆;Y⋆,β)], (37) | − | | ∞ X;Y⋆ h i where the remainder term θ(n) satisfies: (d)3ThereexistsadistributionP supportedon suchthat Xn n F a) If V(β)>0, ıYn Yn⋆(Yn), where PXn PYnXn PYn, is almost surekly bounded by f =o(→√n) fro|m a→bove. 1 n (supp(PX⋆) 1)logn+O(1) θ(n) (38) Then, (35) holds identifying (43)–(45) or all x s.t. −2 | |− ≤ ∈ A b(x)=β: 1 logn+O(1). ≤ 2 C(β)=D(P P ), (43) YX=x Y⋆ (39) | k V(β)=Var[ı (x;Y)X=x], (44) X;Y⋆ b) If V(β)=0, (38) holds, and (39) is replaced by 1 | f +O(1) θ(n) logn+O(1), (45) n 1 − ≤ ≤ 2 θ(n) O n3 . (40) ≤ where f =o(√n) is specified in (d). n (cid:16) (cid:17) Remark 4. Theorem 4 with the remainder in (41) [31] also Proof: holds for the AWGN channel with maximal signal-to-noise Converse. Full details are given in Appendix D. The main ratio P, offering a novel interpretation of the dispersion of steps of the refined asymptotic analysis of the bound in the Gaussian channel [20] Theorem 3 are as follows. First, building on the ideas of [29], [30], we weaken the bound in (24) by a careful choice 1 1 of a non-product auxiliary distribution PY¯n. Second, using V(P)= 2 1− (1+P)2!log2e (46) Theorem 1 and the technical tools developed in Appendix C, we show that the infimum in the right side of (24) is lower as the variance of the b-tilted information density. We note bounded by ǫ for the choice of M in (35). that the AWGN channel satisfies the conditions of Remark 3 Achievability. Full details are given in Appendix E, which with PXn uniform on the power sphere and fn =O(1) [20]. provides an asymptotic analysis of the Dependence Testing Remark 5. As we show in Appendix G, a stationary memo- bound of [20] in which the random codewords are of type ryless channelwith b(x)=x which takes a nonnegativeinput closesttoPX⋆,ratherthandrawnfromtheproductdistribution andaddsanexponentialnoiseofunitmeantoit[32],satisfies PX×...×PX, as in achievability proofs for channel coding the conditions of Remark 3 with fn =O(1), and without cost constraints. We use Corollary 2 to establish that β such constant composition codes achieve the dispersion-cost  (x;y,β)=log(1+β)+ (x y+1)loge, (47) X;Y⋆ 1+β − function. C(β)=log(1+β), (48) Remark 2. According to a recent result of Moulin [22], β2 the achievability bound on the remainder term in (38) can V(β)= log2e. (49) be tightened to match the converse bound in (39), thereby (1+β)2 establishing that Remark 6. As should be clear from the proof of Theorem 4, if the capacity-achieving distribution is not unique, then 1 θ(n)= logn+O(1), (41) 2 minVar[ (X⋆;Y⋆,β)] 0<ǫ 1 V(β)= X;Y⋆ ≤ 2 (50) provided that the following regularity assumptions hold: (maxVar[X;Y⋆(X⋆;Y⋆,β)] 21 <ǫ<1 The random variable ı (X⋆;Y⋆) is of nonlattice type; • supp(P )= ; X;Y⋆ wherethe optimizationis performedoverall PX⋆ thatachieve X⋆ • Cov ı (X⋆A;Y⋆),ı (X¯⋆;Y⋆) < C(β). Thisparallelsthedispersionresultforchannelswithout • X;Y⋆ X;Y⋆ cost [20]. Var[ı (X⋆;Y⋆)] where X;Y⋆ (cid:2) (cid:3) PX¯⋆X⋆Y⋆(¯x,x,y)= PY⋆1(y)PX⋆(¯x)PY|X(y|¯x)PY|X(y|x)PX⋆(x). 3Fortheconverse result,assumptions (a)–(c)suffice. 6 V. JOINT SOURCE-CHANNEL CODING Theorem 6 (Gaussian approximation). Assume the channel In this section we state the counterpartsof Theorems3 and has finite input and output alphabets. For stationary memo- 4 in the lossy joint source-channel coding setting. Proofs of ryless sources satisfying the regularity assumptions (i)–(iv) of the results in this section are obtained by fusing the proofs in [27] and channels satisfying assumptions (ii)–(iv) of Section Sections III and IV and those in [27]. IV-A, the parameters of the optimal (k,n,d,ǫ) code satisfy In the joint source-channel coding setup the source is no nC(β) kR(d)= nV(β)+k (d)Q−1(ǫ)+θ(n), (55) longer equiprobable on an alphabet of cardinality M, as in − V Definition 2, rather it is arbitrarily distributed on an abstract where (d) = Var[pS(S,d)], V(β) is given in (37), and the V alphabet .Further,insteadofreproducingthetransmittedS remainder θ(n) satisfies, if V(β)>0, M under a probability of error criterion, we might be interested 1 logn+O logn θ(n) (56) in approximating S within a certain distortion, so that a − 2 ≤ decoding failure occurs if the distortion between the source 1 (cid:16)p (cid:17) θ¯(n)+ supp(P ) 1 logn, (57) and its reproduction exceeds a given distortion level d, i.e. ≤ 2| X⋆ |− (cid:18) (cid:19) if d(S,Z) > d, where Z ∈ M is the representation of S, where θ¯(n) = O(logn) denotes the upper bound on the re- is a reproduction alphabet, and d: R is M M × M 7→ + maindertermgivenin[27,Theorem10].IfV(β)= (d)=0, the distortion measure. A (d,ǫ,cβ) code is a code for a fixed V the upper bound on θ(n) stays the same, and the lower one scource-channel pair such that the probability cof exceeding distortiondisnolargerthanǫ andnochannelcodewordcosts becomes O n31 . more than β. A (d,ǫ,β) code in a block codingsetting, when Proof o(cid:16)utlin(cid:17)e: The achievability part is proven joining a source block of length k is mapped to a channel block of theasymptoticanalysesof[27,Theorem8]andofTheorem9, length n, is called a (k,n,d,ǫ,β) code. The counterpart of shown in AppendixE. For the conversepart, PY¯ is chosen as the b-tilted informationdensity in lossy compressionis the d- in (146), and similar to the proof of the converse part of [27, tilted information,S(s,d), which can be computedusing the Theorem10],atypicalsetofsourceoutcomesisidentified,and equality it is shown using Theorem 7.2 that for every source outcome inthatset,theinnerinfimumin(54)isapproximatelyachieved  (s,d)=ı (z;s)+λ d(s,z) λ d, (51) S Z⋆;S S S − by the capacity-achieving channel input type. whereZ⋆ istherandomvariablethatachievestheinfimumon the right side of VI. CONCLUSION R (d), min I(S;Z), (52) We introduced the concept of b-tilted information density S PZ|S: (Definition 1), a random variable whose distribution governs E[d(S,Z)] d ≤ the analysis of optimal channel coding under input cost con- λS = −R′S(d) > 0, and equality in (51) holds for PZ⋆-a.e. straints. The properties of b-tilted information density listed z [24]. In a certain sense, the d-tilted information quantifies in Theorem 1 play a key role in the asymptotic analysis of the number of bits required to reproduce the source outcome the converse bound in Theorem 3 in Section IV, which does s withindistortiond.Forrigorousdefinitionsandfurther not only lead to the strong converse and the dispersion-cost ∈M details we refer the reader to [27]. function when coupled with the corresponding achievability bound, but it also proves that the third order term in the Theorem 5 (Converse). The existence of a (d,ǫ,β) code for asymptotic expansion (1) is upper bounded (in the most S and P requires that Y|X common case of V(β) > 0) by 1logn+O(1). In addition, 2 ǫ≥ PiXn|fSmγ>a0x(cid:26)suY¯pP S(S,d)−X;Y¯(X;Y,β)≥γ cwoedisnhgowoveedricnhSanecnteiolsnwViththcaotstthecornessturlatisntosfa[n2d7]aglseonetirgahlitzeenetod (cid:2) (cid:3) the estimate of the third order term in [27]. As propounded exp( γ) (53) in [29], [30], the gateway to the refined analysis of the third − − (cid:27) order term is an apt choice of a non-productdistribution PY¯n ≥ mγ>a0x(cid:26)suY¯pE(cid:20)xi∈nXf P S(S,d)−X;Y¯(x;Y,β)≥γ |S (cid:21) in the bounds in Theorems 3 and 5. (cid:2) (cid:3) VII. ACKNOWLEDGEMENT exp( γ) , (54) − − We thankthe refereesfortheir unusuallythoroughreviews, (cid:27) where the probabilities in (53) and (54) are with respect to which are reflected in the final version. P P P and P , respectively. S XS Y X Y X=x | | | APPENDIX A Proof:Theboundisobtainedbyweakening[27,Theorem PROOF OF THEOREM1 1] (23) using b(x) β. ≤ We note first two auxiliary results. Under the usual memorylessness assumptions, applying Theorem 1 to the bound in (54), it is easy to show that the Lemma 2 ([33]). Let 0 α 1, and let P Q be ≤ ≤ ≪ strong converse holds for lossy joint source-channel coding distributions on the same probability space. Then, over channels with input cost constraints. A more refined 1 analysis leads to the following result. lim D(αP +(1 α)Q Q)=0. (58) α 0α − k → 7 (64). Note that ELanedmexmpleatg3(XX(¯¯D)obnes<kear-V.arTarahndedhnoa,mn[3v4a]r)i.abLleet go:nXX7→[−su∞ch,+th∞at] D(PY|XkPYˆ|PX¯)≥≥DD((1Y¯{kY¯Yˆ∈) F}k1{Yˆ ∈F}) ((7721)) (cid:2) (cid:0)E[g(X(cid:1)(cid:3))]−∞D(XkX¯)≤logE exp g(X¯) (59) ≥PY¯(F)logPPYY¯ˆ((FF)) (73) 1 with equality if and only if X has distr(cid:2)ibutio(cid:0)n PX⋆(cid:1)(cid:3)such that =PY¯(F)logα. (74) ıX⋆kX¯(x)=g(x)−logE exp g(X¯) . (60) Furthermore, we have (cid:2) (cid:0) (cid:1)(cid:3) E X;Yˆ(Xˆ;Yˆ,β) −E[X;Y⋆(X⋆;Y⋆,β)] Proof: If the left side of (59) is not −∞, we can write = αEh X;Yˆ(X¯;Y¯,βi) +(1−α)E X;Yˆ(X⋆;Y⋆,β) Eh[ (X⋆;Y⋆i,β)] h i(75) E[g(X)]−D(XkX¯)=E g(X)−ıXkX⋆(X)−ıX⋆kX¯(X(6)1) α−E  X;Y(⋆X¯;Y¯,β) αE[ (X⋆;Y⋆,β)] (76) (cid:2) (cid:3) ≥ X;Yˆ − X;Y⋆ =logE exp g(X¯) D(X X⋆), h 1 i − k (62) ≥ α PY¯(F)logα −λ⋆E b(X¯) +λ⋆β (cid:2) (cid:0) (cid:1)(cid:3) which is maximized by letting PX =PX⋆. −(cid:16)E[X;Y⋆(X⋆;Y⋆,β)](cid:2) (cid:3) (77) We proceed to prove Theorem 1 by generalizing [35, > 0, (cid:17) (78) Theorem 6.1]. Equality (13) is a standard result in convex optimization. By the assumption, the supremum in the right where (76) is due to D(Y⋆ Yˆ) 0, (77) invokes (74), and k ≥ side of(13) isattainedbyP , thereforeC(α) isequalto the (78) holdsfor sufficientlysmall α, therebycontradicting(13). X⋆ right side of (15). We conclude that indeed PY¯ ≪PY⋆. To show (14), fix 0 α 1. Denote To show (16), define the following function of a pair of ≤ ≤ probability distributions on : X PPXXˆ¯ =→αPPYX|¯X+→(1P−Y¯α, )PX⋆, ((6634)) F(PX,PX¯)==EE[X;Y¯((XX;;YY,,ββ))] −DD((XXkX¯X¯))+D(Y Y¯(7)9) PXˆ →PY|X →PYˆ =αPY¯ +(1−α)PY⋆, (65) (cid:2)X;Y (cid:3)− k k (80) and write E[ (X;Y,β)], (81) X;Y ≤ α E[ (X⋆;Y⋆,β)] E  (X¯;Y¯,β) where(81)holdsbythedataprocessinginequalityforrelative X;Y⋆ X;Y⋆ +(cid:0)D(YˆkY⋆) − (cid:2) (cid:3)(cid:1) ecnantrobpey.exSpinrecseseeqduaaslittyheind(o8u1b)leismacahxiiemviezdatbioynPX =PX¯,C(β) = α+Dλ(⋆PαYE|XbkP(XY¯⋆)|P−Xλ⋆)⋆α−Eα[Db((XP⋆Y)|]XkPY⋆|PX¯)+D(Yˆk(Y6⋆6)) C(β)=mPaX¯xmPaXxF(PX,PX¯). (82) = D(PY|Xk(cid:2)PY⋆|P(cid:3)X⋆)−D(PY|XkPY⋆|PXˆ)+D(YˆkY⋆) To solvetheinnermaximizationin (82), we invokeLemma λ⋆E[b(X⋆)]+λ⋆E b(Xˆ) (67) 3 with = D−(P P P ) hD(P i P P ) λ⋆E[b(X⋆)] g(x)=E X;Y¯(x;Y,β)|X =x (83) Y|Xk Y⋆| X⋆ − Y|Xk Yˆ| Xˆ − to conclude that (cid:2) (cid:3) +λ⋆E b(Xˆ) (68) =E[Xh;Y⋆(Xi⋆;Y⋆,β)]−E X;Yˆ(Xˆ;Yˆ,β) (69) mPaXxF(PX,PX¯)=logE exp E X;Y¯(X¯;Y¯,β)|X¯ (8,4) (cid:2) (cid:0) (cid:2) (cid:3)(cid:1)(cid:3) ≥0, h i (70) which in the special case PX¯ = PX⋆ yields, using represen- tation (82), where (70) holds because X⋆ achieves the supremum in the rLigemhtmsaide2oimf p(1li3e)s. AthsastuDme(YˆforYt⋆h)e=moom(eαn)t.tThahtusP,Y¯su≪ppoPsYin⋆g. C(β)≥lEog[E[exp(X(E⋆;[YX⋆;,Yβ⋆)(]X⋆;Y,β)|X⋆])] ((8856)) that E X;Y⋆(X¯;Y¯,β) > Ek[X;Y⋆(X⋆;Y⋆,β)] would lead ≥=C(βX);Y⋆ (87) to a contradiction, since then the left side of (66) would be (cid:2) (cid:3) negative for a sufficiently small α. where (86) applies Jensen’s inequality to the strictly convex To complete the proof of (14), it remains to show P function exp(), and (87) holds by the assumption. We con- Y⋆ · cdoonmtriandaitcetsioanl,l aPssY¯umseucthhatthPatX¯PaX¯nd→F ⊆PY|YX a→re suPcY¯h. thBayt cthluadteE[thXat;,Y⋆in(Xfa⋆c;tY,,(β8)6|)Xh⋆o]lidssalwmitohstesquureallyityc,onwshtaicnht,tihmeprelibeys PY¯(F) > PY⋆(F) = 0, and define the mixture PXˆ as in showing (16). 8 APPENDIX B 2. For δ =A logn, n n PROOF OF COROLLARY 2 q n n To show (21), we invoke (6) to write, for any x , minP W (z) n(D⋆ ∆) Q ∆ ∈X z∈D "i=1 i ≤ n− #≥ (cid:18) rVn⋆(cid:19) X Var[ (X;Y,β)X =x] X;Y⋆ | logn = Var[ı (X;Y) λ⋆(b(X) β) X =x] (88) K . X;Y⋆ − − | − r n = Var[ı (X;Y)X =x]. (89) (102) X;Y⋆ | To show (20), we invoke (16) to write 3. Fix 0 ≤ β ≤ 16. If in (98), Vn⋆ = 0 (which implies that V =0in(99),i.e.we droptherequirementinTheorems min E[Var[X;Y⋆(X;Y,β)|X]] 7.1 and7.2 thatVmin be positive), thenthere exists K ≥0 = E (X;Y⋆(X;Y,β))2 such that for all ∆> n12A+β, where A>0 is arbitrary Eh(E[X;Y⋆(X;Y,β)iX])2 (90) minP n W (z) n(D⋆ +∆) 1 K 1 . =− Eh(X;Y⋆(X;Y,β))2 | C2i(β) (91) z∈D "Xi=1 i ≤ n #≥ − A23 n14−32(1β03) − = Vahr[ (X;Y,β)].i (92) X;Y⋆ Theorem 7 gives a general result on the minimization of a APPENDIX C cdf of a sum of independent random variables parameterized AUXILIARY RESULTON THEMINIMIZATION OF THECDF OF by elements of a metric space: it says that the minimum is approximately achieved by the sum with the largest mean, A SUMOF INDEPENDENTRANDOMVARIABLES underregularityconditions.Themetricnatureoftheparameter Let is a metric space with metric d : 2 R+. space is essential in making sure the means and the variances D D 7→ Let W (z),i = 1,...,n be independent random variables i of W () behave like continuous functions: assumptions (98) i parameterized by z . Denote · and (97) essentially ensure that functions D () and D (z) ∈D n n · n are well-behaved in the neighborhood of the optimum, while 1 Dn(z)= n E[Wi(z)], (93) assumption(96)guaranteesthatDn(·)decaysfastenoughnear i=1 its maximum. X 1 n BeforeweproceedtoproveTheorem7,werecalltheBerry- V (z)= Var[W (z)], (94) n n i Esseen refinement of the central limit theorem. i=1 X n Theorem 8 (Berry-EsseenCLT, e.g. [36, Ch. XVI.5 Theorem 1 Tn(z)= n E |Wi(z)−E[Wi(z)]|3 . (95) 2]). Fix a positive integer n. Let Wi, i = 1,...,n be i=1 independent. Then, for any real t X (cid:2) (cid:3) Let ℓ , ℓ , ℓ , L , L , F , F , V and T be positive n 1 2 3 1 2 1 2 min max V B constants. We assume that there exist z⋆ and sequences P Wi >n Dn+t n Q(t) n, (104) Dn⋆, Vn⋆ such that for all z ∈D, ∈ D (cid:12)(cid:12)(cid:12) "Xi=1 r n !#− (cid:12)(cid:12)(cid:12)≤ √n wh(cid:12)ere (cid:12) Dn⋆ −Dn(z)≥ℓ1d2(z,z⋆)− √ℓ2nd(z,z⋆)− ℓn3, (96) (cid:12) D = 1 n E[W ], (cid:12) (105) n i n L D⋆ D (z⋆) 1, (97) Xi=1 n− n ≤ n 1 n V (z) V⋆ F d(z,z⋆)+ F2 , (98) Vn = n Var[Wi], (106) | n − n|≤ 1 √n Xi=1 n 1 Vmin ≤Vn(z), (99) Tn = n E |Wi−E[Wi]|3 , (107) Tn(z)≤Tmax. (100) Xi=1 (cid:2) (cid:3) c T B = 0 n, (108) Theorem 7. In the setupdescribed above,underassumptions n V3/2 (96)–(100),foranyA>0,thereexistsaK 0suchthat,for n ≥ and 0.4097 c 0.5600 (c 0.4784 for identically all ∆ δn (where δn is specified below) and all sufficiently ≤ 0 ≤ 0 ≤ | |≤ distributed W ). large n: i 1. If δ = A , We also make note of the following lemma, which deals n √n with the behavior of the Q-function. n minP W (z) n(D⋆ ∆) Q ∆ n K . Lemma 4 ( [27, Lemma 4]). Fix b 0. Then, there exists z∈D "Xi=1 i ≤ n− #≥ (cid:18) rVn⋆(cid:19)− √n q ≥0 such that for all z ≥−21b and a≥ll n≥1, (101) q Q √nz Q √nz(1+bz) . (109) − ≤ √n (cid:0) (cid:1) (cid:0) (cid:1) 9 proven shortly, for appropriately chosen a,b,c > 0 we can write We are now equipped to prove Theorem 7. cδ Proof of Theorem 7: To show (103), denote for brevity maxν (z) ν⋆ +bν⋆2+ n (121) ζ =d(z,z⋆) and write z n ≤ n n √n ∈D n for n large enough. P W (z)>n(D⋆ +∆) If i n "Xi=1 # ∆ √Vmin = A, (122) n ≥− 2b − ℓ ℓ A ≤ P"Xi=1Wi(z)>n(cid:18)Dn(z)+ℓ1ζ2− √2nζ− n3 + n12+β(cid:19)# tthheenfaνcn⋆t≥tha−t Q21b(,)anids mLeomnomtoan4icaapllpylideesctroeaνsn⋆in.gSoan,dusLinegm(m12a14),, (110) · we conclude that there exists q >0 such that 1 F1ζ+ √F2n (111) Q √nν⋆ minQ √nν (z) ≤ nK(cid:16)ℓ1ζ21− √ℓ2nζ− ℓn3 + n12A+β(cid:17)2 ≤ Q(cid:0)√nνnn⋆(cid:1)−−Qz∈D√n(cid:0)νn⋆ +√nnbν(cid:1)n⋆2+cδnc (123) ≤ A32 n41−32β, (112) ≤ Q(cid:0)√nνn⋆(cid:1)−Q(cid:0)√nνn⋆ +√nbνn⋆2 + √(cid:1)2πδn (124) where q(cid:0) c(cid:1) (cid:0) (cid:1) + δ , (125) n (110) uses (96) and the assumption on the range of ∆; ≤ √n √2π • (111) is due to Chebyshev’s inequality and V⋆ =0; where • n (112) is by a straightforwardalgebraicexercise revealing (124) is due to • thatζ thatmaximizestheleftsideof(112)isproportional • 1 ξ to A2 . Q(z+ξ) Q(z) , (126) n41+21β ≥ − √2π We proceed to show (101) and (102). which holds for arbitrary z and ξ 0, Denote ≥ (125) holds by Lemma 4 as long as ν⋆ 1. n • n ≥−2b g (z)=P W (z) n(D⋆ ∆) . (113) Thus, (125) establishes (101) and (102). It remains to prove n " i ≤ n− # (121).Toupper-boundmax ν (z),denoteforconvenience i=1 z n X ∈D Using (99) and (100), observe D (z) D⋆ f (z)= n − n, (127) n c T (z) c T V (z) 0 n B = 0 max < . (114) n 3 ≤ 3 ∞ 1 Vn2(z) Vm2in gn(z)= p , (128) V (z) Therefore the Berry-Esseen bound yields: n B and note, using (96), (97),p(99), (100) and (by Ho¨lder’s g (z) Q √nν (z) , (115) inequality) n n − ≤ √n 2 V (z) T3 , (129) (cid:12) (cid:0) (cid:1)(cid:12) n max where (cid:12) (cid:12) ≤ D (z) D⋆ +∆ ν (z), n − n . (116) that n Vn(z) D (z⋆) D⋆ D (z) D⋆ f (z⋆) f (z)= n − n n − n (130) Denote p n − n V (z⋆) − V (z) ∆ n n νn⋆ , Vn⋆ (117) ≥ℓ′1dp2(z,z⋆)− √ℓ′2nd(pz,z⋆)− ℓn′3, (131) Since p where g (z)=Q(√nν⋆)+ g (z) Q √nν (z) n + Q √nnν (z) n Q(−√nν⋆) n (118) ℓ′1 =Tm−a13xℓ1, (132) n (cid:2) − (cid:0)n (cid:1)(cid:3) 1 (cid:2)Q(√(cid:0)nν⋆) (cid:1)B + Q √n(cid:3)ν (z) Q(√nν⋆) , ℓ′2 =Vm−in2ℓ2, (133) ≥ n − √n (cid:2) (cid:0) n (cid:1)− n(1(cid:3)19) ℓ′3 =Vm−in12(L1+ℓ3). (134) Observe that for a,b>0 to show (101), it suffices to show that q 1 1 a b Q(√nν⋆) minQ √nν (z) (120) | − | , (135) n −z∈D (cid:0) n (cid:1)≤ √n (cid:12)(cid:12)√a − √b(cid:12)(cid:12)≤ 2min{a,b}32 for some q 0, and to show (102), replacing q with q√logn so, using (98) an(cid:12)d (99), we(cid:12)conclude ≥ (cid:12) (cid:12) in the right side of (120) would suffice. 1 1 F imSuimnceinQ(1i2s0)mwoneotnoeneidcatlolymdaexcrimeaiszieng√, ntoνna(czh)ie.vAesthweilmlibne- (cid:12)(cid:12) Vn(z) − Vn⋆(cid:12)(cid:12)≤F1′d(z,z⋆)+ √2n′ , (136) (cid:12) (cid:12) (cid:12)p p (cid:12) (cid:12) (cid:12) 10 where where PYK=k, k are defined as follows, for some c>0, { | ∈ K} 1 3 F1′ = 21Vm−in32F1, (137) PY|K=k(y)=PY⋆(y)+ √knyc, (147) F2′ = 2Vm−in2F2. (138) = k Z : k =0, |B| y K ( ∈ Let z0 achieve the maximum maxz νn(z), i.e. Xy∈B ∈D 1 k P (y)+ y 1 P (y) , (148) maxνn(z)=fn(z0)+∆gn(z0). (139) − Y⋆ √nc ≤ √nc ≤ − Y⋆ ) z ∈D A= exp k2 < . (149) Using (136) and (131), we have, −| | ∞ k X∈K (cid:0) (cid:1) Denote by P the minimum Euclidean distance approx- ν (z ) ν (z⋆) Π(Y) n 0 − n imation of an arbitrary PY , where is the set of = (fn(z0) fn(z⋆))+∆(gn(z0) gn(z⋆)) (140) distributions on the channel ∈outQput alphabeQt , in the set ≤ −ℓ′1d2(z−0,z⋆)+(cid:18)√ℓ′2n +|∆|F1′−(cid:19)d(z0,z⋆)+ 2F√2′|n∆| (cid:8)PPYΠ|K(Y=)k=: PkY∈KK=(cid:9)k⋆: where k⋆ =argmin PY BPYK=k . ℓ | k − | + ′3 (141) ∈K(cid:12) (1(cid:12)50) n The quality of approximation (150) is gov(cid:12)erned by [30] (cid:12) 1 ℓ 2 2F ∆ ℓ ≤ 4ℓ′1 (cid:18)√′2n +|∆|F1′(cid:19) + √2′|n | + n′3, (142) PΠ(Y)−PY ≤ |B|(|Bnc|−1). (151) r where (142) follows because the maximum of its left side is Wesaythatxn(cid:12)(cid:12)∈An hasty(cid:12)(cid:12)pePXˆ ifthenumberoftimeseach achieved at d(z ,z⋆)= 1 ℓ′2 + ∆F . Using (96), (99), letter a is encountered in xn is nPX(a). An n-type is a 0 2ℓ′1 √n | | 1′ distribut∈ionAwhose masses are multiples of 1. Denote by P (136), we upper-bound (cid:16) (cid:17) n Xˆ the minimum Euclidean distance approximation of P in the X F ∆ ℓ ℓ F set of n-types, that is, ν (z⋆) ν⋆ + 2′| | + 3 + 3 2′. (143) n ≤ n √n nVmin n32 PXˆ =arg Pmin: |PX−P|. (152) P isa∈nPn-type Applying (142) and (143) to upper-boundmax ν (z), we z n The accuracy of approximation in (152) is controlled by the have established (121) in which ∈D following inequality: b= F1′2Tm32ax, (144) PX−PXˆ ≤ |A|(n|A|−1). (153) 4ℓ′1 For each P (cid:12) , let(cid:12)xn p n be an arbitrary sequence X(cid:12)∈ P (cid:12) ∈ A where we used (98) and (129) to upper-bound ∆2 = ν⋆2V⋆, of type PXˆ, and lower-bound the sum in (146) by the term n n containing P to obtain: thereby completing the proof. Π(Y) n Xn;Y¯n(xn;yn,β)≤ X;Π(Y)(xi,yi,β) i=1 X 2 APPENDIX D +nc PΠ(Y) PY⋆ +A. (154) − PROOF OFTHECONVERSE PARTOF THEOREM4 Applying (146) and (154) t(cid:12)o loosen (24)(cid:12), we conclude by (cid:12) (cid:12) Theorem 3 that, as long as an (n,M,ǫ) code exists, for an ′ Given a finite set , let be the set of all distributions on arbitrary γ >0, A P that satisfy the cost constraint, A n E[b(X)]≤β, (145) ǫ′ ≥PmX∈inPP"Xi=1Wi(PX)≤logM −γ−A#−exp(−(γ15)5,) which is a convex set in R . where |A| Leveraging an idea of Tomamichel and Tan [30], we will W (P )= (x ,Y ,β)+c P P 2, (156) i X X;Π(Y) i i Π(Y) Y⋆ wnoena-kpernod(2u4ct)dbiystcrihbouotisoinngs PwY¯itnh twoebigehatsccohnovseexnctoomfbaivnoartiothnoosef amnidnimYiiziastidoinstorinbuthteedrigachctosriddeinogft(o15P5(cid:12)(cid:12)Y),|Xw=exwi.−4illTaoppe(cid:12)(cid:12)lvyaTluhaeteortehme distributions that are close to P . Specifically (cf. [30]), Y⋆n 4Strictly speaking, the order of Wi(PX), i = 1,...,n depends on the PY¯n(yn)= A1 k exp −|k|2 i=n1PY|K=k(yi), (146) pochafrottohicseuelsatuhrmicshPsoeiqcniu=ee1onfcWeseiaq(rPubeXitnr)caerdiolxye.ns noofttydpeepePnXˆd.oHnotwheevirerr,eslainticveetohrededri,stwribeumtioany X∈K (cid:0) (cid:1)Y

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.