Towards an Information Geometric characterization/classification of Complex Systems. I. Use of Generalized Entropies Demetris P.K. Ghikas∗, Fotios D. Oikonomou Department of Physics University of Patras, Patras 26500, Greece 7 1 0 2 n Abstract a J Using the generalized entropies which depend on two parameters we propose 2 a set of quantitative characteristics derived from the Information Geometry based on these entropies. Our aim, at this stage, is modest, as we are first ] h constructing some fundamental geometric objects. We first establish the p - existence of a two-parameter family of probability distributions. Then using h thisfamilywederivetheassociatedmetricandwestateageneralizedCramer- t a Rao inequality. This gives a first two-parameter classification of complex m systems. Finally computing the scalar curvature of the information manifold [ we obtain a further discrimination of the corresponding classes. Our analysis 1 v is based on the two-parameter family of generalized entropies of Hanel and 9 Thurner (2011). 6 3 Keywords: Complex Systems, Generalized Entropies, Information 0 0 Geometry . 1 0 7 1 : v i X r a ∗Corresponding author Email address: [email protected] (Demetris P.K. Ghikas) Preprint submitted to Elsevier January 3, 2017 1. Introduction The problem of characterization and classification of complex systems Complex Systems are ubiquitous in nature and in man made systems. They are objects of study in natural sciences, in social and economic models and in mathematical and information constructions. But despite this exten- sive activity there is still lacking a universal consensus of the meaning of the word ”complex”. And though it is understood that complex is different from complicated there is no generally accepted definition of ”complex sys- tems” let alone quantitative characterization and qualitative classification. There are many ”system approaches” and few ”axiomatic approaches”. In the first, in the framework of a particular discipline and concrete examples there is an abstraction which encapsulates some common and definite prop- erties. Typical examples are the Theory of Dynamical Systems, Neuroscience and Financial Markets. This is a case study approach which cannot cover all systems qualified as complex [6, 15, 12]. The axiomatic approaches, cer- tainlyemanatingfromcasestudies, gofurthertoidentifyuniversalproperties [10, 8, 9]. In the system approaches there exist some quantitative measures, computable or mainly non-computable, but cover a particular area. In the abstract approaches there are qualitative characterizations but no algorith- mic definition. But there is a common concept that plays a fundamental role in this activity. This is the stochasticity and the statistical behavior. Since it is not the purpose of this work to review and comment on definitions of com- plexity we focus on the possibility to use statistical tools quantitatively. A well established framework is based on the concept of entropy and its various generalizations. There is an extensive literature on the use of entropy in con- nection to case studies of complex systems [14, 11, 13, 16, 17, 18, 24, 28, 29]. But there is a particular generalization of entropy that it is assumed to clas- sify complex systems, the so called (c,d)- entropies of Hanel and Thurner [1]. Our approach is the use of this entropy to construct particular Information Manifolds. From these we construct geometric quantities with which we clas- sify complex systems. There have been similar constructions of Information Geometry, but they are based on single parameter generalizations of entropy [5, 7, 19, 20, 21, 22, 23, 27, 26, 25, 31, 30]. Generalized Entropies The development of Statistical Mechanics and its associated thermody- namic limit was based on the thermodynamic behavior of physical systems 2 with short range forces. The fundamental tool for the theoretical analysis is the Boltzman-Gibbs Entropy. Inherent in this formalism is the Legen- dre structure which incorporates the duality between extensive and intensive thermodynamicquantities. ShannonandlaterKhinchinwerethefirsttodeal with entropy in a rigorous way. This is based on four axioms which uniquely determine the well known functional form of the entropy. Key concepts are the extensivity and additivity properties which for the Boltzman-Gibbs entropy coincide. After Renyi’s non standard entropy functional, Tsallis [17, 18] proposed a one-parameter entropy functional which it is more suit- able for systems with long range forces. This entropy does not satisfy the property of additivity. After that there has been a host of different entropic functionals, constructed under particular assumptions and satisfying certain conditions supposed to hold for particular systems. We are interested on the two-parameter entropy functional of Hanel and Thurner [1] because it is pro- posed as a mathematical tool for the classification of complex systems. Now for a given entropy functional one can obtain the probability distribution function that minimizes this functional under the Maximum Entropy Prin- ciple. But in any variational procedure one needs the escorting conditions that enter with their Lagrange multipliers. All these entropies produce their associated probability distributions, uniform, exponential and so on. In this part of our work we use the distribution associated with the two-parameter entropyofHanelandThurner[1,2,3]toconstructourinformationgeometric quantities. Information Geometric tools Information Geometry emerged as a practical geometric framework in the theory of parameter estimation in mathematical statistics [5]. For a given statistical model, that is a given class of probability measures there is associ- ated an information manifold and a Riemannian metric. This metric enters in the estimation procedure through the Cramer-Rao inequality giving the possible accuracy of an estimator of the parameters of the model. Further on, onemaydefinenon-Riemannianconnectionswhichofferadeeperanalysis of the estimation procedure. Our work is based on the geometric quantities emerging in the Information Geometry which is based on the two-parameter entropy functional of Hanel and Thurner [1]. In this paper we construct the information manifold, we prove the appropriate properties of the distribu- tion function and use the Cramer-Rao Inequality and the scalar curvature to construct certain plots that differentiate between various classes of complex systems. In a subsequent paper we make a similar classification using differ- ent objects of the information geometry. This classification seems to be more discriminating but fails for certain parameter values, a problem that seems 3 not to be of a mere technical origin. In Paragraph 2 we introduce in a minimal way the necessary definitions and geometric quantities of Information Geometry. We present the Cramer- Rao Inequality which we use for our classification, as well as the connection used to compute the scalar curvature that quantifies our classes. In Para- graph 3 we present the main forms of the proposed generalized entropies in the literature with a short discussion and comments on their nature and applicability. Then the generalized entropy of Hanel and Thurner [1] is in- troduced with few comments on its derivation and properties. In Paragraph 4 we present our results. First we state some theorems which prove the ap- propriateness of the generalized distribution function. Then we compute the Riemannian metric for the (c,d)-entropy and present the dependence of the Cramer-Rao bound on the values of c and d. Our graphs indicate the differ- encesbetweenvariousclasses. Finallywecomputethescalarcurvaturewhich clearly indicates that various classes of complex systems have differences in their information manifolds. In the last Paragraph we discuss our approach and comment on its applicability and possible extension. In the Appendix some extra formulas are given and the proofs of the theorems. 2. Basic concepts of Information Geometry 2.1. Geometry from probability distributions and the Cramer-Rao Inequality Here we present only the necessary concepts in order to establish the notation. We refer to the bibliography for the details [5, 7]. Let S = {p = p(x;ξ)|ξ = [ξ1,...,ξn] ∈ Ξ} (1) ξ be a parametric family of probability distributions on X. This is an n- dimensionalparametricstatisticalmodel. GiventheNobservationsx ,...,x 1 N the Classical Estimation Problem concerns the statistical methods that may be used to detect the true distribution, that is to estimate the parameters ξ. To this purpose, an appropriate estimator is used for each parameter. These estimators are maps from the parameter space to the space of the random variables of the model. The quality of the estimation is measured by the variance -covariance matrix V = [vij] where ξˆ ξ vij = E [(ξˆ(X)−ξi)(ξˆ(X)−ξj)] (2) ξ ξ Suppose that the estimators are unbiased, namely ˆ E [ξ(X)] = ξ, ∀ξ ∈ Ξ (3) ξ 4 Then a lower bound for the estimation error is given by the Cramer-Rao inequality V (ξˆ) ≥ G(ξ)−1 (4) ξ where G(ξ) = [g (ξ)] ij g (ξ) = E [∂ l(x;ξ)∂ l(x;ξ)] (5) ij ξ i j the Classical Fisher Matrix with l = l(x;ξ) = lnp(x;ξ) (6) ξ the score function. As it has been shown the Fisher Matrix provides a metric on the manifold of classical probability distributions. This metric, according to the theorem of Cencov [4], is the unique metric which is monotone under the transformations of the statistical model. This means that if the map F : X → Y induces a model S = {q(y;ξ)} on Y then F G (ξ) ≤ G(ξ) (7) F That is, the distance of the transformed distributions is smaller than the original distributions. Thus monotonicity of the metric is intuitively related to the fact that in general we loose distinguishability of the distributions from any transformation of the information. The metric defined in this way is the ordinary Fisher metric. Using the Levi-Civita connection the corresponding Riemannian structure is con- structed. In this geometry the scalar curvature is a quantification of the information manifolds. But there is a further development connected with the existence of connections different from Levi-Civita. These are certain pairs of connections satisfying a duality property with respect to the Fisher metric and playing a fundamental role in the estimation theory. An impor- tant case is the dually flat connections. 2.2. Geometry from Divergencies A further extension is the derivation of the differential structure from relative entropies or divergence functions. These are quasi-distances and particular cases have been used with various names. Let p,q be distribution functions considered as points in an information manifolds. A divergence D(p(cid:107)q) satisfies the property D(p(cid:107)q) ≥ 0 and D(p(cid:107)q) = 0 iff p = q (8) 5 Now considering the function D(p(cid:107)p+dp) and expanding to third order we get a metric and a connection characterized by D : gD = −∂ ∂(cid:48)D(p(cid:107)p(cid:48))| (9) ij i j p(cid:48)=p ΓD = −∂ ∂ ∂(cid:48)D(p(cid:107)p(cid:48))| (10) ij,k i j k p(cid:48)=p where ∂ = ∂ and ∂(cid:48) = ∂ . A fundamental concept of great practical i ∂ξi i ∂ξ(cid:48)i usefulness in the estimation theory is the duality. Given a metric and two connections (g,∇,∇∗) the connections are dual with respect to the metric if ∂ g = Γ +Γ∗ (11) k ij ki,j kj,i holds. From the geometry coming from a divergence a dual structure is obtained by defining ΓD∗ = −∂ ∂(cid:48)∂(cid:48)D(p||p(cid:48))| . There is a general family ij,k k i j p(cid:48)=p of divergences, the so called f-divergences, which are generalizations of the known Kullback - Leibler divergence. In the statistical applications a special role is played by the dually flat connections. In this case there exist dual coordinate systems on the manifold, [θi] , [η ] and functions ψ and φ such j that θi = ∂iφ , η = ∂ ψ , g = ∂ ∂ ψ , gij = ∂i∂jφ (12) i i ij i j This is a Legendre Transformation with the corresponding potential function ψ and φ. There is a canonical divergence which is uniquely defined for dually flat manifolds D(p||q) ≡ ψ(p)+φ(q)−θi(p)η (q) (13) i Exponential families have an inherent dually flat structure. And this offers a natural root to construct geometries for generalized exponentials which are related to generalized entropies. [5] 3. Generalized Entropies and Complex Systems 3.1. Generalized Entropies Assuming the four Shannon-Khimchin axioms it is proved that there ex- ists a unique entropy functional, the Boltzmann-Gibbs Entropy (cid:88) (cid:88) S[p] = − p(j)lnp(j) , p(j) = 1 (14) j∈J j∈J 6 These axioms are plausible assumptions abstracted from the typical behavior of thermodynamic systems and the role of thermodynamic entropy. But after the statistical foundation of thermodynamics and the association of entropy with information theory, it became necessary to look for other functionals which were thought to cover more general systems than the simple ones like perfect gases, and more generally systems with long range interactions. And though it is expected that in the thermodynamic limit to have functionals with a universal form, it is evident that for small systems one needs func- tionals dependent on parameters. These parameters, not having always a transparent connection with the empirical properties of the systems, never- theless, offered a minimal parametric generalization of the Boltzmann-Gibbs functional as an information theoretic tool. One of the earliest generaliza- tions is the Renyi’s Entropy 1 Sq[p] = ln((cid:88)p(j)q) (15) 1−q j Later on Tsallis[16, 17, 18], in relation to the theory and practice of fractals introduced his entropy 1 STsallis[p] = ((cid:88)p(j)q −1) (16) q 1−q j a form that had been introduced earlier for mathematical reasons. There after a host of other forms of entropic functionals were introduced associated with particular properties of complex statistical systems. All these entropies, assuming a form of Maximal Entropy Principle give rise to probability dis- tributions which depend on the parameter of entropy. In general these are generalized exponentials which are the inverse functions of generalized loga- rithms. These generalized exponentials, assumed to be particular exponen- tials of probability distributions may be used to construct information geo- metric objects. In this work we use the two-parameter entropic functional of Hanel and Thurner to construct our geometric tools. 3.2. A two-parameter Generalized Entropy and Complex Systems Given the fact that the four Shannon-Khinchin Axioms impose a unique form for the entropy, which is the Boltzmann-Gibbs functional, Hanel and Thurner, seeking a generalization to the case of a functional not satisfying additivity had to abandon the relevant axiom. Their analysis produced a two-parameter entropic functional of the form 7 e(cid:80)W Γ(d+1,1−clnp ) c S [p] = i i − (17) c,d 1−c+cd 1−c+cd whereWisthenumberofpotentialoutcomesandΓ(a,b) = (cid:82)∞dtta−1exp(−t) b the incomplete Gamma-function. The Bolzmann-Gibbs entropy is recovered for (c,d) = (1,1), while for the Tsallis entropy we have (c,d) = (c,0). The maximizing distribution function is the generalized exponential 1 Ec,d,r(x) = e−1−dc[Wk(B(1−x/r)d)−Wk(B)] (18) where r = (1 − c + cd)−1 and B = (1−c)r exp( (1−c)r ). The function W 1−(1−c)r 1−(1−c)r k is the k-th branch of the Lambert W function which is a solution of the equation x = W(x)exp(W(x)). This generalized exponential is the inverse function of the generalized logarithm (under appropriate conditions) 1−(1−c)r Λ (x) = r−rxc−1[1− lnx]d (19) c,d,r rd 4. Results 4.1. The (c,d)-exponential family Amari and Ohara [7] studied the geometry of q-exponential family of probability distributions. We repeat this analysis for the (c,d)-exponential family. First it is easily seeing that for x ∈ (0,1] and c,d,r real if 1−(1−c)r ≥ 0 dr then the generalized logarithm Λ (x) is a real function. This is connected c,d,r to the conditions given by Hanel and Thurner 1 d > 0 : r < (20) 1−c 1 d = 0 : r = (21) 1−c 1 d < 0 : r > (22) 1−c Now the distribution E (x) is characterized as exponential family if c,d,r p(x,θ) = E (x θi−ψ(θ)) or equivalently Λ (p(x,θ)) = x θi−ψ(θ). That c,d,r i c,d,r i this distribution is exponential is proved in our first theorem Theorem 1 The family with the discrete distribution p = (p ,p ,...,p ) with 0 1 n 8 p = Prob(x = x ) and p = 1−(cid:80)n p has the structure of a (c,d) exponen- i i 0 i=1 i tial family with (cid:34) (cid:35)d (cid:34) (cid:35)d 1−(1−c)r 1−(1−c)r θi = rpc−1 1− lnp −rpc−1 1− lnp (23) 0 dr 0 i dr i (cid:40) 1 x = x x = δ (x) = i i = 1,...,n (24) i i 0 x (cid:54)= x i ψ(θ) = −Λ (p ) (25) c,d,r 0 Let the function 1 W(B(1−x/r)1/d) x ∆(x) = E (x) (1− )−1 (26) r(1−c) c,d,r 1+W(B(1−x/r)1/d) r We get ∂ψ (cid:82) x∆(x θj −ψ(θ))dx j ≡ ∂ ψ = (27) ∂θi i ∆(x θj −ψ(θ))dx j From this we have Theorem 2 The function ψ(θ) is convex for the values of c,d,r for which ∆(x θj −ψ(θ)) ≥ 0 , ∆(cid:48)(x θj −ψ(θ)) ≥ 0 (28) j j 4.2. The (c,d)-information metric We define the functions (cid:32) (cid:33) 1−(1−c)r c−1 K(x) = x 1− lnx (29) d rd so that Λ (x) = r−rKd(x) (30) c,d,r and (cid:90) (cid:90) h(p) = ∆(Λ(p(x,θ)))dx ≡ ∆(x θj −ψ(θ))dx (31) j For a discrete distribution we have n 1 W(BK(p )) h(p) = (cid:88)p i K−d(p ) (32) i i r(1−c)1+W(BK(p ) i=0 i 9 We define the (c,d)-divergence as a canonical divergence D (p(x,θ ),p(x,θ ) = ψ(θ )−ψ(θ )−[∂ ψ(θ )](θi −θi)) (33) c,d,r 1 2 2 1 i 1 2 1 Then we have Theorem 3 For two discrete distributions p = (p ,p ,...,p ) , q = (q ,q ,...,q ) we 0 1 n 0 1 n have for the (c,d)-divergence the expression 1 n p W(BK(p )) D (p,q) = (cid:88) i i (K−d(p )Kd(q )−1) (34) c,d,r i i (1 −c)h(p) 1 +W(BK(p )) i=0 i Finally defining the metric for a discrete distribution ∂2 g (p) = D (p,q)| (35) ij c,d,r q=p ∂q ∂q i j we have Theorem 4 1 g (p) = (H(p )+δ H(p )) (36) ij 0 ij j (1 −c)h(p) where W(BK(x)) H(x) = x (d(d −1)K−2(x)[K(cid:48)(x)]2 +dK−1(x)K(cid:48)(cid:48)(x)) 1 +W(BK(x)) (37) It can be seen that this metric for d=1 and c → 1 gives the Fisher metric. 4.3. Cramer-Rao Inequalities for Complex Systems Here we follow the analysis of Nauds[19, 20]. A new information metric can be defined using two distributions P = P (x,θ) and p = p (x,θ) θ θ θ θ (cid:90) 1 ∂p ∂p θ θ g˜ (θ) = dµ(x) (38) ij P (x) ∂θi ∂θj ω θ For the discrete distribution we get 1 1 g˜ (θ) = +δ (39) ij i,j P P 0 i The following theorem holds 10