Lower Bounds for Matrix Product Amir Shpilka Institute of Computer Science Hebrew University 2 Jerusalem, Israel 0 0 [email protected] 2 n February 1, 2008 a J 2 Abstract ] We provelowerbounds onthe numberofproductgatesinbilinearandquadraticcircuits that C compute the productoftwo n×n matricesoverfinite fields. In particularwe obtainthe following C results: . s 1. We show that the number of product gates in any bilinear (or quadratic) circuit that com- c [ putes the product of two n×n matrices over GF(2) is at least 3n2−o(n2). 1 2. Weshowthatthenumberofproductgatesinanybilinearcircuitthatcomputestheproduct v of two n×n matrices over GF(p) is at least (2.5+ 1.5 )n2−o(n2). p3−1 1 0 These results improve the former results of [3, 1] who proved lower bounds of 2.5n2−o(n2). 0 1 0 1 Introduction 2 0 / s The problem of computing the product of two matrices is one of the most studied computational c problems: We are given two n× n matrices x = (x ), y = (y ), and we wish to compute their : i,j i,j v product, i.e. there are n2 outputs where the (i,j)’th output is i X n r a (x·y)i,j = xi,k ·yk,j . k=1 X In 69’ Strassen surprised the world by showing an upper bound of O(nlog27) [12]. This bound was later improved and the best upper bound today is O(n2.376) [5] (see also [7] for a survey). The best lower bound is a lower bounds of 2.5n2−o(n2), on the number of products needed to compute the function [3, 1]. Thus the following problem is still open: Can matrix product be computed by a circuit of size O(n2) ? Thestandardcomputationalmodelforcomputingpolynomialsisthemodelofarithmeticcircuits, i.e. circuits over the base {+, ·} over some field F. This is indeed the most general model, but for matrix product two other models are usually considered, quadratic circuits and bilinear circuits. In the quadratic model we require that product gates are applied only on two linear functions. In the bilinear model we also require that product gates are applied only on two linear functions, but in addition we require that the first linear function is linear in the variables of x and that the second linear function is linear in the variables of y. These models are more restricted than the general model of arithmetic circuits. However it is interesting to note that over infinite fields we can always assume w.l.o.g. that any circuit for matrix product is a quadratic circuit [13]. In addition we note that the best circuits that we have today for matrix product are bilinear circuits. In this paper we prove that any quadratic circuit that computes matrix product over the field GF(2) has at least 3n2−o(n2) product gates, and that any bilinear circuit for matrix product over the field GF(p) must have at least (2.5+ 1.5 )n2−o(n2) product gates. p3−1 From now on we will use the notation MP to denote the problem of computing the product of n two n×n matrices. 1.1 Known Lower Bounds In contrast to the major advances in proving upper bound, the attempts to prove lower bounds on the size of bilinear circuits that compute MPn were less successful. Denote by q∗(MPn) and bl∗(MPn) the number of product gates in a smallest quadratic circuit for MPn, and in a smallest bilinear circuit for MP respectively. We also denote by bl (MP ) the total number of gates in a n tot n smallest bilinear circuit for MPn. In 78’ Brocket and Dobkin proved that bl∗(MPn) ≥ 2n2−1 over any field [10]. This lower bound was later generalized by Lafon and Winograd to a lower bound on q∗(MPn) over any field [8]. In 89’ Bshouty showed that over GF(2), q∗(MPn) ≥ 2.5n2 −O(nlogn) [3]. Recently Bl¨aser proved a lower bound of 2n2+n−3 on q∗(MPn) over any field [2]. In [1] Bl¨aser proved that bl∗(MPn)≥ 2.5n2 −3n over any field. In [9] it is shown that any bounded depth circuit for MP , over any field, has a super linear (in n n2) size. Notice however, that the best known circuits for MP have depth Ω(logn). n 1.2 Bilinear Rank An important notion that is highly related to the problem of computing matrix product in bilinear circuits is the notion of bilinear rank. A bilinear form in two sets of variables x,y is a polynomial in the variables of x and the variables of y, which is linear in the variables of x and linear in the variables of y. Clearly each output of MP is a bilinear form in x = {x }, y = {y }. The bilinear rank of a set of bilinear forms n i,j i,j { b (x,y), ..., b (x,y) } is the smallest number of rank 1 bilinear forms that span b , ..., b , 1 m 1 m where a rank 1 bilinear form is a product of a linear form in the x variables and a linear form in the y variables. We denote by R (b , ···, b ) the bilinear rank of { b , ..., b } over the field F. For F 1 m 1 m further background see [4, 7]. We denote by R (MP ) the bilinear rank over F of the n2 outputs of matrix product, i.e. it is F n the bilinear rank of the set { n x ·y } over F. k=1 i,k k,j i,j The following inequalitiesPare obvious (over any field). • q∗(MPn) ≤ bl∗(MPn)≤ 2q∗(MPn). • RF(MPn)= bl∗(MPn). • The following inequality is less obvious, but also not so hard to see. bl∗(MPn) ≤bltot(MPn) ≤ poly(logn)·bl∗(MPn). 2 I.e. up to polylogarithmic factors, the number of product gates in a smallest bilinear circuit for MP , over any field F, is equal to the total number of gates in the circuit. n 1.3 Results and Methods We prove that anyquadratic circuit that computes MP over thefield GF(2) hasat least 3n2−o(n2) n product gates (i.e. q∗(MPn) ≥ 3n2 −o(n2) over GF(2)). We also prove that over the field GF(p) everybilinearcircuitforMPn musthaveatleast(2.5+p13.−51)n2−o(n2)productgates(i.e. bl∗(MPn)≥ (2.5+ 1.5 )n2−o(n2) over GF(p)). Both of these results actually hold for the bilinear rank as well. p3−1 The proof of the lower bound over GF(2) is based on techniques from the theory of linear codes. However, we cannot use known results from coding theory in a straightforward way, since we are not dealing with codes in which every two words are distant, but rather with codes on matrices in which the distance between two code words, of two matrices, is proportional to the rank of the difference of the matrices. The reduction from circuits to codes and the proof of the bound are given in section 4. The proof of the second bound is based on a lemma proved by Bl¨aser in [1]. We prove that in the case of finite fields we can use the lemma with better parameters than those used by Bl¨aser. This result is proved in section 5. 1.4 Organization of the paper In section 2 we present the models of bilinear circuits and quadratic circuits. In section 3 we present some algebraic and combinatorial tools that we need for the proofs of our lower bounds. In section 4 we introduce the notion of linear codes of matrices, and prove our lower bound on bilinear and quadratic circuits that compute MP over GF(2). In section 5 we prove our lower n bound on bilinear circuits that compute MP over GF(p). n 2 Arithmetic Models Inthis section we presentthe modelsof quadratic circuits and bilinear circuits. Thesearethemodels for which we prove our lower bounds. We first give the definition of a general arithmetic circuit. An arithmetic circuit over a field F is a directed acyclic graph as follows. Nodes of in-degree 0 are called inputs and are labeled with input variables. Nodes of out-degree 0 are called outputs. Each edge is labeled with a constant from the field and each node other than an input is labeled with one of the following operations { + , · }, in the first case the node is a plus gate and in the second case a product gate. The computation is done in the following way. An input just computes the value of the variable that labels it. Then, if v , ..., v are the vertices that fan into v then we multiply the 1 k result of each v with the value of the edge that connects it to v. If v is a plus gate we sum all the i results, otherwise v is a product gate and we multiply all the results. Obviously the value computed by each node in the circuit is a polynomial over F in the input variables. We are interested in the problem of computing the product of two n×n matrices, MP . The n input consists of two n×n matrices x, y. The output is the matrix x·y, i.e., there are n2 outputs, 3 and the (i,j)’th output is: n (x·y) = x ·y . i,j i,k k,j k=1 X Each output (x·y) is hence a bilinear form in x and y. i,j Since each output of MP is a bilinear form, it is natural to consider bilinear arithmetic circuits n for it. A bilinear arithmetic circuit is an arithmetic circuit with the additional restriction that product gates are applied only on two linear functions, one function is linear in the variables of x and the other function is linear in the variables of y. Thus, bilinear circuits have the following structure. First, there are many plus gates computing linear forms in x and linear forms in y. Then there is one level of productgates that compute bilinear forms, and finally there are many plus gates that eventually compute the outputs. We will be interested in bounding from below the number of products in any bilinear circuit for MP . This model is more restricted than the general model of n arithmetic circuits but we note that all the known upper bounds (over any field) for MP are by n bilinear circuits. Another model that we will consider is the model of quadratic circuits. A quadratic circuit is an arithmetic circuit with the additional restriction that product gates are applied only on two linear functions. Notice that the only difference between quadratic circuits and bilinear circuits is that in the quadratic model the product gates compute quadratic forms in x, y, whereas in the bilinear model the product gates compute bilinear forms in x, y. This model is more general than the model of bilinear circuits, but it is still more restricted than the general model. However it is interesting to note that over infinite fields we can assume w.l.o.g. that any arithmetic circuit for MP is a n quadratic circuit [13]. 3 Algebraic and Combinatorial tools In this section we present some algebraic and combinatorial tools that we will use. The following lemma is an extremely weak variant of the famous Schwartz-Zippel lemma which shows that every non zero polynomial (non zero as a formal expression) over a large enough field has a non zero assignment in the field (see [11, 15]). Lemma 1 Let P be a polynomial of degree d in x , ..., x over some field F, such that d < |F|, 1 n and such that at least one of the coefficients of P is not zero. Then we can find an assignment, ρ∈ Fn, to the x ’s, such that P(ρ ,...,ρ )6= 0. i 1 n We say that two polynomials p,q in n variables are equivalent over a field F, if p(x ,...,x ) = 1 n q(x ,...,x ) for any x , ..., x ∈ F. We denote p ≡ q if p and q are equivalent over F (we omit F 1 n 1 n from the notation as the field that we deal with will be clear from the context). Lemma 2 Let P be a polynomial of degree d in the variables x , ..., x over a field F. If P 6≡ 0 1 n then we can find an assignment, ρ ∈ Fn, to the x ’s such that at most d of the ρ ’s get a nonzero i i value, and such that P(ρ ,...,ρ ) 6= 0. 1 n Proof: P is equal (as a function) to a polynomial P¯ in which the degree of each variable is at most |F|−1. We call P¯ the reduction of P. Considersome monomial M in P¯ whosecoefficient is not zero. We assign all the variables that do not appear in M to zero. The resulting polynomial (after the 4 assignment), isapolynomialinthevariables ofM, whichisnotthezeropolynomialas itisareduced polynomial which has a monomial with a non zero coefficient (M of course). Therefore according to lemma 1 there is some assignment to the variables of M, that gives this polynomial a nonzero value. Therefor we have found an assignment which gives nonzero values only to the variables of M (and there are at most d such variables) under which P 6= 0. ♣ The following useful lemma, which is a straightforward implication of the previous lemma, is the key lemma in most of our proofs. The lemma deals with linear forms in n2 variables. From now on we shall think about such linear forms as linear forms in the entries of n×n matrices. Lemma 3 Let µ1, ..., µn2 be n2 linearly independent linear forms in n2 variables over some field F. Let P be a polynomial of degree d in kn2 variables over F, i.e. we can view P as a polynomial P(x ,...,x ) in the entries of k matrices, x ,...,x , of size n×n each. Assume that P 6≡ 0. Then we 1 k 1 k can find k matrices a , ..., a ∈ M (F) such that P(a ,...,a ) 6= 0 and such that there exist n2−d 1 k n 1 k linear forms among µ1,...,µn2’s that vanish on all the ai’s. Proof: The idea of the proof is the following. Let b1, ..., bn2 be the dual basis of µ1,...,µn2, i.e. it is a basis of M (F) satisfying ∀i,j µ (b ) = δ . We wish to find k matrices, a ,...,a , such that n i j i,j 1 k P(a ,...,a ) 6= 0, and such that there exist b ,...,b that span all of them. If we manage to find 1 k i1 id such matrices, then since the b ’s are the dual basis to the µ ’s we will get that n2 −d of the µ ’s i i i vanish on a ,...,a . The way to find such matrices that are contained in the span of a small subset 1 k of the b ’s, is based on lemma 2. i So let b1, ..., bn2 be the dual basis to µ1, ..., µn2, i.e. ∀i,j µi(bj) = δi,j. We now change the variables of P. Let α j = 1...k, i = 1...n2, be a set of kn2 variables. Denote x = n2 α b . i,j j i=1 i,j i Thus P(x ,...,x ) can be viewed as a polynomial of degree d in the kn2 variables α . Therefore 1 k i,jP P 6≡ 0 as a polynomial in the α ’s. Hence, according to lemma 2 there exists an assignment, ρ, i,j to the α ’s such that at most d of them get a nonzero value. Define a = n2 ρ b . Clearly i,j j i=1 i,j i P(a ,...,a ) 6= 0. Since at most d of the ρ ’s got non zero values, we see that there are at most 1 k i,j P d b ’s such that all the a ’s are linear combinations of them. Since the b ’s are the dual basis to i j i µ1, ..., µn2 we get that there are at least n2 −d of the µi’s that vanish on all the aj’s. Therefore a , ..., a satisfy the requirements of the lemma. ♣ 1 k Thenextlemmawillenableustotranslatepropertiesofmatrices overlargefieldsofcharacteristic p to properties of matrices (of higher dimension) over GF(p). Lemma 4 There exist an embedding, φ : GF(pn) ֒→ M (GF(p)). That is there exist a mapping n φ: GF(pn)7→ M (GF(p)) such that n • φ is a one to one linear transformation. • φ(1) = I, where I is the n×n identity matrix. • φ is multiplicative, i.e. ∀x,y ∈GF(pn) we have that φ(xy) =φ(x)·φ(y). This embedding also induces an embedding M (GF(pn)) ֒→ M (GF(p)). k nk This lemma is a standard tool in algebra, but for completeness we give the proof. 5 Proof: GF(pn) is an n dimensional vector space over GF(p). Each element x ∈ GF(pn) can be viewed as a linear transformation x: GF(pn) 7→ GF(pn) in the following way: ∀y ∈ GF(pn) x(y) = x·y . Clearly this is a linear transformation of GF(pn) into itself, as a vector space over GF(p). Therefore, by picking a basis to GF(pn) we can represent the linear transformation corresponding to each x ∈ GF(pn) by a matrix a ∈ M (GF(p)). Thus, we have defined a mapping φ: GF(pn) 7→ M (GF(p)) x n n such that φ(x) = a , and it is easy to verify that this mapping is an embedding of GF(pn) into x M (GF(p)). The way to generalize it to an embedding of M (GF(pn)) into M (GF(p)) is the n k nk following. Let a= (a )∈ M (GF(pn)) be some matrix. Every entree of a of a, is some element of i,j k i,j GF(pn). Wecan nowreplace a withthematrixφ(a ). Thustheresultingmatrix willbeakn×kn i,j i,j matrix whose entries are in GF(p). Again it is easy to verify that this is indeed an embedding of M (GF(pn)) into M (GF(p)). ♣ k nk In addition to the algebraic lemmas we also need the following combinatorial tools. Definition 1 Let F be a field, and let v, u be two vectors in Fm. We denote by weight(v) the number of nonzero coordinates of v. Let dH(v,u) = weight(v − u), i.e. dH(v,u) is the number of coordinates on which u and v differ. dH(v,u) is also known as the Hamming distance of u and v. We also denote by agree(u,v) the number of coordinates on which u and v are equal, i.e. agree(u,v) = m−dH(v,u). Thenext lemma shows that if a vector space contains a set of vectors such that every pair/triplet of them don’t agree on many coordinates (i.e. their Hamming distance is large) then it is of large dimension. There are numerous similar lemmas in coding theory, and in particular the first part of our lemma is the famous Plotkin bound (see [14]). Lemma 5 1. In every set of k vectors in GF(p)t, such that p < k, there are two vectors that agree on at least (t − t) coordinates. p k 2. In every set of k vectors in GF(p)t, such that 2p < k, there are three vectors that agree on at least ( t − 3t) coordinates. p2 pk Proof: We begin by proving the first claim. Let v ,...,v be k vectors in GF(p)t. We are going to 1 k estimate agree(v ,v ) in two different ways. On the one hand this sum is at most k times the i<j i j 2 maximum of agree(v ,v ). On the other hand consider a certain coordinate. For every α ∈ GF(P) P i j (cid:0) (cid:1) denote by n the number of vectors among the v ’s that are equal to α on this coordinate. Clearly α i p−1 n = k. The contribution of this coordinate to agree(v ,v ) is exactly p−1 nα . By α=0 α i<j i j α=0 2 convexity P p−1 P P (cid:0) (cid:1) n 1 k k k(k−p) α ≥ p· · −1 = . α=0 2 ! 2 p (cid:18)p (cid:19) 2p X We get that k ·max(agree(v ,v )) ≥ i j 2! i<j k(k−p) agree(v ,v ) ≥ t· . i j 2p i<j X 6 Therefore t k−p t k−p t t max(agree(v ,v )) ≥ · ≥ · = − . i j i<j p k−1 p k p k The proof of the second claim is similar. We give two different estimates to agree(v ,v ,v ) i<j<l i j l (the number of coordinates on which v , v , and v are the same). In the same manner as before we i j l P get that t k−p k−2p t 3t max(agree(v ,v ,v )) ≥ · · ≥ − . i<j<l i j l p2 k−1 k−2 p2 pk ♣ Corollary 1 If {0,1}t contains k vectors v1, ..., vk, such that 2 < k and ∀i 6= j dH(vi,vj) ≥ N, then t ≥ 2N −4 N . k+2 Proof: According to lemma 5 there are two vectors, w.l.o.g. v and v , such that agree(v ,v ) ≥ 1 2 1 2 2t − kt. Since dH(v1,v2) = t−agree(v1,v2) we get that t t t−( − )≥ dH(v1,v2)≥ N 2 k and the result follows. ♣ 4 Lower bound over GF(2) In this section we prove our main theorems. 5 5 Theorem 1 bl∗(MPn)≥ 3n2−O(n3) (in other words RGF(2)(MPn) ≥ 3n2−O(n3)). The second theorem that we shall prove is a lower bound for quadratic circuits. 5 Theorem 2 q∗(MPn) ≥ 3n2−O(n3). I.e. the number of product gates in any quadratic circuit that 5 computes the product of two n×n matrices over GF(2) is at least 3n2−O(n3). Clearly theorem 2 imply theorem 1, but we first prove of theorem 1 as it is more intuitive and simple. We begin by introducing the notion of linear codes of matrices. 4.1 Linear Codes of Matrices Definition 2 A linear code of matrices is a mapping, Γ : M (GF(2)) 7→ {0,1}m , n (for some m) with the following properties: • Γ is linear. • For any matrix a, weight(Γ(a)) ≥ n·rank(a). From the linearity of Γ and the requirement on weight(Γ(a)) we get the following corollary. 7 Corollary 2 Γ is a one to one mapping, and for any two matrices a and b, dH(Γ(a),Γ(b)) ≥ n·rank(a−b). The following theorem shows that the dimension of the range of any linear code of matrices is large (i.e. m must be large). 5 Theorem 3 Let Γ :Mn(GF(2)) 7→ {0,1}m be a linear code of matrices, then m ≥ 3n2−O(n3) . Proof: Denote Γ(a) = (µ (a), ..., µ (a) ) . 1 m 1 Theproofisbasedonthefollowinglemmathatshowsthatwecanfindk = n3 matrices,a1, ..., ak ∈ M (GF(2)), with the following properties. n • ∀i6= j , a −a is an invertible matrix. i j • There are n2− k n linear forms among the µ ’s that vanish on all the a ’s. 2 i i (cid:0) (cid:1) 1 We state the lemma for every k < 2n but we apply it only to k = n3. Lemma 6 For every n,k such that k < 2n, and any µ1, ..., µn2 linearly independent linear forms in n2 variables, over GF(2), there are k matrices, a ,...,a ∈ M (GF(2)), such that for every i 6=j, 1 k n a −a is an invertible matrix, and such that n2− k n of the µ ’s vanish on them. i j 2 i Proof: Consider the following polynomial P in k m(cid:0)a(cid:1)trices: P(a ,...,a )= determinant (a −a ) . 1 k i j i<j Y Clearlyasetofk matricesa ,...,a satisfyP(a ,...,a ) 6= 0iffallthematricesa −a areinvertible. 1 k 1 k i j k In addition, it is easy to see that deg(P) = n. Therefore if we show that P 6≡ 0 over GF(2), then 2 according to lemma 3 we will get what we wanted to prove. (cid:0) (cid:1) In order to show that P 6≡ 0 we just have to prove the existence of k matrices, such that the differenceof every two ofthem is invertible. Lemma4assuresusthat wecan embedthefieldGF(2n) intoM (GF(2)). DenotethisembeddingbyΦ :GF(2n)֒→ M (GF(2)). Wetakek distinctelements n n in GF(2n), x , ..., x . Their images, Φ(x ), ..., Φ(x ), are matrices in M (GF(2)) such that the 1 k 1 k n difference of every two of them, Φ(x )−Φ(x )= Φ(x −x ), is an invertible matrix. This is because i j i j the x ’s are distinct (i.e. x −x 6= 0), and every nonzero element in GF(2n) is invertible. Thus, i i j Φ(x ), ..., Φ(x ) are exactly the k matrices that we were looking for. This concludes the proof of 1 k the lemma. ♣ 1 We proceed with the proof of the theorem. Let k = n3. Since Γ is a one to one mapping, there are n2 independent linear forms among µ , ..., µ . Therefore we can use lemma 6 and get that 1 m there are k matrices a , ..., a such that for every i6= j a −a is invertible, and such that, w.l.o.g., 1 k i j µm−r+1, ..., µm vanish on a1, ..., ak for some r ≥ n2− k2 n ≥ n2−n35. Since the last r linear forms vanish on all the ai’s, we(cid:0) a(cid:1)re going to restrict our attention only to the first m−r linear forms. So from now on we only consider Γ(a ) restricted to its first m−r i coordinates. 8 Since each of the differences, a − a (∀i 6= j), is an invertible matrix, we get that i j dH(Γ(ai),Γ(aj)) ≥ n2. Thus, Γ(a1), ..., Γ(ak) are k vectors contained in {0,1}m−r (we con- sider only their first m−r coordinates !) such that the hamming distance of every pair of them is at least n2. Therefore according to corollary 1 we get that n2 m−r ≥ 2n2−4 . k+2 5 1 Since r ≥ n2−n3 and k = n3, we get that 5 m ≥ 3n2−O(n3) which is what we wanted to prove. This concludes the proof of the theorem. ♣ 4.2 Proof of Theorem 1 Assume that bl∗(MPn) = m. Let C be a smallest bilinear circuit for MPn. Let µ (x)·η (y), ..., µ (x)·η (y) 1 1 m m be the m bilinear forms computed in the product gates of C. We will show that these bilinear forms define in a very natural way a code on M (GF(2)). The code thus defined, will have the property n that the dimension of the space into which the code maps M (GF(2)) is exactly m. Thus, according n 5 to theorem 3 we will get that m ≥ 3n2−O(n3), which is what we wanted to prove. So we begin by defining a mapping from M (GF(2)) to {0,1}m. Let Γ : M (GF(2)) 7→ {0,1}m n n be the following mapping. Γ(x) = (µ (x), ..., µ (x)) . 1 m Notice that we ignore the η ’s in the definition of Γ. The next lemma shows that Γ is a linear code i of matrices. Lemma 7 Γ is a linear transformation with the property that for every matrix x ∈ M (GF(2)), n weight(Γ(x)) ≥ n·rank(x). Proof: ClearlyΓisalineartransformationfromM (GF(2))to{0,1}m. Soweonlyhavetoprovethe n claim about the weights. Let x be a matrix of rank r. Assume w.l.o.g. that µ (x) = ... = µ (x) = 1 1 k and that µ (x) = ... = µ (x) = 0, i.e. weight(Γ(x)) = k. We shall show that k ≥ nr. For k+1 m every y ∈ M (GF(2)), the n2 entries of x·y are functions of µ (x)·η (y), ..., µ (x)·η (y). Since n 1 1 m m µ (x) = ... = µ (x) = 0, we get that x·y is a function of η (y), ..., η (y). Therefore there are k+1 m 1 k at most 2k different matrices of the form x·y. Since rank(x) = r we get that there are exactly 2nr different matrices of the form x·y. Therefore k ≥ nr. This concludes the proof of the lemma. ♣ 5 Therefore Γ is a linear code of matrices, so according to theorem 3 we get that m ≥ 3n2−O(n3) which is what we wanted to prove. This concludes the proof of theorem 1. ♣ 9 4.3 Proof of Theorem 2 As in the proof of theorem 1 we will show that every quadratic circuit for MP , defines a code on n M (GF(2)). The code thus defined, will have the property that m (i.e the dimension of the space n into which the code maps M (GF(2))) is exactly the number of product gates in the circuit. Thus, n 5 according to theorem 3 we will get that m ≥ 3n2−O(n3), which is what we wanted to prove. LetC beaquadraticcircuitforMP . Assumethattheproductgates ofC computethequadratic n forms µ (x,y)·η (x,y), ..., µ (x,y)·η (x,y). Thus, each of the outputs (x·y) can be written 1 1 m m i,j as a sum of these quadratic forms: m (k) (x·y) = α ·µ (x,y)·η (x,y) , i,j i,j k k k=1 X (k) where α ∈ {0,1}. i,j We would like to have a proof similar to the proof of theorem 1. In that proof we defined a code of matrices using the linear transformation µ ,...,µ . Unfortunately this method will fail here as µ 1 m i is a linear function in both the variables of x and the variables of y and not just in the variables of x as in the proof of theorem 1. In order to overcome this obstacle we introduce a new set of variables z ={z } . We think about z as an n×n matrix. Define the following m linear forms in z: i,j i,j=1...n (k) γ (z) = α z , k = 1,...,m . k i,j i,j i,j X We get that m µ (x,y)·η (x,y)·γ (z) = k k k k=1 X m (k) ( z ·α )·µ (x,y)·η (x,y) = i,j i,j k k k=1 i,j X X m (k) z α ·µ (x,y)·η (x,y) = (1) i,j i,j k k i,j k=1 X X z ·(x·y) = trace(x·y·zt) , i,j i,j i,j X where (zt) = z . The computation that we just performed shows that the γ ’s that we introduced i,j j,i k are quite natural. We also notice that z plays the same role in trace(x·y ·zt) as x and y. These observations motivate us to try to repeat the proof of theorem 1 using the γ ’s instead of the µ ’s. k i So define a linear mapping Γ : M (GF(2)) 7→ {0,1}m by n Γ(z) = (γ (z), ..., γ (z)) . 1 m The following lemma shows that Γ is indeed a linear code of matrices. Lemma 8 Γ is a linear mapping and it has the property that for every matrix z, weight(Γ(z)) ≥ n·rank(z). 10