ebook img

Private Broadcasting: an Index Coding Approach PDF

0.41 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Private Broadcasting: an Index Coding Approach

1 Private Broadcasting: an Index Coding Approach Mohammed Karmoose, Linqi Song, Martina Cardone, Christina Fragouli University of California Los Angeles, Los Angeles, CA 90095 USA Email: mkarmoose, songlinqi, martina.cardone, christina.fragouli @ucla.edu { } 7 1 0 2 n Abstract a J 2 Usingabroadcastchanneltotransmitclients’datarequestsmayimposeprivacyrisks.Inthispaper, 2 we address such privacy concerns in the index coding framework. We show how a malicious client can ] T infersomeinformationabouttherequestsandsideinformationofotherclientsbylearningtheencoding I matrix used by the server. We propose an information-theoretic metric to measure the level of privacy . s c and show how encoding matrices can be designed to achieve specific privacy guarantees. We then [ consider a special scenario for which we design a transmission scheme and derive the achieved levels 2 v of privacy in closed-form. We also derive upper bounds and we compare them to the levels of privacy 8 5 achieved by our scheme, highlighting that an inherent trade-off exists between protecting privacy of the 9 4 request and of the side information of the clients. 0 . 1 0 I. INTRODUCTION 7 1 : Consider a set of clients who share the same broadcast domain and wish to download data v i X content from a server. Even though the content that they request may be publicly available, they r a wish to preserve the anonymity of their requests. For instance, assume that a client requests a video from YouTube related to a particular medical condition. If other clients learn about the identity of that request, this may then violate the privacy of that client. In this paper, we are interested in studying how to maintain the privacy of clients sharing a broadcast domain. It is well established that coding across the content messages of the clients is needed to efficiently use the shared broadcast domain, as formalized in index coding [1]. A typical index coding instance consists of a server with m messages, connected through a broadcast channel to a set of n clients. Each client possesses a subset of the messages as side information and 2 requires a specific new message. The server then uses these side information sets to send coded transmissions, which efficiently deliver the required messages to the clients. In this paper we claim that index coding poses a privacy challenge. Consider, for example, that a server transmits b + b to satisfy client 1. Since this is a broadcast transmission, other 1 2 clients observing this transmission will infer that the request of client 1 is either b or b , while 1 2 the other message must belong to her side information. This suggests that, although the clients can securely convey their requests to the server (e.g., through pairwise keys), a curious client may be able to infer information about the requests and/or side information sets of other clients by learning the encoding matrix used to generate the broadcast transmissions. The first question we ask is: how much information does the encoding matrix in index coding reveal about the requests and the side information of other users? At a high level, one can think of the request and side information as two shared secrets between each client and the server, where one secret could be used to protect the other. Therefore, as we also show in the paper, these two aspects exhibit a trade-off: maintaining a certain level of privacy on one aspect limits the amount of privacy level achieved on the other. We also ask: can we design index coding matrices that, for a given number of transmissions, achieve the highest possible level of privacy? How should these matrices be designed and how much privacy can they guarantee? In this paper, we take first steps in answering such questions. Our main contributions can be summarized as follows: 1) We propose an information-theoretic metric to characterize the levels of privacy that can be guaranteed. We then provide guidelines for designing encoding matrices and transmission strategies to achieve high privacy levels; 2) We design an encoding matrix and characterize the maximum levels of privacy that it can achieve; 3) We derive universal upper bounds (i.e., which hold independently of the scheme that is used) on the maximum levels of privacy that can be attained; 4) We consider a special case of the problem and we characterize in closed-form the levels of privacy achieved by our scheme, which then we compare to the outer bounds, hence highlighting the privacy trade-off. Related Work. In secure index coding [2], the primal goal is to design strategies such that a passive external eavesdropper – who wiretaps the communication from the server to the clients – cannot learn any information about the messages. Differently, in this work we seek to protect 3 clients’ privacy against adversaries who wish to learn information about the identity of the requests and side information sets of the clients. Recently, there has been a lot of effort trying to address privacy concerns in communication setups. For instance, a set of relevant work has considered the problem of protecting privacy of a user against a database. This problem was introduced in [3] and is known as Private Information Retrieval (PIR). Specifically, in PIR a client wishes to receive a specific message from a set of (possibly colluding) databases, without revealing the identity of the request. Towards this end, data request and/or storage schemes were designed [4], [5] and recently the PIR capacity was characterized [6], [7]. In cryptography, the Oblivious Transfer (OT) problem [8] has a close connection to PIR [9]. Specifically, in OT the goal is to protect both the privacy of the client against the server (i.e., as in PIR, the identity of the request of the client is not revealed to the server) and the privacy of the server against the client (i.e., the client learns only the requested message). OT has also been used as a primitive to build techniques for secure multi-party computation [9]. Differentfromtheseworks,inthispaperweseektounderstandtheprivacyissuesthatcanarise among clients who share the same broadcast domain. Specifically, we seek to design techniques that guarantee high levels of privacy both in the side information and in the request of a client against another curious client. Given the different problem formulation, the techniques developed to solve the PIR and OT problems do not easily extend to our setup. Paper Organization. The paper is organized as follows. In Section II we define our setup. In Section III we provide definitions and guidelines on how to design privacy-preserving transmis- sion schemes and we derive fundamental upper bounds. In Section IV we present the design of a privacy-preserving matrix. Based on this matrix, in Section V we consider a specific scenario for which we propose a transmission scheme and assess its performance. In Section VI we conclude the paper. Some of the proofs are delegated to the appendices. Notation. Calligraphic letters indicate sets; boldface lower case letters denote vectors and bold- face upper case letters indicate matrices; is the cardinality of ; [n] is the set of inte- |X| X gers 1, ,n ; 2[n] and (cid:0)[n](cid:1) are the power set and the set of all possible subsets of [n] of { ··· } s size s, respectively; for all x R, the floor function is denoted with x ; for a sequence ∈ (cid:98) (cid:99) X = X ,...,X , X is the subsequence of X where only the elements indexed by are 1 n S { } S retained; 0 is the all-zero matrix of dimension i j; A is the submatrix of A where only the i×j S × 4 columns indexed by are retained; span(A) is the linear span of the columns of A; H(X y) S | is the entropy of the random variable X, conditioned on the specific realization y; (cid:0)n(cid:1) = 0 if k k < 0 or k > n; logarithms are in base 2. II. SETUP We consider a typical index coding instance, where a set of clients = c , with = n, [n] N { } |N| are connected to a server through a shared broadcast channel. The server has a database of messages = b , with = m. Each client c ,i [n], is represented by a pair of [m] i M { } |M| ∈ ¯ random variables, namely: (i) Q [m] associated with the index of the message that c wishes i i ∈ to download from the server and (ii) S¯ 2[m], associated with the indices of the subset of i ∈ ¯ messages she already has as side information. We indicate with q¯ and the realizations of i i S ¯ ¯ Q and S , respectively, which are chosen uniformly at random from their respective domains. i i ¯ ¯ ¯ Clearly, q¯ / . We assume that the pairs (Q ,S ), i [n], are independent across i [n]. i i i i ∈ S ∀ ∈ ∈ Server Model. We assume that the server knows the request and the side information of each ¯ ¯ ¯ client, i.e., it is aware of the realizations of the random variables Q = q¯ and S = , with i i i i S i [n]. Given this, the server seeks to satisfy the requests of the clients through T broadcast ∈ transmissions. The server employs linear encoding, i.e., each transmission consists of a linear combination of the m messages, where the coefficients are chosen from a finite field F with L L being large enough. This can be mathematically formulated as Ab = y, where b Fm is the ∈ L column vector of the m messages, A FT×m is the encoding matrix used by the server and ∈ L y FT is the column vector with linear combinations of the messages. ∈ L Therefore, a transmission scheme employed by the server consists of the following two components: i) Transmission space: a specific set of encoding matrices designed to satisfy the clients A and protect their privacy; ¯ ii) Transmission strategy: a function that, given (q¯ , ), determines the encoding matrix [n] [n] S A to be used. We model the output of the function as a random variable A where ∈ A ˆ ˆ ¯ A = AaccordingtoaprobabilitydistributionpA|Q¯ ,S¯ (A q¯[n], [n])thathastobedesigned. [n] [n] | S Adversary Model. We assume that some of the clients – referred to as eavesdroppers – are malicious. Specifically, the eavesdroppers are non-cooperative clients who, based on the broadcast transmissions they receive, are eager to infer information about the requests and the 5 side information sets of other clients. Since the eavesdroppers do not cooperate, without loss of generality, we can assume that there is only one eavesdropper in the system, namely client c . n In addition, we assume that the eavesdropper c : (i) is aware of both the transmission scheme n employed by the server and the underlying distribution based on which the clients obtain their requests and side information sets; (ii) has infinite computational power; (iii) knows the size of ¯ the side information set of each client, i.e., s = ,i [n]. This last assumption, which we i i |S | ∈ make to simplify the analysis, provides pessimistic privacy guarantees with respect to a scenario where the eavesdropper does not have this information. Basedonthisknowledge,theeavesdropperc wishestoinferinformationabouttherequestand n side information of the other clients. Specifically, we denote with Q and S the random variables, i i which represent the eavesdropper’s estimate of the request and side information of client c , i respectively and we let p (q ) and p ( ) be the corresponding probability density functions. Qi i Si Si For ease of notation, in the rest of the paper, we drop the subscripts from the probability density ¯ ¯ functions while retaining the arguments. Clearly, Q = Q and S = S . Before transmission, n n n n the eavesdropper is completely oblivious to Q and S for i [n 1]; we model this situation i i ∈ − by having p(q s ) and p( s ) uniformly distributed over [m] and (cid:0)[m](cid:1), respectively1. Then, by i| i Si| i si ˆ learning the specific encoding matrix A employed by the server, the eavesdropper infers some information about the other clients, which is reflected in the conditional probability distributions ˆ ˆ p(q A,s ,q , ) and p( A,s ,q , ). i [n] n n i [n] n n | S S | S Privacy Metric. We consider the amount of knowledge the eavesdropper has about the variables Q and S as a privacy metric. In particular, we evaluate how far the uniform distribution is from i i ˆ the conditional distribution that the eavesdropper has after learning the encoding matrix A. Let X Q ,S . Then, inspired by the t-closeness metric for data privacy [10], we consider [n] [n] ∈ { } ˆ the Kullback–Leibler divergence as a distance metric between the distributions p(x A,s ,q , ) i n n | S and p(x s ), namely i | ˆ ˆ D (p(x A,s ,q , ) p(x s )) = log( ) H(X A,s ,q , ), (1) KL [n] n n i [n] n n | S || | |X| − | S where is the support of X (note that the entropy used throughout the paper is conditioned X ˆ ˆ on specific realizations). If D (p(x A,s ,q , ) p(x s )) = 0, i.e., H(X A,s ,q , ) = KL [n] n n i [n] n n | S || | | S 1Inprinciple,inp(q |s )andp(S |s )weshouldalsohaveq ,S ands intheconditioning.However,since(Q¯ ,S¯), i i i i n n [n]\{i} i i ∀i∈[n], are independent across i, we can safely drop this dependence. 6 log( )), then the eavesdropper has no knowledge of the variable X. Differently, larger values |X| ˆ ˆ of D (p(x A,s ,q , ) p(x s )), i.e., smaller values of H(X A,s ,q , ) indicate lower KL [n] n n i [n] n n | S || | | S ˆ levels of privacy. Therefore, we consider H(X A,s ,q , ) as an indication of the level of [n] n n | S privacy attained for the variable X. We focus on designing transmission schemes with guaranteed levels of privacy regarding three different quantities for each client: ˆ i) Privacy in the request, captured by H(Q A,s ,q , ); i [n] n n | S ˆ ii) Privacy in the side information, captured by H(S A,s ,q , ); i [n] n n | S ˆ iii) Joint privacy, captured by H(Q ,S A,s ,q , ). i i [n] n n | S Therefore, our goal is to design a transmission scheme which provides privacy guarantees - in terms of the aforementioned metrics - for a given number of transmissions. III. GUIDELINES FOR PROTECTING PRIVACY ¯ ¯ ˆ Based on the knowledge of (Q ,S ), the server chooses to use an encoding matrix A = A [n] [n] such that it satisfies all clients, i.e., it allows each client to decode her request using her side information set. ˆ ˆ Definition III.1. A (q, ) pair is said to be decodable in A if, using A as encoding matrix, S message b can be decoded knowing b . q S ˆ Definition III.2. A q (or ) is said to be decodable in A if there exists (or q) such that (q, ) S S S ˆ is decodable in A. In order to design an encoding matrix that satisfies all clients, we rely on the following lemma – a slight variation of [11, Lemma 4] – which provides a decodability criterion for (q, ) using S ˆ a matrix A. ˆ Lemma III.1 (Decodability Criterion). Let A be the encoding matrix used by the server. Then, ˆ ˆ ˆ the pair (q, ) is decodable in A iff A / span(A ). q [m]\{q∪S} S ∈ Lemma III.1 provides a necessary and sufficient algebraic condition on whether a particular (q, ) pair is decodable using a given encoding matrix. The eavesdropper, when trying to infer S information about c ,i [n 1], can therefore apply this decodability criterion on all possible i ∈ − ˆ (q , ) pairs with = s , to determine the subset of pairs that are decodable using A. In other i i i i S |S | ¯ words, since she knows that the request of client c must be satisfied, then the actual (q¯, ) i i i S pair of client c must belong to this set of decodable pairs. Thus, the size of the set of decodable i 7 pairs with side information sets of size s determines the uncertainty that the eavesdropper has i regarding the information of client c and hence the attained levels of privacy for c . Therefore, i i in order to maintain high levels of privacy, it is imperative to design encoding matrices with decodable sets of large sizes. We next formalize this intuition. Towards this end, we define the following three quantities: (i) (Aˆ,s ), i.e., the set of decodable (q , ) pairs in Aˆ for client c ; (ii) Q(Aˆ,s ), i.e., the i i i i i D S D set of decodable q in Aˆ for client c , and (iii) S(Aˆ,s ), i.e., the set of decodable in Aˆ for i i i i D S client c . To better understand this notation, consider the following example. i   1 0 0 0 0 ˆ Example. Consider m = 5, n = 2 and s1 = 1. If the server uses A1 =   as an 0 0 1 0 0 encoding matrix, then (Aˆ ,1) = (1,i),(3,j) with i [5] 1 and j [5] 3 , Q(Aˆ ,1) = 1 1 D { } ∈ \{ } ∈ \{ } D  1 1 0 0 0 1,3 and S(Aˆ1,1) = [5]. Now, suppose that the server uses Aˆ2 =  . Then, { } D 0 0 1 1 0 (Aˆ ,1) = (1,2),(2,1),(3,4),(4,3) , Q(Aˆ ,1) = S(Aˆ ,1) = [4]. Clearly, (Aˆ ,1) > 2 2 2 1 D { } D D |D | (Aˆ ,1) and S(Aˆ ,1) > S(Aˆ ,1) , but Q(Aˆ ,1) < Q(Aˆ ,1) . 2 1 2 1 2 |D | |D | |D | |D | |D | With this, we have the following remark that relates the privacy metrics to the sizes of the decodable sets (see Appendix A for details). ˆ Remark III.2. When the eavesdropper observes the encoding matrix A, then for all i [n 1] ∈ − and s [m 1], we have i ∈ − ˆ ˆ H(Q ,S A,s ,q , ) log (A,s ) , (2a) i i [n] n n i | S ≤ |D | H(Q Aˆ,s ,q , ) log Q(Aˆ,s ) , (2b) i [n] n n i | S ≤ |D | H(S Aˆ,s ,q , ) log S(Aˆ,s ) . (2c) i [n] n n i | S ≤ |D | Moreover, these bounds are tight iff the corresponding probability distributions are uniform. Namely: ˆ ˆ i) eq.(2a) is tight iff p(q , A,s ,q , ) is uniform over (q , ) (A,s ); i i [n] n n i i i S | S S ∈ D ii) eq.(2b) is tight iff p(q Aˆ,s ,q , ) is uniform over q Q(Aˆ,s ); i [n] n n i i | S ∈ D iii) eq.(2c) is tight iff p( Aˆ,s ,q , ) is uniform over S(Aˆ,s ). i [n] n n i i S | S S ∈ D Remark III.2 implies that the sizes of the decodable sets give an upper bound on the corre- sponding levels of the privacy metrics. Moreover, one can show that the conditions i) to iii) in ˆ ¯ Remark III.2 hold – and hence bounds (2a) to (2c) are tight – if p(A q¯ , ) in the transmission [n] [n] | S 8 strategy (described in Section II) is properly designed. For instance, using Bayes’ rule, it can be shown – see Appendix A for the details – that condition i) is satisfied iff (cid:88) ˆ p(A q , ,s ), = [n 1] i [n] [n] [n] | S K − \{ } qK,SK∈ (cid:81) D(Aˆ,sj) j∈K ˆ is the same for all (q , ) (A,s ). i i i S ∈ D From Remark III.2, it follows that the design of privacy-preserving transmission schemes consists of two main steps: (i) designing encoding matrices with large decodable sets and (ii) using transmission strategies which satisfy uniformity conditions and hence achieve maximum levels of privacy. Based on the result in Remark III.2, we now derive universal upper bounds (i.e., which hold independently of the encoding matrix that the server uses) on the decodable sets and hence on the levels of the privacy metrics. In particular, we have Lemma III.3. For any Aˆ FT×m and s [m 1], we have ∈ L i ∈ − (cid:18) (cid:19) m (Aˆ,s ) T =: UB , (3a) i Q,S |D | ≤ s i Q(Aˆ,s ) m =: UB , (3b) i Q |D | ≤ (cid:18) (cid:19) m S(Aˆ,s ) =: UB . (3c) i S |D | ≤ s i Proof:Theupperboundsin(3b)and(3c)simplyfollowbynoticingthatthesizeofadecodable set is upper bounded by the size of the support of the corresponding random variable. We next prove the bound in (3a). For a given encoding matrix Aˆ FT×m, one can write (Aˆ,s ) = ∈ L D i (cid:80) (Aˆ, ), where (Aˆ, ) is the set of requests q Q(Aˆ,s ) for which the pair Si∈([smi])N Si N Si i ∈ D i ˆ ˆ (q , ) is decodable. According to Lemma III.1, for each q (A, ), A is not in the span i Si i ∈ N Si qi ˆ ˆ of A . It is therefore straightforward to show that the columns of A are linearly [m]\Si∪qi N(Aˆ,Si) independent. Thus, (Aˆ, ) T and hence we have (Aˆ,s ) T(cid:0)m(cid:1). (cid:4) |N Si | ≤ |D i | ≤ si IV. DESIGN OF A TRANSMISSION SPACE In this section, we take first steps towards designing a privacy-preserving transmission scheme. Specifically, we design an encoding matrix, referred to as the base matrix Abase. Then, we populate the transmission space with the matrices obtained from Abase by taking all the permu- tations of its columns. Our design of Abase is based on the use of Maximum Distance Separable 9 Seg. 1 Seg. 2 Seg. k Seg. 0 A 0 0 b T ℓ b b b T ℓ k× k× 0 A 0 T ℓ b b b b T ℓ 0 k× k× T m kℓ b b b b × − b b b b b b b b 0 0 A T ℓ T ℓ b b b b k× k× Figure 1. Design of the base matrix Abase for the achievable scheme. (MDS) codes. A generator matrix of an [m,T] MDS code has the property that any T T × submatrix is full rank, i.e., any T columns are linearly independent. Such matrices promise to provide large decodable sets. To see this notice that, for a given side information set with S m T, all requests in [m] are decodable with . Therefore, if B FT×m is a |S| ≥ − \ S S ∈ L generator matrix of an [m,T] MDS code, then, for all s m T, we have Q(B,s) = m ≥ − |D | and (B,s) = m(cid:0)m−1(cid:1) = O(ms). However, this scheme might require a prohibitively large |D | s number of transmissions T, especially when m is large and s is small compared to m. To achieve high levels of privacy with T that is not that large, we next propose the design of Abase, which is based on a block-MDS as shown in Figure 1 and structured as follows: i) The columns of Abase are divided in k +1 segments, labeled as “Seg. 0 to k”, where T is a multiple of k; ii) Segments from 1 to k consist of (cid:96) columns, where (cid:96) min s + T/k, m/k , with min ≤ { (cid:98) (cid:99)} s = min s ; min i∈[n] i T×(cid:96) iii) A matrix A Fk is constructed as the generator matrix of an [(cid:96),T/k] MDS code; then, b ∈ L A is repeated k times and positioned in Abase as shown in Figure 1; b iv) The rest of Abase is filled with zeros. Note that, for any number of clients n and messages m, one can always find values of k, (cid:96) and T so that Abase satisfies all clients (e.g., k = 1,(cid:96) = s and T = n). min We now analyze the performance of our proposed Abase in terms of the sizes of its decodable sets (see Appendix B). These, by means of Remark III.2, provide upper bounds on the levels of privacy that could be attained using Abase. 10 1 1 1 T = 5 T = 5 T = 5 T = 10 T = 10 T = 10 rQ0.5 rQ,S0.5 rS0.5 01 1.5 2 2.l5 3 3.5 4 01 1.5 2 2.l5 3 3.5 4 01 1.5 2 2.l5 3 3.5 4 Figure 2. Numerical evaluation of r , r and r - m=30 and s=3. Q S Q,S Theorem IV.1. For Abase and any s [m 1], we have i ∈ −   (cid:96)−1 (cid:18) (cid:19)(cid:18) (cid:19) (cid:88) (cid:96) 1 m (cid:96) ˆ H(Qi,Si A,s[n],qn, n) logk(cid:96) − − , (4a) | S ≤ j s j i j=(cid:96)−T/k − ˆ H(Q A,s ,q , ) (k(cid:96)). (4b) i [n] n n | S ≤ where the bounds can be achieved by satisfying the uniformity conditions in Remark III.2 In the next section we study a special scenario in which we use the transmission space here proposed (i.e., populated by the matrices obtained from Abase by taking all the permutations of its columns) and we design the transmission strategy. V. TRANSMISSION STRATEGY FOR A SPECIAL CASE In the previous section, we designed a transmission space that consists of all possible matrices obtained by permuting the columns of the matrix Abase. Thus, as discussed in Section II, in order to design a transmission scheme, we need to design a transmission strategy that selects which specific matrix to use according to a probability distribution. However, designing such a transmission strategy that achieves the upper bounds in Remark III.2 is non-trivial. To get an analytical handle on the problem, we take a first step and consider a simplified model: we assume n = 2 and an eavesdropper who does not have a request. Such a scenario can model a situation where the n = 2 clients (the second of which is the eavesdropper) do not have a simultaneous request. Since only one client needs to be satisfied, then we can use our proposed encoding matrix Abase with k = T and (cid:96) min s +1, m/T , knowing that the client c can always be satisfied 1 1 ≤ { (cid:98) (cid:99)} by using the appropriate column-permutation of Abase (i.e., by ensuring that Abase is non-zero, q1 and all other columns belonging to the same segment of Abase correspond to messages in ). In q1 S1 this case, A is a row vector of arbitrary non-zero values. The following theorem (whose proof b can be found in Appendix C) then provides analytical guarantees on the attained performance of this scheme.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.