ebook img

PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations PDF

4.6 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations

1 PETRELS: Parallel Estimation and Tracking of Subspace by Recursive Least Squares from Partial Observations Yuejie Chi∗, Yonina C. Eldar and Robert Calderbank 2 1 0 2 l u J 6 Abstract 2 ] Many real world data sets exhibitan embeddingof low-dimensionalstructure in a high-dimensional E M manifold. Examples include images, videos and internet traffic data. It is of great significance to reduce . the storage requirements and computational complexity when the data dimension is high. Therefore t a t we consider the problem of reconstructing a data stream from a small subset of its entries, where s [ the data is assumed to lie in a low-dimensional linear subspace, possibly corrupted by noise. We 1 further consider tracking the change of the underlying subspace, which can be applied to applications v 3 such as video denoising, network monitoring and anomaly detection. Our problem can be viewed as a 5 3 sequentiallow-rankmatrixcompletionprobleminwhichthesubspaceislearnedinanon-linefashion.The 6 . proposed algorithm, dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS), 7 0 first identifies the underlying low-dimensional subspace via a recursive procedure for each row of the 2 1 subspacematrixin parallelwith discountingforpreviousobservations,andthenreconstructsthe missing : v entries via least-squares estimation if required. Numerical examples are providedfor direction-of-arrival i X estimation and matrix completion, comparing PETRELS with state of the art batch algorithms. r a Index Terms Y. Chi is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA (email: [email protected]). Y. C. Eldar is with the Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel (email: [email protected]). R. Calderbank is with the Department of Computer Science, Duke University, Durham, NC 27708, USA (email: [email protected]). Thework of Y. Chi and R.Calderbank was supported by ONR under Grant N00014-08-1-1110, byAFOSRunder Grant FA 9550-09-1-0643, and by NSF under Grants NSF CCF -0915299 and NSF CCF-1017431. The work of Y. Eldar was supported in part by the Ollendorf foundation, and by the Israel Science Foundation by Grant 170/10. July27,2012 DRAFT 2 subspace estimation and tracking, recursive least squares, matrix completion, partial observations, online algorithms I. INTRODUCTION Many real world data sets exhibit an embedding of low-dimensional structure in a high-dimensional manifold. When the embedding is assumed linear, the underlying low-dimensional structure becomes a linear subspace. Subspace Identification and Tracking (SIT) plays an important role in various signal processing tasks such as online identification of network anomalies [1], moving target localization [2], beamforming [3], and denoising [4]. Conventional SIT algorithms collect full measurements of the data stream at each time, and subsequently update the subspace estimate by utilizing the track record of the stream history in different ways [5], [6]. Recently advances in Compressed Sensing (CS) [7], [8] and in the theory of Matrix Completion (MC) [9], [10] have made it possible to infer data structure from highly incomplete observations. Compared with CS, which allows reconstruction of a single vector from only a few attributes by assuming it is sparse in a pre-determined basis or dictionary, MC allows reconstruction of a matrix from a few entries by assuming it is low-rank. A popular method to perform MC is to minimize the nuclear norm of the underlying matrix [9], [10] which requires no prior knowledge of rank, in parallel with ℓ minimization 1 for sparse recovery in CS. Other approaches include greedy algorithms such as OptSpace [11] which requires an estimate of rank for initialization. Identifying the underlying low-rank structure in MC is equivalent to subspace identification in a batch setting. When the number of observed entries is slightly larger than the subspace rank, it has been shown that with high probability, it is possible to test whether a highly incomplete vector of interest lies in a known subspace [12]. In high-dimensional problems, it might be expensive and even impossible to collect data from all dimensions.Forexampleinwirelesssensornetworks,collectingfromallsensorscontinuouslywillquickly drain the battery power. Ideally, we would prefer to only collect data from a fixed budget of sensors each time to increase the overall battery life, and still be able to identify the underlying structure. Another example is in online recommendation systems, where it is impossible to expect rating feedbacks from all users on every product. Therefore it is of growing interest to identify and track a low-dimensional subspace from highly incomplete information of a data stream in an online fashion. In this setting, the estimate of the subspace is updated and tracked across time when new observations become available with low computationalcost. The GROUSE algorithm [13] has beenproposedfor SIT from online partial observations using rank-one updates of the estimated subspace on the Grassmannian manifold. However, July27,2012 DRAFT 3 performance is limited by the existence of “barriers” in the search path [14] which result in GROUSE being trappedat a localminima. We demonstratethis behaviorthrough numericalexamplesin Section VI in the context of direction-of-arrival estimation. In this paper we further study the problem of SIT given partial observations from a data stream as in GROUSE. Our proposed algorithm is dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS). The underlying low-dimensional subspace is identified by minimizing the geometricallydiscountedsumofprojectionresidualsontheobservedentriespertimeindex,viaarecursive procedure with discounting for each row of the subspace matrix in parallel. The missing entries are then reconstructed via least-squares estimation if required. The discounting factor balances the algorithm’s ability to capture long term behavior and changes to that behavior to improve adaptivity. We also benefit fromthefactthatouroptimizationoftheestimatedsubspaceisonallthepossiblelow-ranksubspaces,not restricted to the Grassmannian manifold. We prove the convergence properties of PETRELS by revealing its connection with the well-known Projection Approximation Subspace Tracking (PAST) algorithm [5] in the full observation scenario. Finally, we provide numerical examples to measure the impact of the discount factor, estimated rank and number of observed entries. In the context of direction-of-arrival estimation we demonstrate superior performance of PETRELS over GROUSE in terms of separating close-located modes and tracking the changes in the scene. We also compare PETRELS with state of the art batch matrix completion algorithms, showing it as a competitive alternative when the subspace is fixed. The rest of the paperis organized as follows. Section II states the problem and provides backgroundin the context of matrix completion and conventional subspace tracking. Section III describes the algorithm in detail. Two extensions of PETRELS to improve robustness and reduce complexity are presented in Section IV. We discuss convergence issues of PETRELS in Section V. Section VI shows the numerical results and we conclude the paper in Section VII. II. PROBLEM STATEMENT AND RELATED WORK A. Problem Statement We consider the following problem. At each time t, a vector x ∈RM is generated as: t x = U a +n ∈ RM, (1) t t t t where the columns of U ∈ RM×rt span a low-dimensional subspace, the vector a ∈ Rrt specifies the t t linear combination of columns and is Gaussian distributed as a ∼ N(0,I ), and n is additive white t rt t July27,2012 DRAFT 4 Gaussian noise distributed as n ∼ N(0,σ2I ). The rank of the underlying subspace r is not assumed t M t known exactly and can be slowly changing over time. The entries in the vectors x can be considered t as measurements from different sensors in a sensor network, or values of different pixels from a video frame. In practice, the upper bound of the subspace rank is considered known such that r ≤ r for any t. t We assume only partial entries of the full vector x are observed, given by t y = p ⊙x = P x ∈ RM, (2) t t t t t where ⊙ denotes point-wise multiplication, P = diag[p ], p = [p ,p ,··· ,p ]T ∈ {0,1}M with t t t 1t 2t Mt p = 1 if the mth entry is observed at time t. We denote Ω = {m : p = 1} as the set of observed mt t mt entries at time t. In a random observation model, we assume the measurements are taken uniformly at random. We are interested in an online estimate of a low-rank subspace D ∈ RM×r at each time index n, n which identifies and tracks the changes in the underlying subspace, from streaming partial observations (y ,p )n . TherankoftheestimatedsubspaceD isassumedknownandfixedthroughoutthe algorithm t t t=1 n as r. In practice, we assume is the upper bound of the rank of the underlying subspace. The desired properties for the algorithm include: • Low complexity: each step of the online algorithm at time index n should be adaptive with small complexity compared to running a batch algorithm using history data; • Small storage: The online algorithm should require a storage size that does not grow with the data size; • Convergence:The subspace sequence generated by the online algorithm should converge to the true subspace U = U if it is constant under the assumption that the sampled covariance matrix of x n t converges. • Adaptivity: The online algorithm should be able to track the changes of the underlying subspace in a timely fashion. B. Conventional Subspace Identification and Tracking When x ’s are fully observed, our problem is equivalent to the classical SIT problem, which is widely t studied and has a rich literature in the signal processing community. Here we describe the Projection Approximation Subspace Tracking (PAST) algorithm in detail which is the closest to our proposed algorithm in the conventional scenario. First, consider optimizing the scalar function with respect to July27,2012 DRAFT 5 a subspace W ∈ RM×r, given by J(W) = Ekx −WWTx k2, (3) t t 2 When Ut = U is fixed over time, let Cx = E[xtxTt ] = UUT +σ2IM be the data covariance matrix. It is shown in [5] that the global minima of (3) is the only stable stationary point, which is obtained by W = UrQ, where Ur is composed of the r dominant eigenvectors of Cx, and Q ∈ Cr×r is a unitary matrix. When the data is noise-free, we can choose U = U. This motivates PAST to optimize the r following function at time n without constraining W to have orthogonal columns: n W = argmin αn−ikx −WWTx k2, (4) n t t 2 W∈RM×r i=1 X n ≈ argmin αn−ikx −WWT x k2, (5) t n−1 t 2 W∈RM×r i=1 X where the expectation in (3) is replaced by geometrically weighting the previous observations by α in (4), which is further approximated by replacing the second W by its previous estimate in (5). Based on (5), the subspace W can be found by first estimating the coefficient using the previous subspace n estimate as a = WT x , then update the subspace as n n−1 n n W = argmin αn−ikx −Wa k2. (6) n t t 2 W∈RM×r i=1 X Now let α = 1 and R = n a aT. In [15], the asymptotic dynamics of the PAST algorithm is n i=1 n n described by the Ordinary DiPfferential Equation (ODE) below and its equilibrium as t goes to infinity: R˙ = E[a˜na˜Tn]−R = WTCxW−R, W˙ = E[xn(xn −Wa˜n)T]R† = (I−WWT)CxWR†, where a˜ = WTx , R = R(t) and W = W(t) are continuous time versions of R and W , and † n n n n denotes pseudo-inverse. It is proved in [15] that as t increases, W(t) converges to the global optima, i.e. to a matrix which spans the eigenvectors of Cx corresponding to the r largest eigenvalues. In Section V we show that our proposed PETRELS algorithm becomes essentially equivalent to the PAST algorithm when all entries of the data stream are observed, and can be shown to converge globally. In particular, the PAST algorithm belongs to the class of power-based methods, together with the Oja method [16], the Novel Information Criterion (NIC) method [17] and their variations. These methods are treated under a unified framework in [18]. The estimate of the low-rank subspace W ∈ RM×r is n July27,2012 DRAFT 6 updated at time n as W = C W (DT C2W )−1/2, (7) n n n−1 n−1 n n−1 where C is the sample data covariance matrix updated from n C = α C +x xT, (8) n n n−1 n n where α is a parameter between 0 and 1. The normalization in (7) assures that the updated subspace n W is orthogonal but this normalization is not performed strictly in different algorithms. n It is shown in [18] that these power-based methods guarantee global convergence to the principal subspace spanned by eigenvectors corresponding to the r largest eigenvalues of Cx. If the entries of the data vector xt’s are fully observed, then Cn converges to Cx very fast, and this is exactly why the power-basedmethods perform verywell in practice. When the data is highly incomplete, the convergence of (8) is very slow since only a small fraction |Ω |2/n2 of entries in C are updated, where |Ω | is n n−1 n the number of observed entries at time n, making direct adoption of the above method unrealistic in the partial observation scenario. C. Matrix Completion When only partial observations are available and U = U are fixed, our problem is closely related to t the Matrix Completion (MC) problem, which has been extensively studied recently. Assume X ∈ RM×n is a low-rank matrix, P is an M ×n mask matrix with 0 at missing entries and 1 at observed entries. Let Y = P ⊙ X be the observed partial matrix, where ⊙ denotes point-wise multiplication. Matrix completion aims to solve the following problem: min rank(Z) s.t. P⊙(X−Z)= 0, (9) Z i.e. to find a low-rank matrix such that the observed entries are satisfied. This problem is combinatorially intractable due to the rank constraint. It has been shown in [9] that by replacing the rank constraint with nuclear norm minimization, (9) can be replaced by a convex optimization problem, given the following spectral-regularized MC problem: 1 min kP⊙(X−Z)k2 +µkZk , (10) Z 2 F ∗ where kZk is the nuclear norm of Z, i.e. the sum of singular values of Z, and µ > 0 is a regularization ∗ parameter. Interestingly, the solution of (10) is the same as that of (9), i.e. provides exact recovery for July27,2012 DRAFT 7 matrix completion under mild conditions [9]. The nuclear norm [19] of Z is given by 1 kZk = min kUk2 +kVk2 (11) ∗ U,V:Z=UVT 2 F F (cid:0) (cid:1) where U ∈ CM×r and V ∈ Cn×r. Substituting (11) in (10) we can rewrite the MC problem as minkP⊙(X−UV)k2 +µ kUk2 +kVk2 . (12) U,V F F F (cid:0) (cid:1) Our problem formulation can be viewed as an online way of solving the above batch-setting MC problem, where the columns of X are drawn randomly and treated as a new measurement at each time index with a fixed underlying subspace U = U. The online algorithm has potential advantages for t adaptingchangesin matrix sizeand avoiding large matrix manipulations.We will comparethe PETRELS algorithm against some of the popular MC algorithms proposed recently in Section VI. D. The GROUSE Algorithm The GROUSE algorithm [13] proposed by Balzano et. al. addresses the same problem of online identificationoflow-ranksubspacefromhighlyincompleteinformation.Itaimstominimizetheprojection residual on observed entries with respect to the underlying subspace at each time n by using previous estimate D as a warm start on the Grassmannian manifold n−1 G = {D ∈ RM×r :DTD = I }. r r The resulting algorithm is a fast rank-one update on D at each time n; convergence to a stationary n−1 point but not global optimal is guaranteed under mild conditions on the step-size. However, performance islimitedbytheexistenceof“barriers”inthesearchpath[14]whichresultinGROUSEbeingtrappedata local minima. We will compare againstGROUSE with more details in Section III-C after the formulation of our method and provide numerical comparisons in Section VI. III. THE PETRELS ALGORITHM We now describe our proposed Parallel Estimation and Tracking by REcursive Least Squares (PE- TRELS) algorithm. July27,2012 DRAFT 8 A. Objective Function We first define the function f (D) at each time t = 1,··· ,n for a fixed subspace D ∈ RM×r, which t is the total projection residual on the observed entries, f (D) = minkP (x −Da )k2, t = 1,··· ,n. (13) t a t t t 2 t Hereristherankoftheestimatedsubspace,whichisassumedknownandfixedthroughoutthealgorithm1. We aim to minimize the following loss function at each time n with respect to the underlying subspace: n D = argminF (D) = argmin λn−tf (D), (14) n n t D∈RM×r D∈RM×r t=1 X where D is the estimated subspace of rank r at time n, and the parameter 0 ≪ λ ≤ 1 discounts past n observations. To motivate the loss function in (14) we note that if U = U is not changing over time, then the RHS t of (14) is minimized to zero when D spans the subspace defined by U. If U is slowly changing, then n t λ is used to control the memory of the system and maintain tracking ability at time n. For example, by using λ → 1 the algorithm gradually loses its ability to forget the past. Fixing D, f (D) can be written as t f (D) = xT P −P D(DTP D)†DTP x , (15) t t t t t t t (cid:16) (cid:17) where † denotes matrix pseudo-inverse. Plugging this back to (14) the exact optimization problem becomes: n D = argmin λn−txT P −P D(DTP D)†DTP x , n t t t t t t D∈RM×r Xt=1 (cid:16) (cid:17) which is difficult to solve over D and requires storing all previous observations. Instead, we propose PETRELS to approximately solve this optimization problem. Before developing PETRELS we note that if there are further constraints on the coefficients a ’s, a t regularization term can be incorporated as: f (D) = min kP (Da −x )k2+βka k , (16) t at∈Rr t t t 2 t p wherep ≥ 0.Forexample,p = 1enforcesasparseconstraintona ,andp =2 enforcesanormconstraint t on a . t 1The rank may not equal the true subspace dimension. July27,2012 DRAFT 9 In (14) the discount factor λ is fixed, and the influence of past estimates decreases geometrically; a more general online objective function can be given as F (D) = λ F (D)+f (D), (17) n n n−1 n where the sequence {λ } is used to control the memory and adaptivity of the system in a more flexible n way. Algorithm 1 PETRELS for SIT from Partial Observations Input: a stream of vectors y and observed pattern P . t t Initialization: an M ×r random matrix D , and (R0 )† = δI , δ > 0 for all m = 1,··· ,M. 0 m r 1: for n= 1,2,··· do 2: an = (DTn−1PnDn−1)†DTn−1yn. 3: xˆn = Dn−1an. 4: for m = 1,··· ,M do 5: βmn = 1+λ−1aTn(Rmn−1)†an, 6: vmn = λ−1(Rmn−1)†an, 7: (Rnm)† = λ−1(Rmn−1)†+pmt(βmn)−1vmn(vmn)T, 8: dnm = dmn−1+pmn(xmn −aTndmn−1)(Rnm)†an. 9: end for 10: end for B. PETRELS The proposed PETRELS algorithm, as summarized by Algorithm 1, alternates between coefficient estimation and subspace update at each time n. In particular, the coefficient vector a is estimated by n minimizing the projection residual on the previous subspace estimate D : n−1 a =argminkP (x −D a)k2 n n n n−1 2 a∈Rr =(DT P D )†DT y , (18) n−1 n n−1 n−1 n where D is a random subspace initialization. The full vector x can then be estimated subsquently as: 0 n xˆ =D a , (19) n n−1 n and the residual error is given as r = P (x −xˆ ). n n n n July27,2012 DRAFT 10 The subspace D is then updated by minimizing n n D = argmin λn−tkP (x −Da )k2, (20) n t t t 2 D t=1 X where a , t = 1,··· ,n are estimates from (18). Comparing (20) with (14), the optimal coefficients t are substituted for the previous estimated coefficients. This results in a simpler problem for finding D . The discount factor mitigates the error propagation and compensates for the fact that we used the n previous coefficients updated rather than solving (14) directly, therefore improving the performance of the algorithm. The objective function in (20) can be equivalently decomposed into a set of smaller problems for each row of D = [dn,dn,··· ,dn ]T as n 1 2 M n dn = argmin λn−tp (x −aTd )2, (21) m mt mt t m d m t=1 X for m = 1,··· ,M. Tofind the optimaldn, we equatethe derivativeof the RHS of(21) to zero,resulting m in n n λn−tp a aT dn − λn−tp x a = 0. mt t t m mt mt t ! t=1 t=1 X X which can be rewritten as Rndn = sn, (22) m m m where Rn = n λn−tp a aT, and sn = n λn−tp x a . Therefore, dn can be found as m t=1 mt t t m t=1 mt mt t m P P dn = (Rn)†sn. (23) m m m When Rn is not invertible, this finds the least-norm solution to dn. Now we show how (22) can be m m updated recursively. First we rewrite Rn = λRn−1+p a aT, (24) m m mn n n sn = λsn−1+p x a , (25) m m mn mn n July27,2012 DRAFT

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.