ebook img

Data Streams & Communication Complexity PDF

102 Pages·2012·0.52 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Streams & Communication Complexity

Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly (cid:73) Origins in seventies but has become popular in last ten years... Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m 2/25 (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly (cid:73) Origins in seventies but has become popular in last ten years... Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... 2/25 2. Access data sequentially 3. Process each element quickly (cid:73) Origins in seventies but has become popular in last ten years... Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2/25 3. Process each element quickly (cid:73) Origins in seventies but has become popular in last ten years... Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 2/25 (cid:73) Origins in seventies but has become popular in last ten years... Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly 2/25 Data Stream Model (cid:73) Stream: m elements from universe of size n, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m (cid:73) Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, ... (cid:73) Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly (cid:73) Origins in seventies but has become popular in last ten years... 2/25 (cid:73) Theoretical Appeal: (cid:73) Easy to state problems but hard to solve. (cid:73) Links to communication complexity, compressed sensing, metric embeddings, pseudo-random generators, approximation... Why’s it become popular? (cid:73) Practical Appeal: (cid:73) Faster networks, cheaper data storage, ubiquitous data-logging results in massive amount of data to be processed. (cid:73) Applications to network monitoring, query planning, I/O efficiency for massive data, sensor networks aggregation... 3/25 Why’s it become popular? (cid:73) Practical Appeal: (cid:73) Faster networks, cheaper data storage, ubiquitous data-logging results in massive amount of data to be processed. (cid:73) Applications to network monitoring, query planning, I/O efficiency for massive data, sensor networks aggregation... (cid:73) Theoretical Appeal: (cid:73) Easy to state problems but hard to solve. (cid:73) Links to communication complexity, compressed sensing, metric embeddings, pseudo-random generators, approximation... 3/25 (cid:73) Problems: What can we approximate in sub linear space? (cid:73) Frequency moments: Fk =(cid:80)ifik. (cid:73) Max frequency: F∞ =maxifi. (cid:73) Number of distinct element: F0 =(cid:80)ifi0 (cid:73) Median: j such that f1+f2+...+fj ≈m/2 Algorithms are often randomized and guarantees will be probabilistic. (cid:73) Keep things simple: Could consider f’s being increased or decreased i but for this talk we’ll focus on unit increments. Will also assume algorithms have an unlimited store of random bits. This Lecture: Basic Numerical Statistics (cid:73) Given a stream of m elements from universe [n]={1,2,...,n}, e.g., (cid:104)x ,x ,...,x (cid:105) = (cid:104)3,5,3,7,5,4,...(cid:105) 1 2 m let f ∈Nn be the frequency vector where f is the frequency of i. i 4/25

Description:
Data Streams & Communication Complexity. Lecture 1: Simple Stream Statistics in Small Space. Andrew McGregor, UMass Amherst. 1/25
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.