Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabás Póczos Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods Many of these slides are taken from • Aarti Singh, • Eric Xing, • Carlos Guetrin 2 Clustering 3 K- means clustering What is clustering? Clustering: The process of grouping a set of objects into classes of similar objects –high intra-class similarity –low inter-class similarity –It is the commonest form of unsupervised learning 4 K- means clustering What is Similarity? Hard to define! But we know it when we see it The real meaning of similarity is a philosophical question. We will take a more pragmatic approach: think in terms of a distance (rather than similarity) between random variables. 5 The K- means Clustering Problem 6 K-means Clustering Problem K -means clustering problem: Partition the n observations into K sets (K ≤ n) S = {S , S , …, S } 1 2 K such that the sets minimize the within-cluster sum of squares: K=3 7 K-means Clustering Problem K -means clustering problem: Partition the n observations into K sets (K ≤ n) S = {S , S , …, S } 1 2 K such that the sets minimize the within-cluster sum of squares: How hard is this problem? The problem is NP hard, but there are good heuristic algorithms that seem to work well in practice: • K–means algorithm • mixture of Gaussians 8 K-means Clustering Alg: Step 1 • Given n objects. • Guess the cluster centers k , k , k (They were µ ,…,µ in the previous slide) 1 2 3. 1 3 9 K-means Clustering Alg: Step 2 • Build a Voronoi diagram based on the cluster centers k , k , k 1 2 3. • Decide the class memberships of the n objects by assigning them to the nearest cluster centers k , k , k . 10 1 2 3
Description: