
Power Iteration Clustering - Fast and Cost-Effective Method for Spectral Clustering
Discover how Power Iteration Clustering offers a quick and affordable alternative to traditional spectral clustering methods. Learn about its efficiency, performance results, and comparisons with other techniques like Normalized Cut and NJW. Explore the potential of PIC in optimizing clustering processes for better accuracy and runtime efficiency.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Power Iteration Clustering Frank Lin and William W. Cohen School of Computer Science, Carnegie Mellon University ICML 2010 2010-06-23, Haifa, Israel
Overview Preview Motivation Power Iteration Clustering Power Iteration Stopping Results Related Work
Overview Preview Motivation Power Iteration Clustering Power Iteration Stopping Results Related Work
Preview Spectral clustering methods are nice
Preview Spectral clustering methods are nice But they are rather expensive (slow)
Preview Spectral clustering methods are nice But they are rather expensive (slow) Power iteration clustering can provide a similar solution at a very low cost (fast)
Preview: Runtime Normalized Cut
Preview: Runtime Normalized Cut, faster implementation Normalized Cut
Preview: Runtime Normalized Cut, faster implementation Normalized Cut Pretty fast
Preview: Runtime Normalized Cut, faster implementation Normalized Cut Ran out of memory (24GB)
Preview: Accuracy Upper triangle: PIC does better
Preview: Accuracy Upper triangle: PIC does better Lower triangle: NCut or NJW does better
Overview Preview Motivation Power Iteration Clustering Power Iteration Stopping Results Related Work
k-means A well-known clustering method
k-means A well-known clustering method 3-cluster examples:
k-means A well-known clustering method 3-cluster examples:
k-means A well-known clustering method 3-cluster examples:
Spectral Clustering Instead of clustering data points in their original (Euclidean) space, cluster them in the space spanned by the significant eigenvectors of an (Laplacian) affinity matrix
Spectral Clustering Instead of clustering data points in their original (Euclidean) space, cluster them in the space spanned by the significant eigenvectors of an (Laplacian) affinity matrix Affinity matrix: a matrix A where Aij is the similarity between data points i and j.
Spectral Clustering Network = Graph = Matrix A B C D E F G H I J C A 1 1 1 B 1 1 A C 1 B G D 1 1 I E 1 H F 1 1 1 J F G 1 D H 1 1 I 1 1 1 E J 1 1
Spectral Clustering Results with Normalized Cuts:
Spectral Clustering dataset and normalized cut results 2nd eigenvector 3rd eigenvector
Spectral Clustering dataset and normalized cut results 2 cluster 3 1 2nd eigenvector value index 3rd eigenvector
Spectral Clustering dataset and normalized cut results 2 cluster 3 1 clustering space 2nd smallest eigenvector value index 3rd smallest eigenvector
Spectral Clustering A typical spectral clustering algorithm: 1. Choose k and similarity function s 2. Derive affinity matrix A from s, transform A to a (normalized) Laplacian matrix W 3. Find eigenvectors and corresponding eigenvalues of W 4. Pick the k eigenvectors of W with the smallest corresponding eigenvalues as significant eigenvectors 5. Project the data points onto the space spanned by these vectors 6. Run k-means on the projected data points
Spectral Clustering Normalized Cut algorithm (Shi & Malik 2000): 1. Choose k and similarity function s 2. Derive A from s, let W=I-D-1A, where I is the identity matrix and D is a diagonal square matrix Dii= j Aij 3. Find eigenvectors and corresponding eigenvalues of W 4. Pick the k eigenvectors of W with the 2nd to kth smallest corresponding eigenvalues as significant eigenvectors 5. Project the data points onto the space spanned by these vectors 6. Run k-means on the projected data points D
Spectral Clustering Finding eigenvectors and eigenvalues of a matrix is very slow in general: O(n3) Normalized Cut algorithm (Shi & Malik 2000): 1. Choose k and similarity function s 2. Derive A from s, let W=I-D-1A, where I is the identity matrix and D is a diagonal square matrix Dii= j Aij 3. Find eigenvectors and corresponding eigenvalues of W 4. Pick the k eigenvectors of W with the 2nd to kth smallest corresponding eigenvalues as significant eigenvectors 5. Project the data points onto the space spanned by these vectors 6. Run k-means on the projected data points D
Hmm Can we find a low-dimensional embedding for clustering, as spectral clustering, but without calculating these eigenvectors?
Overview Preview Motivation Power Iteration Clustering Power Iteration Stopping Results Related Work
The Power Iteration Or the power method, is a simple iterative method for finding the dominant eigenvector of a matrix: +1 = t t v cWv W a square matrix vt the vector at iteration t; v0 is typically a random vector c a normalizing constant to avoid vt from getting too large or too small Typically converges quickly, and is fairly efficient if W is a sparse matrix
The Power Iteration Or the power method, is a simple iterative method for finding the dominant eigenvector of a matrix: +1 = t t v cWv What if we let W=D-1A (similar to Normalized Cut)?
The Power Iteration Or the power method, is a simple iterative method for finding the dominant eigenvector of a matrix: +1 = t t v cWv What if we let W=D-1A (similar to Normalized Cut)? The short answer is that it converges to a constant vector, because the dominant eigenvector of a row-normalized matrix is always a constant vector
The Power Iteration Or the power method, is a simple iterative method for finding the dominant eigenvector of a matrix: +1 = t t v cWv What if we let W=D-1A (similar to Normalized Cut)? The short answer is that it converges to a constant vector, because the dominant eigenvector of a row-normalized matrix is always a constant vector Not very interesting. However
Power Iteration Clustering It turns out that, if there is some underlying cluster in the data, PI will quicklyconverge locally within clustersthenslowlyconverge globally to a constant vector. The locally converged vector, which is a linear combination of the top eigenvectors, will be nearly piece-wise constant with each piece corresponding to a cluster
Power Iteration Clustering colors correspond to what k-means would think to be clusters in this one-dimension embedding larger smaller
Power Iteration Clustering Recall the power iteration update: = 1 t t v v W = ... 2 2 t v W = 0 t v W = + + + t t t e e e ... c W 1 c W 2 c W e 1 1 2 2 n n = + + + t 1 t 2 t n e e ... c c c 1 2 n n
Power Iteration Clustering Recall the power iteration update: = 1 t t v v W = ... 2 2 t v W ei the eigenvector corresponding to i ci - the ith coefficient of v when projected onto the space spanned by the eigenvectors of W = 0 t v W = + + + t t t e e e ... c W 1 c W 2 c W e 1 1 2 2 n n = + + + t 1 t 2 t n e e ... c c c 1 2 n n i - the ith largest eigenvalue of W
Power Iteration Clustering Group the ci ieiterms, and define pict(a,b) to be the absolute difference between elements in the vt, where a and b corresponds to indices a and b on vt: = t ( , ) pic a b 1 k n = i = k j + + t 1 t i t e e e e e e ( ) ( ) ( ) ( ) ( ) ( ) a b c a b c a b c 1 1 1 i i i j j j j + 2
Power Iteration Clustering Group the ci ieiterms, and define pict(a,b) to be the absolute difference between elements in the vt, where a and b corresponds to indices a and b on vt: = t ( , ) pic a b 1 k n = i = k j + + t 1 t i t e e e e e e ( ) ( ) ( ) ( ) ( ) ( ) a b c a b c a b c 1 1 1 i i i j j j j + 2 The first term is 0 because the first (dominant) eigenvector is a constant vector
Power Iteration Clustering Group the ci ieiterms, and define pict(a,b) to be the absolute difference between elements in the vt, where a and b corresponds to indices a and b on vt: = t ( , ) pic a b 1 k n = i = k j + + t 1 t i t e e e e e e ( ) ( ) ( ) ( ) ( ) ( ) a b c a b c a b c 1 1 1 i i i j j j j + 2 The first term is 0 because the first (dominant) eigenvector is a constant vector As t gets bigger, the last term goes to 0 quickly
Power Iteration Clustering Group the ci ieiterms, and define pict(a,b) to be the absolute difference between elements in the vt, where a and b corresponds to indices a and b on vt: = t ( , ) pic a b 1 k n = i = k j + + t 1 t i t e e e e e e ( ) ( ) ( ) ( ) ( ) ( ) a b c a b c a b c 1 1 1 i i i j j j j + 2 The first term is 0 because the first (dominant) eigenvector is a constant vector We are left with the term that signals the cluster corresponding to eigenvectors! As t gets bigger, the last term goes to 0 quickly
Power Iteration Clustering The 2nd to kth eigenvectors of W=D-1A are roughly piece-wise constant with respect to the underlying clusters, each separating a cluster from the rest of the data (Meila & Shi 2001)
Power Iteration Clustering The 2nd to kth eigenvectors of W=D-1A are roughly piece-wise constant with respect to the underlying clusters, each separating a cluster from the rest of the data (Meila & Shi 2001) The linear combination of piece-wise constant vectors is also piece-wise constant!
Spectral Clustering dataset and normalized cut results 2 cluster 3 1 clustering space 2nd smallest eigenvector value index 3rd smallest eigenvector
Spectral Clustering dataset and normalized cut results clustering space 2nd smallest eigenvector value index 3rd smallest eigenvector
Spectral Clustering 2nd smallest eigenvector 3rd smallest eigenvector