Understanding Parallel K-Means Clustering

parallel k means n.w
1 / 6
Embed
Share

Dive into the world of parallel K-Means clustering, a method that partitions data points into clusters based on their similarities. Learn about strategies for parallelizing the computation, tackling issues like false sharing, and optimizing the clustering process.

  • Clustering
  • K-Means
  • Parallelization
  • Machine Learning
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Parallel K-Means

  2. Problem description Clustering is the task of assigning a set of objects into groups (clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. k-means clustering is a method of clustering which aims to partition n data points into k clusters (n >> k) in which each observation belongs to the cluster with the nearest mean. The nearness is evaluated by a distance function

  3. Example Iterative: Stopping criterion: clusters stable, max iter reached

  4. K-means parallelization: Na ve for large dimensions (d), we can try simple parallelization of the distance function computation = split DIMENSIONS between threads fine just for shared memory systems a risk of false sharing

  5. BTW: false sharing? false sharing is a performance issue on SMP systems, where each processor has a local cache. It occurs when threads on different processors modify variables that reside on the same cache line each thread is not actually sharing access to the same variable but it s cached copy cache line is invalidated, forcing a memory update to maintain cache coherency

  6. K-means parallelization: Better split DATA (points) into threads assign points to clusters in threads update centroids after points clustered

More Related Content