Understanding Parallel K-Means Clustering

1 / 6

Embed Share

Dive into the world of parallel K-Means clustering, a method that partitions data points into clusters based on their similarities. Learn about strategies for parallelizing the computation, tackling issues like false sharing, and optimizing the clustering process.

maga_27 Follow

Uploaded on May 30, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Parallel K-Means

Problem description Clustering is the task of assigning a set of objects into groups (clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. k-means clustering is a method of clustering which aims to partition n data points into k clusters (n >> k) in which each observation belongs to the cluster with the nearest mean. The nearness is evaluated by a distance function

Example Iterative: Stopping criterion: clusters stable, max iter reached

K-means parallelization: Na ve for large dimensions (d), we can try simple parallelization of the distance function computation = split DIMENSIONS between threads fine just for shared memory systems a risk of false sharing

BTW: false sharing? false sharing is a performance issue on SMP systems, where each processor has a local cache. It occurs when threads on different processors modify variables that reside on the same cache line each thread is not actually sharing access to the same variable but it s cached copy cache line is invalidated, forcing a memory update to maintain cache coherency