Clustering Methods for Exploratory Data Mining

1 / 17

Embed Share

Explore the world of clustering in data mining with a focus on wine data analysis. Learn about various methods like MDS, Isomap, Kmeans, Ncut, and more. Understand the motivation behind clustering and its real-world applications. Dive into the dimensions of wine data and different distance calculations to enhance object separation. Discover the importance of Mahalanobis distance and delve into 3D MDS visualization.

bet_we Follow

Uploaded on Mar 22, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Wine Clustering Ling Lin

Contents Motivation Data Dimensionality Reduction-MDS, Isomap Clustering-Kmeans, Ncut, Ratio Cut, SCC Conclustion Reference

Motivation Clustering is a main task of exploratory data mining Make market Segementation, marketing strategies Document Clustering Target appropriate treatment to patients with similar response patterns Image segementation Apply clustering methods to a real data

Data Wine data Source of the data set : Machine Learning Repository , University of California, Irvine. Data sample size : 14 variables and 178 observations in 3 classes : different cultivar Variables : 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline

MDS Can I seperate objects better? ---> change the ways to find the distances

Cityblock(L1) Distance Chebychev Distance Cosine Distance Mahalanobis Distance

Distances Euclidean Distance-Straight line distance between two points. ? 2 ? = ?? ?? ?=1 City-block Distance- (L1 Distance) Sum of the distances of two points in any coordinate dimension. ? ? = ?? ?? ?=1

Distances Chebychev Distance-(Chessboard Distance) The greatest distance of two points difference in any coordinate dimension. ? = max ??1 ??1, ??2 ??2, ??3 ??3 ??? ??? Cosine Distance- The cosine of the angle between two vectors ? ? ? ? = cos(?) = ?

Distances Mahalanobis Distance-The dissimilarity of two vectors. S is the covariance matrix. ? ??? 1? ? ? = Euclidean Distance = c b City-block Distance = a+b a Chebychev Distance = max(a,b) = a c Cosine Distance = cos( )

MDS in 3D

MDS in 2D

Isomap Cosine Mahalanobis

Isomap Cosine Mahalanobis

Kmeans Clustering Error rate = 0.03

True Labeled Kmeans Clustering Clustering Comparison Normalized Cut Ratio Cut SCC

Conclusion Dimensionality Reduction- Different methods for calculating distances and reducing dimension --->Wine data V X 3D MDS Cosine Distance Mahalanobis 2D MDS Cosine Distance Mahalanobis Isomap make Mahalanobis distance a better display

Conclusion Clustering: Kmeans= Rcut SCC Ncut Ncut and Rcut : consider both inter and intra cluster connections. However, in this dataset, the intra cluster connections are weak.

Clustering Methods for Exploratory Data Mining

Download Presentation

Presentation Transcript

Related

More Related Content