Clustering Methods for Exploratory Data Mining

wine clustering n.w
1 / 17
Embed
Share

Explore the world of clustering in data mining with a focus on wine data analysis. Learn about various methods like MDS, Isomap, Kmeans, Ncut, and more. Understand the motivation behind clustering and its real-world applications. Dive into the dimensions of wine data and different distance calculations to enhance object separation. Discover the importance of Mahalanobis distance and delve into 3D MDS visualization.

  • Clustering Methods
  • Data Mining
  • Wine Data Analysis
  • Distance Calculations
  • Mahalanobis Distance

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Wine Clustering Ling Lin

  2. Contents Motivation Data Dimensionality Reduction-MDS, Isomap Clustering-Kmeans, Ncut, Ratio Cut, SCC Conclustion Reference

  3. Motivation Clustering is a main task of exploratory data mining Make market Segementation, marketing strategies Document Clustering Target appropriate treatment to patients with similar response patterns Image segementation Apply clustering methods to a real data

  4. Data Wine data Source of the data set : Machine Learning Repository , University of California, Irvine. Data sample size : 14 variables and 178 observations in 3 classes : different cultivar Variables : 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline

  5. MDS Can I seperate objects better? ---> change the ways to find the distances

  6. Cityblock(L1) Distance Chebychev Distance Cosine Distance Mahalanobis Distance

  7. Distances Euclidean Distance-Straight line distance between two points. ? 2 ? = ?? ?? ?=1 City-block Distance- (L1 Distance) Sum of the distances of two points in any coordinate dimension. ? ? = ?? ?? ?=1

  8. Distances Chebychev Distance-(Chessboard Distance) The greatest distance of two points difference in any coordinate dimension. ? = max ??1 ??1, ??2 ??2, ??3 ??3 ??? ??? Cosine Distance- The cosine of the angle between two vectors ? ? ? ? = cos(?) = ?

  9. Distances Mahalanobis Distance-The dissimilarity of two vectors. S is the covariance matrix. ? ??? 1? ? ? = Euclidean Distance = c b City-block Distance = a+b a Chebychev Distance = max(a,b) = a c Cosine Distance = cos( )

  10. MDS in 3D

  11. MDS in 2D

  12. Isomap Cosine Mahalanobis

  13. Isomap Cosine Mahalanobis

  14. Kmeans Clustering Error rate = 0.03

  15. True Labeled Kmeans Clustering Clustering Comparison Normalized Cut Ratio Cut SCC

  16. Conclusion Dimensionality Reduction- Different methods for calculating distances and reducing dimension --->Wine data V X 3D MDS Cosine Distance Mahalanobis 2D MDS Cosine Distance Mahalanobis Isomap make Mahalanobis distance a better display

  17. Conclusion Clustering: Kmeans= Rcut SCC Ncut Ncut and Rcut : consider both inter and intra cluster connections. However, in this dataset, the intra cluster connections are weak.

Related


More Related Content