Understanding Clustering in Data Analysis

weighted clustering margareta ackerman n.w
1 / 64
Embed
Share

Clustering is a vital tool used across various fields like social sciences, biology, astronomy, and computer science to analyze large datasets. Despite its increasing popularity, the abstract nature of clustering and the lack of consensus on its definition pose challenges. The ambiguity in clustering, the presence of multiple reasonable clusterings, and the diverse behavior of clustering algorithms further complicate its application. Users often struggle with selecting the right clustering algorithm due to the wide variety available and the need to consider costs.

  • Clustering Analysis
  • Data Structure
  • Algorithm Selection
  • Data Science
  • Ambiguity

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Weighted Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker

  2. The Theory-Practice Gap Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science . All apply clustering to gain a first understanding of the structure of large data sets. 2

  3. The Theory-Practice Gap While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood (Wright, 1973) There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model (Kleinberg, 2002) Both statements still apply today. 3

  4. Inherent Obstacles: Clustering is ill-defined Clustering aims to assign data into groups of similar items Beyond that, there is very little consensus on the definition of clustering 4

  5. Inherent Obstacles Clustering is inherently ambiguous There may be multiple reasonable clusterings There is usually no ground truth There are many clustering algorithms with different (often implicit) objective functions Different algorithms have radically different input-output behaviour 5

  6. Differences in Input/Output Behavior of Clustering Algorithms 6

  7. Differences in Input/Output Behavior of Clustering Algorithms 7

  8. Clustering Algorithm Selection There are a wide variety of clustering algorithms, which can produce very different clusterings. How should a user decide which algorithm to use for a given application? 8

  9. Clustering Algorithm Selection Users rely on cost related considerations: running times, space usage, software purchasing costs, etc There is inadequate emphasis on input-output behaviour 9

  10. Our Framework for Algorithm Selection We propose a framework that lets a user utilize prior knowledge to select an algorithm Identify properties that distinguish between different input-output behaviour of clustering paradigms The properties should be: 1) Intuitive and user-friendly 2) Useful for distinguishing clustering algorithms 10

  11. Our Framework for Algorithm Selection In essence, our goal is to understand fundamental differences between clustering methods, and convey them formally, clearly, and as simply as possible. 11

  12. Previous Work Axiomatic perspective Impossibility Result: Kleinberg (NIPS, 2003) Consistent axioms for quality measures: Ackerman & Ben-David (NIPS, 2009) Axioms in the weighted setting: Wright (Pattern Recognition, 1973) 12

  13. Previous Work Characterizations of Single-Linkage Partitional Setting: Bosah Zehad and Ben-David (UAI, 2009) Hierarchical Setting: Jarvis and Sibson (Mathematical Taxonomy, 1981) and Carlsson and Memoli (JMLR, 2010). Characterizations of Linkage-Based Clustering Partitional Setting: Ackerman, Ben-David, and Loker (COLT, 2010). Hierarchical Setting: Ackerman & Ben-David (IJCAI, 2011). 13

  14. Previous Work Classifications of clustering methods Fischer and Van Ness (Biometrica, 1971) Ackerman, Ben-David, and Loker (NIPS, 2010) 14

  15. Whats Left To Be Done? Despite much work on clustering properties, some basic questions remain unanswered. Consider some of the most popular clustering methods: k-means, single-linkage, average-linkage, etc What are the advantages of k-means over other methods? Previous classifications are missing key properties. 15

  16. Our Contributions (at a high level) We indentify 3 fundamental categories that clearly delineate some essential differences between common clustering methods The strength of these categories lies in their simplicity. We hope this gives insight into core differences between popular clustering methods. To define these categories, we first present the weighted clustering setting. 16

  17. Outline Outline Formal framework Categories and classification A result from each category Conclusions and future work

  18. Weighted Clustering Every element is associated with a real valued weight, representing its mass or importance. Generalizes the notion of element duplication. Algorithm design, particularly design of approximation algorithms, is often done in this framework. 18

  19. Other Reasons to Add Weight: An Example Apply clustering to facility allocation, such as the placement of police stations in a new district. The distribution of stations should enable quick access to most areas in the district. Accessibility of different institutions to a station may have varying importance. The weighted setting enables a convenient method for prioritizing certain landmarks. 19

  20. Algorithms in the Weighted Clustering Setting Traditional clustering algorithms can be readily translated into the weighted setting by considering their behavior on data containing element duplicates. 20

  21. Formal Setting For a finite domain set X, a weight function w: X R+ defines the weight of every element. For a finite domain set X, a distance function d: X x X R + u {0} is the distance defined between the domain points.

  22. Formal Setting Partitional Clustering Algorithm (X,d) denotes unweighted data (w[X],d) denotes weighted data A Partitional Algorithm maps Input: (w[X],d,k) to Output: a k-partition (k-clustering) of X

  23. Formal Setting Hierarchical Clustering Algorithm A Hierarchical Algorithm maps Input: (w[X],d) to Output: dendrogram of X A dendrogram of (X,d) is a strictly binary tree whose leaves correspond to elements of X C appears in A(w[X],d) if its clusters are in the dendrogram

  24. Our Contributions We utilize the weighted framework to indentify 3 fundamental categories, describing how algorithms respond to weight. Classify traditional algorithms according to these categories Fully characterize when different algorithms react to weight 24

  25. Towards Basic Categories Range(X,d) PARTITIONAL: Range(A(X, d,k)) = {C | w s.t. C=A(w[X], d)} The set of clusterings that A outputs on (X, d) over all possible weight functions. HIERARCHICAL: Range(A(X, d)) = {D | w s.t. D=A(w[X], d)} The set of dendrograms that A outputs on (X, d) over all possible weight functions.

  26. Outline Outline Formal framework Categories and classification A result from each category Conclusions and future work

  27. Categories: Weight Robust A is weight-robust if for all (X, d), |Range(X,d)| = 1. A never responds to weight. 27

  28. Categories: Weight Sensitive A is weight-sensitive if for all (X, d),|Range(X,d)| > 1. A always responds to weight. 28

  29. Categories: Weight Considering An algorithm A is weight-considering if 1) There exists (X, d) where |Range(X,d)|=1. 2) There exists (X, d) where |Range(X,d)|>1. A responds to weight on some data sets, but not others. 29

  30. Summary of Categories Range(A(X, d)) = {C | w such that A(w[X], d) = C} Range(A(X, d)) = {D | w such that A(w[X], d) = D} Weight-robust: for all (X, d), |Range(X,d)| = 1. Weight-sensitive: for all (X, d),|Range(X,d)| > 1. Weight-considering: 1) (X, d) where |Range(X,d)|=1. 2) (X, d) where |Range(X,d)|>1. 30

  31. Outline Connecting To Applications The desired category depends on the application. In the facility allocation example above, a weight- sensitive algorithm may be preferred. In phylogeny, where sampling procedures can be highly biased, some degree of weight robustness may be desired.

  32. Classification Partitional Hierarchical Min Diameter K-center Single Linkage Complete Linkage Weight Robust K-means, k-medoids, k-median, min-sum Ward s Method Bisecting K-means Weight Sensitive Ratio Cut Average Linkage Weight Considering For the weight considering algorithms, we fully characterize when they are sensitive to weight.

  33. Outline Outline Formal framework Categories and classification A result from each category Classification of heuristics Conclusions and future work

  34. Classification Partitional Hierarchical Single Linkage Complete Linkage Weight Robust Min Diameter K-center Ward s Method Bisecting K-means Weight Sensitive K-means k-medoids k-median, min-sum Ratio Cut Average Linkage Weight Considering

  35. Zooming Into: Weight Sensitive Algorithms We show that k-means is weight-sensitive. A is weight-separable if for any data set (X, d) and subset S of X with at most k points, w so that A(w[X],d,k) separates all points of S. Fact: Every algorithm that is weight-separable is also weight-sensitive. 35

  36. K-means is Weight-Sensitive Theorem:k-means is weight-sensitive. Proof: Show that k-means is weight-separable Consider any (X,d) and S X on at least k points Increase weight of points in S until each belongs to a distinct cluster. 36

  37. Zooming Into: Weight Considering Algorithms We show that Average-Linkage is Weight Considering. Characterize the precise conditions under which it is sensitive to weight. Recall: An algorithm A is weight-considering if 1) There exists (X, d) where |Range(X,d)|=1. 2) There exists (X, d) where |Range(X,d)|>1. 37

  38. Average Linkage Average-Linkage is a hierarchical algorithm. It starts by creating a leaf for every element. It then repeatedly merges the closest clusters using the following linkage function: Average weighted distance between clusters 38

  39. Average Linkage is Weight Considering (X,d) where Range(X,d) =1: A B C D B C D A The same dendrogram is output for every weight function.

  40. Average Linkage is Weight Considering (X,d) where Range(X,d) >1: 1+ 1 1 2+2 A B C D E B C D A E 1+ 1 1 2+2 A B C D E B A C D E

  41. When is Average Linkage Sensitive to Weight? We showed that Average-Linkage is weight- considering. Can we show when it is sensitive to weight? We provide a complete characterization of when Average-Linkage is sensitive to weight, and when it is not. 41

  42. Nice Clustering A clustering is nice if every point is closer to all points within its cluster than to all other points. Nice 42

  43. Nice Clustering A clustering is nice if every point is closer to all points within its cluster than to all other points. Nice 43

  44. Nice Clustering A clustering is nice if every point is closer to all points within its cluster than to all other points. Not nice 44

  45. Characterizing When Average Linkage is Sensitive to Weight A dendrogram is nice if all of its clusterings are nice. Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram. 45

  46. Characterizing When Average Linkage is Sensitive to Weight: Proof Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram. Proof: Show that: 1) If there is a nice dendrogram for (X,d),then Average-Linkage outputs it. 2) If a clustering that is not nice appears in dendrogram AL(w[X],d) for some w, then Range(AL(X,d)) > 1. 46

  47. Characterizing When Average Linkage is Sensitive to Weight: Proof (cnt.) Lemma:If there is a nice dendrogram for (X,d),then Average-Linkage outputs it. Proof Sketch: 1) Assume that (w[X],d) has a nice dendrogram. 2) Main idea: Show that every nice clustering of the data appears in AL(w[X],d). 3) For that, we show that each cluster in a nice clustering is formed by the algorithm. 47

  48. Characterizing When Average Linkage is Sensitive to Weight: Proof (cnt.) Given a nice clustering C, it can be shown that for any clusters Ci and Cj of C,any disjoint subsets Y and Z of Ci, and any subset W of Cj, Y and Z are closer than Y and W. This implies that C appears in the dendrogram. 48

  49. Characterizing When Average Linkage Responds to Weight: Proof (cnt.) Lemma:If a clustering C that is not nice appears in AL(w[X],d), then range(X,d)>1. Proof: Since C is not nice, there exist points x, y, and z, so that x and y are belong to the same cluster in C x and z belong to difference clusters yet d(x,z) < d(x,y) If x, y and z are sufficiently heavier than all other points, then x and z will be merged before x and y, so C will not be formed. 49

  50. Characterizing When Average Linkage is Sensitive to Weight Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram. Average Linkage is robust to weight whenever there is a dendrogram of (X,d) consisting of only nice clusterings, and it is sensitive to weight otherwise. 50

Related


More Related Content