
Understanding Recursive Clustering Algorithms for Data Analysis
Explore the concept of recursive clustering algorithms for identifying tight-knit groups in data sets. Learn about sparse cuts, dense submatrices, and network flow techniques used in community finding. Discover how these algorithms help in efficient data clustering.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
June 2017 HIGH DENSITY CLUSTERS HIGH DENSITY CLUSTERS 1
Idea Shift Density-Based Clustering VS Center-Based. 2
Main Objective Objective: find a clustering of tight knit groups in G. 3
Outline Clustering Algorithm: Recursive Algorithm based on Sparse Cuts Finding Dense Submatrices Community Finding: Network Flow 4
Recursive Clustering-Sparse Cuts For two disjoint sets of nodes S,T, we will define: |????? ???? ? ?? ?| |????? ???????? ?? ? ?? ?| ?,? = S T ? ?,? =? ? 6
Recursive Clustering-Sparse Cuts For a set S, we will define: ? ? = ????(?) S ? ? = 3 7
Recursive Clustering-Sparse Cuts 2 3 S ? ? ?,?\? = ?? 4 1 6 11 8 7 5 10 12 9 8
Let ? be ? ? ??: ? ?,? =? |S| = 3 |W| = 6 ?? ? 4 2 ?? ?? 6 1 3 5 7 ?? ??: : ? ?,? =? |S| = |S| = 3 3 |W| = |W| = 9 9 ? 8 9 9
Recursive Clustering-Sparse Cuts Clusters(G)- List of current clusters Initiailization: Clusters(G) = {V} (One cluster- the graph) Let ? > 0 Rec_Clustering(G, ?, ????????(?)) for each cluster W in Clusters(G) do if (? ? ?.? ? ?,?\? ? |? ? | 1 2|? ? |) do ???????? ? = ???????? ? \{W} ?\S, ? Rec_Clustering(G, ?, ????????(?)) 10
Recursive Clustering-Sparse Cuts Theorem Theorem 7.9 the total number of edges between vertices in different clusters is at most ??log? 7.9 At the termination of Recursive Clustering, Recursive Clustering, 11
Dense Submatrices - Different Approach Let n data points in d-space be represented as a ? ? Matrix (We will assume that A is non negative). Example: The Document-Term matrix. Let D1 be the statement I really really like Clustering and Let D2 be the statement I love Clustering I 1 1 like 1 0 love 0 1 really 2 0 Clustering 1 1 D1 D2 13
????(?????????) ???????(?????) I ??,? ??? ?? ???????? Like 1 T= {2,4} ? = {1,2} Love 2 Really Clustering 14
Dense Submatrices Say we look at A as a bipartite graph, where one side represents Rows(A) and the other Col(A), where the edge (i , j) is given weight ??,? We want ? ????,? ??????? s.t : ????????: ? ?,? ??,? ? ?,? ? 15
Dense Submatrices ?(?,?) ? |?| First Try: ???????? (The average size in the submatrix) ?(?,?) Second Try: ???????? ? |?| ?(?,?) ? |?|, and let Define:? ?,? = ? ? = max ?,??(?,?) the density of A. 16
Dense Submatrices I Like Love Clustering D1 1 1 0 1 D2 1 0 1 1 3 3 2 3 3 2 ? ?????? = = ? ?????? = = 1 4 2 2 17
Dense Submatrices Theorem 7.10 Let A be a ? ? Matrix with entries in (0,1) then ?1(?) ?1(?) ?(?) 4log?log? ?1? Furthermore, we can find S,T such that ? ?,? singular vector 4 log ? log ?using the top 18
Dense Submatrices Special Case: Similarity of the Set ? ? ???? ??? ??????? ??? ????????? ? ? ???? ??? ?,??,??{0,1} For S subgroup of V, What does D(S,S) represent? 1 2 3 4 5 0 1 1 0 0 1 1 0 1 0 0 2 1 1 0 1 1 Let S= {3,4,5} 3 0 0 1 0 1 4 0 0 1 1 0 5 20
Dense Submatrices 1 2 3 4 5 0 1 1 0 0 1 2 1 0 1 0 0 2 1 3 1 1 0 1 1 3 4 5 0 0 1 0 1 4 S= Green ? ?= ? ? ?,? = 0 0 1 1 0 5 21
Community Finding- Similarity of the Set Similarity of the Set Goal: Find the subgraph with maximum average degree in graph G. 22
Community Finding Let G=(V,E) a weighted graph. We define: ? ?,? = ??,? ???,??? Where S,T are two sets of nodes. The density of S will be: ?(?,?) |?| ? ?,? = What are we looking for in terms of density? 23
Flow technique Sub-Problem Let ? > 0,Find a subgraph with ??????? ?????? of at least ?. (Or claim it does not exist!) S= Green ? ?= ? ? ?,? = 24
Flow technique u ? (u,v) 1 v ? 1 s t ? (v,w) w 1 ? x (w,x) edges vertices What kinds of Cuts exist in H? 25
Type 1 Cut u ? (u,v) 1 v ? 1 s t ? (v,w) w 1 ? x (w,x) edges vertices C(S,T) = |E| 26
Type 2 Cut u ? (u,v) 1 v ? 1 s t ? (v,w) w 1 ? x (w,x) edges vertices C(S,T) = ?|V| 27
Type 3 Cut u ? (u,v) 1 v ? 1 s t ? (v,w) w 1 ? x (w,x) edges vertices C(S,T) = ? ??+ ??? 28
Flow technique Theorem: ? ? ???? ??????? ? ??????? ??? ?? ?? ???? ? 29
Flow technique Algorithm: ? ?+(? ?) ? Start with ? = Build Network, and run MaxFlow If we get Type 3 Cut: Look for bigger ? Else: Look for a smaller ? ? ?+ ( Complexity: log ?? ? ) ? (???? ?? ???????) 30
Flow technique - Questions When do we stop? ??? ?1??? ?2?? ??? ????????? ?????? ?? ? ? ?????? ?,then: ??1 ??1 ??2 ??2=??1??2 ??2??1 1. ?1 ?2= ??1??2 2. ??1??2 ?2 3. ??1??2 ??2??1> 0 (different stages of algorithm) and whole, ??1??2 ??2??1 1 ? ?? ?? ?? 31