Enumeration Framework for Clique and Subgraph Analysis

1 / 54

Embed Share

Explore the output-sensitive enumeration framework for analyzing cliques, maximal cliques, and dense subgraphs. Utilizing reverse search algorithms and family tree realization techniques to efficiently identify solutions. Discover the complexities and alternative output strategies for optimal computation times.

kblak Follow

Uploaded on Jun 09, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Output Sensitive Enumeration 4. Reverse Search Clique & Others Framework Maximal Clique Pseudo Clique (Dense Subgraph) Base of Polytope

4-1. Reverse Search and Maximal Clique Enumeration

Reverse Search Avis & Fukuda 96 modified by U For every solution except for several, define its parent, so that any solution is not its proper ancestor(acyclic) The parent-child relation induces a tree (or a forest) Traverse the tree by depth-first search # of iterations is equal to # of solutions Computation time per solution is that per iteration

Realization Depth-first search on induced tree (called, family tree) no need to store the tree in the memory (or disk) Algorithm for finding all children of a parent is sufficient Particularly, it is better to have an algorithm that finds the (i+1)- th child by giving i-th child ReverseSearch (S) 1. output S 2. for eachchild S of S 3.call ReverseSearch (S ) 4. end for

Complexity each iteration = each solution (#iterations = #solutions) If finding a (next) child takes O(X) time, the computation time per iteration is O(X) ( finding one child = children enumeration time / #children ) the computation time per solution is O(X) Output polynomial if X is polynomial Space = memory usage of iterations and height of the family tree Using (find (i+1)-th child), height is eliminated O(X) delay by alternative output

Alternative Output Uno 2002 Alternative output is a technique for reducing the delay (avoid long path (going up) with no output) Suppose that an enumeration algorithm takes O(X) time in each iteration, and always outputs a solution AlternativeOutput (S) 1. if depth is even output S 2. for eachchild S of S callAlternativeOutput (S ) 3. if depth is odd output S Delay is O(X)

4-2. Maximal Clique Enumeration

Clique Enumeration Makino & Uno 04 Clique: a subgraph that is a complete graph (any two vertices are connected Finding a maximum size is NP-complete Bipartite clique enumeration is converted to clique enumeration Finding a maximal clique is easy ( O(|E|) time ) Many researches and many applications, with many models

Monotone Set of cliques is monotone, since any subset of a clique is also a clique 111 1 Backtracking works cliques 000 0 The check being a clique takes O(|E|) time, and at most |V| recursive calls 1,2,3,4 1,2,3 1,2,4 1,3,4 2,3,4 O(|V| |E|) per clique 1,3 1,4 2,3 2,4 3,4 1,2 1 3 4 2

Motivations Real-world graphs are usually sparse, thus clique sizes are small On the other hand, large cliques also exist #cliques explodes Enumeration of maximal ones looks better + the number reduces to 1/10 1/1000 + no information loss (any clique is included in some maximal) + maximal cliques are complete in some sense, and non-maximals are incomplete, thus good for modeling Clique

Difficulty on the Search Maximal cliques are tops of the mountains Impossible to move to each other, only with simple operation 111 1 No maximal near by start Backtrack doesn t work cliques Introduce more sophisticated adjacency on maximal cliques 000 0

Adjacency on Maximal Cliques C(K) := lexicographically smallest maximal clique including K (greedily add vertices from the smallest index) For maximal clique K, remove vertices iteratively, from largest index At the beginning C(K) = K, but at some point C(K) original K Define the parent P(K) of K by the maximal clique (uniquely defined) . The lexicographically smallest maximal clique (= root) has no parent P(K) is always lexicographically smaller than K the parent-child relation is acyclic, thereby induces tree

Example The parent-child relation on the left graph 1, 2 3 4 1, 3, 5 2, 4, 6, 8 1 2 7 3 10 11 5 4 6 3, 5, 7, 9, 12 7 6, 8, 10 4, 8, 11 8 9 10 11 12 11 11 12 9, 11 8, 10, 11 10, 12

Finding Children K[v] The maximal clique obtained by adding vertex v to K, remove vertices not adjacent to v, and take C() K[v] := C(K N(v) {v}) K is a child of K K = K[v]for some v K[v] for all v are sufficient to check For each K[v], we compute P(K[v]) If it is equal to K to, K[v] is a child of K All children of K can be found by at most |V| checks, thus an iteration takes O(|V| |E|) time O(|V| |E|) per maximal clique Note that C(K) and P(K) can be computed in O(|E|) time

Example The parent-child relation on the left graph The red-lines are moves by K[v] 1, 2 3 4 1, 3, 5 2, 4, 6, 8 1 2 7 3 10 11 5 4 6 3, 5, 7, 9, 12 7 6, 8, 10 4, 8, 11 8 9 10 11 12 11 11 12 9, 11 8, 10, 11 10, 12

Finding Children Quickly K[v] The maximal clique obtained by adding vertex v to K, remove vertices not adjacent to v, and take C() K[v] := C(K N(v) {v}) K is a child of K K = K[v]for some v v is adjacent to no vertex in K K[v]= C({v}) P(K[v]) is root if K root, v is adjacent to none of K K[v] is not a child We have to check only the vertices adjacent to some of K, that are at most ( +1)2

Pseudo Code for Maximal Clique EnumMaxcliq (K) 1. output K 2. for eachvertex v not in K and adjacent to a vertex of K 3. K := K[v]( = C( K N(v) v ) ) 4. if P(K ) = Kthen call EnumMaxcliq (K ) 5. end for

Computing C(K) CAND(K) the set of vertices adjacent all vertices in K To compute C(K), we add to K the minimum index among CAND(K), until CAND(K) = CAND(K v) thus computable in O( ) time ( :maximum degree) CAND(K) N(v) Repetitions (=maximum clique size) is at most , the total time is O( 2) C(K) can be computed in O( 2) time

Computing P(K) Let r(v) be #vertices in K adjacent to v r(v) = |K| addible to K Delete vertices in K from maximum index, and update r(v) for all necessary v deletion of u needs O( (u))time for update) If a vertex v satisfies r(v) = |K| after deleting u, compare v and u If v < u, C(K-{, ,v}) never include u, thus it is the parent P(K) can be computed in O( 2) time

Two Routines CompCK (K={v1, ,vk}) 1. K := K, CAND := N(v1) N(vk) 2.ifCAND = then returnK 3.v :=minimum vertex in CAND 4.K := K v, CAND := CAND N(v) 5. go to 2. CompPK (K) 1. for each vertex v, r(v) := 0 2. for each vertex v in K, for each vertex u in N(v)-K, r(u) := r(u)+1 3. remove v from K that has maximum index 4.for each vertex u in N(v), r(u) := r(u)-1 5.find minimum u among vertices r(u) = |K| 6.ifu < v or K = then returnC(K) 7. go to 3. compute r(v) use buckets and update them for this

Complexity Analysis EnumMaxcliq (K) 1. outputK 2.for each vertex v adjacent to some vertices of K 3.K := K[v] ( = C( K N(v) v ) ) 4.ifP(K ) = Kthen callEnumMaxcliq (K) 5.end for O( 2) repetitions O( ) time O( 2) time O( 2) time Taken together, each iteration takes O( 4) time

4-3. Pseudo Cliques

Considering Ambiguity Real world network data often includes missing-edges, thus communities/clusters to be found are decomposed to many small cliques Subgraphs having many edges but few missing edges would include or approximate those communities/clusters By finding pseudo cliques, we can find better incite for approximation of communities/clusters

Definition of Pseudo Clique ave. ratio of vertices adjacent to a vertex For a vertex setK, the density of K is (# of edges connecting vertices inK) (|K|-1)|K| /2 -K is a clique density is 1 -K is an independent set if density is high, K is nearly a clique maximum # of edges in S density is 0 For given , K is a pseudo clique (density of K) We want to solve the problem of enumerating all pseudo cliqus of the given graph

Existing Results Easy to find one pseudo clique two connected vertices always form a pseudo clique Finding a pseudo clique of size k is NP-complete Reducing k-clique problem by setting = 1 Approximation algorithms for maximizing the density for size k - O(|V|1/3- ) approximation algorithm - O((n/k) ) approx. if optimal solution is dense [Tokuyama el al.] - PTASif (n2) edges [Arora et al.] Many heuristic algorithms in data mining, data engineering, natural sciences

Monotonicity The set system of pseudo cliques is not a monotone set (independent set system) A missing edge can be a subset of a pseudo clique, but it is not a pseudo clique so, simple backtracking doesn t work Moreover, binary partition doesn t work in general, because of the extension hardness

Simple Binary Search A straightforward approach is branch and bound In each iteration, divide the problem into two non-empty problems by the inclusion of a vertex v1 v1 v1, v2 v1, v2 v1, v2 v1, v2 The existence of pseudo clique is NP-comp.

Proof of Extension Hardness Theorem 1 For given graph G, threshold , and vertex set U, the problem of checking the existence of a pseudo clique including U is NP-complete Proof: reducing the problem of clique of k vertices Add 2|V|2 vertices as U |V|2 -1 |V|2 = + input graph G=(V,E) only (U+ clique) is pseudo clique density increases by increase of pseudo clique size setting so that clique of size at least k induces a pseudo clique density =|V|2 -1 |V|2

Is This Really Hard? We proved NP-hardness for "very dense graphs" unclear for middle dense graph possibility for polynomial time enumeration = 1 hard easy ????? easy = 0

Reverse Search: Def. Parent v*(K) : min. deg. min. index vertex in G[K] The parent of pseudo clique K K v*(K) The parent of K K Density of K= ave. degree G[K] / (|K|-1) The parent is the removal of most "sparse" vertex from K, thus is a pseudo clique The parent is smaller than its child acyclic relation

Ex. Enumeration Tree threshold = .7 1 2 3 4 5 6 7

Finding Children A child is obtained by adding a vertex to the parent degK(v): #vertices in K adjacent to v (can be maintained in O( )time for vertex addition) K v is a child of K K v is a pseudo clique v*(K v) = v lower bound for degK(v) upper bound for degK(v) - degK(v)< min. deg. of K - degK(v) > min. deg. of K+1 K v is always a child K v never be a child min. deg. of K or+1 degK(v) next slide

Detailed Condition S(K): sequence of vertices in K in the order of (degree, index) v is a child v is the top of S(K v) top of S(K) is v*(K) v is child only ifv is adjacent to all vertices preceding to v in S(K) For each vertex, find the first "non-adjacent vertex" in S(K) This can be done in O( 2) time Computation time for one iteration is O( 2 + log |V|) ( O( k+ log |V|) if k-degenerate)

Pseudo Code for Pseudo Clique EnumPseudoClique (K) 1. output K 2. for eachvertex v not in K ,such that K v is a pseudo clique and v is the minimum index minimum degree vertex in K v 3. call EnumPseudoClique (K v) 4. end for

Implementation Code is a simple version - update |degK(vi)| at each addition adding u to K takes O(deg(u)) time - to find children, vi satisfying |K|(|K|+1) - (#edges in K) | degK(vi)| d*(K)+1 O( C d*(K)) = O(|E|) time + O(1) time for each C := #vertices vi, | degK(vi)| = d*(K), d*(K)+1 Seems to be not large for # of children

Problem Instances Pentium M 1.1GHz, 256MB memory, Cygwin,C, gcc Test instances are: - random graphs (make edge with probability p), - locally dense random graphs (vertex i is adjacent to vertices from i-k to i+k with probability 1/2 - graphs generated from real-world data (co-author graph)

Random Graphs p= 0.1, #vertices = 200 to 2000, threshold 0.8, 0.9 random graph p=0.1 random graph p=0.1 #clique time per 1M clique time clique #p-clique 0.9 time per 1M 0.9 time 0.9 #p-clique 0.8 time per 1M 0.8 time 0.8 1000000000 100000000 10000000 1000000 100000 10000 time (sec) & #cliques time (sec) & #cliques 1000 100 10 1 0.1 0.01 #vertices 200 282 400 565 800 1131 1600 2262 3200 4524 6400 Computation time linearly increase as ave. degree

Locally Dense Random Graph make edge from a vertex to its neighbors with p=0.5 #vertices 100 to 25600, threshold 0.8, 0.9 locally dense random graph 1000000000 #clique time per 1M clique 100000000 10000000 time clique 1000000 #p-clique 0.9 100000 time (sec) & #cliques time per 1M 0.9 10000 1000 time 0.9 100 #p-clique 0.8 10 time per 1M 0.8 1 0.1 time 0.8 0.01 3E+05 1000 4000 16000 64000 #vertices 10 times slower than clique enumeration computation time per one clique does not change

Randomly Generated Scale Free Graph Add vertices of degree 10 iteratively, to a clique of 10 vertices Vertices to be connected are chosen according to their current degrees 10000000 1000000 100000 #clique time per 1M clique time clique #p-clique 0.9 time per 1M 0.9 time 0.9 #p-clique 0.8 time per 1M 0.8 time 0.8 10000 1000 100 10 time & #cliques 1 0.1 0.01 1000 2000 4000 8000 16000 32000 64000 128000 256000 #vertices Computation time increases quite slowly

Real-world Instance co-author graph of academic paper database #vertices = 30,000, #edges = 125,000, scale free 1000000000 real-world data 10000000 100000 #p-clique time time per 1M time & #p-cliques 1000 10 0.1 1 1 0.9 0.98 0.95 0.93 0.88 0.85 0.83 threshold Computation time for one pseudo clique does not depend on threshold

Bottom-wideness Why good in practice? The algorithm generates several recursive calls recursion tree expands exponentially by going down computation time is dominated by the lowest levels On lower levels, small degree vertices are added fast! Long time Short time When pseudo cliques are sufficiently large (over 5?) min. degree is small on average computation time is short on average at lower levels

Weakly Accessible Set System A set system I is called weakly accessible if for any X I, there is an element e X such that X e I The parent of pseudo clique satisfies this, so the collection of pseudo cliques is weakly accessible For any weakly accessible set system, we can construct a reverse search algorithm for the enumeration of all its members

Weakly Accessible Set System A set system I is called weakly accessible if for any X I, there is an element e X such that X e I The parent of pseudo clique satisfies this, so the collection of pseudo cliques is weakly accessible For any weakly accessible set system, we can construct a reverse search algorithm for the enumeration of all its members For example, we define the parent of X I by X that e is minimum index among all those removable elements, i.e., e = min {e | e X, X e I } e such that

Pseudo Code for Weakly Accessible EnumWeaklyAccessible (X) 1. output X 2. for each element e not in X 3. if X e is a member then 4. if e = min {e | e X e, (X e) call EnumWeaklyAccessible (X e) 5. end for e I }then

4-4. Bases of Polytopes

Vertex Enumeration Vertex enumeration is the problem of outputting all vertices of the given polytope, represented by a set of linear inequalities a1x b1 a2x b2 ax b Existence of output-polynomial time enumeration algorithm is a long- standing open problem

Feasible Base Enumeration A vertex is represented by a set of d inequations (hyperplanes) (d is the dimension of the polytope) Enumerate all sets of d inequations that are corresponding to vertices of the polytope (feasible bases) (a1x b1, a2x b2) (aix bi, ajx bj) Actually, different problem; a vertex can correspond to exponentially many feasible bases

Difficulty by Binary Partition Even though linear programming can find a vertex the binary partition approach is hard For binary partition, we have to determine whether there is a vertex satisfying + inequations of S with equation + inequations of X not with equation Actually, this problem is NP-complete in general

Simplex Method Feasible bases are the base idea of simplex method; it starts from some, and it climbs the polytope to the optimal solution, by iteratively exchanging an inequality of the feasible base and an inequality outside by the exchange, we can traverse the feasible bases (completely) Actually, different problem; a vertex can correspond to exponentially many feasible bases

Parent-Child Relation Avis & Fukuda 96 The movement of simplex method, from a feasible bases to another, naturally gives a parent-child relation (move from a child to its parent) (Notice: the parent is defined as a result of computation!!!) The move is done in polynomial time, and the number of children is limited, since it exchange two inequalities Thus, output linear time The depth of the recursion can be exponential, but alternative output attains polynomial time delay

Enumeration Framework for Clique and Subgraph Analysis

Download Presentation

Presentation Transcript

Related

More Related Content