Purnamrita Sarkar, Deepayan Chakrabarti, and Andrew W. Moore's Collaborative Research

Slide Note

This collaborative research project involves Purnamrita Sarkar from Carnegie Mellon, Deepayan Chakrabarti from Yahoo! Research, and Andrew W. Moore from Google, Inc. Their combined expertise offers a unique perspective on the subject matter, promising innovative insights and solutions in the field. With diverse backgrounds and experiences, this team is poised to make significant contributions to the academic and industrial landscapes.

weis_lip Follow

Uploaded on Feb 27, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend suggestion in Facebook Movie recommendation in Netflix

Predict link between nodes With the minimum number of hops With max common neighbors (length 2 paths) Alice Prolific common friends Less evidence 1000 followers Bob Less prolific Much more evidence 8 followers Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

Predict link between nodes With the minimum number of hops With more common neighbors (length 2 paths) With larger Adamic/Adar With more short paths (e.g. length 3 paths )

Especially if the graph is sparse How do we justify these observations? Link prediction accuracy* Random Shortest Common Neighbors Adamic/Adar Ensemble of short paths Path *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Raftery et al.s Model: Points close in this space are more likely to be connected. Unit volume universe Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. Equivalent to inferring distances in the latent space 6

Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters , r , r and D are known Higher probability of linking determines the steepness 1 radius r 7

Especially if the graph is sparse Link prediction accuracy Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

j i Pr2(i,j) = Pr(common neighbor|dij) = Pr (i, j) Pr( ~ d | ) Pr( ~ d | ) d ( P d , d | ) d d i k j k 2 ik jk ik jk ij ik jk Product of two logistic probabilities, integrated over a volume determined by dij As Logistic Step function Much easier to analyze!

Everyone has same radius r Unit volume universe j i =Number of common neighbors = Pr (i, j) A(r, r, d ) 2 ij Empirical Bernstein + = P A(r, r, d ) 1 2 ij N N V(r)=volume of radius r in D dims / 1 / 2 D D + Bounds on distance /N /N 2 1 d 2 1 r r ij V(r) V(r) 10

OPT = node closest to i MAX = node with max common neighbors with i Theorem: w.h.p dOPT dMAX dOPT + 2[ /V(1)]1/D = c1 (varN/N) + c2/(N-1) Common neighbors is an asymptotically optimal heuristic as N D=dimensionality

Node k has radius rk . i k if dik rk (Directed graph) rk captures popularity of node k Type 2: i k j Type 1: i k j k k j rj j rk i i rk ri A(ri , rj ,dij) A(rk , rk ,dij) 12

Example graph: N1 nodes of radius r1 and N2 nodes of radius r2 r1 << r2 2 ~ Bin[N2 , A(r2, r2, dij)] 1 ~ Bin[N1 , A(r1, r1, dij)] k i j Maximize Pr[ 1 , 2 | dij] = product of two binomials w(r1)E[ 1|d*] + w(r2) E[ 2|d*] = w(r1) 1 + w(r2) 2 RHS LHS d*

Jacobian A d = w(r) ij A(1 { Variance const A) Adamic/Adar Small variance Presence is more surprising const w(r) Small variance Absence is more surprising 1 r deg D 1/r r is close to max radius Real world graphs generally fall in this range

Especially if the graph is sparse Link prediction accuracy Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Common neighbors = 2 hop paths Analysis of longer paths: two components 1. Bounding E( l | dij). [ l= # l hop paths] Bounds Prl (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. l E( l | dij) Triangulation

Common neighbors = 2 hop paths Analysis of longer paths: two components 1. Bounding E( l | dij) [ l= # l hop paths] Bounds Prl (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. l E( l | dij) Bounded dependence of lon position of each node Can use McDiarmid s inequality to bound | l -E( l|dij)|

Bound dij as a function of l using McDiarmids inequality. - 1 ( ) + dij r ( 1)r g , N, For l lwe need l >> l to obtain similar bounds Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

1 Factor weak bound for Logistic d + + A e (V A) Pr (i, j) A c(r, D) , ij 1 1 4 2 2 Can be made tighter, as logistic approaches the step function.

Three key ingredients 1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. Triangle inequality holds necessary to extend to l hop paths 3. Points are spread uniformly at random Otherwise properties will depend on location as well as distance

In sparse graphs, length 3 or more paths help in prediction. Differentiating between different degrees is important Link prediction accuracy* For large dense graphs, common neighbors are enough The number of paths matters, not the length Random Shortest Common Neighbors Adamic/Adar Ensemble of short paths Path *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Link Prediction Heuristics Generative model A few properties Most likely neighbor of node i ? node b node a Compare Can justify the empirical observations We also offer some new prediction algorithms 23

Combine bounds from different radii But there might not be enough data to obtain individual bounds from each radius New sweep estimator Qr= Fraction of nodes w. radius r, which are common neighbors. 2/D c 1 E(Q ) A(r, r, d ) d 2r 1 Q V(r) N r ij ij r r Higher Qr smaller dij w.h.p

Qr= Fraction of nodes w. radius r, which are common neighbors larger Qr smaller dij w.h.p TR: = Fraction of nodes w. radius R, which are common neighbors. 1/D c 1 + T A(R, R, d ) d 2R 1 T V(R) N R ij ij R R Smaller TR large dij w.h.p

Number of common neighbors of a given radius r TR = Fraction of nodes with radius R which are common neighbors Qr = Fraction of nodes with radius r which are common neighbors Small TR large dij Large Qr small dij

Purnamrita Sarkar, Deepayan Chakrabarti, and Andrew W. Moore's Collaborative Research

Download Presentation

Presentation Transcript

Related

More Related Content