
Node Similarity and Collaboration Metrics
Explore various metrics and algorithms for measuring node similarity and predicting collaborations in a network. From Jaccard's Coefficient to Rooted PageRank, learn how to analyze network structures effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
x x y y
Graph distance: (Negated) length of shortest path between x and y E D B (A, C) -2 (C, D) -2 (A, E) -3 A C Common Neighbors: A and C have 2 common neighbors, more likely to collaborate E D B score (x, y) := where ?(?) denotes the neighbors of ? A C
Jaccards coefficient: same as common neighbors, adjusted for degree score (x, y) := E D Adamic / Adar: weighting rarer neighbors more heavily B A C score (x, y) :=
Probability that a new collaboration involves x is proportional to ?(x), the current neighbors of x score (x, y) :=
E D B A C is chosen to be a very small value (for dampening)
Hitting time: expected number of steps for a random walk starting at x to reach y Commute time: If y has a large stationary probability, Hx,yis small. To counterbalance, we can normalize Rooted PageRank: to cut down on long random walks, walk can return to x with a probablity at every step y
Defined by this recursive definition: two nodes are similar to the extent that they are joined by similar neighbors
Idea: Delete tenuous (sparse) edges with a clustering procedure Run predictors on the cleaned-up subgraphs
Even the best predictor (Katz) is correct on only 16% of predictions How good is that? Maybe more information about the meaning of nodes and edged is required
In sparse graphs, paths of length 3 or more help in prediction. Differentiating between different degrees is important For large dense graphs, common neighbors are enough The number of paths matters, not the length
Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3
Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3
Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3
Item 3 Item 4 Item 5 How similar are items 3 and 5? How to calculate their similarity? 2 7 ? 5 7 5 7 4 7 7 3 8 4 6 ? 7 8 3
Item 3 Item 5 Only consider users who have rated both items For each user: Calculate difference in ratings for the two items Take the average of this difference over the users 7 ? 5 5 7 7 7 8 sim(item 3, item 5) = cosine( (5, 7, 7), (5, 7, 8) ) 4 ? 7 = (5*5 + 7*7 + 7*8)/(sqrt(52+72+72)* sqrt(52+72+82)) 8 Can also use Pearson Correlation Coefficients as in user-based approaches
= ( , ) * user { ( , ) ( , ) r user item r user item sim item item 1 3 1 1 1 3 + ( , ) ( , ) r item sim item item 1 2 2 3 + ( , ) ( , ) r user item sim item item 1 4 4 3 + ( , ) ( , )} r user item sim item item 1 5 5 3 Where is a normalization factor, which is 1/[the sum of all sim(itemi,item3)].
P1 P3 P2 [1, 2, 0, , 1] +1 [0, 0, 1, , 1] -1
? = (?,?) ?:? ? ? ?(?) ??(?) ?
?(?) ??(?) arg ?(?)
??? ? s1 s2 s8 s7 BFS u s6 DFS s9 s4 s5 s3 ????? = {?1,?2?3} ????? = {?4,?5?6}
u Jure Leskovec, StanfordUniversity 3
Biased random walk S that given a node ? generates neighborhood ??? Two parameters: Return parameter ?: Return back to the previous node In-out parameter ?: Moving outwards (DFS) vs. inwards (BFS) Intuitively, ? is the ratio of BFS vs. DFS Jure Leskovec, StanfordUniversity 3
? ? ? ? s2 s3 Farther from ? w s1 u Closer to ? Jure Leskovec, StanfordUniversity 3
? 1 s2 s3 1/?, 1/?, 1 are unnormalized probabilities w 1/ ? s1 u 1/ ? ?, ?: model transition probabilities ?: return parameter ?: walk away parameter Jure Leskovec, StanfordUniversity 23
? 1/?, 1/?, 1 are unnormalized probabilities 1 s2 s3 w 1/? 1 1/? s1 s2 s3 1/ ? s1 w u 1/ ? ?, ?: model transition probabilities ?: return parameter (low = BFS) ?: walk away parameter (low = DFS) ??? are the nodes visited by the walker Unnormalized transition prob. Jure Leskovec, StanfordUniversity 23