Node Similarity and Collaboration Metrics

slide1 n.w
1 / 45
Embed
Share

Explore various metrics and algorithms for measuring node similarity and predicting collaborations in a network. From Jaccard's Coefficient to Rooted PageRank, learn how to analyze network structures effectively.

  • Collaboration Metrics
  • Network Analysis
  • Node Similarity
  • Graph Algorithms
  • Predictive Modeling

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. x x y y

  2. Graph distance: (Negated) length of shortest path between x and y E D B (A, C) -2 (C, D) -2 (A, E) -3 A C Common Neighbors: A and C have 2 common neighbors, more likely to collaborate E D B score (x, y) := where ?(?) denotes the neighbors of ? A C

  3. Jaccards coefficient: same as common neighbors, adjusted for degree score (x, y) := E D Adamic / Adar: weighting rarer neighbors more heavily B A C score (x, y) :=

  4. Probability that a new collaboration involves x is proportional to ?(x), the current neighbors of x score (x, y) :=

  5. E D B A C is chosen to be a very small value (for dampening)

  6. Hitting time: expected number of steps for a random walk starting at x to reach y Commute time: If y has a large stationary probability, Hx,yis small. To counterbalance, we can normalize Rooted PageRank: to cut down on long random walks, walk can return to x with a probablity at every step y

  7. Defined by this recursive definition: two nodes are similar to the extent that they are joined by similar neighbors

  8. Idea: Delete tenuous (sparse) edges with a clustering procedure Run predictors on the cleaned-up subgraphs

  9. Even the best predictor (Katz) is correct on only 16% of predictions How good is that? Maybe more information about the meaning of nodes and edged is required

  10. In sparse graphs, paths of length 3 or more help in prediction. Differentiating between different degrees is important For large dense graphs, common neighbors are enough The number of paths matters, not the length

  11. Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3

  12. 21

  13. weights of n similar users normalizer

  14. 1 if neighbors( ) i a = ( , ) w a i 0 else

  15. Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3

  16. Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 2 7 ? 5 User 2 2 7 5 ? 4 User 3 5 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 ? 7 User 6 8 3 8 3

  17. Item 3 Item 4 Item 5 How similar are items 3 and 5? How to calculate their similarity? 2 7 ? 5 7 5 7 4 7 7 3 8 4 6 ? 7 8 3

  18. Item 3 Item 5 Only consider users who have rated both items For each user: Calculate difference in ratings for the two items Take the average of this difference over the users 7 ? 5 5 7 7 7 8 sim(item 3, item 5) = cosine( (5, 7, 7), (5, 7, 8) ) 4 ? 7 = (5*5 + 7*7 + 7*8)/(sqrt(52+72+72)* sqrt(52+72+82)) 8 Can also use Pearson Correlation Coefficients as in user-based approaches

  19. = ( , ) * user { ( , ) ( , ) r user item r user item sim item item 1 3 1 1 1 3 + ( , ) ( , ) r item sim item item 1 2 2 3 + ( , ) ( , ) r user item sim item item 1 4 4 3 + ( , ) ( , )} r user item sim item item 1 5 5 3 Where is a normalization factor, which is 1/[the sum of all sim(itemi,item3)].

  20. P1 P3 P2 [1, 2, 0, , 1] +1 [0, 0, 1, , 1] -1

  21. The Fundamental Challenge 32

  22. ? Jure Leskovec, StanfordUniversity 3

  23. ? = (?,?) ?:? ? ? ?(?) ??(?) ?

  24. ?(?) ??(?) arg ?(?)

  25. ??? ? s1 s2 s8 s7 BFS u s6 DFS s9 s4 s5 s3 ????? = {?1,?2?3} ????? = {?4,?5?6}

  26. u Jure Leskovec, StanfordUniversity 3

  27. Biased random walk S that given a node ? generates neighborhood ??? Two parameters: Return parameter ?: Return back to the previous node In-out parameter ?: Moving outwards (DFS) vs. inwards (BFS) Intuitively, ? is the ratio of BFS vs. DFS Jure Leskovec, StanfordUniversity 3

  28. ? ? ? ? s2 s3 Farther from ? w s1 u Closer to ? Jure Leskovec, StanfordUniversity 3

  29. ? 1 s2 s3 1/?, 1/?, 1 are unnormalized probabilities w 1/ ? s1 u 1/ ? ?, ?: model transition probabilities ?: return parameter ?: walk away parameter Jure Leskovec, StanfordUniversity 23

  30. ? 1/?, 1/?, 1 are unnormalized probabilities 1 s2 s3 w 1/? 1 1/? s1 s2 s3 1/ ? s1 w u 1/ ? ?, ?: model transition probabilities ?: return parameter (low = BFS) ?: walk away parameter (low = DFS) ??? are the nodes visited by the walker Unnormalized transition prob. Jure Leskovec, StanfordUniversity 23

Related


More Related Content