Understanding PageRank and HITS Algorithms

mathematical view of pagerank hits n.w
1 / 44
Embed
Share

Explore the mathematical view of PageRank and HITS algorithms, covering concepts like random surfing, Markov random walks, and linear algebra calculations for ranking web pages based on hyperlink information. Discover the challenges and solutions in link analysis for web page popularity.

  • PageRank
  • HITS
  • Algorithms
  • Web Ranking
  • Link Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Mathematical View of PageRank & HITs Apr 6, 2012 Heegook Jun

  2. Outline Introduction Google s PageRank HITs Conclusion Discussion 2

  3. Introduction Idea: mine hyperlink information in the Web Assumptions Links often connect related pages A link between pages is a recommendation Query-independent ordering Using link counts as simple measures of popularity 3

  4. PageRank PageRank Markov Random Walk Random Surfing PageRank query Assumption Link 4

  5. PageRank Algorithm Imagine a browser doing a random walk on web pages: Start at a random page At each step, follow one of the n links on that page, each with 1/n probability Do this repeatedly. Use the long-term visit rate as the page s score 5

  6. PageRank PageRank 1 ?? ??= ? ?? ? ? r1= 10 r4= 1 2 10 + 1 3 30 + 1 2 40 r2= 30 r3= 40 6

  7. PageRank: Linear Algebra P = (1 ??) j 1 2 3 4 1 2 i 1 1 0 0 0 1 1 1 2 0 0 1 2 0 0 0 0 0 1 2 3 0 0 4 4 3 7

  8. PageRank: Linear Algebra 8

  9. PageRank: Linear Algebra r = PT y i j 9

  10. PageRank: Linear Algebra r = (P P)T y i j 10

  11. PageRank: Linear Algebra r = (P P P)T y i j 11

  12. PageRank: Linear Algebra For all page s rank: y = (0, 0, 1, .., 0, 0) u = (1, 1, 1, .. , 1, 1) Summation r = I u + PT u + ( P P )T u + ( P P P )T u By Von Neumann Diffusion Kernel r = ( (I P)T)-1 u Recursive expression: ri= PT ri-1 12

  13. Rank Sink Problem Unreachable! j i 13

  14. Rank Sink Problem There is no out edge! j i 14

  15. Random Surfer Model The random surfer simply keeps clicking on successive links at random. 15

  16. Random Surfer Model If a real web surfer ever gets into a small loop of web pages, it is unlikely that the surfer will continue in the loop forever. j i 16

  17. Random Surfer Model Instead, the surfer will jump to some other page. j i i 17

  18. Random Surfer Model Apply random surfer model Damping factor 0 < 1 ( google uses = 0.85 ) ri= PTri-1 ri= PTri-1 + ( 1 ) ri-1 18

  19. Example U = { 'http://www.aaa.com , 'http://www.bbb.com , 'http://www.ccc.com , 'http://www.ddd.com , 'http://www.eee.com , 'http://www.fff.com } d a b c f e 19

  20. Example P d a i/ j a b 1 2 c d e f 1 2 a 0 0 0 0 1 2 1 2 1 3 b b 0 0 0 0 1 3 1 3 c 0 0 0 1 1 d 0 0 0 0 0 c e 0 0 0 0 0 0 1 1 f 0 0 0 0 0 f e 20

  21. Example ri= PTri-1 + ( 1 ) ri-1 ri= 0.85 PTri-1 + 0.15 ri-1 21

  22. Example ri= PTri-1 + ( 1 ) ri-1 ri= 0.85 PTri-1 + 0.15 ri-1 22

  23. Example ri= PTri-1 + ( 1 ) ri-1 ri= 0.85 PTri-1 + 0.15 ri-1 23

  24. Example PageRank 0.35 a b c d e f 0.3498 0.1807 0.0934 0.1256 0.0322 0.2129 0.3 0.25 0.2 0.15 0.1 0.05 0 a b c d e f ??= ? ? ? 24

  25. HITs (Hyperlink Induced Topic Search) PAGERANK HITS 25

  26. HITs (Hyperlink Induced Topic Search) Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg, Journal of ACM, 1999 Uses hubs and authorities to define a recursive relationship between web pages An authority is a page that many hubs link to A hub is a page that links to many authoriites 26

  27. HITs (Hyperlink Induced Topic Search) The basic idea: A page is a good authoritative page with respect to a given query if it is referenced (i.e., pointed to) by many (good hub) pages that are related to the query. A page is a good hub page with respect to a given query if it point s to many good authoritative pages with respect to the query. Good authoritative pages (authorities) and good hub pages (hubs) reinforce each other. 27

  28. Example: Authority B A Authority 28

  29. Example: Hub A Hub B 29

  30. HITs Algorithm Operation I: for each page p: q1 a(p) = q: (q, p) E h(q) q2 p q3 Hubs Authority 30

  31. HITs Algorithm Operation O: for each page p: q1 h(p) = q: (p, q) E a(q) p q2 q3 Hubs Authority 31

  32. HITs: Linear Algebra Matrix representation of operations I and O A: Adjacency matrix of Subgraph hi: vector of hub scores after i iterations ai: the vector of authority scores after i iterations Operation I : ai= AThi-1 = AT A ai-1 = ( AT A )i a0 Operation O: hi= A ai = A AT hi-1 = ( A AT )i h0 32

  33. HITs: Linear Algebra Authority Score : ai= ( AT A ) ai-1 Hub Score : hi= ( A AT ) hi-1 After each iteration of applying operations, normalize all authority and hub scores ?( ? ) ?( ? ) ? ? = ? ? = ? ?? ? ? ?? ? ? ? Repeat until the scores for each page converge the convergence is guaranteed 33

  34. HITs: Linear Algebra Authority Score ?( ??) ?? ? ?(?? ?) ? ?? = = ? ?? ? ? Hub Score ?( ??) ? ?? ? ?? = = ?(?? ?) ? ?? ? ? 34

  35. Example d a i/ j a b c d e f a 0 0 0 0 1 1 b b 0 0 1 1 0 0 c 0 0 0 1 1 1 d 0 0 0 0 0 1 c e 0 0 0 0 0 0 f 0 0 0 0 0 1 f e 35

  36. Example Hub Score Authority Score ?( ??) ?( ??) ?? ? ?(?? ?) ? ?? = = ? ?? ?(?? ?) ? ?? = = ? ? ?? ? ? ? ?? ? 36

  37. Example Hub Score Authority Score ?( ??) ?( ??) ?? ? ?(?? ?) ? ?? = = ? ?? ?(?? ?) ? ?? = = ? ? ?? ? ? ? ?? ? 37

  38. Example Authority 0.7 a b c d e f 0.0001 0.2041 0.2041 0.6124 0.4082 0.6124 0.6 0.5 0.4 0.3 0.2 0.1 0 a b c d e f 38

  39. Example Hub 1 a b c d e f 0.4082 0.4082 0.8165 0.0001 0.0000 0.0001 0.8 0.6 0.4 0.2 0 a b c d e f 39

  40. Example PageRank 0.4 0.2 d a 0 a b c d e f b Authority 1 0.5 0 c a b c d e f Hub 1 f e 0.5 0 a b c d e f 40

  41. Conclusion PageRank HITs Hyperlink Ranking Method PageRank Markov Random Walk Random Surfing HITs Authority Hub Link Linear Algebra 41

  42. Discussion Strong point geometry . Weak point Rank Link 42

  43. References Jon M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, Journal of ACM, 1999 Ayman Farahat Et al, Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization, SIAM J. Scientific Computing, 2006 Wikipedia, http://en.wikipedia.org/wiki/HITS_algorithm#Normalization The Stanford Natural Language Processing Group, http://nlp.stanford.edu/IR- book/html/htmledition/hubs-and-authorities-1.html Dept of Mathematics, Cornell Univ, http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture4/lectur e4.html 43

  44. Thank you

More Related Content