Introduction to PageRank Algorithm in Web Graph Analysis

Introduction to PageRank Algorithm in Web Graph Analysis
Slide Note
Embed
Share

Dive into the fundamentals of the PageRank algorithm in web graph analysis, exploring how website importance is estimated based on links, probabilities of surfer arrival, matrices representing contributions, and key properties of the algorithm. Understand the significance of site connections and contributions to PageRank calculations.

  • PageRank Algorithm
  • Web Analysis
  • Website Importance
  • Graph Theory
  • Link-Based Ranking

Uploaded on Feb 19, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Reid Andersen, Christian Borgs, Jennifer Chayes, John Hopcraft, Vahab S. Mirrokni and Shang-Hua Teng Omer Rotem

  2. Introduction - PageRank The web as a graph, each website is a vertex, a directed edge ? ? represents a link from u to v

  3. Introduction - PageRank Estimate importance of website. More important websites receive more links. Estimate probability that random- click-surfer arrives at particular page

  4. Introduction PageRank, Formally A:=adjacency matrix ????:=Diagonal matrix of out-degrees of the websites (vertices) 1? ? ???? For a constant ?, PR satisfies: ???= ? 1 + 1 ? ??? ?

  5. Introduction - PageRank ???= ? 1 + 1 ? ??? ? Namely: ?? ?1 = ? + (1 ? )(?? ?2 ?? ?? ?????? + + ) ?????2 The more websites have links to website v, the more important those sites are, and the less links they have the higher v s PR is

  6. Introduction - PageRank ? = 0.15

  7. Introduction - PageRank ? = 0.1

  8. Introduction PR of v greatly depends on sites that point to v Each site contributes some to v s PR.

  9. Introduction Definitions Denote: ????: the matrix in which (?,?) entry is the contribution of u to v s PR ????(?): the vector, where the ?? entry is the contribution of v to i (the v row of PRM) ????(?): the vector, where the ?? entry is the contribution of i to v(the v column of PRM)

  10. Introduction Properties ????(?) = ? ??+ 1 ? ????(?) ? ?? is vector with 1 in the ?? entry, 0 elsewhere 1 ????= ???

  11. Introduction Supporting Set Our goal: Find set of vertices (websites) that contribute most to v s PR What is contribute the most to the PR of v ? At least ?-fraction of the PR Top k vertices

  12. Approximation of PR Contributions We say a vector ? is ?-approximation of ? = ????(?) if ? 0 and for all vertices u ? ? ? ???? ? ? ?(?) We say a vector ? is an absolute- ?-approximation of c if for all vertices u ? ? ? ? ? ?(?) We say a set is ?-contributing to v if every vertex in the set contributes at least ? ???(?) to v s PR

  13. Main Results A local algorithm that gives ? ????????????? to the contribution vector of a vertex v by examining ?(1 ?) vertices 2. A local algorithm that returns all the ?-contributing vertices and ?(1 vertices 1. ?) vertices from the ? 2-contributing

  14. Motivation: Spam Detection Consider webpage whose PageRank has increased suspiciously. Which pages contribute most to the webpage s PR? (may help us detect spam)

  15. Motivation: Spam Detection

  16. Approximation of PR Contributions Fact: ????(?) = ? ?=0 Definition: ????? ? =< ????? ,??> Combining the 2 we get: 1 ?? (????). 1 ?? ????,??> ????? ? =<? ?=0 =< ??,? ?=0 ???? is the k-step random walk distribution starting from u 1 ?? ???? ?>

  17. Approximation of PR Contributions From the previous equalities we get: ????(?) = ? ?=0 1 ??( ?? ?? ?) Expensive .

  18. Approximation of PR Contributions on SUBSETS For some subset ? ? denote: ??= ? ??? Notation abuse: ????? = ? ?=0 To further abuse notation: For non-negative vector v: 1 ??(? ?? ?) 1 ??(???? ?) ????? = ? ?=0 Note: ?????1+ ?2 = ?????1 + ????(?2)

  19. Plan We want to compute ?-absolute approximation of the contribution vector of v Number of vertices examined depends on ???? ,?,? (independent of |V|) Performs a series pushbacks: performed on a single vertex, and runtime is proportional to the in-degree of this vertex we want to place an upper bound on the number of pushbacks

  20. Approximation of PR Contributions Intuition: We know v s PR comes from its in vertices. 2 vectors p,r. p approximates ????(?), r what we have left. Each step, we add move ?-fraction of v s mass in r to p. Also, we move the rest of its mass to r s entries corresponding to its in-vertices.

  21. Approximation of PR Contributions Algorithm ApproxContributions(?,?,?,????): Output: a vector ? s.t. 0 ? ? and either: 1. ? is an ?-absolute approximation of ????(?) 2. ? 1 ???? Goal: Limit number of pushbacks P s.t. ? min ???? ,???? + 1 ??

  22. Approximation of PR Contributions Algorithm: Initialize two vectors p,r s.t. p=0, ? = ??. Pushback operation on v does the following: Want to modify p,r, increasing ? 1 while keeping ? + ????(?) = ????(?)

  23. Approximation of PR Contributions ??? ???? ? : ? = ?,? = ? except: ? ? = ? ? + ?? ? ? ? = 0 ?????? ? ? ?? ? ? = ? ? + (1 ?) ?(?) ????(?)

  24. Approximation of PR Contributions: Proof Proof: At the end of the pushback, we have: ? = ? + ?? ? ?? ? = ? ? ? ??+ 1 ? ? ? ?? We use the fact: ????(?) = ?? + 1 ? ????(?)?? = ?? + 1 ? ????(???)

  25. Approximation of PR Contributions ? = ? + ?? ? ?? ? = ? ? ? ??+ 1 ? ? ? ?? Want to show ? + ????? = ? + ????(? ) ????? = ????? ? ? ?? + ????? ? ?? = = ????? ? ? ?? + ?? ? ?? + ???? 1 ? ? ? ????= = ????? ? ? ??+ 1 ? ? ? ????+ ?? ? ?? = ????? + ? ?

  26. Approximation of PR Contributions ApproxContributions(?,?,?,????): 1. Initialize p=0, ? = ?? 2. While ? ? > ? for some vertex u: a. Pick a vertex u s.t. ? ? > ? b. pushback(u) 1 ???? halt and output ? = ? 3. Output ? = ? c. If ?

  27. Approximation of PR Contributions T:= number of pushbacks, ??,??:= state of p,t after t pushbacks. Want to show a bound on T: ?? 1 satisfies ?? 1 algorithm decided to continue Recall that ? = ? + ?? ? ??, hence ??+1 1< ????,???(?) because 1 ?? 1+ ?? ?? ? 1 min( ????? ?? 1 1 1,????)

  28. Approximation of PR Contributions In case algorithm terminates because ?? ?-absolute-approximation of ????(?): ????? ? ?, p is = ?????? ?? ? ?????? each row of M sums to 1 ?? because ?? is non-negative and

  29. Approximation of PR Contributions In order to find an ?-approximation of ????(?) given ???? we simply need to call ApproxContributes(?,?,? ???? ,???(?)) Number of pushback operations: 1 ??+ 1

  30. Finding the ?-significant contributing set call ? = ApproxContributions(?,?,? ???? ,???? ) 2. Return the vertices in ? that are at least ? ? ???(?). Clearly, we return all vertices that contribute at least ? ???(?) Number of pages not in the ?-supporting set of v and are included is at most ? ? 1. 1

  31. Almost There So far assumed ??(?) is given What if it s not ? Approximate!

  32. Conclusion We saw algorithm for finding contributing set of v Main Idea: Go backward instead of forward Keep going until approximation good enough

More Related Content