Efficient Graph Algorithms in MapReduce: Overview and Examples

design patterns for efficient graph algorithms n.w
1 / 21
Embed
Share

Learn about MapReduce, a programming model for processing large datasets efficiently, through examples. Understand the components like Mappers, Reducers, Combiners, and how they work together in parallel to analyze data in a distributed system. Explore concepts like message passing graphs and PageRank.

  • Graph Algorithms
  • MapReduce
  • Data Processing
  • Parallel Computing
  • Distributed Systems

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)

  2. What is MapReduce? Definition: Programming model and an associated implementation for processing and generating large datasets with a parallel, distributed algorithm on a cluster 2 main parts - Mapper - Reducer 2 sub parts - Combiner - Partitioner

  3. What is MapReduce? 1) Mappers applied to input 2) Combiners perform local aggregation 3) Partitioners send data to reducers 4) Reducers aggregate results Very parallelizable

  4. Example Step through the MapReduce function to return the # of times a certain word length appears in the following sentence: We should all take summer classes this year. Write and label the outputs of the mapper, combiner, and reducer (no need for a partitioner with an example this small)

  5. Example (continued) We should all take summer classes this year. Mapper- 2: We 5: should 3: all 4: take 6: summer 7: classes 4: this 4: year

  6. Example (continued) We should all take summer classes this year. Mapper- 2: We 3: all 4: take 4: this 4: year 5: should 6: summer 7: classes

  7. Example (continued) We should all take summer classes this year. Combiner- 2: [We] 3: [all] 4: [take, this, year] 5: [should] 6: [summer] 7: [classes]

  8. Example (continued) We should all take summer classes this year. Reducer- 2: 1 3: 1 4: 3 5: 1 7: 1

  9. Message Passing Graphs G = (V, E) -Graph = (Vertices, Edges) -directed graphs In-degree - how many vertices point to me Out-degree - how many vertices do I point to Metadata

  10. PageRank Definition: Google s main algorithm that works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. Assumption - Really one big popularity contest Graph Topology - physical layout of the graph - what points to what

  11. PageRank At each iteration - Computations occur at every vertex as a function of the vertex s internal state and the LOCAL graph structure - Partial results in the form of messages are passed via DIRECTED edges to each vertex s neighbors - Computations occur at every vertex based on incoming partial results, potentially altering the vertex s internal state

  12. PageRank

  13. Basic PageRank Algorithm

  14. Basic Example Say A has a link to B, B has links to C and A, C has a link to A, and D has a link to A B and C

  15. Basic Example (continued) Each page has starting rank of 0.25 PR(A) = (0.25 / L(B)) + (0.25 / L(C)) + (0.25 / L(D)) B has 2 links, C has 1 link, D has 3 links PR(A) = (0.25 / 2) + (0.25 / 1) + (0.25 / 3) PR(A) = = 0.4583

  16. Complications Need a way to deal with - Random hops - Sinks

  17. Dampening Factor Probability that at any step, the surfer will continue on as he has been 0.85 - (1 0.85) / N

  18. Dampening Factor

  19. Tying It All Together Why MapReduce - good for this type of calculation - Exploit shuffle and sort phase to aid info passing Parallelization of PageRank - Only care about local topology and dampening factor - No need to worry about entire picture - create adjacency list representation of the graph where key is id of vertex and value is vertex s structure and metadata -metadata probably include out-degree and internal state

  20. Bibliography "PageRank." Wikipedia. Wikimedia Foundation, 26 Apr. 2015. Web. 03 May 2015 "MapReduce." Wikipedia. Wikimedia Foundation, 01 May 2015. Web. 03 May 2015. Lin, Jimmy, and Michael Schatz. "Design Patterns for Efficient Graph Algorithms in MapReduce." Thesis. University of Maryland, College Park, 2010. Https://cs.wmich.edu. Web. 1 May 2015. <https://cs.wmich.edu/gupta/teaching/cs5950/sumII10cloudComputi ng/graphAlgo%20in%20mapReduce%20paper%20p78-lin.pdf>.

  21. Questions?

More Related Content