Overview of Pregel, a Large-Scale Distributed Analytics Framework

Overview of Pregel, a Large-Scale Distributed Analytics Framework
Slide Note
Embed
Share

Pregel is a powerful distributed analytics framework designed for large-scale graph processing. It offers high scalability, fault tolerance, and flexibility in expressing graph algorithms through a message-passing programming model. Inspired by the Bulk Synchronous Parallel model, Pregel operates in super-steps across distributed machines, enabling efficient graph computations.

  • Distributed systems
  • Analytics frameworks
  • Graph processing
  • Fault tolerance
  • Scalability

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Distributed Systems CS 15-440 Pregel Lecture 19, October 18, 2022 Mohammad Hammoud

  2. Today Last Session: Hadoop Today s Session: Pregel Announcements: PS4 is due on Oct 23 by midnight Quiz II is on Tuesday, Oct 25 P3 is due on Oct 27 by midnight

  3. Distributed Analytics Frameworks Pregel Architectural & Scheduling Models Introduction & Execution Model Programming Model

  4. Googles Pregel MapReduce is a good fit for a wide array of large-scale applications but ill-suited for graph processing Pregel is a large-scale graph-parallel distributed analytics framework Some Characteristics: oIn-Memory across iterations (or super-steps) oHigh scalability oAutomatic fault-tolerance oFlexibility in expressing graph algorithms oMessage-Passing programming model oTree-style, master-slave architecture oSynchronous Pregel is inspired by Valiant s Bulk Synchronous Parallel (BSP) model

  5. The BSP Model Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier 5 Super-Step 1 Super-Step 2 Super-Step 3

  6. Googles Pregel: A Birds Eye View The input to Pregel is a directed graph, which can be stored on a distributed storage layer (e.g., GFS) The input graph is partitioned (e.g., using hashpartitioning) and distributed across cluster machines Execution is pursued in super-steps and final output can be stored again in a distributed storage layer HDFS BLK HDFS BLK A Master Machine To HDFS Dataset HDFS BLK HDFS HDFS BLK

  7. Distributed Analytics Frameworks Pregel Architectural & Scheduling Models Introduction & Execution Model Programming Model

  8. Workflow and the Programming Model A user-defined function (sayF) is executed at each vertex (sayV) F can read messages sent to V in super-step S 1and send messages to other vertices, which will receive them at super-step S + 1 Machines F can modify the state of V and its outgoing edges Local Computations Communication Barrier Synchronization F can alter the topology of the graph Vertical Structure of a Super-Step Messages in F are explicitly sent/received by programmers Hence, Pregel employs a message-passing programming model

  9. When Does a Pregel Program Terminate? A Pregel program works as follows: At super-step 0, every vertex is active ONLY active vertices in any super-step perform computations A vertex deactivates itself by voting to halt It, subsequently, enters an inactive state A vertex can return to an active state if it receives an external message Vote to Halt Active Inactive Message Received Vertex State Machine A Pregel program terminates when: All vertices are inactive And, there are no messages in transit

  10. Example: Find Max Value 3 6 2 1 Blue Arrows are messages S1 Blue vertices have voted to halt 3 6 6 6 2 2 1 6 S2 6 6 6 2 6 6 6 S3 6 6 6 6 6

  11. Distributed Analytics Frameworks Pregel Architectural & Scheduling Models Introduction & Execution Model Programming Model

  12. The Architectural and Scheduling Models Pregel assumes a tree-style network topology and a master-slave architecture Core Switch Rack Switch Rack Switch Master Worker4 Worker3 Worker1 Worker5 Worker2 Push work (i.e., partitions) to all workers 12 Send Completion Signals When the master receives the completion signal from every worker in super-step S, it starts super-step S + 1

  13. Googles Pregel: Summary Aspect Google s Pregel Programming Model Message-Passing Execution Model Synchronous Architectural Model Master-Slave Scheduling Model Push-Based Suitable Applications Strongly-Connected Applications

  14. MapReduce vs. Pregel Aspect Hadoop MapReduce Google s Pregel Programming Model Shared-Based Message-Passing Execution Model Synchronous Synchronous Architectural Model Master-Slave Master-Slave Scheduling Model Pull-Based Push-Based Suitable Applications Loosly-Connected Strongly-Connected

  15. Next Class Caching Part I 15

Related


More Related Content