Reducing Communication in PIM-Based Graph Processing

Reducing Communication in PIM-Based Graph Processing
Slide Note
Embed
Share

This study introduces GraphP, a framework that aims to reduce communication overhead for Processing-In-Memory (PIM)-based graph processing by implementing efficient data partitioning. It addresses the challenges of high bandwidth requirements and minimal computation per vertex, offering insights into the drawbacks of existing solutions and presenting GraphP's evaluation within the ALCHEM platform.

  • Graph Processing
  • PIM
  • Data Partitioning
  • Communication Reduction
  • Efficiency

Uploaded on Apr 04, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. GraphP GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

  2. Outline Motivation Graph applications Processing-In-Memory The drawbacks of the current solution GraphP Evaluation ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  3. Graph Applications Social network analytics Recommendation system Bioinformatics ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  4. Challenges High bandwidth requirement Small amount of computation per vertex Data movement overhead comp L1 L2 L3 mem ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  5. PIM: Processing-In-Memory Idea: Computation logic inside memory Advantage: High memory bandwidth Example: Hybrid Memory Cubes (HMC) mem .. mem mem 320GB/s intra-cube mem 4x120GB/s inter-cube comp ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  6. HMC: Hybrid Memory Cubes 120 120 320 Intra-cube Bottleneck: Inter-cube communication Inter-cube Inter-group bandwidth (GB/s) ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  7. Outline Motivation Graph applications Processing-In-Memory The drawbacks of the current solution GraphP Evaluation ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  8. Current Solution: Tesseract First PIM-based graph processing architecture Programming model Vertex program Partition Based on vertex program Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in- memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  9. PageRank in Vertex Program for (v: vertices) { update = 0.85 * v.rank / v.out_degree; for (w: edges.destination) { put(w.id, function{ w.next_rank += update; }); } } barrier(); ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  10. Graph Partition hmc0 0 1 2 0 1 2 5 3 4 intra edge vertex 1 3 4 5 inter edge hmc1 comm put(w.id, function{ w.next_rank += update; }); communication = # of cross-cube edges ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  11. Drawback of Tesseract Excessive data communication Why? Data Programming Model Graph Partition Communication Tesseract ? ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  12. Outline Motivation GraphP Evaluation ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  13. GraphP Consider graph partition first. Graph Partition Source-Cut Programming model Two-phase vertex program Reduces inter-cube communication ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  14. Source-Cut Partition 0 1 0 2 1 2 hmc0 5 3 4 intra edge vertex 1 2 inter edge replica 2 5 4 3 hmc1 ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  15. Two-Phase Vertex Program for (r: replicas) { r.next_rank = 0.85 * r.next_rank / r.out_degree; } //apply updates from previous iterations 2 5 3 4 ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  16. Two-Phase Vertex Program for (v: vertices) { for (u: edges.sources) { update += u.rank; } 2 5 3 4 ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  17. Two-Phase Vertex Program for (r: replicas) { put(r.id, function { r.next_rank = update}); } 0 2 } barrier(); 5 3 4 3 4 ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  18. Benefits Strictly less data communication Enables architecture optimizations ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  19. Less Communication 2 2 2 4 5 5 4 Tesseract GraphP ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  20. Broadcast Optimization for (r: replicas) { put(r.id, function { r.next_rank = update}); } } barrier(); broadcast 4 4 4 4 ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  21. Nave Broadcast 15 point to point messages src dst dst dst dst ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  22. Hierarchical communication 3 intergroup messages src dst dst dst dst ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  23. Other Optimizations Computation/communication overlap Leveraging low-power state of SerDes Please see the paper for more details ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  24. Outline Motivation GraphP Evaluation ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  25. Evaluation Methodology Simulation Infrastructure zSim with HMC support ORION for NOC Energy modeling Configurations Same as Tesseract 16 HMCs Interconnection: Dragonfly and Mesh2D 512 CPUs Single-issue in-order cores Frequency: 1GHz ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  26. Workloads 4 graph algorithms Breadth First Search Single Source Shortest Path Weakly Connected Component PageRank 5 real-world graphs Wiki-Vote (WV) ego-Twitter (TT) Soc-Slashdot0902 (SD) Amazon0302 (AZ) ljournal-2008 (LJ) ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  27. Performance 20 <1.1x 15 data partition Speedup 1.7x 10 5 memory bandwidth 0 SOTA Tesseract DDR3 GraphP-SC GraphP-SC -BRD ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  28. Communication Amount 100% Intra-group Inter-group Normalized to Tesseract 75% 51.8% 50% 25% 48.2% 0.4% 1.7% 7.1% 7.0% 0% Tesseract GraphP-SC GraphP-SC -BRD ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  29. Energy consumption 100% 100.0% Normalized to Tesseract 75% 50% 24.9% 25% 15.9% 0% Tesseract GraphP-SC GraphP-SC -BRD ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  30. Other results Bandwidth utilization Scalability Replication overhead Please see the paper for more details ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  31. Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition Two-phase vertex program Enable additional architecture optimizations GraphP drastically reduces inter-cube communication and improves energy efficiency. ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  32. GraphP GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

  33. Workload Size & Capacity 128 GB (16 * 8GB) ~16 billion edges ~400 million edges (SNAP) ~7 billion edges (WebGraph) https://snap.stanford.edu/data/ http://law.di.unimi.it/datasets.php ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

  34. Two-phase vertex program Equivalent Expressiveness as vertex programs ALCHEM alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Related


More Related Content