A Modular Heterogeneous Stack for Deploying FPGAs and CPUs in the Data Center

A Modular Heterogeneous Stack for Deploying  FPGAs and CPUs in the Data Center
Slide Note
Embed
Share

This discusses the implementation of FPGA and CPU clusters in data centers focusing on communication architecture models. It covers advancements in FPGA technology by Microsoft and Amazon, along with the communication between accelerators and CPUs in the cluster setup. The comprehensive analysis sheds light on the evolution of reconfigurable computing and its impact on data center performance. (331 characters)

  • FPGA Clusters
  • Data Center
  • Communication Architecture
  • Reconfigurable Computing
  • CPU Accelerators (22
  • 11
  • 20
  • 21
  • 15 characters)

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A Modular Heterogeneous Stack for Deploying FPGAs and CPUs in the Data Center Nariman Eskandari, Naif Tarafdar, Daniel Ly-Ma, Paul Chow High-Performance Reconfigurable Computing Group University of Toronto February 28, 2025

  2. FPGAs in Clouds and Data Centers? Microsoft Catapult v1(2014) 10% more power, 95% throughput Catapult v2 (2017) Brainwave (2017) 2 February 28, 2025

  3. FPGAs in Clouds and Data Centers? Microsoft Catapult v1(2014) 10% more power, 95% throughput Catapult v2 (2017) Brainwave (2017) 3 Amazon AWS F1(2017) February 28, 2025

  4. Background: Heterogeneous Communication Architecture models for FPGA and CPU clusters: CPU CPU CPU CPU Network Network FPGA FPGA FPGA FPGA 4 Slave Model Peer Model February 28, 2025

  5. Background: Heterogeneous Communication Architecture models for FPGA and CPU clusters: CPU CPU CPU CPU Network Network FPGA FPGA FPGA FPGA 5 Slave Model Peer Model February 28, 2025

  6. Background: Heterogeneous Communication Architecture models for FPGA and CPU clusters: CPU CPU CPU CPU Network Network FPGA FPGA FPGA FPGA 6 Slave Model Peer Model February 28, 2025

  7. Background: Heterogeneous Communication Architecture models for FPGA and CPU clusters: CPU CPU CPU CPU Network Network FPGA FPGA FPGA FPGA 7 Slave Model Peer Model February 28, 2025

  8. Background: Heterogeneous Communication Architecture models for FPGA and CPU clusters: CPU CPU CPU CPU Network Network FPGA FPGA FPGA FPGA 8 Slave Model Peer Model Communication Peer model: No different way to communicate between Accelerators and CPUs Easier Direct connection between accelerators February 28, 2025

  9. Background: System Orchestration Request User User issues request Give user network handle of cluster Resources connected on network Network Handle of Cluster Heterogeneous Cloud Provider 9 FPGA FPGA CPU CPU Network February 28, 2025

  10. Contributions Galapagos: Rearchitected work in [FPGA 2017] to focus on modularity Previously large monolithic layer Modularity allows users to experiment with design space for heterogeneous clusters Also addressed scalability issues in [FPGA 2017] 1 0 1 Naif Tarafdar et al. Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center, FPGA 2017. February 28, 2025

  11. Contributions Galapagos: Rearchitected work in [FPGA 2017] to focus on modularity Previously large monolithic layer Modularity allows users to experiment with design space for heterogeneous clusters Also addressed scalability issues in [FPGA 2017] HUMboldt (Heterogeneous Uniform Messaging) : A Communication Layer Heterogeneous (multi-FPGA and CPU) Same high-level code for software and hardware (portable) Easy to use in order to make scalable applications for CPU/FPGA Cluster. 1 1 1 Naif Tarafdar et al. Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center, FPGA 2017. February 28, 2025

  12. Outline Galapagos HUMboldt Results Conclusion Future Work 1 2 February 28, 2025

  13. Outline Galapagos HUMboldt Results Conclusion Future Work 1 3 February 28, 2025

  14. Heterogeneous Abstraction Stack Monolithic Orchestration Layer [FPGA 2017] 1 4 February 28, 2025

  15. Heterogeneous Abstraction Stack HUMboldt 1 5 Galapagos February 28, 2025

  16. Galapagos: Middleware Layer User can define a FPGA cluster using cluster description files and AXI-Stream kernels VM FPGA 1 File Network Tool Flow 9 VM VM Kernel Kernel Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 February 27, 2019 Xilinx Update

  17. Galapagos: Middleware Layer User can define a FPGA cluster using cluster description files and AXI-Stream kernels VM FPGA 1 File Network Tool Flow 9 VM VM Kernel Kernel Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 February 27, 2019 Xilinx Update

  18. Galapagos: Middleware Layer User can define a FPGA cluster using cluster description files and AXI-Stream kernels VM FPGA 1 File Network Tool Flow 9 VM VM Kernel Kernel Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 February 27, 2019 Xilinx Update

  19. Galapagos: Middleware Layer User can define a FPGA cluster using cluster description files and AXI-Stream kernels VM FPGA 1 File Network Tool Flow 9 VM VM Kernel Kernel Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 February 27, 2019 Xilinx Update

  20. Galapagos Hypervisor CPU The Hypervisor abstracts all the I/O interfaces Limitations of the base infrastructure: Closed source (proprietary IPs) Not easily portable for other boards support (Heterogeneity) Driver PCIE 2 x D D R Application Region 2 0 Network February 28, 2025

  21. Galapagos Hypervisor CPU The Hypervisor abstracts all the I/O interfaces Limitations of the base infrastructure: Closed source (proprietary IPs) Not easily portable for other boards support (Heterogeneity) Driver PCIE 2 x D D R Application Region 2 1 Redesigned using publicly available Xilinx IPs Supports higher level of the network stack. IP layer Transport Layer (e.g. TCP) Network February 28, 2025

  22. Galapagos Hypervisor CPU The Hypervisor abstracts all the I/O interfaces Limitations of the base infrastructure: Closed source (proprietary IPs) Not easily portable for other boards support (Heterogeneity) Driver PCIE 2 x D D R Application Region 1 1 Redesigned using publicly available Xilinx IPs Supports higher level of the network stack. IP layer Transport Layer (e.g. TCP) FPGA FPG Network February 28, 2025

  23. Galapagos: Hypervisor Application Region 2 3 [FPGA 2017] Galapagos February 28, 2025

  24. Galapagos: Hypervisor Application Region 2 4 [FPGA 2017] Galapagos February 28, 2025

  25. Galapagos: Application Region Router 2 5 [FPGA 2017] Galapagos February 28, 2025

  26. Galapagos: Application Region Network Bridge 2 6 [FPGA 2017] Galapagos February 28, 2025

  27. Galapagos: Application Region Comm Bridge 2 7 [FPGA 2017] Galapagos February 28, 2025

  28. Outline Base infrastructure Galapagos HUMboldt Results 2 8 Conclusion Future Work February 28, 2025

  29. HUMboldt (Heterogeneous Uniform Messaging) Communication Layer A message passing communication layer A minimal subset of MPI Only blocking send and receives Software and Hardware library Exact same source code for both hardware and software Functional portability 2 9 February 28, 2025

  30. HUMboldt Hardware All the functions are implemented as High-Level Synthesis (HLS) functions Library for user to integrate in HLS code Functional portability Easy to use The underlying protocol is handled by Galapagos 3 0 February 28, 2025

  31. HUMboldt Software Uses standard socket programming libraries TCP and Ethernet Software kernels communicating through a mature software MPI library (MPICH) It parses the cluster description files at runtime to choose the right protocol Hardware node: HUMboldt Software nodes: MPICH 3 1 February 28, 2025

  32. System Tool Flow HUMboldt has two branches for creating the entire cluster Software kernels Hardware kernels Same code can be used for both software and hardware kernels 3 2 February 28, 2025

  33. System Tool Flow HUMboldt has two branches for creating the entire cluster Software kernels Hardware kernels Same code can be used for both software and hardware kernels 3 3 February 28, 2025

  34. System Tool Flow HUMboldt has two branches for creating the entire cluster Software kernels Hardware kernels Same code can be used for both software and hardware kernels 3 4 February 28, 2025

  35. System Tool Flow <cluster> <node> Ease of use Changing the underlying protocol <type> sw </type> <kernel> 0 </kernel> <mac_addr> ac:c4:7a:88:c0:47 </mac_addr> <ip_addr> 10.1.2.152 </ip_addr> </node> <node> <appBridge> <name> Humboldt_bridge</name> </appBridge> <board> adm-8k5-debug </board> <type> hw </type> <comm> eth </comm> <kernel> 1 </kernel> . . . <kernel> 16 </kernel> <mac_addr> fa:16:3e:55:ca:02 </mac_addr> <ip_addr> 10.1.2.101 </ip_addr> </node> </cluster> 3 5 February 28, 2025

  36. System Tool Flow <cluster> <node> Ease of use Changing the underlying protocol Changing from software to hardware <type> sw </type> <kernel> 0 </kernel> <mac_addr> ac:c4:7a:88:c0:47 </mac_addr> <ip_addr> 10.1.2.152 </ip_addr> </node> <node> <appBridge> <name> Humboldt_bridge</name> </appBridge> <board> adm-8k5-debug </board> <type> hw </type> <comm> eth </comm> <kernel> 1 </kernel> . . . <kernel> 16 </kernel> <mac_addr> fa:16:3e:55:ca:02 </mac_addr> <ip_addr> 10.1.2.101 </ip_addr> </node> </cluster> 3 6 February 28, 2025

  37. Outline Goals Contributions Previous Works Galapagos HUMboldt Results 3 7 Conclusion Future Work February 28, 2025

  38. Results: Testbed The testbed that is used is a cluster of: Intel Xeon E5-2650 (2.20 GHz) 12 physical core, 24 threads Alpha Data ADM-PCIE-8k5 Xilinx KU115 UltraScale devices 3 8 February 28, 2025

  39. Galapagos/HUMboldt Resource Utilization Abstraction Layer IP LUTs Flip-Flops BRAMs I) Hypervisor 14.4 % 9.1 % 11.8 % II) Network Bridge TCP 4.4 % 2.4 % 0.1 % III) Network Bridge Ethernet 0.1 % 0.1 % 0.1 % 3 9 IV) HUMboldt Bridge 0.1 % 0.1% 0.05 % V) Router 0.8 % 0.5 % 0.05 % Total TCP (1 + II + IV + V) Total Ethernet (1 + III + IV + V) 19.7 % 12.1 % 15.9 % 15.3 % 9.7 % 12.0 % February 28, 2025

  40. Galapagos/HUMboldt Resource Utilization Abstraction Layer IP LUTs Flip-Flops BRAMs I) Hypervisor 14.4 % 9.1 % 11.8 % II) Network Bridge TCP 4.4 % 2.4 % 0.1 % III) Network Bridge Ethernet 0.1 % 0.1 % 0.1 % 4 0 IV) HUMboldt Bridge 0.1 % 0.1% 0.05 % V) Router 0.8 % 0.5 % 0.05 % Total TCP (1 + II + IV + V) Total Ethernet (1 + III + IV + V) 19.7 % 12.1 % 15.9 % 15.3 % 9.7 % 12.0 % February 28, 2025

  41. Galapagos/HUMboldt Resource Utilization Abstraction Layer IP LUTs Flip-Flops BRAMs I) Hypervisor 14.4 % 9.1 % 11.8 % II) Network Bridge TCP 4.4 % 2.4 % 0.1 % III) Network Bridge Ethernet 0.1 % 0.1 % 0.1 % 4 1 IV) HUMboldt Bridge 0.1 % 0.1% 0.05 % V) Router 0.8 % 0.5 % 0.05 % Total TCP (1 + II + IV + V) Total Ethernet (1 + III + IV + V) 19.7 % 12.1 % 12.9 % 15.3 % 9.7 % 12.0 % February 28, 2025

  42. Galapagos/HUMboldt Resource Utilization Abstraction Layer IP LUTs Flip-Flops BRAMs I) Hypervisor 14.4 % 9.1 % 11.8 % II) Network Bridge TCP 4.4 % 2.4 % 0.1 % III) Network Bridge Ethernet 0.1 % 0.1 % 0.1 % 4 2 IV) HUMboldt Bridge 0.1 % 0.1% 0.05 % V) Router 0.8 % 0.5 % 0.05 % Total TCP (1 + II + IV + V) Total Ethernet (1 + III + IV + V) 19.7 % 12.1 % 15.9 % 15.3 % 9.7 % 12.0 % February 28, 2025

  43. Galapagos/HUMboldt Resource Utilization Abstraction Layer IP LUTs Flip-Flops BRAMs I) Hypervisor 14.4 % 9.1 % 11.8 % II) Network Bridge TCP 4.4 % 2.4 % 0.1 % III) Network Bridge Ethernet 0.1 % 0.1 % 0.1 % 4 3 IV) HUMboldt Bridge 0.1 % 0.1% 0.05 % V) Router 0.8 % 0.5 % 0.05 % Total TCP (1 + II + IV + V) Total Ethernet (1 + III + IV + V) 19.7 % 12.1 % 15.9 % 15.3 % 9.7 % 12.0 % February 28, 2025

  44. Results: Microbenchmarks FPGA HUMboldt Hardware Kernel Hardware Kernel FPGA FPGA HUMboldt Hardware Kernel Hardware Kernel FPGA CPU HUMboldt 4 4 Hardware Kernel Software Kernel CPU FPGA HUMboldt Software Kernel Hardware Kernel CPU CPU MPICH Software Kernel Software Kernel February 28, 2025

  45. Results: Throughput 4 5 Ethernet February 28, 2025

  46. Results: Throughput 4 6 Ethernet February 28, 2025

  47. Results: Throughput 4 7 TCP February 28, 2025

  48. Results: Throughput 4 8 TCP February 28, 2025

  49. Results: Latency Zero-payload packets Relevant Comparison: Microsoft Catapult V2 40 G Ethernet Lightweight transport layer 2.88 s round trip Microbenchmarks Ethernet ( s) TCP ( s) 4 9 Hardware to Hardware (same node) 0.2 0.2 Hardware to hardware (different node) 5.7 15.2 Software to Hardware 27.5 48.8 Hardware to Software 34.7 113.6 February 28, 2025

  50. Results: Latency Zero-payload packets Relevant Comparison: Microsoft Catapult V2 40 G Ethernet Lightweight transport layer 2.88 s round trip Microbenchmarks Ethernet ( s) TCP ( s) 5 0 Hardware to Hardware (same node) 0.2 0.2 Hardware to hardware (different node) 5.7 15.2 Software to Hardware 27.5 48.8 Hardware to Software 34.7 113.6 February 28, 2025

More Related Content