R2C2: A Network Stack for Rack-scale Computers

1 / 22

Embed Share

This project delves into designing a network stack for rack-scale computers, targeting predictability and efficiency in datacenter networks. It explores direct-connected topologies, adaptive routing, congestion control, and innovative solutions for load balancing. Through concepts like broadcast efficiency and global traffic knowledge, the aim is to enhance network performance in multi-tenant clusters.

ceanade Follow

Uploaded on Mar 13, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 W16 1

Authors MSR Cambridge Systems and Networking Towards predictable datacenter networks: Oktopus [SIGCOMM 11] Datacenter for Multi-tenant services Rack-scale computers for multi-tenant EECS 582 W16 2

Rack-scale computing Building block of future datacenters High BW low latency network Direct-connected topology HP moonshot 10K $ SoC based micro-servers 100s cores, 1TB RAM, SSDs Not a single giant but array of dummies EECS 582 W16 3

Rack-scale network topology Distributed switch (each node works as switch) Path diversity load-balancing, congestion control? Fat-tree 3D torous Datacenter topology Rack-scale topology HPCs EECS 582 W16 4

Motivation Direct-connect topology in multi-tenant clusters Unpredictable workloads and network flows One routing-fits-all Thoroughput of different routing algorithms Valiant Load Balancing EECS 582 W16 5

Tornado traffic Require adaptive load balancing EECS 582 W16 6

Problem statement Given direct-connected network and unpredictable network pattern Adaptive routing and Congestion control Decouple Routing /Control plane Rack Routing Congestion Control EECS 582 W16 7

Big idea Broadcast is cheap With global traffic knowledge of active flows (allocation, topology, and routing protocol) fair sending rate calculated locally for congestion control EECS 582 W16 8

R2C2 Broadcast when flow starts Global traffic matrix Sender Node1 Flows Routing Rate alloc Node2 Flow1 2-4 10 Flow2 1-3-4 10 New flow 1-2-4 Broadcasting Node3 Node4 EECS 582 W16 9

R2C2 Broadcast when flow starts Global traffic matrix Sender Node1 Flows Routing Rate alloc Node2 Flow1 2-4 10 5 Flow2 1-3-4 10 Flow3 1-2-4 5 Broadcasting Node3 Node4 EECS 582 W16 10

R2C2 Broadcast when flow starts and finishes Global traffic matrix Sender Node1 Flows Routing Rate alloc Node2 Flow1 2-4 10 5 Flow2 1-3-4 5 Flow3 1-2-4 5 10 Broadcasting Node3 Node4 EECS 582 W16 11

Broadcast Broadcast route?: shortest-path tree for node 95% of traffic comes from < 4% flows broadcast traffic total traffic = 1.3% 16 byte EECS 582 W16 12

Rate Computation Routing protocol dictates relative rate across paths Max-min fair allocation lead to network under-utilization f1 f2 optimal Max-min fair EECS 582 W16 13

Rate Computation Routing protocol dictates relative rate across paths Max-min fair allocation lead to network under-utilization f1 f2 optimal Max-min fair EECS 582 W16 14

Routing selection Genetic algorithm with aggregate throughput fitness Adaptation period of few seconds/minutes Adaptive routing with statistical heuristic EECS 582 W16 15

Evaluation Method: Simulation validated by emulated Rack server 1. Throughput / latency 2. Broadcast overhead 3. Rate computation period 4. Centralized vs de-centralized TCP (ECMP single-path routing) as baseline PFQ(random packet spraying) as ideal case EECS 582 W16 16

Throughput/ latency R2C2 congestion control improves network load-balancing better better Long flows Short flows EECS 582 W16 17

Broadcast overhead More hops (average traffic ) = lower relative overhead EECS 582 W16 18

Rate computation interval Overhead =compute time Rate update (median) is < 10% compute time 25 ?? interval Deviation from ideal rate CPU overhead for 512-node with 1 us flow inter-arrival time EECS 582 W16 19

Decentralized design Centralized control increases network traffic How about short flows? Concurrent long flow EECS 582 W16 20

Discussion Decouple control / routing plane? But control is dictated by routing protocol Broadcast would not scale. assuming rack will not scale? Single-path TCP is unfair baseline? EECS 582 W16 21

Conclusion Maintaining global traffic matrix, optimize decision by separating routing and congestion control Simulation allows extensive parameterization of experiment EECS 582 W16 22

R2C2: A Network Stack for Rack-scale Computers

Download Presentation

Presentation Transcript

Related

More Related Content