Lossless Network Technologies and AFD Algorithm

lossless network lossless network n.w
1 / 15
Embed
Share

Explore the concepts of lossless network technologies such as RoCEv2, TCP, and iWARP, along with the AFD algorithm for Active Queue Management. Discover how these technologies ensure data integrity, mitigate packet loss, and manage congestion in network environments effectively.

  • Lossless Network
  • AFD Algorithm
  • RoCEv2
  • TCP
  • iWARP

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lossless Network Lossless Network Jun Liu (johnliu@cisco.com)

  2. Lossy vs Lossless Storag Storag e e Protoc Protoc ol ol Network Network Protocol Protocol Network Transport Network Transport Requirements Requirements Lossy Example: TCP, iWARP, NVMe/TCP Tolerate packet drops in the transport TCP retransmit dropped packets Traffic loss can be reduced by network iSCSI TCP Lossy Lossless NVMe -oF TCP Lossy Example: RoCE RoCE/Ro CEv2 Lossless Does not tolerate traffic drops Requires lossless network transport Application retransmits dropped packets iWARP (TCP) Lossy Lossless transport requires network to support lossless Ethernet NFS TCP Lossy In case of drop, whole block/SCSI exchange SMB RoCE/Ro CEv2 Lossless 2 iWARP Lossy

  3. RoCEv2 End-To-End Lossless Behavior Technologies used for Lossless Networks (RoCEv2): Flow control, Congestion control To preserve lossless capabilities network require PFC to be enabled for RoCEv2 traffic Traffic requires preservation of priority when transferred between Layer 2 and Layer 3 network Configure ETS 802.1Qaz ETS Packet/Flow identification follows standard practices of IP/Ethernet networks (i.e., DSCP/802.1Q) ECN Consider AFD (Cisco) Consider DPP (Cisco) 3

  4. Approximate Fair Drop (AFD) Increases Headroom, Reduces Latency Maintain throughput while minimizing buffer consumption by elephant flows keep buffer state as close to the ideal as possible state as close to the ideal as possible keep buffer Higher-bandwidth elephants higher AFD drop probability Lower-bandwidth elephants lower AFD drop probability Distinguish elephant flows from other flows Queue admission N Ideal depth exceeded? AFD set? Y N Y Buffer Non-elephants no AFD 4

  5. AFD Overview AFD is an Active Queue Management (AQM) algorithm that acts on long lived large flows (elephant flows) in case of congestion and does not impact the short flows (mice flows). As a result, improves flow based fairness, drops average queue occupancy and increases head-room for micro-bursts Drop probability depends upon the arrival rate calculation of a flow at ingress. In case of a congestion scenario, the AFD algorithm maintains the queue occupancy at the configured queue desired value by probabilistically dropping packets from the large elephant flows and not impacting small dropping packets from the large elephant flows and not impacting small mice flows. mice flows. probabilistically

  6. Elephant Trap 10 msec Mechanism to identify large volume flows Identified based on 5-tuple Elephant trap threshold is byte-count-based. When received packets in a flow exceeds the number of bytes specified by the threshold, the flow is considered an elephant flow Only elephant flows are submitted to AFD dropping algorithm. Mice flows are protected and not subject to AFD dropping Arriving data rate is measured on the ingress, and compared against a calculated fair rate on the egress port to decide dropping capability E* E* packet E* == Elephant Flow 6

  7. AFD Comparison to WRED Both AFD and WRED are Active Queue Management algorithms. In case of congestion, WRED computes a random drop probability and drops the packets indiscriminately across all the flows in a class of traffic. AFD computes drop probability based on arrival rate of incoming flows, compares it with the computed fair rate and drops the packets from the elephant flows while not impacting the mice flows. Recommended values for queue-desired for different port speeds: 10G: 150 kbytes 40G: 600 kbytes 100G: 1500 kbytes

  8. DPP Introduction Thanks to AFD, mice flows (micro-bursts) can use the large buffer headroom. But this is still within their original queue, which is typically not set a strict priority, since it holds all kinds of flows including elephant. There s also other queues that can have equal or higher priority to us. DPP dynamically classifies the first packets of a mice flow into a strict priority queue.

  9. DPP (Dynamic Packet Prioritization) Separate flows into short and long Put short flows short flows in high priority queue high priority queue Prioritized Packets Express Lane Express Lane Put long flows in low priority queue Prioritization guarantees packet order We want to prevent the drops of the mice, the incast and burst traffic Normal Lane Normal Lane Benefits: For a short burst: it s 100% prioritized For a longer flow: first few packets get prioritized. It s still useful because the first packets of a TCP flow establish the TCP connection, increase the TCP window etc. Egress Queue 9

  10. DPP looks for Any Burst: TCP, UDP, Multicast, .. A Long-lived TCP Session An Elephant Flow The elephant trap and DPP algorithm are not not tracking only TCP sessions The algorithm is 5-tuple based which means it can find TCP, UDP, Unicast and Multicast bursts A very long lived session that is quiet and then bursts will be prioritized for that burst Traffic arriving due to a link failure will be prioritized, etc One Long-lived Session Dest Source Multiple Flowlets 10

  11. Testing Case Traditional QoS is static Nexus 9K-EX provides out-of-box behavior where a 1 1- -line QoS config real time, without having to define config to statically identify large flows vs small flows. Burst traffic is protected thanks to DPP. This is working real This is working real- -time and adapts to the traffic patterns and rates. The fair share adjusts time and adapts to the traffic patterns and rates. The fair share adjusts dynamically based on incoming rate. dynamically based on incoming rate. static. Hard to come up with QoS policy. line QoS config protects bursty traffic in Sender 1 8Gbps Throttle to 7Gbps 10G Sender 2 3Gbps 40/100G 10G 10G Sender 3 Bursty traffic Receiver 10G

  12. Better Application Performance in an Incast Environment Nexus 92160 Nexus 9272Q Data Mining Workload Data Mining Workload Average Flow Completion Time Average Flow Completion Time Merchant (BCOM Dune 9GB) 200.00 180.00 Flow Completion time (msec) Flow Completion time (msec) 160.00 140.00 120.00 100.00 80.00 60.00 40.00 20.00 0.00 20 30 40 50 60 70 80 90 95 Traffic Load (% line rate) Traffic Load (% line rate) https://miercom.com/cisco-systems-speeding-applications-in-data-center-networks/ 12

  13. Better Application Performance in an Incast Environment Nexus 92160 Nexus 9272Q Merchant (BCOM Dune 9GB) Data Mining Data Mining Workload Under 100KB Flow Completion Time Under 100KB Flow Completion Time Workload 25.00 Flow Completion time (msec) Flow Completion time (msec) 20.00 15.00 10.00 5.00 0.00 20 30 40 50 60 70 80 90 95 Traffic Load (% line rate) Traffic Load (% line rate) https://miercom.com/cisco-systems-speeding-applications-in-data-center-networks/ 13

  14. Better Application Performance in an Incast Environment Nexus 92160 Nexus 9272Q Data Mining Data Mining Workload > 10MB Flow Completion Time > 10MB Flow Completion Time Workload Merchant (BCOM Dune 9GB) 2500.00 Flow Completion time (msec) Flow Completion time (msec) 2000.00 1500.00 1000.00 500.00 0.00 20 30 40 50 60 70 80 90 95 Traffic Load (% line rate) Traffic Load (% line rate) https://miercom.com/cisco-systems-speeding-applications-in-data-center-networks/ 14

  15. Thank you! Thank you!

Related


More Related Content