Fine-grained TCP Retransmissions for Datacenter Communication

safe and effective fine grained n.w
1 / 11
Embed
Share

"Learn about fine-grained TCP retransmissions for efficient datacenter communication, addressing issues like TCP throughput collapse and incast occurrences in high-bandwidth, low-latency networks."

  • Datacenter Communication
  • TCP Retransmissions
  • High-bandwidth Networks
  • Low-latency
  • Datacenter Networking

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication Presented by Daehyeok Kim Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas Inc. 6/9/2025 1

  2. TCP request-response in Datacenter High-bandwidth / Low-latency links Small switch buffer Barrier-synchronized workloads Server 1 Client Server 2 Server 3 RTOmin = 200 ms Server 4 TCP timeout *Animation from Mohammad Alizadeh s lecture slide 6/9/2025 2

  3. Preconditions for TCP Incast High-bandwidth, low-latency networks with small switch buffer. Clients issuing barrier-synchronized requests in parallel. Servers returning a relatively small amount of data per request. Root cause of incast: Imbalance between low link latency (us) and RTO (ms) 6/9/2025 3

  4. TCP Throughput Collapse o 1Gbps Ethernet o 100us delay o 200ms RTO o 1MB block size Nagle et al*. called this problem Incast . Provided app-level solution. * The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage, SC 2014. 6/9/2025 4

  5. Incast Really Happens Microsoft datacenter ~300ms Idle!! Google datacenter* *Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google s Datacenter Network, SIGCOMM 2015. 6/9/2025 5

  6. Recap: RTO estimation Jacobson s TCP RTO estimator RTO = RTT + 4*RTTVAR Actual RTO Timer = max(RTOmin, RTO) Minimum RTO bound (RTOmin) = 200ms 6/9/2025 6

  7. Solution: Eliminate RTOmin Idea Enabling TCP retransmissions to fire up in microseconds. RTT is still measured in milliseconds. Millisecond retransmissions not enough! 6/9/2025 7

  8. Solution++: Microsecond RTO Idea Measure RTT in microseconds Use high-resolution kernel timers (HPET) 6/9/2025 8

  9. More Optimization: RTO Desynchronization Add some delay to avoid synchronized retransmissions 6/9/2025 9

  10. Discussion Spurious retransmissions Do microsecond timeouts harm wide-area throughput? Showed that there is no noticeable different in performance via experiments Interaction with delayed ACKs 10-15% loss in throughput Delayed ACKs should be avoided when possible 6/9/2025 10

  11. Summary Analyze the Incast problem using simulation and testbed experiment Microsecond RTOs can help datacenter application response time and throughput Simple yet effective solution Influence many works in datacenter networking DCTCP, ICTCP, pFabric, Fastpass, 6/9/2025 11

Related


More Related Content