
Identifying Performance Bottlenecks in Content Delivery Networks
"Learn how to identify and address performance bottlenecks in Content Delivery Networks (CDNs) through TCP-level monitoring. Discover common bottlenecks, reactions to each bottleneck, previous techniques, how TCP stats reveal bottlenecks, a measurement framework, and how a bottleneck classifier works. Explore insights from a CoralCDN experiment serving one million clients per day." (286 characters)
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University August 19, 2011
Performance Bottlenecks CDN Servers Server APP Internet Clients Server OS APP Write too slowly Internet Network congestion Client Insufficient receive buffer Server OS Insufficient send buffer or Small initial congestion window 2
Reaction to Each Bottleneck CDN Servers Server APP Internet Clients Server OS APP is bottleneck: Debug application Internet is bottleneck: Circumvent the congested part of network Client is bottleneck: Notify client to change Server OS is bottleneck: Tune buffer size, or upgrade server 3
Previous Techniques Not Enough Application logs: No details of network activities Packet sniffing: Expensive to capture Server APP Packet Sniffer Server OS Transport-layer stats: Directly reveal perf. bottlenecks Active probing: Extra load on network 4
How TCP Stats Reveal Bottlenecks Insufficient data in send buffer CDN Server Applications Receive window too small Packet loss Send buffer full or Initial congestion window too small Server Clients Network Path Network Stack CDN Servers Internet Clients 5
Measurement Framework Collect TCP statistics Web100 kernel patch Extract useful TCP stats for analyzing perf. Analysis tool Bottleneck classifier for individual connections Cross-connection correlation at AS level Map conn. to AS based on RouteView Correlate bottlenecks to drive CDN decisions 6
How Bottleneck Classifier Works Small initial Cwin BytesInSndBuf = Rwin Cwin drops greatly and Packet loss Slow start limits perf. Rwin limits sending Network path is bottleneck Network Stack is bottleneck Client is bottleneck 7
CoralCDN Experiment CoralCDN serves 1 million clients per day Experiment Environment Deployment: A Clemson PlanetLab node Polling interval: 50 ms Traces to Show: Feb 19th 25th 2011 Total # of Conn.: 209K After removing Cache-Miss Conn.: 137K (Total 2008 ASes) Log Space overhead < 200MB per Coral server per day 8
What are Major Bottleneck for Individual Clients? We calculate the fraction of time that the connection is under each bottleneck in lifetime % of Conn. With Bottleneck for >40% of Lifetime 10.75% 18.72% 3.94% 1.27% Bottlenecks Server Application Server Network Stack Network Path Clients Reasons: Congestion window rises too slowly for short conn. Spotty network (discussed in next slide) Receive buffer too small (Most of them are <30KB) Use more powerful PlanetLab machines Use larger initial congestion window Filter them out of decision making Reasons: Reasons: Our suggestion: Our suggestion: Our suggestion: Reasons: Slow CPU or scarce disk resources of the PlanetLab node (>80% of the connections last <1 second) 9
AS-Level Correlation CDNs make decision at the AS level e.g., change server selection for 1.1.1.0/24 Explore at the AS level: Filter out non-network bottlenecks Whether network problems exist Whether the problem is consistent 10
Filtering Out Non-Network Bottlenecks CDNs change server selection if clients have low throughput Non-network factors can limit throughput 236 out of 505 low-throughput ASes limited by non-network bottlenecks Filtering is helpful: Don t worry about things CDNs cannot control Produce more accurate estimates of perf. 11
Network Problem at AS Level CDN make decision at AS level Whether conn. in the same AS have common network problem For 7.1% of the ASes, half of conn. have >10% packet loss rate Network problems are significant at the AS level 12
Consistent Packet Loss of AS CDNs care about predictive value of measurement Analyze the variance of average packet loss rates Each epoch (1 min) has nonzero average loss rate Loss rate is consistent across epochs (standard deviation < mean) # of ASes with Consistent Packet Loss 377 / 2008 122 / 739 Analysis Length One Week One Day (Feb 21st) One Hour (Feb 21st 18:00~19:00) 19 / 121 13
Conclusion & Future Work Use TCP-level stats to detect performance bottlenecks Identify major bottlenecks for a production CDN Discuss how to improve CDN s operation with our tool Future Works Automatic and real-time analysis combined into CDN operation Detect the problematic AS on the path Combine TCP-level stats with application logs to debug online services 14
Thanks! Questions? 15