
Unified Profiling Infrastructure for Datacenters: Overcoming Challenges with Instrumentation Sampling
"Learn about the need for instant profiling in datacenter applications, the limitations of traditional profiling methods, and how Google-wide profiling offers a solution. Explore goals for a unified profiling infrastructure, the role of instrumentation sampling, and the issues with basic implementation in profiling datacenters."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications Hyoun Kyu Cho1, Tipp Moseley2, Richard Hank2, Derek Bruening2, Scott Mahlke1 1University of Michigan 2Google 1
Datacenter Applications http://googleblog.blogspot.com In 2010, US Datacenters spent 70~90 billion kWh* Datacenter application performance is critical Profiling can help *[Koomey`11] 2
Traditional Profiling Source Code Challenges for Datacenters Need to run on live traffic Difficult to isolate Overheads Value profiling 3.8x slowdown1 Path profiling 31%, edge profiling 16%2 Binary management Many programs, multiple versions Instrumentation Build Instrumented Binary Input Data Training Run Profile Data 1[Calder`99] 2[Ball`96] 3
Google-Wide Profiling Continuous profiling infrastructure for datacenters Negligible overhead Sampling based Aggregated profiling overhead less than 0.01% Limitations Heavily rely on Performance Monitoring Units Limited flexibility and portabiliity [Ren et al.`10] 4
Goals Unified profiling infrastructure for datacenters Flexible types of profile data Portable across heterogeneous datacenter While maintaining Low overhead Does not burden binary management Dynamic Binary Instrumentation Sampling 5
Instrumentation Sampling application system call gateway operating system hardware 6
Instrumentation Sampling application instrumentation engine dispatch client context switch code cache DynamoRIO operating system hardware [Bruening`04] 6
Instrumentation Sampling application shepherding thread instrumentation engine dispatch client start profiling stop profiling code cache operating system hardware 6
Problems with Basic Implementation Unbounded profiling periods due to fragment linking Latency degradation due to initial instrumentation Multi-threade programs 7
Temporal Unlinking/Relinking of Fragments context switch code cache BB1 dispatch BB2 BB2->BB1 8
S/W Code Cache Pre-population application Still have latency degradation for intial instrumentation phases shepherding thread instrumentation engine dispatch client code cache operating system hardware 9
Multithreaded Program Support Sampling makes it possible to miss thread operations Forces Instant Profiling s signal handler for every thread Enumerates all threads and sends profiling start signal to each thread 10
Experimental Setup 6-core Intel Xeon 2.67GHz w/ 12MB L3 12GB main memory Linux kernel 2.6.32 gcc 4.4.3 w/ -O3 SPEC INT2006, BigTable, Web search Edge profiling client 11
Nave Edge Profiling 50 45 40 35 Slowdown 30 25 20 15 10 5 0 web search bigtable 473.astar 429.mcf 462.libquantum a.mean 445.gobmk 401.bzip2 403.gcc 464.h264ref 400.perlbench 12
Profiling Overhead 2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms 1.30 Normalized Execution Time 1.25 1.20 1.15 1.10 1.05 1.00 0.95 0.90 web search bigtable 473.astar 429.mcf 445.gobmk 462.libquantum a.mean 401.bzip2 403.gcc 464.h264ref 400.perlbench 13
S/W Code Cache Prepopulation w/ pre-population w/o pre-population 3500000 3000000 Cumulative Number of Samples 2500000 2000000 1500000 1000000 500000 0 0 1 2 3 4 5 6 7 8 9 Sampling Phases 14
Profiling Accuracy 2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms 100 90 80 Profiling Accuracy 70 60 50 40 30 20 10 0 web search bigtable 473.astar 429.mcf 445.gobmk 462.libquantum a.mean 401.bzip2 403.gcc 464.h264ref 400.perlbench 15
Asymptotic Accuracy bigtable web search 100 90 80 Cumulative Accuracy 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 Sampling Phases 16
Conclusion Low-overhead, portable, flexible profiling needed Instant Profiling Combines sampling and DBI Pre-populates S/W code cache Tunable tradeoff between overhead and information Provides eventual profiling accuracy Less than 5% overhead, more than 80% accuracy for na ve edge profiling client 17
Thank you! 18