Unified Profiling Infrastructure for Datacenters: Overcoming Challenges with Instrumentation Sampling

instant profiling instrumentation sampling n.w
1 / 20
Embed
Share

"Learn about the need for instant profiling in datacenter applications, the limitations of traditional profiling methods, and how Google-wide profiling offers a solution. Explore goals for a unified profiling infrastructure, the role of instrumentation sampling, and the issues with basic implementation in profiling datacenters."

  • Datacenters
  • Profiling
  • Instrumentation Sampling
  • Google-wide Profiling
  • Infrastructure

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications Hyoun Kyu Cho1, Tipp Moseley2, Richard Hank2, Derek Bruening2, Scott Mahlke1 1University of Michigan 2Google 1

  2. Datacenter Applications http://googleblog.blogspot.com In 2010, US Datacenters spent 70~90 billion kWh* Datacenter application performance is critical Profiling can help *[Koomey`11] 2

  3. Traditional Profiling Source Code Challenges for Datacenters Need to run on live traffic Difficult to isolate Overheads Value profiling 3.8x slowdown1 Path profiling 31%, edge profiling 16%2 Binary management Many programs, multiple versions Instrumentation Build Instrumented Binary Input Data Training Run Profile Data 1[Calder`99] 2[Ball`96] 3

  4. Google-Wide Profiling Continuous profiling infrastructure for datacenters Negligible overhead Sampling based Aggregated profiling overhead less than 0.01% Limitations Heavily rely on Performance Monitoring Units Limited flexibility and portabiliity [Ren et al.`10] 4

  5. Goals Unified profiling infrastructure for datacenters Flexible types of profile data Portable across heterogeneous datacenter While maintaining Low overhead Does not burden binary management Dynamic Binary Instrumentation Sampling 5

  6. Instrumentation Sampling application system call gateway operating system hardware 6

  7. Instrumentation Sampling application instrumentation engine dispatch client context switch code cache DynamoRIO operating system hardware [Bruening`04] 6

  8. Instrumentation Sampling application shepherding thread instrumentation engine dispatch client start profiling stop profiling code cache operating system hardware 6

  9. Problems with Basic Implementation Unbounded profiling periods due to fragment linking Latency degradation due to initial instrumentation Multi-threade programs 7

  10. Temporal Unlinking/Relinking of Fragments context switch code cache BB1 dispatch BB2 BB2->BB1 8

  11. S/W Code Cache Pre-population application Still have latency degradation for intial instrumentation phases shepherding thread instrumentation engine dispatch client code cache operating system hardware 9

  12. Multithreaded Program Support Sampling makes it possible to miss thread operations Forces Instant Profiling s signal handler for every thread Enumerates all threads and sends profiling start signal to each thread 10

  13. Experimental Setup 6-core Intel Xeon 2.67GHz w/ 12MB L3 12GB main memory Linux kernel 2.6.32 gcc 4.4.3 w/ -O3 SPEC INT2006, BigTable, Web search Edge profiling client 11

  14. Nave Edge Profiling 50 45 40 35 Slowdown 30 25 20 15 10 5 0 web search bigtable 473.astar 429.mcf 462.libquantum a.mean 445.gobmk 401.bzip2 403.gcc 464.h264ref 400.perlbench 12

  15. Profiling Overhead 2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms 1.30 Normalized Execution Time 1.25 1.20 1.15 1.10 1.05 1.00 0.95 0.90 web search bigtable 473.astar 429.mcf 445.gobmk 462.libquantum a.mean 401.bzip2 403.gcc 464.h264ref 400.perlbench 13

  16. S/W Code Cache Prepopulation w/ pre-population w/o pre-population 3500000 3000000 Cumulative Number of Samples 2500000 2000000 1500000 1000000 500000 0 0 1 2 3 4 5 6 7 8 9 Sampling Phases 14

  17. Profiling Accuracy 2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms 100 90 80 Profiling Accuracy 70 60 50 40 30 20 10 0 web search bigtable 473.astar 429.mcf 445.gobmk 462.libquantum a.mean 401.bzip2 403.gcc 464.h264ref 400.perlbench 15

  18. Asymptotic Accuracy bigtable web search 100 90 80 Cumulative Accuracy 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 Sampling Phases 16

  19. Conclusion Low-overhead, portable, flexible profiling needed Instant Profiling Combines sampling and DBI Pre-populates S/W code cache Tunable tradeoff between overhead and information Provides eventual profiling accuracy Less than 5% overhead, more than 80% accuracy for na ve edge profiling client 17

  20. Thank you! 18

Related


More Related Content