Practical Runtime Support for Expressing Parallelism

michael bond milind kulkarni man cao minjiazhang n.w
1 / 34
Embed
Share

Explore the need for practical runtime support to enhance parallelism expression, eliminate concurrency errors, diagnose production bugs, and address nondeterminism. Learn about atomicity checking, data race detection, record and replay, transactional memory, DRF/SC enforcement, and deterministic execution in parallel programming.

  • Parallelism
  • Runtime Support
  • Concurrency
  • Determinism
  • Programming

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Michael Bond Milind Kulkarni Man Cao MinjiaZhang Meisam FathiSalmi SwarnenduBiswas AritraSengupta Jipeng Huang Purdue Ohio State

  2. Parallel programming is mainstream Shared memory with locks Challenge: performance & correctness

  3. Need practical runtime support Help express parallelism better Eliminate concurrency errors Diagnose production bugs Deal with nondeterminism

  4. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution

  5. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f

  6. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f

  7. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f Commodity (software-only) approaches slow programs by several times

  8. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences check o.f = check = o.f Commodity (software-only) approaches slow programs by several times

  9. Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences check o.f = check = o.f Any access could race add synchronization at every access

  10. Octet Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

  11. Octet Framework for runtime support HB edges all dependences Atomicity of analysis & access Proofs! Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

  12. Octet tracks ownership Each object s state { WrExT , RdExT , RdShc }

  13. os state = WrExT1 T1 T2 write check wr o.f Time

  14. os state = WrExT1 T1 T2 write check read check wr o.f Time

  15. os state = WrExT1 T1 T2 write check read check wr o.f Time

  16. os state = WrExT1 T1 T2 write check wr o.f Time read check safe point

  17. os state = WrExT1 T1 T2 write check wr o.f Time Implicit safe point read check safe point

  18. os state = WrExT1 T1 T2 write check wr o.f Time read check safe point

  19. os state = RdExT2 T1 T2 write check wr o.f Time read check safe point

  20. os state = RdExT2 T1 T2 write check wr o.f Time read check safe point rd o.f

  21. os state = RdExT2 T1 T2 T3 T4 write check wr o.f read check safe point rd o.f

  22. os state = RdExT2 T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check

  23. os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check rd o.f

  24. os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f

  25. os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f rd o.f

  26. os state = RdShc T1 T2 T3 T4 Sharing detection [von Praun & Gross 01] Comparison in our paper write check wr o.f read check Distributed shared memory Shasta [Scales et al. 96] safe point Biased locking [Kawachiya et al. 02] [Russell & Detlefs 06] [Hindman & Grossman 06] rd o.f read check read check rd o.f rd o.f

  27. Practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences Framework for runtime support Concurrency control mechanism

  28. Dependence recorder records happens-before edges T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f rd o.f

  29. Implementation in Jikes RVM Publicly available http://jikesrvm.org/Research+Archive Parallel programs DaCapo Benchmarks 2006& 2009 SPEC JBB 2000& 2005 Parallel platform 32cores (AMD Opteron 6272)

  30. 34,600% 3,000% 1000 900 800 700 Overhead (%) 600 500 Pessimistic 400 300 200 100 0

  31. Octet w/o coord 120 100 80 Overhead (%) 60 Octet w/o coord 40 20 0

  32. 120 100 80 Overhead (%) 60 Octet Octet w/o coord 40 20 0

  33. 120 100 80 Overhead (%) Recorder 60 Octet Octet w/o coord 40 20 0

  34. Octet helps enable practical runtime support for reliable, scalable concurrency Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

More Related Content