
Practical Runtime Support for Expressing Parallelism
Explore the need for practical runtime support to enhance parallelism expression, eliminate concurrency errors, diagnose production bugs, and address nondeterminism. Learn about atomicity checking, data race detection, record and replay, transactional memory, DRF/SC enforcement, and deterministic execution in parallel programming.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Michael Bond Milind Kulkarni Man Cao MinjiaZhang Meisam FathiSalmi SwarnenduBiswas AritraSengupta Jipeng Huang Purdue Ohio State
Parallel programming is mainstream Shared memory with locks Challenge: performance & correctness
Need practical runtime support Help express parallelism better Eliminate concurrency errors Diagnose production bugs Deal with nondeterminism
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences o.f = = o.f Commodity (software-only) approaches slow programs by several times
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences check o.f = check = o.f Commodity (software-only) approaches slow programs by several times
Need practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences check o.f = check = o.f Any access could race add synchronization at every access
Octet Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement
Octet Framework for runtime support HB edges all dependences Atomicity of analysis & access Proofs! Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement
Octet tracks ownership Each object s state { WrExT , RdExT , RdShc }
os state = WrExT1 T1 T2 write check wr o.f Time
os state = WrExT1 T1 T2 write check read check wr o.f Time
os state = WrExT1 T1 T2 write check read check wr o.f Time
os state = WrExT1 T1 T2 write check wr o.f Time read check safe point
os state = WrExT1 T1 T2 write check wr o.f Time Implicit safe point read check safe point
os state = WrExT1 T1 T2 write check wr o.f Time read check safe point
os state = RdExT2 T1 T2 write check wr o.f Time read check safe point
os state = RdExT2 T1 T2 write check wr o.f Time read check safe point rd o.f
os state = RdExT2 T1 T2 T3 T4 write check wr o.f read check safe point rd o.f
os state = RdExT2 T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check
os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check rd o.f
os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f
os state = RdShc T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f rd o.f
os state = RdShc T1 T2 T3 T4 Sharing detection [von Praun & Gross 01] Comparison in our paper write check wr o.f read check Distributed shared memory Shasta [Scales et al. 96] safe point Biased locking [Kawachiya et al. 02] [Russell & Detlefs 06] [Hindman & Grossman 06] rd o.f read check read check rd o.f rd o.f
Practical runtime support Atomicity checking Data race detection Record & replay Transactional memory DRF/SC enforcement Deterministic execution Track dependences Control dependences Framework for runtime support Concurrency control mechanism
Dependence recorder records happens-before edges T1 T2 T3 T4 write check wr o.f read check safe point rd o.f read check read check rd o.f rd o.f
Implementation in Jikes RVM Publicly available http://jikesrvm.org/Research+Archive Parallel programs DaCapo Benchmarks 2006& 2009 SPEC JBB 2000& 2005 Parallel platform 32cores (AMD Opteron 6272)
34,600% 3,000% 1000 900 800 700 Overhead (%) 600 500 Pessimistic 400 300 200 100 0
Octet w/o coord 120 100 80 Overhead (%) 60 Octet w/o coord 40 20 0
120 100 80 Overhead (%) 60 Octet Octet w/o coord 40 20 0
120 100 80 Overhead (%) Recorder 60 Octet Octet w/o coord 40 20 0
Octet helps enable practical runtime support for reliable, scalable concurrency Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement