Scalable Hybrid Transactional Memory Solutions

reduced hardware norec a safe and scalable hybrid n.w
1 / 27
Embed
Share

Explore innovative solutions for hybrid transactional memory systems, addressing challenges such as failure points, conflicts, and concurrency issues. Learn about lock elision techniques and the benefits and drawbacks of hardware-software concurrency in transactional memory. Discover how hybrid transactional memory bridges the gap between hardware and software for improved performance.

  • Transactional memory
  • Scalable solutions
  • Hardware-software concurrency
  • Lock elision
  • Hybrid memory

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT

  2. Good: Hardware Transactional Memory (HTM) Bad: The HTM is best-effort HTM may always fail due to: 1. L1 cache capacity 2. Interrupt 3. Unsupported instruction

  3. A Possible Solution is: Lock Elision Thread 1 Thread 2 1. HTM Start 1. Lock 1. HTM Start 1. Lock 2. Read lock and check it is free 2. Read lock and check it is free 3. ... code 3. ... code 4. HTM Commit 4. HTM Commit 2. Unlock 2. Unlock No conflict HTMs commit concurrently

  4. A Possible Solution is: Lock Elision Thread 1 Thread 2 Thread 3 1. HTM Start Wait for Lock 1. HTM Start Wait for Lock 1. HTM Start 1. Acquire Lock 2. Read lock and check it is free 2. Read lock and check it is free 2. Read lock and check it is free 2. ... code 3. ... code 4. ... CONFLICT HTM Restart 3. ... code 4. ... CONFLICT HTM Restart 3. ... code 3. ... FAIL HTM Restart 3. Release Lock No concurrency between hardware and software

  5. A Possible Solution is: Lock Elision Good Simple: No need to instrument reads and writes Bad: Serial fallback: A software fallback grabs the global lock and aborts all hardware transactions

  6. Another Approach is: Hybrid Transactional Memory Thread 1 Thread 2 Thread 3 1. HTM Start 1. HTM Start 1. HTM Start 1. STM Start 2. Read lock and check it is free 2. Read lock and check it is free 2. Read lock and check it is free 2. ... code 3. ... code 3. ... code 3. ... code 3. ... FAIL HTM Restart 3. more code 4. ... more code 4. ... more code STM and HTM execute concurrently

  7. Another Approach is: Hybrid Transactional Memory Good Hardware-Software Concurrency Bad: Complex: 1. Hard to coordinate hardware and software Our focus 2. Hard to apply to code due to instrumentation GCC C/C++ TM helps here a lot

  8. Hybrid TM History 2006: First Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum] Key Idea: Use per location metadata version- locks to coordinate hardware and software Bad: Hardware is slow: on each read/write must read the version-lock and execute a branch condition check

  9. Hybrid TM History 2007: Phased TM [LevMoirNussbaum] Key Idea: Use HTM mode or STM mode, but not HTM and STM at the same time Bad: Expensive to switch modes: a single fallback must stop all hardware

  10. Hybrid TM History 2011: Hybrid Norec (state-of-the-art) [DalessandroCarougeWhiteLevMoirScottSpear] Key Idea: No metadata + global clock for coordination

  11. Hybrid NOrec Good No metadata: Efficient for low concurrency Bad: Limited Scalability: too much aborts due to global clock updates A software write must abort all hardware A hardware write must abort all software

  12. Hybrid NOrec Fast-Path: Hardware Slow-Path: Software Read clock Read clock Read X (verify clock) Read X (pure) Lock clock ABORT X = 4 Unlock clock Read clock Read clock Read X Update clock Read X: check clock => changed => restart/revalidate RESTART

  13. A Possible Solution 2011: Hybrid NOrec 2 [RiegelMarlierNowackFelberFetzer] Key Idea: Use non-speculative reads inside HTM to verify the global clock and avoid unnecessary aborts Bad: HTM of Intel and IBM has no support for non- speculative reads

  14. A Recent Approach 2014: Invyswell Hybrid [CalciuGottschlichShpeismanPokamHerlihy] Key Idea: Allow unsafe concurrency between hardware and software, and use the HTM sandboxing to detect and handle errors

  15. Invyswell Fast-Path: Hardware Slow-Path: Software Lock clock X = 4 (NEW) NO ABORT Read X (NEW) Read Y (OLD) Func(X, Y): Unsafe Hopes HTM aborts FUTURE Update clock Y = 8 (NEW) Unlock clock

  16. Invyswell Good Much less aborts than Hybrid Norec Bad: Unfortunately, HTM sandboxing may miss errors, so a corrupted transactions may commit and crash the system: This problem was shown in a recent work: Pitfalls of Lazy Subscription by [DiceHarrisKoganLevMoir]

  17. Our New Approach 2015: RH NOrec [MatveevShavit] Key Idea: Use a mixed fallback path, that uses both software and short hardware transactions

  18. RH NOrec Slow-Path: Software Mixed Slow-Path Fast-Path: Hardware Lock clock X = 4 (NEW) X = 4 (HIDDEN) Read X (NEW) Read X (OLD) Read Y (OLD) Read Y (OLD) A Writes are speculative (invisible) HTM Func(X, Y) Func(X, Y): Unsafe Hopes HTM aborts Safe! Update clock Y = 8 (NEW) Y = 8 (HIDDEN) X and Y both OLD or both NEW not a mix Unlock clock

  19. RH NOrec Key Point 1: Execute software writes in a short hardware transaction No need to abort hardware transactions Full safety In practice this works well Due to the 80:20 rule: a typical operation has 80% reads and 20% writes

  20. RH NOrec Key Point 2: Execute a maximal amount of initial software reads in a read-only hardware transaction Allows to defer the global clock read, and significantly reduce the software restarts/revalidations

  21. Fast-Path: Hardware Mixed Path Read clock HTM start reads/writes reads in software (verifies clock) Update clock HTM commit RESTART Read some X: check clock => changed => restart/revalidate

  22. Fast-Path: Hardware Mixed Path HTM start HTM start reads/writes reads in HTM (pure/direct) HTM Prefix Update clock HTM commit Read clock HTM commit NO ABORT NO ABORT

  23. HTM start HTM start reads/writes reads in HTM (pure/direct) HTM Prefix Update clock HTM commit Read clock HTM commit reads in software Lock clock HTM start HTM start writes in HTM HTM Postfix HTM commit reads/writes Unlock clock Update clock NO ABORT NO ABORT HTM commit

  24. Throughput on 8-core Intel (GCC C/C++) 7.00E+08 Red-Black Tree (10K) 10% mutations 6.00E+08 5.00E+08 Lock Elision RH-NORec 4.00E+08 TL2 HY-NORec NORec 3.00E+08 2.00E+08 1.00E+08 0.00E+00 1 2 4 6 8 10 12 14 16 4.50E+08 Red-Black Tree (10K) 40% mutations 4.00E+08 3.50E+08 Lock Elision RH-NORec TL2 HY-NORec 3.00E+08 2.50E+08 2.00E+08 1.50E+08 NORec 1.00E+08 5.00E+07 0.00E+00 1 2 4 6 8 10 12 14 16

  25. 1.40E+06 4.00E+06 Vacation Database (STAMP - Low) Intruder Detection (STAMP) 3.50E+06 1.20E+06 3.00E+06 1.00E+06 2.50E+06 8.00E+05 2.00E+06 6.00E+05 1.50E+06 4.00E+05 1.00E+06 2.00E+05 5.00E+05 0.00E+00 0.00E+00 1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16 Lock Elision RH-NORec TL2 Lock Elision RH-NORec TL2 HY-NORec NORec HY-NORec NORec 1.00E+06 4.50E+06 Genome Sequencing (STAMP) SSCA2 (STAMP) 9.00E+05 4.00E+06 8.00E+05 3.50E+06 7.00E+05 3.00E+06 6.00E+05 2.50E+06 5.00E+05 2.00E+06 4.00E+05 1.50E+06 3.00E+05 1.00E+06 2.00E+05 5.00E+05 1.00E+05 0.00E+00 0.00E+00 1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16 Lock Elision RH-NORec TL2 Lock Elision RH-NORec TL2 HY-NORec NORec HY-NORec NORec

  26. Conclusion RH Norec: a new Hybrid TM that is safe and scalable Key Idea: Use a mixed fallback path that uses two short hardware transactions: 1. HTM Prefix: Executes a maximal amount of initial reads defers the global clock read 2. HTM Postfix: Executes the software writes preserves safety and allows hardware- software concurrency

  27. Thank You

Related


More Related Content