Understanding Multicore Processing Challenges

mohamed m saad mohamed a mohamedin prof binoy n.w
1 / 29
Embed
Share

Discover the reasons behind the adoption of multicore processors, the evolving landscape of software to support multithreading, and the development of innovative solutions like Transactional Memory and HydraVM.

  • Multicore Processing
  • Software Development
  • Multithreading
  • Transactional Memory
  • HydraVM

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Mohamed. M. Saad Mohamed A. Mohamedin& Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University

  2. Motivation & Objectives Background Transactional Memory Jikes RVM Program Reconstruction Architecture Profiler, Builder & Runtime Future Work

  3. Why Multicores? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits heat problems speed of light problems difficult design and verification large design teams necessary server farms need expensive air- conditioning

  4. No fast CPUs any more, just more cores! Trend is using multi-core & hyper-threading

  5. At 2005, Sun Niagara (8 cores with HT run 32 HWT) At 2010, Supermicro (48-core AMD Opteron). Now, Sun make boxes with between 128-512 hardware threads (16 HWT/core, 8 cores/CPU) !! What About Software!!! Are we ready for this HW ?!

  6. Many applications are designed to use few threads Legacy systems were designed to run at a single processor Multi-threading programming is headache for developers (race situations, concurrent access, ) HydraVM: Java Virtual Machine Prototype based on Jikes RVM and targets utilizing large number of cores through detecting automatically possible parallel portions of code

  7. Transactional Memory Jikes RVM (Adaptive Online Architecture)

  8. Atomicity: An operation (or set of operations) appears to the rest of the system to occur instantaneously Example (Money Transfer): synchronized { from = from - amount to = to + amount } Example (Money Transfer): account1.lock() account2.lock() from = from - amount to = to + amount account1.unlock() account2.unlock() Y account1 account2 X

  9. Drawbacks Deadlock Livelock Starvation Priority Inversion Non-composable Cost of managing the lock Non-scalable on multiprocessors Y A B X

  10. Simplifies parallel programming by allowing a group of load and store instructions to execute in an atomic way using additional primitives account1y account2y Y Example (Money Transfer): START-TRANSACTION from = from - amount to = to + amount END-TRANSACTION account1 account2 account1x account2x X Commit Commit or or Rollback & Retry Rollback & Retry

  11. Each transaction has ReadSet & WriteSet Transactions conflict if have the same variable(s) at ReadSet / WriteSet Conflict Resolution using Contention Manager that employs different policies (Aggressive, Polite, Back-Off, Random, ..) Aborted code undo changes (if required) and retries again

  12. Transactions may be nested (multiple levels) Inner transaction share the ReadSet/WriteSet of parent Inner transactions conflicts with each other and with other higher level transactions Aborting parent transaction forces abort for children Inner transactions changes are visible to parents once commit successfully, but hidden from outside world till commit of highest level

  13. Hardware Transactional Memory Modifications in processors, cache and bus protocol ex; unbounded HTM, TCC, . Software Transactional Memory Software runtime library or the programming language support Minimal hardware support; CAS, LL/SC ex; RSTM, DSTM, ESTM, .. Hybrid Transactional Memory Exploits HTM support to achieve hardware performance for transactions that do not exceed the HTM s limitations, and STM otherwise ex; LogTM, HyTM, Distributed Transactional Memory Extends transaction primitives to distributed environment (network of multiple machines) ex; HyFlow, DecentSTM, GenSTM,

  14. Mature modular open source Java virtual machine designed for research purposes. Unlike most other JVMs it is written in Java! Adaptive Online System

  15. We view program as a set of basic building blocks Each block is a set of instructions Block has single entry and multiple exists Blocks may access the same memory (variables) It is possible to reconstruct the program from these blocks by rearranging it differently with some changes to the control instructions. It is even possible to assign each set of blocks to different thread

  16. 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return 35: return 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 int counter = 0; for(int i=0; i<2; i++) if(Math.random()>0.3) counter++; else counter--;

  17. public class Test{ public static void foo(){ int counter = 0; for(int i=0; i<12; i++) if(Math.random()>0.3) counter++; else counter--; } public static void zoo(){ System.out.println("hi"); } public static void main(String[] args){ int i=6; if(i<10) foo(); else zoo(); } }

  18. Split code into Basic Block Inject loaded classes with additional instructions to monitor: Program Flow (Which Basic Blocks are accessed and in what order?) Memory accessed by each Basic Block Which Basic Block is doing I/O ?

  19. 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 write J write C visit B1 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl read K write K visit B2 14: ifle 23 17: iinc 1, 1 read C write C visit B3 20: goto 26 23: iinc 1, -1 read C write C visit B4 26: iinc 2, 1 read J write J visit B5 29: iload_2 30: bipush 12 visit B6 read J 32: if_icmplt 7 35: return Example: int C = 0; for(int J=0; J<2; J++) if(Math.random()>0.3) C++; else C--; 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

  20. Recompile the Java class bytecode into machine-code Replace and reload class definition at memory

  21. Running the profiled code Collecting flow & memory access information and store it at the knowledge repository

  22. Analyze knowledge repository information and know: Which Blocks can be grouped together Which groups of blocks can be parallelized

  23. Program can be represented as a string (each character is a basic block). Example: for (Integer i = 0; i < DIMx; i++) { for (Integer j = 0; j < DIMx; j++) { } } for (Integer k = 0; k < DIMy; k++) { C[i][j] += A[i][k] * B[k][j]; } abjbhcfefghcfefghijbhcfefghcfefghijk ab(jb(hcfefg)2hi)2jk

  24. ab(jb(hcfefg)2hi)2k Externalize common blocks patterns as methods Generated methods may be nested Reconstruct the program as producer-consumer pattern Collector Provides Executor with suitable blocks as Tasks to execute according to flow up-to time Executor Allocates core threads Assign tasks to threads Requests Collector for more blocks based on program flow, after all threads complete

  25. Problems Threads may conflict when access the same variables Threads may finish out of normal order Collector may generate invalid tasks Lets represents each Thread as Transaction When two transactions conflicts abort one that has newer blocks relative to normal execution Transaction will not commit unless its preceding one in timeline is finished Transaction timeout if not reachable

  26. Collects which transactions conflicts and commit rate We can refine the constructed program Builder re-organize generated blocks and recompile the code again

  27. Complete the implementation of HydraVM Profiling by monitoring memory instead of generating new instructions Automatically uses of Java NIO to handle I/O operations and generate callbacks to process it Using thread scheduling techniques instead of TM Formal verification of reconstructed programs matches desired semantics

  28. www.hydravm.org msaad@vt.edu

Related


More Related Content