Optimizing Ruby Interpreter with Superinstructions

how to select superinstructions for ruby n.w
1 / 24
Embed
Share

Explore how to select and apply superinstructions for optimizing the Ruby interpreter. Learn about the benefits, effects, and successful implementations of superinstructions in reducing dispatch overhead. Discover why superinstructions may not be ideal for Ruby due to specific characteristics of VM operations and dispatch overhead.

  • Ruby
  • Superinstructions
  • Interpreter Optimization
  • Dispatch Overhead
  • Performance

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. How to select superinstructions for Ruby ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya** * Tokyo Institute of Technology, dept. of Mathematical and Computing Sciences ** Tokyo University, Information Technology Center

  2. Ruby Dynamic language Becoming popular recently Numeric benchmarks 100 1000 times slower than equivalent program in C Numeric benchmarks marked in red * http://shootout.alioth.debian.org/ 2

  3. Interpreter optimization efforts Many techniques to optimize interpreter were proposed Threaded interpretation Stack top caching Pipelining Superinstructions Superinstructions Merge code of operations executed in sequence Focus of this presentation 3

  4. Superinstructions (contrived example) Optimizations applied PUSH: // put <imm> argument on stack stack[sp++] = *pc++; goto **pc++; PUSH_ADD: // add <imm> to stack top stack[sp-1] += *pc++; goto **pc++; ADD: // add two topmost values on stack sp--; stack[sp-1] += stack[sp]; goto **pc++; PUSH_ADD: // add <imm> to stack top stack[sp++] = *pc++; //goto **pc++; sp--; stack[sp-1] += stack[sp]; goto **pc++; Dispatch eliminated 4

  5. Superinstructions (effects) Effects 1. Reduce dispatch overhead a. Eliminate some jumps b. Provide more context for indirect branch predictorby replicating indirect jump instructions 2. Allow more optimizations within VM op 5

  6. Good for reducing dispatch overhead Superinstructions help when: VM operations are small (~10 hwop/vmop) Dispatch overhead is high (~50%) Examples of successful use in prior research ANSI C interpreter: 2-3 times improvement (Proebsting 1995) Ocaml: more than 50% improvement (Piumarta 1998) Forth: 20-80% improvement (Ertl 2003) 6

  7. Ruby does not fit well Superinstructions help when: VM operations are small (~10 hwop/vmop) Dispatch overhead is high (~50%) BUT Only 1-3% 60-140 hardware ops per VM op misprediction overhead Hardware profiling data on Intel Core 2 Duo on interpreter dispatch 7

  8. Superinstructions for Ruby We experimentally evaluated effect of naive superinstructions on Ruby Superinstructions are selected statically Frequently occurring in training run combinations of length 2 selected as superinstructions Training run uses the same benchmark Superinstructions constructed by concatenating C source code, C compiler optimizations applied 8

  9. Naive superinstructions effect on Ruby Limited benefit 4 benchmarks Normalized execution time Unpredictable effects Number of superinstructions used 9

  10. Branch mispredictions 2 benchmarks: mandelbrot and spectral_norm Normalized execution time Number of superinstructions used 10

  11. Branch mispredictions, reordered 2 benchmarks: mandelbrot and spectral_norm Normalized execution time Number of superinstructions used, reordered by execution time 11

  12. So why Ruby is slow? Profile of numeric benchmarks Garbage collection takes significant time Boxed floating point values dominate allocation 12

  13. Floating point value boxing Typical Ruby 1.9 VM operation OPT_PLUS: VALUE a = *(sp-2); VALUE b = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float) { sp--; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)); } else { CALL(1/*argnum*/, PLUS, a); } goto **pc++; New box object is allocated on each operation 13

  14. Proposal: use superinstructions for boxing optimization 2 operation per allocation instead of 1 Boxing of intermediate result eliminated OPT_MULT_OPT_PLUS: VALUE a = *(sp-3); VALUE b = *(sp-2); VALUE c = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float && CLASS_OF(c) == Float) { sp-=2; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)*DOUBLE_VALUE(c)); } else { CALL(1/*argnum*/, MULT/*method*/, b/*receiver*/); CALL(1/*argnum*/, PLUS/*method*/, a/*receiver*/); } goto **pc++; 14

  15. Implementation VM operations that handle floating point values directly: opt_plus opt_minus opt_mult opt_div opt_mod We implemented all 25 combinations of length 2 Based on Ruby 1.9.1 Using existing Ruby infrastructure for superinstructions with some modifications 15

  16. Limitations Coding style-sensitive Not applicable to other types (e.g. Fixnum, Bignum, String) Fixnum is already unboxed Bignum and String cannot be unboxed Sequences of 3 arithmetic instructions or longer virtually non-existent No occurrences in the benchmarks 16

  17. Evaluation Methodology median time of 30 runs Reduction in allocation 17

  18. Results Up to 22% benefit on numeric benchmarks No slowdown on other benchmarks 18

  19. Example: mandelbrot tweak Slight modification produces 20% difference in performance 4 of 9 arithmetic instructions get merged into 2 superinstructions 24% reduction in float allocation Normalized execution time ITER.times do tr = zrzr - zizi + cr tr = cr + (zrzr - zizi) ti = 2.0*zr*zi + ci ti = ci + 2.0*zr*zi - + - + 19

  20. Discussion of alternative approaches Faster GC would improve performance as well Superinstructions still apply, but with reduced benefit Type inference Would allow to specialize expressions and eliminate boxing Interoperability with dynamic code is an issue Dynamic specialization Topic for further research 20

  21. Related work: Tagged values Use lower bits of pointers to trigger alternative handling Embed floating point value into higher bits Limited to 64-bit platforms, as Ruby uses double precision 64 bit floating point arithmetic Our approach has same effect on 32 and 64 bit platforms Allows to eliminate majority of boxed floats Provides 28-35% benefit (on the same benchmarks) * Sasada 2008 21

  22. Related work: Lazy boxing Java-like language with generics over value- types Boxing needed to avoid duplication of template instantiation code for primitive types Lazy optimization works by allocating boxed objects in the stack frame, and moving to heap as needed Relies on static compiler analysis for escape path detection, and runtime checks * Owen 2004 22

  23. Related work:Superinstructions Superinstructions used for code compression ANSI C hybrid compiler-interpreter Trimedia code compression system Superinstructions chosen statically to minimize code size * Proebsting 1995 * Hoogerbrugge 1999 Superinstructions used to reduce dispatch overhead Forth, Ocaml Superinstructions chosen dynamically * Piumarta 1998 * Ertl 2003 23

  24. Conclusion Naive approach to superinstructions does not produce substantial benefit for Ruby Floating point values boxing overhead is a problem of Ruby Superinstructions provide some help (up to 22%) Future work Eliminate float boxing further Specializing computation loop 24

Related


More Related Content