Transforming Machine Code Synthesis for Speed and Efficiency

Transforming Machine Code Synthesis for Speed and Efficiency
Slide Note
Embed
Share

Pervasive software usage demands efficient rewriting of binary code without access to source code or documentation. Explore the motivation behind binary rewriting, superoptimization, program repair, obfuscation, and more to enhance the speed of McSynth in synthesizing machine code.

  • Machine Code Synthesis
  • Binary Rewriting
  • Superoptimization
  • Program Repair
  • Optimization

Uploaded on Apr 08, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Speeding up Machine-Code Synthesis Venkatesh Srinivasan, Tushar Sharma, Thomas Reps {venk, tsharma, reps}@cs.wisc.edu 1

  2. Software is Pervasive Binaries are [Motivation] Software is everywhere Everyday systems: phones, laptops, watches, cars, etc. Critical systems: aircrafts, space shuttles, medical devices, etc. Binaries are 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 No source No source code code No No documentation documentation Analyze Binaries Rewrite Binaries 2

  3. Binary Rewriting [Motivation] Superoptimization 0100010 Binary 1011101 Partial Evaluation 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 Program Repair 1101010 Binary 0010111 Binary Rewriter Obfuscation 0111101 Binary 0010111 Partial Evaluation Slicing 3

  4. Inside a Semantics-Based Binary Rewriter * Binary Analyses [Motivation] Disassembly of section .text: Rewritten Binary Rewritten Binary Offline binary optimization 080482e0 <.text>: 80482e0: xor ebp,ebp 80482e2: pop esi 80482e3: mov ecx,esp 80482e5: and esp, 0xfffffff0 80482e8: push eax 80482e9: push esp 80482ea: push edx 80482eb: push 0x80483d0 80482f0: push 0x80483e0 80482f5: push ecx 80482f6: push esi 80482f7: push 0x80483a5 80482fc: call 80482c4 . . 8048394: push ebp 8048395: mov ebp, esp 8048397: sub esp, 0x10 804839a: mov eax, [ebp + 8] 804839d: mov [ebp-4], eax 80483a0: mov eax, [ebp-4] . . 80483c0: mov eax, [ebp-4] 80483c3: leave 80483c4: ret 080482e0 <.text>: 80482e0: xor ebp,ebp 80482e2: pop esi 80482e3: mov ecx,esp 80482e5: and esp, 0xfffffff0 80482e8: push 1 80482e9: push 2 80482ea: push 3 80482eb: push 0x80483d0 80482f0: push 0x80483e0 80482f5: push ecx 80482f6: push esi 80482f7: push 0x80483a5 80482fc: call 80482c4 . . 8048394: lea esp, [esp 4] 8048395: mov ebp, esp 8048397: sub esp, 0x10 804839a: mov eax, 10 804839d: mov [ebp-4], eax 80483a0: mov eax, 20 . . 80483c0: mov [ebp-4], eax 80483c3: leave 80483c4: ret Partial evaluation Logical Formula Transformed Formula Binary obfuscation Slicing McSynth Transform Binary repair Logical Formula Transformed Formula McSynth Transform * V. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015 4

  5. McSynth is not fast enough! [Motivation] Rewriter makes Rewriter makes many calls to many calls to McSynth McSynth Each call to Each call to McSynth takes ~ McSynth takes ~ minutes minutes Superoptimization 0100010 Binary 1011101 We need to speed-up McSynth! Partial Evaluation 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 Binary Rewriter Program Repair 1101010 Binary 0010111 Obfuscation Analysis Transformation McSynth McSynth++ Synthesis 0111101 Binary 0010111 Partial Evaluation Time taken to Time taken to rewrite a binary rewrite a binary several hours or several hours or days! days! Slicing 5

  6. Outline Motivation Intel x86 (IA-32) Primer McSynth McSynth++ Improvements to Master Improvements to Slave Experiments Conclusion 6

  7. Intel x86 (IA-32) Primer [Background] Registers 20 EBX 1000 ESP 20 EAX 996 1000 20 40 Stack push eax mov ebx, eax mov [esp], 40 lea esp, [esp+4] 7

  8. Outline Motivation Intel x86 (IA-32) Primer McSynth McSynth++ Improvements to Master Improvements to Slave Experiments Conclusion 8

  9. Synthesis of Machine Code* [McSynth] Synthesize machine-code instructions from a semantic specification QFBV formula Parameterized by ISA ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] EBP = EBP EAX = EAX EBX = EBX ... CF = CF OF = OF McSynth lea esp, [esp 4] push eax mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax lea esp, [esp 4];mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] * V. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015 9

  10. Enumerative Synthesis Challenges [McSynth] 43,000 unique IA-32 instruction schemas Exponential cost of enumeration + = Synthesis of an instruction Synthesis of an instruction sequence of length 2 takes sequence of length 2 takes several hours or days! several hours or days!

  11. Methodology [McSynth] m,1 'm,1 Slave I1 T I'1 Slave I . . . . . m-1,1 1,1 m,2 . . . I2 I'2 'm,2 Slave Master 1,2 m-1,k m,n In Slave 11

  12. Divide-and-Conquer in Master [McSynth] : EAX = Mem(ESP + 4) EBX = ((EAX * 2) >> 2) + EAX Mem = Mem[ESP EAX] 1: EAX = Mem(EAX + 4) EBX = ((EAX * 2) >> 2) + EAX 1: Mem = Mem[ESP EAX] 1 : EBX = ((EAX * 2) >> 2) + EAX Non-aliasing Flow Legal Illegal split split locations independence 2: EAX = Mem(ESP + 4) 2: EAX = Mem(ESP + 4) 2: Mem = Mem[ESP EAX] EBX = ((EAX * 2) >> 2) + EAX Mem = Mem[ESP EAX] KILL( 1) USE( 2) = { } 12

  13. Slave Synthesizer [McSynth] Slave Synthesizer Footprint-Based Search-Space Pruning 1 I Instruction Enumerator CEGIS Prunes useless candidates Instantiation of counterexample- guided inductive synthesis framework for machine code Search Space (Instruction sequences) 13

  14. McSynth is still slow! [McSynth] imul ebx, eax, 2 shr ebx, 2 lea ebx, [ebx + eax] 1 : EBX = ((EAX * 2) >> 2) + EAX Slave Master 2: EAX = Mem(ESP + 4) Mem = Mem[ESP EAX] mov [esp], eax mov eax, [esp + 4] Slave Decision procedure for flow dependence treats memory as single unit Overall synthesis task takes Overall synthesis task takes several hours several hours few days! No splitting of deep terms few days! 14

  15. Outline Motivation Intel x86 (IA-32) Primer McSynth McSynth++ Improvements to Master Improvements to Slave Experiments Conclusion 15

  16. McSynth Vs. McSynth++ [McSynth++] McSynth McSynth++ Improved Master: More and finer- grained splits (n m) Master+ + Master . . . . . . . . Slave Slave Slave Slave++ Slave++ Slave++ Improved Slave: Faster enumerative synthesis I1 I2 Im I1 I2 In 16

  17. Improved Decision Procedure for Flow Dependence [Master++] : EAX = Mem(ESP + 4) EBX = ((EAX * 2) >> 2) + EAX Mem = Mem[ESP EAX] Flow Legal Split independence 2: EAX = Mem(ESP + 4) EBX = ((EAX * 2) >> 2) + EAX DP in McSynth++ can reason about individual memory locations Decision procedure for Decision procedure for flow independence flow independence treats memory as single treats memory as single unit unit 3 : EBX = ((EAX * 2) >> 2) + EAX 1: Mem = Mem[ESP EAX] 4: EAX = Mem(ESP + 4) Mem(ESP) and Mem(ESP +4) could never alias 17

  18. Flattening Deep Terms [Master++] 3 : EBX = ((EAX * 2) >> 2) + EAX Equisatisfiable Scratch Locations (Dead Registers) = {EBX} McSynth does not split a McSynth does not split a conjunct of conjunct of any further further 3 : m = EAX * 2 n = m >> 2 EBX = n + EAX any 3 : EBX = EAX * 2 EBX = EBX >> 2 EBX = EBX + EAX 6 : EBX = EBX + EAX 5 : EBX = EAX * 2 6 : EBX = EBX >> 2 18

  19. Master Vs. Master++ [Master++] Master Master++ Master+ + Master 4 : EAX = Mem(ESP + 4) 1 : Mem = Mem[ESP EAX] 2: EAX = Mem(ESP + 4) Mem = Mem[ESP EAX] 1 : EBX = ((EAX * 2) >> 2) + EAX 7 : EBX = EBX + EAX 5 : EBX = EAX * 2 6 : EBX = EBX >> 2 19

  20. Bits-Lost-Based Pruning [Slave++] 7 requires pre-state bits in EBX loses the pre-state bits in EBX when it transforms the state Prunes a candidate if does not retain enough bits to implement 7 Retains a candidate if pre-state bits required to implement 7 are possibly latent in 7 : EBX = EBX + EAX Instruction Enumerator Slave mov ebx, eax sub ebx, eax : EBX = EAX : EBX = EBX - EAX 20

  21. Move-to-Front Heuristic [Slave++] Pragmatic heuristic Moves instructions that occur in synthesized code to the front of instruction pool in next synthesis task Certain instructions tend to be used more frequently than others to implement Prioritize useful instructions Master+ + . . . . Slave++ Slave++ Slave++ Move To Front I1 I2 In 21

  22. McSynth Vs. McSynth++ [McSynth++] McSynth McSynth++ Master+ + Master Overall synthesis task Overall synthesis task takes under a minute takes under a minute over four orders of over four orders of magnitude magnitude speedup! Overall synthesis Overall synthesis task takes several task takes several hours hours few days! few days! 4 : EAX = Mem(ESP + 4) 1 : Mem = Mem[ESP EAX] speedup! 2: EAX = Mem(ESP + 4) Mem = Mem[ESP EAX] 1 : EBX = ((EAX * 2) >> 2) + EAX 5 : EBX = EAX * 2 7 : EBX = EBX + EAX Slave++ Slave++ 6 : EBX = EBX >> 2 Slave++ Slave++ Slave Slave mov [esp], eax mov eax, [esp+4] Slave++ add ebx, eax imul ebx, eax, 2 imul ebx, eax, 2 shr ebx, 2 lea ebx, [ebx + eax] mov [esp], eax mov eax, [esp + 4] 22 shr ebx. 2

  23. Outline Motivation Intel x86 (IA-32) Primer McSynth McSynth++ Improvements to Master Improvements to Slave Experiments Conclusion 23

  24. Test Suite [Experiments] Important instruction-sequences from real-world programs SPECINT 2006 benchmarks Instruction-sequences of length 1 through 10 Top 5 most frequently-occurring 50 instruction-sequences in total I Symbolic Symbolic Execution Execution I McSynth++ McSynth++ No restriction on source of QFBV formula 24

  25. Effect of Improvements [Experiments] Synthesis Time - Effect of all improvements Synthesis time - baseline McSynth + all 100000 No. of timeouts: 14 Vs. 2 (out of 50) Avg. speedup for formulas that timed out in McSynth: over 1981X Avg. speedup for remaining formulas: 3X improvements (seconds) 10000 1000 100 10 1 1 10 100 1000 10000 100000 Synthesis time - baseline McSynth (seconds) 25

  26. Effect on a Client [Experiments] WiPEr: Machine-code partial evaluator * Total time to synthesize residual code: McSynth Vs. McSynth++ Application LOC No. of calls to synthesizer McSynth (seconds) Synthesis time using Synthesis time using McSynth++ (seconds) Speedup power 19 6 16 13.5 1.19 interpreter 71 19 30 22.8 1.32 sha1 140 23 25.4 21 1.21 filter 107 212 241 177 1.36 dotproduct 29 306 312 267 1.17 Average speedup: 1.25X * V. Srinivasan and T. Reps. Partial Evaluation of machine code. In OOPSLA, 2015 26

  27. Conclusion Improvements to speed-up machine-code synthesis McSynth++: improved synthesizer for IA-32 Master++ Improved decision procedure for flow dependence Flattening deep terms Slave++ Bits-lost-based pruner Move-to-front heuristic Over 1981X speedup for formulas that timed out in McSynth 3X speedup for remaining formulas 27

  28. Questions? McSynth McSynth++ Improved Master: More and finer- grained splits (n m) Master+ + Master . . . . . . . . Slave Slave Slave Slave++ Slave++ Slave++ Improved Slave: Faster enumerative synthesis I1 I2 Im I1 I2 In 28

More Related Content