Synthesis of Machine Code: Semantics to Implementation
This comprehensive study delves into the transformation of semantic specifications into machine code instructions using advanced synthesis techniques. Featuring innovative approaches like CEGIS loop and divide-and-conquer strategy, the research showcases significant advancements in generating optimized machine code sequences efficiently.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Synthesis of Machine Code from Semantics Venkatesh Srinivasan Thomas Reps {venk, reps}@cs.wisc.edu 1
McSynth Synthesize a straight-line machine-code instruction sequence from a semantic specification QFBV Formula ESP = ESP 4 Specification Mem = Mem[ESP 4 EAX] McSynth Machine Code push eax 2
McSynth Core synthesis loop CEGIS Footprint-based search- space pruning heuristics Divide-and-conquer strategy Tested on corpus of 50 QFBV formulas Orders of magnitude faster than a baseline enumerative synthesizer ESP = ESP 4 Mem = Mem[ESP 4 EAX] McSynth CEGIS Pruning Search space Divide-and-Conquer push eax 3
Motivation Binary Rewriting No source code and/or documentation Superoptimization 0100010 Binary 1011101 Partial Evaluation 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 11111101010 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 Obfuscation 1101010 Binary 0010111 Binary Rewriter Tamper-proofing 0111101 Binary 0010111 Partial Evaluation Slicing 5
Motivation A Semantics-Based Binary-Rewriting Framework Offline binary optimization Partial evaluation Transformations to semantics Completed VSA [1], DUA [2], Rewriting In Progress Analysis Live Registers and Flags, Slicing Binary obfuscation Future Work 010100001100 001000010000 010001010111 Rewritten Binary 000010101010 101010100010 011111000000 011111000000 010100001100 001000010000 010001010111 Optimized Binary 000010101010 101010100010 001000010000 010001010111 111010101010 Binary 001010101011 101010101010 101010100010 Mc- Synth Optimized Transformed Semantics Semantics Transform Optimize Semantics [1] G. Balakrishnan and T. Reps. WYSINWYX: What You See Is Not What You eXecute. TOPLAS, 32(6), 2010. [2] J. Lim and T. Reps. TSL: A system for generating abstract interpreters and its application to machine-code analysis. TOPLAS, 35(4), 2013 6
Outline Motivation and uses Problem statement Related work Synthesis algorithm Divide-and-conquer Footprint-based search-space pruning CEGIS loop Experiments Extensions Future Work 7
Synthesis of Machine Code Synthesize machine- code instructions from a semantic specification QFBV formula Desired properties Correctness Parameterized by ISA ESP = ESP 4 ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax R1 = R1 4 Mem = Mem[R1 4 R7] Mem = Mem[ESP 4 EAX] EBP = EBP EAX = EAX EBX = EBX ... CF = CF Synth <IA32> Synth <PPC32> McSynth OF = OF push eax stwu r7, -4(r1) lea esp, [esp 4] mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax lea esp, [esp 4];mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] 8
Related Work Instruction-sequence to QFBV Reinterpretation of instruction semantics [1] Superoptimization [2, 3] SuperOpt: InstrSeq InstrSeq Synth: QFBVFormula InstrSeq Component-based synthesis [4] ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax SuperOptimizer McSynth InstrToQFBV McSynth InstrToQFBV push eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] [1] J. Lim, A. Lal, and T. Reps. Symbolic analysis via semantic reinterpretation. STTT, 13(1):61-87, 2011. peephole superoptimizers. In ASPLOS, 2006. [3] E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. In ASPLOS, 2013. [2] S. Bansal and A. Aiken. Automatic generation of [4] S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In PLDI, 2011. 9
Outline Motivation and uses Problem statement Related work Synthesis algorithm Divide-and-conquer Footprint-based search-space pruning CEGIS loop Experiments Extensions Future Work 10
Challenges Input QFBV formula Synthesized Code Instruction- Sequence Enumerator Equivalence Check 43,000 unique IA-32 instruction schemas Exponential cost of enumeration + =
Methodology m,1 'm,1 Synth I1 T I'1 Synth I . . . . . m-1,1 1,1 m,2 . . . I2 I'2 'm,2 Synth Divide- and- conquer 1,2 m-1,k m,n In Synth 12
Divide-and-Conquer : ESP = ESP 4 EBP = ESP 4 Mem = Mem[ESP 4 EBP] 1: ESP = ESP 4 1: Mem = Mem[ESP 4 EBP] Legal Illegal split split 2: EBP = ESP 4 3: Mem = Mem[ESP 4 EBP] 3: ESP = ESP 4 KILL( 1) USE( 2) = { } KILL( 1) USE( 3) = { } 13
Slave Synthesizer Slave Synthesizer Footprint-Based Search-Space Pruning 1 I Instruction Enumerator CEGIS Search Space (Instruction sequences) 1: Mem = Mem[ESP 4 EBP] 14
Footprint-Based Search-Space Pruning 1: Mem = Mem[ESP 4 EBP] Abstract semantic Footprint Sound abstraction of locations that might be used or modified by a QFBV formula SFP#use SFP#use( 1) = {ESP, EBP} SFP#kill( 1) = {Mem } SFP#kill Instruction Enumerator SFP#use( ) = { } SFP#kill( ) = {EAX } SFP#kill( ) = {EAX , EBX } mov eax, <imm> mov esp, <imm> add esp, <imm> mov eax, <imm> mov ebx, ecx SFP#use( ) = {ECX} push eax mov eax, <imm> : EAX = imm : EAX = imm ^ EBX = ECX Useless Prefix 15
Counter-Example Guided Inductive Synthesis (CEGIS) Check if an instance of a templatized instruction- sequence implements a QFBV formula [1] 1: Mem = Mem[ESP 4 EBP] = [EBP 300] [ESP 100], [], [] ' = [EBP 300] [ESP 100], [], [96 300] Counterexample Instruction Enumerator Works for k tests ? Equivalence 1 : Mem = Mem [ESP 4 EBP] = [EBP 200] [ESP 100], [], [] ' = [EBP 200] [ESP 100], [],[96 200] YES Return C FAIL mov [esp <i1>], <i2> mov [esp <i1>], ebp : Mem = Mem[ESP i1 i2] C : Mem = Mem [ESP 4 200] [ESP 4 EBP] C : Mem = Mem = [EBP 300] [ESP 100], [], [] ' = [EBP 300] [ESP 100], [],[96 300] i1 4 i1 4 i2 200 16 [1] A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, Univ. of Calif., Berkeley, CA, 2008
Outline Motivation and uses Problem statement Related work Synthesis algorithm Divide-and-conquer Footprint-based search-space pruning CEGIS loop Experiments Extensions Future Work 17
Experiments Important instruction-sequences from real-world programs SPECINT 2006 benchmarks Instruction-sequences of length 1 through 10 Top 5 most frequently-occurring 50 instruction-sequences in total Instruction To QFBV I I McSynth No restriction on source of QFBV formula EAX + EBX = EAX + EBX + 4 18
Synthesis Time Synthesis time 10000 1000 Time (seconds) 100 10 1 0.1 0.01 QFBV Formula 10000 Total Time 1000 Time (seconds) 100 10 1 Slave Time 0.1 QFBV Formula McSynth (certain instruction-sequences of length <= 10): ~ mins x86 superoptimizer [1] (length <= 3): ~ hours 19 [1] S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006
Effect of Pruning Divide-and-conquer turned off No. of Candidates Enumerated Synthesis Time 100000 10000 No. of candidates enumerated 1000 10000 Time (seconds) 100 Without Pruning 1000 10 100 1 With Pruning 10 0.1 0.01 1 1_1 1_2 QFBV Formula 1_3 1_4 1_5 1_1 1_2 QFBV Formula 1_3 1_4 1_5 G.M. of ??? ??? ??????? G.M. of ??? ??? ??????? ??? ??????? = 473 ??? ??????? = 273 20
Extensions Scratch registers for synthesis Biased Synthesis Evaluation function Synthesis of code that satisfies properties Mem = Mem[ESP 4 Mem(EBP 4)] EAX + EBX = EAX + EBX + 4 Scratch = {ESI} McSynth McSynth mov esi, [ebp 4]; mov [esp 4], esi lea eax, [eax + 4] 22
Future Work IA-32 partial evaluator Specialize binaries w.r.t. common inputs Extract a feature from a bloated binary IA-32 slicer Extract components from binaries Obfuscate/de-obfuscate instruction sequences Synthesis of non-straight-line, but non-looping code ite (if-then-else) terms CFG skeleton Synthesize code for basic blocks More accurate legality check in divide-and-conquer QFBV formulas with memory accesses and updates Completed In Progress 23
Conclusion ISA-parameterized algorithm for machine-code synthesis McSynth synthesizer for IA-32 New instantiation of CEGIS Footprint-based search-space pruning Divide-and-conquer Orders of magnitude faster than a baseline enumerative synthesizer IA-32 Partial Evaluator and Slicer Less synthesis overhead in practice 24