Mapping Source Code to x86 for Compiler Construction
This content covers mapping source code to x86 for compiler construction in the context of MiniJava, focusing on variables, basic statements, expressions, and code generation techniques. It discusses translating language constructs, handling variables in stack frames and objects, assembly code translation templates, conventions in examples, and code generation strategies including dealing with constants and assignment statements.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSE P501 Compiler Construction Mapping source code to x86 Basic statements and expressions Objects, method calls, and dynamic dispatch Spring 2014 Jim Hogg - UW - CSE - P501 K-1
Review: Variables For MiniJava, data is held in either A stack frame - for variables local to the method An object - for fields Local variables accessed via ebp ('above' ebp) eg: mov eax, [ebp - 12] Fields accessed via object address in a register Instead of ebp pointing to start of frame, use ecx pointing to start of object Details later Spring 2014 Jim Hogg - UW - CSE - P501 K-2
Basic Statements and Expressions How to translate language constructs from MiniJava into assembly code? X = Y, X op Y, IF-THEN, IF-THEN-ELSE, WHILE, etc If you know assembler code for any chip, the translation templates should be obvious If you don't know assembler code, imagine translating MiniJava constructs into a primitive language whose only control flow is GOTO Spring 2014 Jim Hogg - UW - CSE - P501 K-3
Conventions in Examples Examples show code snippets in isolation Real code generator needs to deal with things like: Which registers are in use at which point in the program Which registers to spill into memory (pushed onto stack or stored in stack frame) when a new register is needed and no free ones are available Register eax used below as a generic example Rename as needed for more complex code Examples include a few peephole optimizations Spring 2014 Jim Hogg - UW - CSE - P501 K-4
Code Generation for Constants Source 17 Idea Realize constant value in a register x86 mov eax, 17 Peephole Optimization if constant is 0, then: xor eax, eax Spring 2014 Jim Hogg - UW - CSE - P501 K-5
Assignment Statement Source var = exp; // var is a local variable x86 <evaluate exp into, say, eax> mov [ebp-offsetvar], eax Spring 2014 Jim Hogg - UW - CSE - P501 K-6
Unary Minus Source -exp x86 <evaluate exp into eax> neg eax Optimization Collapse -(-exp) to exp Unary plus is a no-op Spring 2014 Jim Hogg - UW - CSE - P501 K-7
Binary + Source exp1 + exp2 x86 <evaluate exp1 into eax> <evaluate exp2 into edx> add eax, edx Optimizations If exp2 is a simple variable or constant: add eax, exp2 Change exp1 + (-exp2) into exp1 - exp2 If exp2 is 1 then: inc eax Spring 2014 Jim Hogg - UW - CSE - P501 K-8
Binary -, * Same as + Use sub for (but not commutative!) Use imul for * Optimizations Use left shift to multiply by powers of 2 If your multiplier is really slow or you ve got free scalar units and multiplier is busy, consider: 10*x = (8*x)+(2*x) But this may perform worse on a different micro-architecture Use x+x instead of 2*x, etc (faster) Use dec for x-1 Spring 2014 Jim Hogg - UW - CSE - P501 K-9
Integer Division Slow on x86 Divides 64-bit integer in edx:eax by a 32-bit integer (eg, in ebx) ~50 clock latency Source exp1 / exp2 x86 <evaluate exp1 into eax> <evaluate exp2 into ebx> cdq idiv ebx ; extend eax into edx:eax, clobbers edx ; => quotient in eax; remainder in edx Spring 2014 Jim Hogg - UW - CSE - P501 K-10
Control Flow Basic idea: decompose higher level operation into jmps and conditional jmps In following slides, jfalse is used to mean jump when a condition is false No such instruction on x86! Will have to use appropriate instructions to set condition codes, followed by conditional jumps Normally wouldn t actually generate the value true or false in a register Spring 2014 Jim Hogg - UW - CSE - P501 K-11
While Source while (cond) stm x86 loop: done: <evaluate cond> jfalse done <code for stm> jmp loop Note In generated asm code we ll need to create a unique label name for each loop, conditional statement, etc. eg: L123: Spring 2014 Jim Hogg - UW - CSE - P501 K-12
Optimization for While Put the test at the end loop: test: jmp test <code for stm> <evaluate cond> jtrue loop Why bother? Pulls one jmp instruction out of the loop May avoid pipeline stall on jmp on each iteration But many chips assume backwards-branch is always taken Hardware branch-predictor will 'tune in' to the pattern Easy to do from AST or other IR; not so easy if generating code on- the-fly (eg, recursive-descent, 1-pass compiler) Spring 2014 Jim Hogg - UW - CSE - P501 K-13
Do-While Source do stm while (cond); x86 loop: <code for stm> <evaluate cond> jtrue loop Spring 2014 Jim Hogg - UW - CSE - P501 K-14
IF-THEN Source if (cond) stm x86 <evaluate cond> jfalse skip <code for stm> skip: Spring 2014 Jim Hogg - UW - CSE - P501 K-15
IF-THEN-ELSE Source if (cond) stm1 else stm2 x86 else: done: <evaluate cond> jfalse else <code for stm1> jmp done <code for stm2> Spring 2014 Jim Hogg - UW - CSE - P501 K-16
Jump Chaining Observation: na ve code gen can will produce jumps to jumps Optimization: if a jump targets an unconditional jump, change target of first jump to the target of the second Repeat until no further changes Spring 2014 Jim Hogg - UW - CSE - P501 K-17
IF-ELSEIF-ELSE x86 <evaluate cond1> jfalse elseif <code for stm1> jmp done elseif: <evaluate cond2> jfalse else <code for stm2> jmp done else: <code for stm3> jmp done done: Source if (cond1) stm1 else if (cond2) stm2 else stm3 Note: jump-to-next redundancy Spring 2014 Jim Hogg - UW - CSE - P501 K-18
Boolean Expressions What do we do with this? x > y Expression evaluates to true or false Could generate the value in a register (eg: 0|1) But normally we don t want/need the value; we re only trying to decide whether to jump Eg: if (exp1 > exp2) stm <evaluate exp1 to eax> <evaluate exp2 to edx> cmp eax, edx jle L123 <code for stm> L123: Spring 2014 Jim Hogg - UW - CSE - P501 K-19
Boolean Operators: ! Source ! exp eg: if (!exp) stm To compile: <evaluate exp into eax as 0 or 1> jnz L123, where L123 is label after stm Spring 2014 Jim Hogg - UW - CSE - P501 K-20
Boolean Operators: && and || In C/C++/Java/C#, these are short-circuit operators Right operand is evaluated only if needed if (cond1 || cond2) if cond1 evaluates to true, then no need to evaluate cond2 if (cond1 && cond2) if cond1 evaluates to false, then no need to evaluate cond2 if (p != null && p.x == 2) . . . this check relies upon short-circuit evaluation! Basically, generate if statements that jump appropriately and only evaluate operands when needed Spring 2014 Jim Hogg - UW - CSE - P501 K-21
Example: Code for && Source if (exp1 && exp2) stm x86 skip: <evaluate exp1> jfalse skip <evaluate exp2> jfalse skip <code for stm> Spring 2014 Jim Hogg - UW - CSE - P501 K-22
Example: Code for || Source if (exp1 || exp2) stm x86 doit: skip: <evaluate exp1> jtrue doit <evaluate exp2> jfalse skip <code for stm> Spring 2014 Jim Hogg - UW - CSE - P501 K-23
Realizing Boolean Values If a boolean value needs to be stored in a variable or method call parameter, generate code to create result Typical representations: 0 for false, +1 or -1 for true C specifies 0 and 1 if stored; we ll use that Best choice can depend on machine instructions; normally some convention is established during the primeval history of the architecture (eg: VAX blbc) Spring 2014 Jim Hogg - UW - CSE - P501 K-24
Boolean Values: Example Source var = bexp ; x86 genFalse: storeIt: <code for bexp> jfalse genFalse mov eax, 1 jmp storeIt mov eax, 0 mov [ebp+offsetvar], eax Spring 2014 Jim Hogg - UW - CSE - P501 K-25
Better, If Enough Registers Source var = bexp ; x86 storeIt: xor eax, eax <code for bexp> jfalse storeIt inc eax mov [ebp+offsetvar], eax Or use conditional move instruction; avoids pipeline stalls mov eax, 1 ; assume bexp true <code for bexp> cmovz eax, 0 mov [ebp+offsetvar], eax Spring 2014 Jim Hogg - UW - CSE - P501 K-26
Alternatively: setcc Source var = x < y; x86 mov cmp setl movzx mov eax, [ebp+offsetx] eax, [ebp+offsety] al eax,al [ebp+offsetvar],eax ; load x ; compare to y ; set AL to 0|1 if less than ; zero-extend to 32 bits ; generated by asg stm Gnu mnemonic for movzx (byte->dbl word) is movzbl Or use conditional move (cmovcc) instruction for sequences like: x = y < z ? y : z Spring 2014 Jim Hogg - UW - CSE - P501 K-27
Other Control Flow: switch Na ve: generate a chain of nested if-then-else s Better: switch is intended to allow an O(1) selection, provided the set of switch values is reasonably compact Idea: create a 1-D array of jumps or labels and use the switch expression to select the right one Also need to generate an IF check that expression lies within bounds Spring 2014 Jim Hogg - UW - CSE - P501 K-28
Switch X86-ish <evaluate exp into eax> if (eax < 0 || eax > 2) jmp done mov eax, swtab[eax * 4] jmp eax L0: <code for stms0> jmp done L1: <code for stms1> jmp done L2: <code for stms2> jmp done done: Source switch (exp) { case 0: stms0; case 1: stms1; case 2: stms2; } .data swtab dd L0 dd L1 dd L ; switch table Assume no "fallthru" Spring 2014 Jim Hogg - UW - CSE - P501 K-29
Arrays C/C++/Java have 1-dimensional arrays 0-origin array with n elements contains variables: a[0] a[n-1] C/C++ n-dimensional arrays Multiple dimensions; row major order Java n-dimensional "jagged" arrays Spring 2014 Jim Hogg - UW - CSE - P501 K-30
0-Origin 1-D Integer Arrays Source exp1[exp2] x86 <evaluate exp1 into eax> <evaluate exp2 into edx> address is [eax+4*edx] ; 4 bytes per element contiguous cells in memory 0 1 2 3 4 Spring 2014 Jim Hogg - UW - CSE - P501 K-31
int[][] array = {{3, 4, 5}, {77, 50}}; 2-D Arrays C, etc (the curly-brace languages) use row-major order 0,0 0,1 0,2 0,3 0,4 1,0 1,1 1,2 1,3 1,4 ... Fortran (and Excel) uses column-major order Exercises: What is the layout? How do you calculate location of a[i, j]? What happens when you pass array references between Fortran and C code? Java has "jagged" arrays - array of arrays Java jagged arrays int[][] a = { {1,5,-4,23}, {1,2,6,3}, {99} } 1 5 -4 23 1 2 6 3 99 Spring 2014 Jim Hogg - UW - CSE - P501 K-32
How to find A[row, col] column row-major layout 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 0,4 1,4 2,4 3,4 row 0,0 0,1 0,2 0,3 0,4 2,0 2,1 2,2 2,3 2,4 1,0 1,1 1,2 1,3 1,4 int A[row, col] = @A + (row * numcols + col) * 4 // 0-based A[row, col] = @A + (row rowmin) * (colmax colmin + 1) * elemsize + (col colmin) * elemsize Spring 2014 Jim Hogg - UW - CSE - P501 K-33
Next Code Generation for Objects Representation Method calls Inheritance and overriding Strategies for implementing code generators Code improvement optimization Spring 2014 Jim Hogg - UW - CSE - P501 K-34