Compiler Construction Optimization: Redundancy Elimination and Value Numbering

cse p501 compiler construction n.w
1 / 50
Embed
Share

"Learn about the goals of compiler optimization, including eliminating redundancy and value numbering in control flow graphs. Explore optimization scopes, dominators, code improvement strategies, and examples of optimizing array references and layouts. Discover key concepts in compiler construction and code efficiency."

  • Compiler
  • Optimization
  • Control Flow
  • Graphs
  • Code Improvement

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSE P501 Compiler Construction Optimization: Goals Eliminating Redundancy & Value Numbering Control flow graphs (CFGs) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9 Spring 2014 Jim Hogg - UW - CSE - P501 Q-1

  2. Code Improvement Pick a better algorithm(!) Eg: Quicksort rather than BubbleSort Compilers ain't smart enough to choose better algorithms Use machine resources effectively Instruction selection & scheduling Register allocation More about these later Spring 2014 Jim Hogg - UW - CSE - P501 Q-2

  3. Optimization No optimizations are actually optimal Best we can do is to improve things Most (much?) (some?) of the time Realistically: try to do better for common idioms both in the code and on the machine Never to make it worse! Spring 2014 Jim Hogg - UW - CSE - P501 Q-3

  4. Example Optimization Classic example: Array references in a loop for (k = 0; k < n; k++) a[k] = 0; Naive codegen for a[k] = 0 in loop body mov eax, 4 imul eax, [ebp+offsetk] add eax, [ebp+offseta] mov [eax], 0 // elemsize = 4 bytes // k * elemsize // &a[0] + k * elemsize // a[k] = 0 Better! mov eax, [ebp+offseta] // &a[0], once-off mov [eax], 0 add eax, 4 // a[0]=0, a[1]=0, etc // eax = &a[1], &a[2], etc Note: pointers allow a user to do this directly in C or C++ Eg: for (p = a; p < a + n; ) *p++ = 0; Spring 2014 Jim Hogg - UW - CSE - P501 Q-4

  5. Example Optimization: A[row, col] column row-major layout 0,0 1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 0,4 1,4 2,4 3,4 row 0,0 0,1 0,2 0,3 0,4 2,0 2,1 2,2 2,3 2,4 1,0 1,1 1,2 1,3 1,4 int A[row, col] = @A + (row * numcols + col) * 4 // 0-based A[row, col] = @A + (row rowmin) * (colmax colmin + 1) * elemsize + (col colmin) * elemsize Spring 2014 Jim Hogg - UW - CSE - P501 Q-5

  6. Optimize A[,] = 0 for (i = 0; i < ncols; i++) { for (j = 0; j < nrows; j++) { a[i, j] = 0; } } A[i, j] = @A + (i * ncols + j) * 4 Naive codegen for A[i, j] = 0 mov eax, [ebp+offseti] imul eax, [ebp+offsetncols] add eax, [ebp+offsetj] imul eax, 4 lea edx, [ebp+offsetA] add eax, edx mov [eax], 0 Strength Reduction C programmers can do this with pointers Better! lea eax, [ebp+offsetA] // &A[0], once-off mov [eax], 0 add eax, 4 // A[i, j] = 0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-6

  7. Optimization Phase Goal Discover, at compile time, information about the runtime behavior of the program, and use that information to improve the generated code Spring 2014 Jim Hogg - UW - CSE - P501 Q-7

  8. Common Problems in Optimization Strategy is typical of most compiler optimizations Discover opportunities through program analysis Modify the IR to take advantage of the opportunities Historically, goal usually was to decrease execution time Other possibilities: reduce space, power, Spring 2014 Jim Hogg - UW - CSE - P501 Q-8

  9. Issues (1) Safety transformation must not change program meaning Must generate correct results Can t generate spurious errors (ie, report error when there is none) Optimizations must be conservative, and always correct never 'take a risk': 99% != 100% Eg: fetch &o.m and remember m's address if call to m then call remembered m, so long as obj.vtbl == o.vtbl Large part of analysis goes towards proving safety However, it s not black&white. Eg: hoist bounds-check elide exception thru DCE Spring 2014 Jim Hogg - UW - CSE - P501 Q-9

  10. Issues (2) Profitability If a transformation is possible, is it profitable? Example: loop unrolling Can increase efficiency by reducing loop overhead (fewer tests & branches) Can eliminate duplicate operations done on separate iterations But, increases code size Spring 2014 Jim Hogg - UW - CSE - P501 Q-10

  11. Issues (3) Downside risks Even if a transformation is worthwhile, consider its side-effects Eg: Transformation might need more temporaries, increasing register pressure Increased code size could cause cache misses Increased code size could increase paging (a disaster) Spring 2014 Jim Hogg - UW - CSE - P501 Q-11

  12. Eliminating Redundancy An expression x+y is redundant at a program point iff: along every path from the procedure s entry, it has been evaluated and its constituent sub-expressions (x and y) have not been re-defined (ie, assigned new values) since that evaluation If the compiler can prove the expression is redundant Can store the result of the earlier evaluation: t123 = x + y; Can replace the redundant computation with a reference to the earlier (stored) result, as in = t123 ... Spring 2014 Jim Hogg - UW - CSE - P501 Q-12

  13. Value Numbering Technique for eliminating redundant expressions Assign an identifying number VN(n) to each expression VN(x+y) == VN(j) if x+y and j have the same value Use hashing over value numbers for efficiency But hashing is not crucial to the algorithm Old idea (Balke 1968, Ershov 1954) Invented for low-level, linear IRs Equivalent methods exist for tree IRs. Eg: build a DAG Spring 2014 Jim Hogg - UW - CSE - P501 Q-13

  14. Redundancies? Basic Block 1 a = x * y 2 3 4 b = x * y a = 7 c = x * y 5 6 x = 3 d = x * y Are any of these IR instructions "redundant"? In 2, can I re-use a in place of b? In 4, can I re-use a in place of c? In 6, can I re-use a in place of d? Spring 2014 Jim Hogg - UW - CSE - P501 Q-14

  15. Local Value Numbering (LVN) a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 VN of a variable is not the actual value of that variable at runtime! Two variables with same VN are guaranteed to have the same value at runtime Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Spring 2014 Jim Hogg - UW - CSE - P501 Q-15

  16. Transform Code Remove Redundancies VN'd Oops! a3 was overwritten a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 a3 = x1 * y2 b3 = a3 a4 = 74 c3 = a3 x5 = 35 d6 = x5 * y2 Spring 2014 Jim Hogg - UW - CSE - P501 Q-16

  17. Solution - SSA Could remember a3 into a temporary But realize that a4 is really a different variable from a3 Eg: we could store a3 in register R7 and a4 in a different register R2; they are totally unrelated Use subscript to denote different instances of a variable Bump the lifetime subscript each time we assign into that variable This is the essence of SSA form - see later Fixed: a03 still alive a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 Spring 2014 Q-17

  18. Extensions to LVN Constant Folding Note whether each operand is a constant Evaluate any arithmetic of those constants at compile-time Replace instruction with a load-immediate (eg: mov eax, 42) Algebraic Identities x+0 = x, x - x = 0, etc, etc Humans don't write these - but may be created by index calculation, etc a03 = x01 * y02 b03 = a03 a14 = 74 c13 = a03 x15 = 35 d06 = x15 * y02 Commutativity x8 = a5 + b7 y9 = b7 + a5 Recognize a+b = b+a; eg: order by increasing VNs Spring 2014 Q-18

  19. LVN in Detail (1) Basic Block 1 g = x + y 2 3 4 h = u - v i = x + y u = 3 5 6 7 x = u v u = g + h v = i + x 8 w = u + v Are any of these IR instructions "redundant"? In 3 can I re-use already-calculated value of g? What about 5 - can I replace x with a re-use of h from 2? Any others? Spring 2014 Jim Hogg - UW - CSE - P501 Q-19

  20. LVN in Detail (2) Expression ValueNumber 1 2 3 3 x y <1,+,2> g Basic Block u v <4,-,5> h 4 5 6 6 g = x + y h = u - v i = x + y u = 3 1 2 3 3 x y <1,+,2> i x = u v u = g + h v = i + x w = u + v VN of a variable is not the actual value of that variable at runtime! Two variables with same VN are guaranteed to have the same value at runtime Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Spring 2014 Jim Hogg CSE P 501 Q-20

  21. LVN in Detail (3) Expression ValueNumber 6 2 3 3 x y <1,+,2> g u v <4,-,5> h 4 7 5 6 6 g = x + y h = u - v 1 2 3 3 x y <1,+,2> i i = x + y u = 3 x = u v u = g + h 3 7 v = i + x w = u + v Spring 2014 Jim Hogg CSE P 501 Q-21

  22. LVN in Detail (4) Expression Expression ValueNumber ValueNumber 8 <7,-,5> 6 8 2 3 3 x y <1,+,2> g <3,+,6> 9 10 <3,+,8> g = x + y u v <4,-,5> h 4 7 9 5 10 6 6 11 11 <9,+,10> w h = u - v i = x + y u = 3 x = u v 1 2 3 3 x y <1,+,2> i u = g + h v = i + x w = u + v 3 7 Spring 2014 Jim Hogg CSE P 501 Q-22

  23. LVN in Detail: Summary Exp VN Basic Block g h i u v w x y 3 <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 3 4 7 9 5 10 11 1 8 2 7 3 9 10 6 8 11 g = x + y h = u - v i = x + y u = 3 x = u v u = g + h v = i + x w = u + v Spring 2014 Jim Hogg CSE P 501 Q-23

  24. LVN in Detail - Superscript Notation Exp VN Basic Block g3 = x1 + y2 h6 = u4 - v5 i3 = x1 + y2 u7 = 37 x8 = u7 v5 u9 = g3 + h6 v10 = i3 + x8 w11 = u9 + v10 g h i u v w x y <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 3 4 7 9 5 10 11 1 8 2 3 9 10 6 8 11 Spring 2014 Jim Hogg CSE P 501 Q-24

  25. LVN Algorithm foreach instr in BasicBlock, with instr lhs = <rhs1,op,rhs2> lookup rhs1 in EVN-Table, giving vn1 if vn1 < 0 then {vn1 = ++V; insert <rhs1,vn1> into EVN-Table} lookup rhs2 in EVN-Table, giving vn2 if vn2 < 0 then {vn2 = ++V; insert <rhs2,vn2> into EVN-Table} lookup <vn1,op,vn2> in EVN-Table, giving vn if vn < 0 then insert <<vn1,op,vn2>,vn> into EVN-Table else update EVN-Table to hold <<vn1,op,vn2>,vn> next V is a sequence counter, starting at 1 BasicBlock EVN-Table g h ... <1,+,2> <3,+,6> ... 3 6 ... 3 9 ... g = x + y h = u - v . . . For efficiency, use hash table for EVN-Table Spring 2014 Jim Hogg CSE P 501 Q-25

  26. Basic Block A basic block is a maximal length sequence of straight-line code A branch-free sequence of instructions Properties Statements are executed sequentially If any statement executes, they all do (barring exceptions) In linear IR, first instruction of a basic block is called a leader Procedure entry; jump targets; instruction following jump or call Implementation Array of quads (d = x op y) Linked list of quads Spring 2014 Jim Hogg - UW - CSE - P501 Q-26

  27. Reminder: Flowgraph B0 B2 A Flowgraph is: Directed graph of Nodes and Edges B3 B1 B4 Node = Basic Block Edge = any possible flow between nodes B5 B6 Spring 2014 Jim Hogg - UW - CSE - P501 Q-27

  28. ENTRY Real Example Flowgraph B1 i = 1 j = 1 B2 Control Flow Graph ("CFG", again!) t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0 j = j + 1 if j <= 10 goto B3 B3 3 loops total 2 of the loops are nested B4 i = i + 1 if i <= 10 goto B2 B5 i = 1 B6 t5 = i - 1 t6 = 88 * t5 a[t6] = 1 i = i + 1 if i <= 10 goto B6 Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization EXIT Jim Hogg - UW - CSE - P501 Q-28

  29. EBBs: Formal Definition A set of blocks b1, b2, , bn where b1 has multiple predecessors and each of the remaining blocks bi (2 i n) have only bi-1 as its unique predecessor The EBB is entered only at b1, but may have multiple exits A single block bi can be the head of multiple EBBs (these EBBs form a tree rooted at bi) Spring 2014 Jim Hogg - UW - CSE - P501 Q-29

  30. Example Extended Basic Blocks (EBBs) B0 Intuition: Maximal straight-line sequence of Blocks, with no join points B2 EBBs cannot represent loops! B3 B1 B4 { B0, B1 } { B0, B2, B3 } { B0, B2, B4 } { B5 } { B6 } B5 B6 Spring 2014 Jim Hogg - UW - CSE - P501 Q-30

  31. Optimization Scopes Local Within a Basic Block Regional Bigger than block, smaller than function Eg: loop or Extended Basic Block (EBB) Global Whole function Whole Program Spans two or more functions Maybe large part of a program (excludes libraries) BasicBlock < Region < Function < Program Spring 2014 Jim Hogg - UW - CSE - P501 Q-31

  32. Local Optimization Local optimizations within a Basic Block Algebraic simplifications (eg: 2*x = x+x; x+0 = x) Constant folding (eg: FeetPerMile/MinutesPerHour = 88) Common sub-expression elimination (CSE) Dead code elimination (DCE) (unreachable or unused) Specialize computation based on context Etc Spring 2014 Jim Hogg - UW - CSE - P501 Q-32

  33. Regional Optimization Bigger than a Basic-Block; less than whole function Eg: Loops Hoist invariant code out of loop Unroll-and-jam (unroll outer loop; merge inners) Loop Interchange, for better caching Array-index strength reduction EBB Eg: Extended-Basic-Blocks (EBBs) - no-join paths Super-Local Value Numbering (SVN) Etc Spring 2014 Jim Hogg - UW - CSE - P501 Q-33

  34. Global Optimization Misnomer - really means whole-function Global common sub-expression elimination (CSE) Register allocation Choose which variables, for how long, are enregistered Choose which registers - not so important with 'orthogonal' ISAs Result memoization / caching Remember last 50, say, results calculated; re-use where possible Eg: can speed a naive Fibonacci calculation 10x or more Etc Spring 2014 Jim Hogg - UW - CSE - P501 Q-34

  35. Whole-Program Optimization (WPO) a.k.a Inter-Procedural Optimization (IPO) Spans all, or much, of a program; certainly >1 function At-odds with "separate compilation" Inlining (assisted by profile 'training') Hot/Cold code separation Whole-Program DCE, or "tree shaking" No program uses an entire library! Remove unused classes and/or methods Can make drastic reductions in program size Spring 2014 Jim Hogg - UW - CSE - P501 Q-35

  36. Super-Local Value Numbering (SVN) A m = a + b n = a + b B C p = c + d r = c + d q = a + b r = c + d Local Value Numbering 1 block at a time Strong local results No cross-block effects LVN finds these redundancies LVN misses these D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d Spring 2014 Jim Hogg - UW - CSE - P501 Q-36

  37. Value Numbering over EBBs Apply LVN to EBBs: {A,B}, {A,C,D}, {A,C,E} A m = a + b n = a + b Final info from A is initial info for B, C Final info from C is initial for D, E B C p = c + d r = c + d q = a + b r = c + d Avoid re-analyzing A, C Doesn t help with F, G D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F LVN SVN Missed v = a + b w = c + d x = e + f G y = a + b z = c + d Spring 2014 Jim Hogg - UW - CSE - P501 Q-37

  38. SSA Name Space (from before) a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 Unique name for each definition Subscript denotes variable's SSA number The variable's "generation" number c03 = a03 now works, since a1 = 7 happened to a 'different' variable Spring 2014 Q-38

  39. SSA Name Space Two Principles Each name is defined (assigned-to) by exactly one instruction Each operand refers to exactly one definition Need to deal with merge points Add functions at merge points to reconcile names Use subscripts on variable names for uniqueness Spring 2014 Jim Hogg - UW - CSE - P501 Q-39

  40. Functions a3 a6 A B a7 = (a3, a6) C Result of a7 = (a3, a6) is either a3 or a6 depending on whether we reached node C from node A or from node B Not really a mathematical function at all Spring 2014 Jim Hogg - UW - CSE - P501 Q-40

  41. SVN with Bells & Whistles A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Insert functions at join nodes F, G Little extra cost Still does nothing for F and G F e2 = (e0,e1) u2 = (u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = (r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-41

  42. Larger Scopes A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 Still have not helped F, G Problem: multiple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = (e0,e1) u2 = (u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = (r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-42

  43. Dominators Definition x dominates y iff every path from the entry of the flowgraph to y includes x By definition, x dominates x Associate a Dom set with each node | Dom(x) | 1 Many uses in analysis and transformation Finding loops, building SSA form, code motion Spring 2014 Jim Hogg - UW - CSE - P501 Q-43

  44. Dominator Sets A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 Dom(n) n A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = (e0,e1) u2 = (u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = (r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-44

  45. Dominator Tree & IDom Dom(n) IDom(n) n A A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} - A A C C C A B C G D E F Closest dominator is called the Immediate Dominator Denoted IDom(n) Spring 2014 Jim Hogg - UW - CSE - P501 Q-45

  46. Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Still looking for a way to handle F, G Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VN Technique (DVNT) F e2 = (e0,e1) u2 = (u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = (r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-46

  47. DVN algorithm Use SVN on EBBs Use scoped hash tables & SSA namespace as before Start each node with EVN-Table from its IDOM No values flow along back edges (i.e., loops) Constant folding, algebraic identities as before Spring 2014 Jim Hogg - UW - CSE - P501 Q-47

  48. Dominator Value Numbering Advantages Finds more redundancies Little extra cost Shortcomings Misses some opportunities (common calculations in ancestors that are not IDOMs) Doesn t handle loops or other back edges A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = (e0,e1) u2 = (u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 LVN SVN DVN Missed G r2 = (r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501 Q-48

  49. The Story So Far Local algorithm: LVN Superlocal extension: SVN Some local methods extend cleanly to superlocal scopes Dominator VN Technique (DVN) All of these propagate along forward edges None are global (can't cope with loops) Spring 2014 Jim Hogg - UW - CSE - P501 Q-49

  50. Next Dataflow analysis Global solution to redundant expression analysis Transformations Catalog of transforms a compiler can do with dataflow analysis Spring 2014 Jim Hogg - UW - CSE - P501 Q-50

Related


More Related Content