Instruction Prefetching and Cache Optimization
Instruction prefetching is a crucial technique in computer architecture to enhance performance by predicting and fetching memory addresses before they are needed. By leveraging patterns in cache access sequences and exploiting spatial and temporal locality in programs, prefetching can reduce the impact of cache misses and improve system efficiency. This article delves into various prefetching strategies, such as next-line prefetching and Markov prefetching, to optimize cache performance and minimize instruction fetch delays.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Implementing Subprograms What actions must take place when subprograms are called and when they terminate? calling a subprogram has several associated actions: calling subprogram s environment must be saved subprogram s local variables, execution status, return location handle parameter passing to called subprogram allocation and storage of called subprogram s local variables transfer control to subprogram subprogram termination then has the following actions: return parameters that are out-mode deallocation of local variables return value from subprogram (if it is a function) restore calling subprogram s environment transfer control to calling subprogram at the proper place Generically, this is known as subprogram linkage and in most languages, this is performed through the run-time stack by using activation record instances
FORTRAN I-77 We first examine FORTRAN first since it is simplest early FORTRAN had no recursion, all memory allocation was set up at compile time so all variables /parameters memory locations are known at compile time subprogram call save execution status of current program unit carry out parameter passing pass by copy required copying parameters pass by reference required substituting an address for the value pass return address to called subprogram start execution of the subprogram subprogram return if pass-by-result parameters, copy current values to corresponding actual parameters and if subprogram is a function, value is moved to a place accessible by the caller execution status of caller is restored control is transferred back to caller
Activation Records FORTRAN compiler generates ARs for each subprogram which stores local variables, parameters return address functional value (for functions) With each AR generated, the compiler is able to determine the amount of storage space needed for each subprogram parameter passing is a matter of copying values from one area of memory to another transferring control is merely a matter of changing the PC based on subprogram start addresses or calling subprogram return addresses memory access is efficient because all addresses are known at compile time
ALGOL-like Activation Records ALGOL used the run-time stack for ARs permits recursion unlike in FORTRAN where there was only a single AR for each subprogram the run-time stack was used in FORTRAN only to denote where to return to after subprograms terminate The ALGOL approach was to have the compiler generate an AR template for every subprogram upon a subprogram call, an instance of the template is generated by the run-time environment, pushed onto the run- time stack, and the stack pointer adjusted to point at the new top of stack an activation record instance (ARI) will contain local variables parameters return location return value (if a function) links to connect to rest of run-time stack
ALGOL ARI Static Link points to bottom of ARI for static parent used for non-local variable access Dynamic Link points to top of ARI of caller used to destroy current ARI end of subprogram Parameters Store space for every parameter (whether a value or a pointer) Local variables store space for each local variable stack-dynamic storage allocated at run- time but compile-time type checked recursion available by creating an instance for each recursion call Local variables Parameters Dynamic Link Static Link Return Address NOTE: In C-languages, the static link will point to main s data since there are no other static parents
Example of ARI for C function void sub(float total, int part) { int list[4]; float sum; } 2 parameters (a float and an int) 2 variables, a 4-element int array and a float Return address (where in the code to return to when sub terminates) Dynamic link points to next ARI on stack so that this ARI can be popped off Static link points to static parent (main) for non-local variable references void function so no space needed for return value
Example Without Recursion void A(int x) { int y; ... C(y); ... } void B(float r) { int s, t; ... A(s); ... } void C(int q) { ... } void main() { float p; ... B(p); ... } Stack after: main calls B B calls A A calls C
Example With Recursion (part I) int factorial(int n) { if(n<=1) return 1; else return n*factorial(n-1); } Point 1 Point 2 void main( ) { int value; value = factorial(3); } Point 3 Stack contents at point 1 during each recursive call
Example With Recursion (part II) Stack contents at point 2 as each recursive call completes Stack contents at point 3
Non Local Variable References Assume in some subprogram, a reference is made to a non-local variable how do we determine what is being referenced? non-local variables will be stored either in static memory (if the variable is global or declared static) or on the run-time stack if on the run-time stack, which ARI do we check? a top-down search through the ARIs would be inefficient, the compiler can determine where the variable is stored using scope rules, and set up a pointer directly in C/C++/Java, subprograms are not nested so non-local references would be global variables stored in static memory in non-C languages (notably Ada/Pascal-like languages), subprograms can be nested and the nestedness of the subprograms provides the information needed to find the non- local variable Two methods: Static Chains, Displays
Static Chains The compiler can determine for any given subprogram which subprogram is it s static parent the static link in the ARI points to this static parent The compiler can then determine how many static links must be taken to track down a given reference for instance, assume Main contains SubA so SubA has a static link to Main SubA contains SubB so SubB has a static link to SubA assume Main has declared x and neither SubA nor SubB has declared x if SubB references x, then x is found by following 2 static links (from SubB to SubA and from SubA to Main) to reach Main s ARI A static chain is the information needed to resolve the reference and consists of: chain offset the number of static links to follow (determined by static scope and nestedness) local offset the position in the subprogram s ARI of this variable (starting from the bottom of this ARI)
The stack at position 1 Ada Example program Main_2; var X : integer; procedure Bigsub; var A, B, C : integer; procedure Sub1; var A, D : integer; begin { Sub1 } A := B + C; <----------1 end; { Sub1 } procedure Sub2(X : integer); var B, E : integer; procedure Sub3; var C, E : integer; begin { Sub3 } Sub1; E := B + A: <--------2 end; { Sub3 } begin { Sub2 } Sub3; A := D + E; <----------3 end; { Sub2 } begin { Bigsub } SUB2(7); end; { Bigsub } begin Bigsub; end. { Main_2 } Static chains: Position 1: A = (0, 3) B = (1, 4) C = (1, 5) Position 2: E = (0, 4) B = (1, 4) A = (2, 3) Position 3: A = (1, 3) D = error E = (0, 5) NOTE: Main_2 calls Bigsub which calls Sub2 which calls Sub3 which calls Sub1
Blocks and Efficiency Blocks can have their own local variables a good compiler can optimize the AR based on the scope of the local variables declared in blocks in the example code below, a/b and g/f are used in different blocks and so can share the same stack space int main() { int x, y, z; while() { while() { } int c, d, e; while() { int a, b; } int g, f; } }
Displays Static chains are easy to generate at compile-time but they are inefficient at run-time because the number of static links that might need to be followed to access a variable is strictly based on the degree of nestedness of the subprogram this could be any arbitrary amount in a language like Ada An alternative approach is to use displays: collect all static links into an array at any time, the contents of this array is the addresses of accessible ARIs on the stack a display offset value is used to link to the correct ARI and then a local offset is used to find the location of the variable every subprogram call and return requires modification of the display to reflect the new static scope situation this approach is also costly at run-time but only requires modification when a subprogram is called or terminates non-local references can be performed by following only one link
Display Example Using our previous code, we see how the Display and run-time stack change during the execution of the program Main_2 calls Bigsub calls Sub2 Sub2 calls Sub1 hides Sub2 Return to Sub2, Sub2 calls Sub3 From point 1, A = B + C A (0, 3) B (2, 3) (down 2 in Display) C (1, 3) (down 1 in Display) Now, assume that Sub1 contains a nested subprogram, Sub4 where Main_2 calls Bigsub calls Bus2 callsSub3 calls Sub1 calls Sub4 Before Sub1 calls Sub4 and after Sub1 calls Sub4 Dotted line means pointer currently inactive (unavailable)
Implementing Dynamic Scoping Reference to non-local variables is determined based on the order of subprogram calls, not their spatial relationship as in static scoping two implementation methods Deep Access similar to static chains except dynamic links are followed there are no static links, and the distance traversed cannot be determined at compile time Shallow Access in dynamic scoping, if variables share the same name, only the most recently declared one is currently active shallow access uses a separate stack, one for each variable name where the given variable stack is modified after each function call/return and access to the variable is always from the top of its run-time stack don t confuse deep/shallow access and deep/shallow binding
Dynamic Scope Example void sub3( ) { int x, z; x = u + v; } Assume main calls sub2 which calls sub1 which calls sub2 which calls sub3 The run-time stack using deep access is shown to the right whereas the shallow access is shown below at the point where sub3 is active void sub2( ) { int w, x; } We see that v is from sub1 (the most recent sub1) and w is from sub2 void sub1( ) { int v, w; } void main( ) { int v, u; }