Memory Hierarchy

memory hierarchy n.w
1 / 76
Embed
Share

Explore the concept of memory hierarchy in computer systems, including RAM types and CPU-memory gaps. Learn how memory organization enhances system performance and efficiency through levels of cache memory.

  • Memory Hierarchy
  • RAM
  • CPU
  • Cache Memory
  • Computer Systems

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Memory hierarchy

  2. Memory Operating system and CPU memory management unit gives each process the illusion of a uniform, dedicated memory space i.e. 0x0 0xFFFFFFFFFFFFFFFF for x86-64 Allows easy multitasking Hides underlying memory organization 2

  3. Random-Access Memory (RAM) Non-persistent program code and data stored in two types of memory Static RAM (SRAM) Each cell stores bit with a six-transistor circuit. Retains value as long as it is kept powered. Faster and more expensive than DRAM. Dynamic RAM (DRAM) Each cell stores bit with a capacitor and transistor. Value must be refreshed every 10-100 ms. Slower and cheaper than SRAM. 3

  4. CPU-Memory gap DRAM Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 access (ns) typical size (MB) 0.256 200 100 4 70 16 60 64 50 2,000 40 8,000 20 16.000 10 62,500 SRAM Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 access (ns) 150 35 15 3 2 1.5 1.3 115 CPU Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 CPU Clock rate (MHz) 6 Cycle time (ns) Cores Effective cycle time (ns) 80286 80386 20 50 1 50 Pentium P-4 150 6 1 6 Core 2 2,000 0.50 2 0.25 Core i7(n)Core i7(h) 2,500 3,000 0.4 0.33 4 4 0.10 0.08 Disparate improvement (Memory bottleneck) 3,300 0.30 1 0.30 500 500 4 2,075 166 1 166 Similar cycle times (Memory not a bottleneck) Inflection point in computer history when designers hit the Power Wall (n) Nehalem processor (h) Haswell processor 4

  5. Addressing CPU-Memory Gap Observation Fast memory can better keep up with CPU speeds Cost more per byte and thus will have less capacity Use smaller, faster, SRAM-based memories closer to CPU Leads to an approach for organizing memory and storage systems in a memory hierarchy. For each level of the hierarchy k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Faster memory holds more frequently accessed blocks of main memory Managed automatically in hardware. 5

  6. Memory hierarchy example CPU registers hold words retrieved from L1 cache. Registers L0 Smaller Faster L1 cache holds cache lines retrieved from the L2 cache memory. Level 1 Cache (per-core split) More Expensive L1 L2 Level 2 Cache (per-core unified) L2 cache holds cache lines retrieved from L3 Larger Slower Cheaper L3 Level 3 Cache (shared) L4 Main Memory (off chip) L5 Local Secondary Storage (HD,Flash) Remote Secondary Storage 6

  7. Hierarchy requires Locality Key to make this work is for programs to exhibit locality Data needed in the near future kept in fast memory near CPU Programs designed to reuse data and instructions near those they have used recently, or that were recently referenced themselves. Kinds of Locality: Temporal locality: Recently referenced items are likely to be referenced in the near future. Spatial locality: Items with nearby addresses tend to be referenced close together in time. 7

  8. Locality in code sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Give an example of spatial locality for data access Reference array elements in succession (stride-1 pattern) Give an example of temporal locality for data access Reference sum each iteration Give an example of spatial locality for instruction access Reference instructions in sequence Give an example of temporal locality for instruction access Cycle through loop repeatedly 8

  9. Locality example Which function has better locality? int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum } int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum } 9

  10. Practice problem 6.7 Permute the loops so that the function scans the 3-d array a[] with a stride-1 reference pattern (and thus has good spatial locality)? int sumarray3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (k = 0; k < M; k++) sum += a[k][i][j]; return sum } 10

  11. Practice problem 6.7 Permute the loops so that the function scans the 3-d array a[] with a stride-1 reference pattern (and thus has good spatial locality)? int sumarray3d(int a[M][N][N]) { int i, j, k, sum = 0; for (k = 0; k < M; k++) for (i = 0; i < N; i++) for (j = 0; j < N; j++) sum += a[k][i][j]; return sum } 11

  12. Practice problem 6.8 Rank-order the functions with respect to the spatial locality enjoyed by each void clear1(point *p, int n) { int i,j; for (i=0; i<n ; i++) { for (j=0;j<3;j++) p[i].vel[j]=0; for (j=0; j<3 ; j++) p[i].acc[j]=0; } Best #define N 1000 typedef struct { int vel[3]; int acc[3]; } point; point p[N]; void clear2(point *p, int n) { int i,j; for (i=0; i<n ; i++) { for (j=0; j<3 ; j++) { p[i].vel[j]=0; p[i].acc[j]=0; } } void clear3(point *p, int n) { int i,j; for (j=0; j<3 ; j++) { for (i=0; i<n ; i++) p[i].vel[j]=0; for (i=0; i<n ; i++) p[i].acc[j]=0; } Worst 12

  13. General Cache Operation Direct-mapped Smaller, faster, more expensive memory caches a subset of the blocks Cache 8 4 9 14 10 3 Data is copied in block-sized transfer units 4 10 Larger, slower, cheaper memory viewed as partitioned into blocks Memory 0 1 2 3 4 4 5 6 7 8 9 10 10 11 12 13 14 15 13

  14. Direct mapped caches Simplest of cache to implement One block or cache line per set Each location in memory maps to a single location in cache E.g. Block i at level k+1 placed in block (i mod 4) at level k. Cache (k) 8 4 9 14 10 3 4 10 Memory (k+1) 0 1 2 3 4 4 5 6 7 8 9 10 10 11 12 13 14 15 14

  15. Simple Direct Mapped Cache Implemented by using least-significant bits to index set Assume: cache block size 1 byte Address of byte: v tag t bits 0 01 v tag find set S = 2s sets v tag tag v 15

  16. Simple Direct Mapped Cache After set found, examine block to see if valid Address of byte: valid? + match: assume yes = hit t bits 0 01 v tag tag If tag doesn t match: old line is evicted and replaced 16

  17. Simple direct-mapped cache example (S=4) Cache Tag Main memory Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 01 10 11 M = size of memory space = 16 m = number of bits in address = log2(16) = 4 S = number of sets = 4 s = log2(S) = log2(4) = 2 log2(# of cache lines) Index is 2 LSB of address Tag is rest of address = m s = 2 bits 17 t (2 bits) s (2 bits)

  18. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 01 10 11 Access pattern: 0000 0011 1000 0011 1000 18 t (2 bits) s (2 bits)

  19. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 00 0x00 01 10 11 Access pattern: 0000 0011 1000 0011 1000 Cache miss 19 t (2 bits) s (2 bits)

  20. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 00 0x00 01 10 11 00 0x33 Access pattern: 0000 0011 1000 0011 1000 Cache miss 20 t (2 bits) s (2 bits)

  21. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 00 0x00 01 10 11 00 0x33 Access pattern: 0000 0011 1000 0011 1000 Cache miss (Tag mismatch) 21 t (2 bits) s (2 bits)

  22. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 10 0x88 01 10 11 00 0x33 Access pattern: 0000 0011 1000 0011 1000 Cache miss (Tag mismatch) 22 t (2 bits) s (2 bits)

  23. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 10 0x88 01 10 11 00 0x33 Access pattern: 0000 0011 1000 0011 1000 Cache hit 23 t (2 bits) s (2 bits)

  24. Simple direct-mapped cache example (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 10 0x88 01 10 11 00 0x33 Access pattern: 0000 0011 1000 0011 1000 Cache hit Miss rate = 3/5 = 60% 24 t (2 bits) s (2 bits)

  25. Identify issue #1: (S=4) Main memory Cache Tag Index Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 00 01 10 11 Access pattern: 0000 0001 0010 0011 0100 0101 Miss rate = ? 25 t (2 bits) s (2 bits)

  26. Block size and spatial locality Larger block sizes can take advantage of spatial locality by loading data from not just one address, but also nearby addresses, into the cache. Increase hit rate for sequential access Sequential bytes in memory are stored together in a block (or cache line) Use least significant bits of address as block offset B = number of bytes in block Number of bits = log2(# of bytes in block) Now, set index uses the next least significant bits 26

  27. Example: Direct Mapped Cache (B=8) Direct mapped: One line per set Cache block size 8 bytes Address of byte: v tag 0 1 2 3 4 5 6 7 100 t bits 0 01 v tag 0 1 2 3 4 5 6 7 find set S = 2s sets v tag 0 1 2 3 4 5 6 7 tag v 0 1 2 3 4 5 6 7 27

  28. Example: Direct Mapped Cache (B=8) Direct mapped: One line per set Cache block size 8 bytes Address of byte: valid? + match: assume yes = hit 100 t bits 0 01 v tag tag 0 1 2 3 4 5 6 7 block offset 28

  29. Example: Direct Mapped Cache (B=8) Direct mapped: One line per set Cache block size 8 bytes Address of byte: valid? + match: assume yes = hit 100 t bits 0 01 v tag 0 1 2 3 4 5 6 7 block offset Byte and next 3 bytes loaded for an int access If tag doesn t match: old line is evicted and replaced 29

  30. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 1 Access pattern: 0000 0001 0010 0011 0100 0101 30 t (2 bits) s (1 bit) b (1 bit)

  31. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 00 0x11 0x00 1 Access pattern: 0000 0001 0010 0011 0100 0101 Cache miss 31 t (2 bits) s (1 bit) b (1 bit)

  32. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 00 0x11 0x00 1 Access pattern: 0000 0001 0010 0011 0100 0101 Cache hit 32 t (2 bits) s (1 bit) b (1 bit)

  33. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 00 0x11 0x00 1 00 0x33 0x22 Access pattern: 0000 0001 0010 0011 0100 0101 Cache miss 33 t (2 bits) s (1 bit) b (1 bit)

  34. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 00 0x11 0x00 1 00 0x33 0x22 Access pattern: 0000 0001 0010 0011 0100 0101 Cache hit 34 t (2 bits) s (1 bit) b (1 bit)

  35. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 01 0x55 0x44 1 00 0x33 0x22 Access pattern: 0000 0001 0010 0011 0100 0101 Cache miss 35 t (2 bits) s (1 bit) b (1 bit)

  36. Direct-mapped cache (B=2) Main memory Cache Tag Index Data1 Data0 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 0 01 0x55 0x44 1 00 0x33 0x22 Access pattern: 0000 0001 0010 0011 0100 0101 Cache hit Miss rate = ? 36 t (2 bits) s (1 bit) b (1 bit)

  37. Practice problem Consider a direct-mapped cache with 2 sets and 4 bytes per block and a 5-bit address (S=2, B=4) Derive values for number of address bits used for the block offset (b), the set index (s), and the tag (t) using the formula m = t + s + b s = 1 b = 2 t = 2 Draw a diagram of which bits of the address are used for the tag, the set index and the block offset t (2 bits) s (1 bit) b (2 bits) Draw a diagram of the cache s=0 tag (2 bits) data (4 bytes) s=1 tag (2 bits) data (4 bytes) 37

  38. Practice problem Using this cache (S=2, B=4) t (2 bits) s (1 bit) b (2 bits) s=0 tag (2 bits) data (4 bytes) s=1 tag (2 bits) data (4 bytes) Derive the miss rate for the following access pattern 00000 00001 00011 00100 00111 01000 01001 10000 10011 00000 38

  39. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 tag (2 bits) data (4 bytes) Derive the miss rate for the following access pattern 00000 Miss 00001 00011 00100 00111 01000 01001 10000 10011 00000 39

  40. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 tag (2 bits) data (4 bytes) Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 00100 00111 01000 01001 10000 10011 00000 40

  41. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 tag (2 bits) data (4 bytes) Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 00111 01000 01001 10000 10011 00000 41

  42. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 01000 01001 10000 10011 00000 42

  43. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 01001 10000 10011 00000 43

  44. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 01 Data from 01000 to 01011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 Miss 01001 10000 10011 00000 44

  45. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 01 Data from 01000 to 01011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 Miss 01001 Hit 10000 10011 00000 45

  46. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 10 Data from 10000 to 10011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 Miss 01001 Hit 10000 Miss 10011 00000 46

  47. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 10 Data from 10000 to 10011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 Miss 01001 Hit 10000 Miss 10011 Hit 00000 47

  48. Practice problem Consider the same cache (S,E,B,m) = (2,1,4,5) t (2 bits) s (1 bit) b (2 bits) s=0 00 Data from 00000 to 00011 s=1 00 Data from 00100 to 00111 Derive the miss rate for the following access pattern 00000 Miss 00001 Hit 00011 Hit 00100 Miss 00111 Hit 01000 Miss 01001 Hit 10000 Miss 10011 Hit 00000 Miss 48

  49. Block sizing Typically 8 to 64 bytes Advantages and disadvantages of large block size? Higher hit rates Memory latency on miss for loading line Inefficiency in loading data not subsequently used (consider instruction access) 49

  50. Identify issue #2: (S,E,B,m) = (4,1,1,4) Cache Tag Main memory Index 00 01 Data 0000 0x00 0001 0x11 0010 0x22 0011 0x33 0100 0x44 0101 0x55 0110 0x66 0111 0x77 1000 0x88 1001 0x99 1010 0xAA 1011 0xBB 1100 0xCC 1101 0xDD 1110 0xEE 1111 0xFF 10 11 Access pattern: 0010 0110 0010 0110 0010 0110 Each access results in a cache miss and a load into cache block 2. Only one out of four cache blocks is used. Miss rate = 100% 50 t (2 bits) s (2 bits)

Related


More Related Content