
Understanding Memory Hierarchy Technology in Storage Devices
Explore the organized hierarchy of memory storage devices in computing systems, from registers to backup storage. Learn about the cost differences, access times, capacities, and characteristics of each level in the memory hierarchy.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Memory Hierarchy Technology Storage devices : Registers Cache Main memory Disk drives Backup storage devices Are organized in hierarchy Cost of processors is decreasing than these storage devices.
Registers in CPU Level 0 Increase in Capacity and Access time Increase in Cost per bit Cache sRAM Level 1 Main Memory dRAM Level 2 Disk Storage (Solid state, magnetic) Level 3 Level 4 Backup Storage (magnetic tape, optical disk) Capacity
Access time ti is the time required by CPU to access ith level memory. si ci cisi bi xi - - - - - memory size in bytes / word in level i cost per byte cost of ith level memory transfer bandwidth between adjacent level unit of transfer, grain size for data transfer between level i and i+1
Memory devices at lower level are: faster to access, more expensive per byte, smaller in size, having higher bandwidth, small unit of transfer as compared to higher level.
i.e. ti-1 si-1 ci-1 bi-1 > xi-1 < < < > ti si ci bi xi for i = 1, 2, 3, and 4
Registers and Cache Registers are part of processor. Register transfer conducted at processor speed, i.e., in one clock cycle. Therefore these are not considered in comparison with other levels of memory. Cache are controlled by MMU and are programmer defined. Depending on the speed requirement and application requirement, cache can be implemented at one or multiple levels. Generally resides at first level.
Main Memory It is a primary memory of a computer system. Larger in size than cache, and are managed by MMUs. These resides on second level.
Disk Drives and Backup Storage This is the highest level memory holds system programs i.e. OS, compilers, and user data and programs. Disk drives are called online memory i.e. These are active at all times the system is up. Backup storages are called offline memory, and are mostly used for archival purpose.
Inclusion, Coherence, and Locality properties Information stored in memory hierarchy satisfy these three properties inclusion, coherence, and locality. Cache memory at innermost level M1 directly communicates with registers of the CPU. All information words are stored at outermost level Mn
The set inclusion relationship is: M1 M2 M3 Mn This shows that all information items are originally stored in outermost level Mn. At the time of processing, subset of Mn are copied into Mn-1, Similarly, subset of Mn-1 are copied into Mn-2, and so on.
If information word is found in Mi, then copies of the same word can also be found in all upper levels Mi+1, Mi+2,....,Mn. Whereas, word found in Mi+1 may not be found in Mi. M1 Mi-1 Mi Mi+1 Mn
A word miss in Mi indicates that it is also missing from all lower levels Mi-1, Mi-2,....,M1. Highest level is the backup storage where everything can be found. Information transfer between CPU and cache is in terms of words (8, 16 bytes). The cache (M1) is divided into cache blocks, called cache lines. Each block is 32 bytes. (e.g. blocks a, b in figure)
CPU 1: Access by word (4, 8 byte) from a cache block of 32 bytes, e.g. Block a Inclusion property & Data transfer between adjacent levels of memory hierarchy Registers b a M1: Cache 2: Access by block (32 byte) from memory page of 32 blocks, e.g. Block b from Page B Page B b M2: Main Memory 3: Access by Page (1KByte) from file segment e.g. Segment F a Page A Segment G a Segment F M3: Disk Storage Page A Page B b Segment F M4: Magnetic Tape unit (backup storage) a Page B Page A b Segment G
Main memory (M2) is divided into pages (4KBytes each). Each page has 128 blocks. Pages are the unit of transfer between main memory and disk storage. Main Memory CPU Registers Disk Storage Cache 4-8 bytes Per word transfer 4 K bytes Page transfer file Transfer 512 KBytes 32 bytes block transfer Scattered pages are organized in the form of segment in disk memory. The size of segment varies depends on user needs.
Coherence Property: Copy of same information item at successive level must be consistent. If the word is modified at Cache, must immediately be updated at all higher levels. The hierarchy should be maintained in such way. Frequently used information is often available at lower level to minimize access time.
Coherence is maintained in two ways. i. Write-through (WT) demands immediate update in Mi+1 when word in Mi is updated. ii. Write-back (WB) which delays the update in Mi+1 when the word in Mi is modified.
Locality of Reference Property: Particular portion of memory address space is accessed frequently by a program during its execution in any time window. E.g. Innermost loop. There are three dimensions of locality property. 1. Temporal locality 2. Spatial locality 3. Sequential locality
1. Temporal Locality Items recently referred are likely to be referenced in near future by loop, stack, temporary variables, or subroutines, etc. Once a loop is entered or subroutine is called, a small code segment is referenced many times repeatedly. This temporal locality is clustered in recently used areas.
2. Spatial Locality It is the tendency of a process to access items whose addresses are near to each other, e.g. tables, arrays involves access to special areas which are clustered together. Program segment containing subroutines and macros are kept together in the neighbourhood of memory space.
3. Sequential Locality Execution of instructions in a program follow a sequential order unless out-of-order instructions encountered. The ratio of in-order execution to out-of-order execution is generally 5 to 1.
Memory design implications Temporal locality uses least recently used (LRU) replacement algorithm. Spatial locality determines size of unit data transfer between adjacent memory levels. Sequential locality affect grain packing for optimal scheduling. Prefetch are heavily used.
Working Set Working set are chunk of program / data that is being currently used. In a time window (t, t+ t) this set is kept in lowest level of memory hierarchy. This reduces higher hit ratio at lowest level. Cache size is determined by working set.
Working Set Virtual Address Space (a) (a) (b) (c) (c) Time t t+ t t a, b, c are three software processes. Memory reference pattern
Memory Capacity Planning Memory hierarchy performance is determined by the effective access time teff. It depends on hit ratio and access frequency.
Hit Ratio If desired information is found in Mi If desired information is not found in Mi - hit - miss The hit ratio hi at Mi is the probability that information item will be found in Mi Miss is defined as (1 hi) Hit ratio of successive levels is the function of memory capacity, program behaviour, etc. Assume h0 = 0 and hn = 1
Access Frequency The access frequency to Mi is fi where, fi = (1-h1) (1-h2). . . . (1-hi-1) hi is the probability of sequential access to Mi when there are i-1 misses at the lower levels and hit at Mi
n = i = 1 if and f1 = h1 1 Access frequency decreases rapidly due to locality property i.e. f1 f2 fn also indicates that, innermost memory level are accessed frequently than outer levels.
Effective Access Time Misses are called Block misses Page fault at at cache memory Time penalty in page fault is larger than in block misses. Cache miss is 2 to 4 times costly than cache hit. But, page fault is 1000 to 10000 times costly as a page hit.
The effective access time i=1n fi . ti Teff = = h1.t1 + (1-h1)h2t2 + (1-h1)(1-h2)h3t3 + . . . . + (1-h1)(1-h2)...(1-hn-1).tn Still effective access time depends on program behaviour an memory hierarchy design
Hierarchical Optimization n Total Cost = i = C C s total i i 1 Since, Cost C1 > C2 > . . . . > Cn, we need to select size (capacity) s1 < s2 < . . . . < sn Memory hierarchy should result in a Teff (effective access time) close to t1 of M1.
The effective access time i=1n fi . ti Teff = The optimized effective access time should be minimized and close to t1. Ctotal must be less than C0, where C0 is the ceiling on total cost. There is tradeoff at all levels, and this represents non-deterministic problem. (LPP).
Example: Design a three level memory hierarchy for given parameters. Memory Level Access Time Capacity Cost / K byte Cache t1 = 25ns s1 = 512 Kbyte c1 = $0.12 Main memory t2 = unknown s2 = 32 Mbytes c2 = $0.02 Disk t3 = 4ms s3 = unknown c3 = $0.00002 Goal is to achieve effective access time t = 850 ns cache hit ratio h1 = 0.98, h2 = 0.99 in main memory C0 is $1500.
Effective access time t = h1.t1 + (1-h1)h2t2 + (1-h1)(1-h2)h3t3 850x10-9 = 0.98x25x10-9 + (1-0.98)x0.99xt2 + (1-0.98)(1-0.99)x1x4x10-3 t2 = 1250ns 1500 = 0.12x512 + 0.02x32x103 + 0.00002xs3 s3 = 40 GBytes max
ACA Assignment 3 Q.1 Q.2 Q.3 Q.4 Q.5 Q.6 Q.7 Q.8 Q.9 Q.10 Q.11 Q.12 Explain message passing parallel programming with example. What is SPMD Programming. Discuss any three Optimizing/Vectoring functions designed for optimizing compilers. Explain the use of synchronization primitives in parallel programming. What are the important features of Neuro Computing? Explain how neural network can be used for distributed parallel computing. Compare between Grid Computing and Cluster Computing. Explain the following terms w.r.t. Grid Computing: i) Middleware ii) OGSA iii) OGSI Explain important features of Quantum Computing. How these architecture can be used for distributed parallel processing. State the standard constructs and compiler directives used in High Performance Fortran (HPF) or Fortran-90 as a Data Parallel Programming Language. Explain Three Tier Architecture of Cloud Computing. Explain GPU parallel architecture. What are the operating system support for parallel program execution. Compare PVM and MPI Message Passing Libraries.