Architecting Volatile STT-RAM Caches for Enhanced Performance

Slide Note

Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it ideal for on-chip cache hierarchies. This article explores reducing retention time for STT-RAM to improve write latency, with a proposal to optimize performance and energy efficiency. Learn how to calculate the optimal retention time and reduce STT-RAM write latency for improved cache architecture in Chip Multiprocessors (CMPs).

mack207 Follow

Uploaded on Mar 10, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Adwait Jog , Asit K. Mishra , Cong Xu , Yuan Xie , N. Vijaykrishnan , Ravi Iyer , Chita R. Das The Pennsylvania State University Intel Corporation

STT-RAM as Emerging Memory Technology Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it attractive for on chip cache hierarchies. STT-RAM caches suffer from long write latency and higher write energy consumption when compared to traditional SRAM caches. 2

SRAM vs. STT-RAM Area (mm2) Energy Read Write Energy (nJ) Leakage Power at (mW) Read Latency (ns) Write latency (ns) Read @ 2 GHz (cycles) Write @2 GHz (cycles) (nJ) 1 MB SRAM 4MB STT- RAM 2.61 0.578 0.578 4542 1.012 1.012 2 2 3.00 1.035 1.066 2524 0.998 10.61 2 22 ~3-4x denser (capacity benefit) 1.8x lower leakage energy Comparable read latency ~11x higher write latency (@ 2GHZ) 3

Proposal : Reduce Retention Time Years of data-retention time for STT-RAM may not be required. Trade-off retention time for lower STT-RAM write latency Challenge: Architecting Volatile STT-RAM Caches Advantage: Performance and Energy Benefits! 4

How to Calculate Optimal Retention Time? (1) Device Constraints: Retention Time of STT-RAM can be reduced to a certain limit. (2) Application Needs: Application Characteristics show that data-retention time in range of milliseconds is sufficient enough to make STT-RAM caches effective for CMPs. Both Device Constraints and Application Needs should be considered for Optimal Results! 5

How to Reduce STT-RAM Write Latency? 300 Write Current (uA) Retention Time Operating Point 250 10 years 1sec 10ms 200 150 100 Write current goes down with reduction in retention time 50 0 1 2 3 4 5 6 7 8 9 10 Write Pulse Width (ns) Retention Time of STT-RAM 10 Years 1 second 10 millisecond Write Latency @ 2 GHz 22 cycles 12 cycles 6 cycles 6

How much non-volatility can be traded off? Inter-Write Time (Refresh Time) Distributions of Multi-threaded and Multi-Programmed Benchmarks 100% 100% Percentage of Blocks 90% 90% 80% 80% 40+ ms 70% 70% 40 ms 60% 60% 30 ms 50% 50% 20 ms 40% 40% 10 ms 30% 30% 5 ms 20% 20% 10% 10% 0% 0% frrt. fluid. x264 AVG. libq. gcc namd AVG. PARSEC SPEC2006 Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms 7

Volatile STT-RAM Based Last level Cache Design How to save rest 50% of the blocks? Answer: Use Selective Refresh Policy. Only refresh cache blocks which are in MRU Slots. Dying Blocks (Do not Refresh) Dying Blocks (Refresh) WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block State NON- IMP Blocks IMP Blocks 8

How to refresh? IMP Blocks NON- IMP Blocks WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block State YES COPY BACK Dirty? Is Buffer Full? YES COPY NO Write-back to DRAM 9

Results: Speedup Improvement On Average, 18 % Performance Improvement for PARSEC Multithreaded Benchmarks S-1MB S-4MB (Ideal) M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived-M-4MB(10ms) 1.8 Normalized speedup 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 dedup On Average, 10% Improvement in Instruction Throughput for Multi-programmed workloads freq. rtvw. swpts. x264 frrt. fcsim. vips fluid. AVG. PARSEC Benchmarks 1.2 1.1 1 0.9 0.8 0.7 Instruction Throughput Weighted Speedup 10 SPEC Benchmarks

Results: Energy Improvements Nominal Increase in Dynamic Energy (4%) over M-4MB because of S-1MB M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Buffer Scheme Revived M-4MB(10ms) Normalized Dynamic 2.5 Energy 0.5 dedup fcsim. freq. rtvw. AVG. 60 % reduction in Leakage Energy over SRAM designs Normalized Leakage 0.7 Energy 0.2 dedup fcsim. freq. rtvw. AVG. 11

Summary STT-RAM is a promising technology, which has high density, low leakage and competitive read latencies compared to SRAM. High Write Latency and Energy is impeding its widespread adoption. Reducing Retention time can directly reduce the write- latency and write energy of STT-RAM. A Simple Buffering Scheme is presented to refresh important diminishing blocks. 12

Architecting Volatile STT-RAM Caches for Enhanced Performance

Download Presentation

Presentation Transcript

Related

More Related Content