AMD Opteron Architecture Overview
The presentation delves into the architecture of the AMD Opteron microprocessor, highlighting its key features such as 64-bit support, on-chip DDR memory controller, and multiple HyperTransport links. It explains microarchitecture details, including the execution units and the pipeline structure designed for high-frequency and efficient instruction processing. The overview also addresses the system's capabilities to support legacy 32-bit applications and the doubled register count, optimizing performance for a wide range of applications.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used
Outline Features Block diagram Microarchitecture Pipeline Cache Memory controller HyperTransport InterCPU Connections
Features 64-bit x86-based microprocessor On chip double-data-rate (DDR) memory controller [low memory latency] Three HyperTransport links [connect to other devices without support chips] Out of order, superscalar processor Adds 64-bit (48-bit virtual and 40-bit physical) addressing and expands number of registers Supports legacy 32-bit applications without modifications or recompilation
Features Double the number of registers Integer general purposes registers (GPR s) 16 each Streaming SIMD extension (SSE) registers 16 each Satisfies the register allocation needs of more than 80% of functions appearing in a typical program. Connected to a memory through an integrated memory controller High performance I/O subsystem via HyperTransport bus.
Microarchitecture Works with fixed-length micro-ops and dispatches into two independent schedulers: One for integer, and one for floating point and multimedia (MMX, 3DNow, SSE and SSE2) Load and store micro-ops go to the load/store unit 11 micro-ops each cycle to the following execution resources. Three integer execution units Three address generation units Three floating point and multimedia units Two load/store to the data cache
Pipeline Long enough for high frequency and short enough for good IPC (Instructions per cycle) Fully integrated from instruction fetch through DRAM access. Execute pipeline is typically 12 stages for integer 17 stages for floating-point Data cache access occurs in stage 11. In case that L1 cache miss, the pipeline access the L2 cache in parallel and the request goes to the system request queue. Pipeline in the DRAM run as the same frequency as the core
Cache Separate L1 Instruction and Data caches. Each is 64 Kbytes, 2-way set associative, 64-byte cache line. L2 cache (Data & Instructions) Size: 1 Mbytes. 16-way set associative. uses a pseudo-least-recently-used (LRU) replacement policy Independent L1 and L2 translation look-aside buffers (TLB). The L1 TLB is fully associative and stores thirty-two 4-Kbyte page translations, and eight 2-Mbyte/4-Mbyte page translations. The L2 TLB is four-way set-associative with 512 4-Kbyte entries.
Onboard Memory Control 128-bit memory bus Latency reduced and bandwidth doubled Multicore: Processors have own memory interface and own memory Available memory scales with the number of processors DDR-SDRAM only Up to 8 registered DDR DIMMs per processor Memory bandwidth of up to 5.3 Gbytes/s per processor.
HyperTransport Bidirectional, serial/parallel, scalable, high-bandwidth low- latency bus Packet based 32-bit words regardless of physical width Facilitates power management and low latencies
HyperTransport in the Opteron 16 CAD HyperTransport (16-bit wide, CAD=Command, Address, Data) processor-to-processor and processor-to-chipset bandwidth of up to 6.4 GB/s (per HT port) 8-bit wide HyperTransport for components such as normal I/O-Hubs
InterCPU Connections Multiple CPUs connected through a proprietary extension running on additional HyperTransport interfaces Allows support of a cache-coherent, Non-Uniform Memory Access, multi-CPU memory access protocol Non-Uniform Memory Access Separate cache memory for each processor Memory access time depends on memory location. (i.e. local faster than non-local) Cache coherence Integrity of data stored in local caches of a shared resource Each CPU can access the main memory of another processor, transparent to the programmer