
High-Performance FPGA Floating Point Processing for Variable Precision DSP Blocks
Unlock the power of FPGA-based floating point processing with 28nm silicon enhancements, efficient data paths, and versatile design capabilities. Achieve blazing fast matrix operations and customize matrix and vector sizes for optimal performance. Discover the roadmap to advanced floating point IP in this cutting-edge hardware solution.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Hardware Based Floating Point Processing All the ingredients for FPGA based floating point 28 nm Variable Precision DSP Block 28nm silicon enhancements for floating point IP: Fused Datapath: efficient floating point in Altera FPGAs Tools: DSP Builder Advanced Blockset Matlab/Simulink native support is floating point Cholesky decomposition / matrix inversion design Programmable in both matrix size and vector size Ex: 20x20 with vector size of 10, 240x240 with vector size of 60 Up to 1 Million matrices per second in a single core for time interleaved 20x20 matrices, multiple cores possible in single device Floating point roadmap Silicon / Tools / IP MIT Lincoln Laboratory 999999-1 XYZ 3/11/2005 1
Single Channel Results: Single Core Single Channel Cholesky Inversion Core Vector Size Utilization 60 90.1% 50 88.3% Matrix Size 240x240 200x200 Dot Product Throughput (matrices/sec) 3282 4619 Latency (us) 489 349 100x100 50 57.4% 17,700 97.8 75x75 50 40.7% 24,900 66.7 50x50 50 24.8% 37,400 41.3 100x100 25 70.3% 14,400 110.8 75x75 25 52.5% 22,800 70.2 50x50 25 31.2% 37,200 41.5 25x25 25 12.9% 73,800 19.4 Multiple single precision Cholesky cores may be implemented within a single FPGA MIT Lincoln Laboratory 999999-2 XYZ 3/11/2005 2
Multi-channel Results: Single Core Multi-Channel Cholesky Inversion Core Matrix Size Vector Size Number of channels 6 Dot Product Utilization Throughput (matrices/sec) Latency (us) 100x100 50 96.8% 29,900 380 50x50 50 20 98.3% 148,000 270 75x75 25 10 98.3% 42,700 392 50x50 25 20 98.7% 118,000 304 25x25 25 20 95.6% 526,000 71 20x20 20 50 97.8% 1,000,000 96 Multiple single precision Cholesky cores may be implemented within a single FPGA MIT Lincoln Laboratory 999999-3 XYZ 3/11/2005 3
Cholesky Throughput per FPGA device Cholesky Throughput per Stratix V FPGA Matrix Size Vector Size Device: Stratix V 5SGSD8 single die, 700 kLE, 4000 mults Number of cores per FPGA 18 Throughput per core (matrices/sec) 1M Throughput per FPGA device (matrices/sec) 18 Million 20x20 20 240x240 60 6 3,000 18 Thousand Multiple Cholesky cores implemented within a single FPGA MIT Lincoln Laboratory 999999-4 XYZ 3/11/2005 4