Unified Memory in CUDA: Key Concepts and Implementation Details

unified cuda memory rui ray wu n.w
1 / 17
Embed
Share

"Explore the concept of Unified Memory in CUDA, enabling seamless data access between CPU and GPU without explicit copying. Learn about page migration, system-wide operations, and GPU architecture advancements. Dive into examples and best practices for efficient memory management."

  • CUDA
  • Unified Memory
  • GPU Architecture
  • Page Migration
  • System-wide Operations

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Unified Cuda Memory RUI (RAY) WU RAYWU1990@NEVADA.UNR.EDU

  2. Outline Profile Unified Memory Ideas about Unified Vector Dot Product How to add vectors more than the maximum thread number? PA2

  3. Profile What is nvprof? Profile nvprof ./PA0 <argv> nvprof does not need cudaEvent_t and has more detailed information

  4. Unified Memory

  5. Unified Memory Key idea: allocate and access data that can be used by code running on any processor in the system, CPU or GPU No need to cudaMemcpyHostToDevice and cudaMemcpyDeviceToHost Multiple GPUs and multiple CPUs Read more details: https://devblogs.nvidia.com/unified-memory-cuda- beginners/

  6. Unified Memory

  7. Unified Memory: Vector Addition Example: https://devblogs.nvidia.com/unified-memory-cuda-beginners/ cudaDeviceSynchronize: synchronize before access the data!

  8. Unified Memory How does it work? Store data into Page : Unified Memory is able to automatically migrate data at the level of individual pages between host and device memory Move Page between CPU memory and GPU memory cudaMemcpy => cudaMallocManaged Page-> similar to cache, performs better if you use the loading data multiple times. Read: three methods to avoid page faults

  9. Unified Memory When it accesses any absent pages, the GPU stalls execution of the accessing threads, and the Page Migration Engine migrates the pages to the device before resuming the threads. Pre-Pascal GPUs lack hardware page faulting, so coherence can t be guaranteed. An access from the CPU while a kernel is running will cause a segmentation fault! Pascal and Volta GPUs support system-wide atomic memory operations. That means you can atomically operate on values anywhere in the system from multiple GPUs. What is Pascal and Volta : https://en.wikipedia.org/wiki/CUDA

  10. Unified Memory 49-bit virtual addressing and on-demand page migration. 49-bit virtual addresses are sufficient to enable GPUs to access the entire system memory plus the memory of all GPUs in the system. 49 bits means how many GB? Discuss in next class More reading materials: https://devblogs.nvidia.com/unified-memory-in- cuda-6/

  11. Ideas about Unified Vector Dot Product Step 1: calculate product of each pair in one block (serve PA2) Step 2: __syncthreads() threads in this block Step 3: sum reduction

  12. Ideas about Unified Vector Dot Product: Sum Reduction

  13. Ideas about Unified Vector Dot Product: Sum Reduction __syncthreads() threads in this block Book page P80 introduces how to do this by using shared memory. Shared memory: old version

  14. How to add vectors more than the maximum thread number? Figure!!! Show relations between each other Draw on the board!

  15. How to add vectors more than the maximum thread number?

  16. PA2: Matrix Multiplication Now we know how to do vector product with one block. How about matrix multiplication? Draw graph on the board!!!!

  17. Thank you! Questions?

More Related Content