
Accelerated Image Processing with GPU Technology
"Efficiently process large pathology images using GPU acceleration. Utilize CUDA functions for memory-efficient batch-wise processing and integration with Python for orchestration. Explore the pipeline overview and GPU acceleration techniques in detail. Get insights into the dataset statistics and the development of key C functions for GPU processing."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Final Project: GPU Final Project: GPU- -Accelerated Processing of Large Accelerated Processing of Large- - Scale SVS Images Scale SVS Images Zahra Sadeghi Adl Fall 2024
Introduction Introduction Objective of the project: Efficiently process large pathology images by utilizing GPU acceleration. Image Reading: Images in SVS format are loaded and divided into smaller 256 256 windows, 128x128 frame size. GPU Processing: Windows are sent in batches to the GPU for computation, which includes: Gaussian Filtering to smooth the images. 2D FFT (Fast Fourier Transform) to analyze frequency components. Histogram Calculation for statistical analysis. Processed data is transferred back to the CPU for aggregation and final reporting. 2
Dataset Overview Dataset Overview Dataset Stats: Total images: 3,505 Average Image Size 2 billion pixels Average number of 256x256 windows in each image 131,000 Largest number of windows 320,000 Smallest number of windows 3000 3
Pipeline Overview Pipeline Overview Preparing Windows: Loading the image (nedc_image_tools) Extracting dimensions Top-left Coordinates Generation of 256 256 windows Reading Batches of Windows: Read in batches (default: 2,000) Using Python s yield for one batch at a time GPU Processing: Sending the Batch to GPU using C (cudaMalloc and cudaMemcpy) Doing the Calculations (Gaussian Filtering, FFT Computation, Histogram) Retrieving Processed Data back to CPU using cudaMemcpy GPU Memory Freeing Result Aggregation : Computing average and standard deviation for the image Print the results 4
GPU Acceleration Using C GPU Acceleration Using C C Functions Developed: Memory allocation (cudaMalloc). Gaussian filter kernel execution. FFT computation using cuFFT. Magnitude and histogram calculations. Memory deallocation (cudaFree). 5
Python Integration Python Integration Python s Role: Orchestration of CUDA functions. Batch-wise processing using yield for memory efficiency. Integration: Python calls C functions via ctypes. 6
Python Integration Python Integration Python s Role: Orchestration of CUDA functions. Batch-wise processing using yield for memory efficiency. Integration: Python calls C functions via ctypes. 7
Choosing Thread Block Choosing Thread Block Tested different configurations of threads and blocks Evaluated using 5 images (average processing time per image) Selected 16 threads per block and 16 blocks per dimension 8
Results Results Average processing time per image: 93.81 seconds (100 images). With average of 146,000 windows per image Average time per window: 0.0006 seconds. Approximately for whole data: ??????? ?????? ?? ??????? ??? ????? 0.0006 ?????? ?? ?????? = 292,000 seconds 80 ???? 9
Challenges and Solutions Challenges and Solutions Memory Management: Initial segmentation faults resolved with careful memory allocation and deallocation. Multi-GPU Utilization: Implementation added but faced system visibility issues. 10
Conclusion Conclusion Developed an efficient GPU-accelerated pipeline. Processed large-scale histological images with stable performance. Achieved average of 93.81 seconds processing time per image (100 images). 11
Thank you! 12