Efficient GPU-Accelerated Gaussian Filter Implementation for Image Processing

ece 4822 ece 4822 final project final project n.w
1 / 10
Embed
Share

Explore the optimization of a GPU-accelerated Gaussian filter implementation for image processing, focusing on library usage, preprocessing strategies, GPU kernel design, timing results, and proposed improvements for enhanced performance.

  • GPU Computing
  • Image Processing
  • Optimization
  • Gaussian Filter
  • Numba

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 4822: ECE 4822: Final Project Final Project Luke Dewees

  2. Part 0: Libraries Used Part 0: Libraries Used Numba Cuda kernels + grid striding For all kernels, I used the @cuda.jit decorator Considered @guvectorize for Gaussian Blur CuPy GPU-optimized NumPy features Used CuPy arrays to store image data on GPU Others tested: Dask -> 2D FFT function

  3. Part 1: Gaussian Filter Part 1: Gaussian Filter- - Preprocessing Preprocessing Original plan: process multiple images with one kernel call Why won t this work? Images in database are very large, even when compressed (~0.38 GB each) Expanded into a NumPy array, I found that these can exceed 17 GB Modified plan: process as many windows as possible with one GPU kernel Before any preprocessing, I first sent the Gaussian Filter kernel to the GPU After this- for each image: 1. Extract 5000 windows and convert to grayscale with NumPy 2. Convert windows to CuPy arrays (load onto GPU) 3. Launch filter Kernel

  4. Part 1: Gaussian Filter Part 1: Gaussian Filter- - GPU Kernel GPU Kernel Stored Gaussian filter coefficients and current windows batch in shared memory Grid striding X and Y track the current location in the window, Z tracks the current window being processed Used padding for the edges of each window Each thread computes one value of the blurred window

  5. Part 1: Gaussian Filter Part 1: Gaussian Filter- - Timing Results Timing Results Timing results weren t great Very large amount of time spent on loading the image, preprocessing, and transferring data to GPU Time for a single image varied by minutes Significantly less time spent on calculations (0.00037x less) Runtimes varied from ~5 minutes to 9 minutes for a single image Example Linux Time Output for 1 Image

  6. Part 1: What I Would Change Part 1: What I Would Change 1. Optimize preprocessing and GPU loading Streaming data? Bigger chunks of data 2. Remove redundant filter calculations 3. Flatten image chunks before applying filter ~7x speedup without GPU Would this multiplier increase or decrease with GPU usage? 4. Implement Numba s @guvectorize

  7. Part 2: 2D FFT Part 2: 2D FFT Using CuPy to store image data on the GPU made this really simple CuPy has a built-in 2D FFT function This was very fast Dask s 2D FFT performed slightly better computation-wise in my tests However, this requires making a Dask array version of the blurred image data

  8. Part 3: Histogram Computation Part 3: Histogram Computation Utilized grid striding FFT data in shared memory Each thread processes a chunk of the data and creates its own histogram Threads are synced and each chunk s histograms are combined

  9. Timing Results Timing Results Code with histogram is still running For one image, timing results from test runs were not great Computation speeds are decent (<0.7 seconds for Gaussian Blur, FFT was very fast) Fastest time I have received so far is 542.784 seconds for Gaussian Blur and the 2D FFT

  10. Thanks Thanks CREDITS: This presentation template was created by Slidesgo, and includes icons by Slidesgo Flaticon Flaticon, infographics & images by Freepik and contents by Swetha Tandri Freepik Please keep this slide for attribution

More Related Content