
Intruders & Intrusion Detection Systems
In the realm of cybersecurity, intruders come in various forms - from unauthorized individuals breaking into systems to legitimate users misusing their privileges. Intrusion detection systems play a crucial role in identifying and thwarting these threats. Hackers, both criminal and otherwise, pose a significant risk to organizations, highlighting the importance of cybersecurity measures such as CERTs and software patching to combat vulnerabilities.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
4F Optical Neural Network Acceleration: An Architecture Perspective PUNEET GUPTA (PUNEETG@UCLA.EDU) PUNEET GUPTA (PUNEETG@UCLA.EDU), SHURUI LI DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF CALIFORNIA, LOS ANGELES I N C O L L A B O R A T I O N W I T H G W U ( S O R G E R V O L K E R , M . M I S C U G L I O , Z . H U , J . G E O R G E , R . C A P A N N A , P . B A R D E , M A R I A S O L Y A N I K - G O R G O N E ) 1
Why 4F Optics for Neural Networks ? Convolution operation is expensive for digital systems Most CNNs are compute bound Convolution is free for Fourier Optics, good for accelerating CNN Time of flight Fourier transform! Convolution in the space domain = pointwise multiplication in the Fourier domain Require an input modulation source: can be SLM or DMD Require a set of lenses to perform Fourier transform Require phase mask or SLM/DMD to perform the pointwise product Require photodetector or camera to receive the output |X*W|2 X F(X) F(X) F(W) X*W Photo- detector (Camera) Phase Mask or SLM/DMD Input lens lens Modulation p 2
Modulation/Pointwise Product The CNN context: 100s to 1000s of filters per neural network layer total 10000s of convolutions Remember CNNs is not about executing one filter fast: it is all about throughput No way to scale to simultaneously executing all filters Need to be able to switch filters quickly in hardware for same input Need to execute each filter (convolution) quickly The optical context One filter execution is very low latency: time of flight Fourier transform, pointwise multiplication and inverse fourier transform BUT Switching filters is troublesome throughput suffers Options for pointwise product Fixed phase masks: weights on phase mask cannot be modified after fabrication no flexibility at all tough to scale to realistic CNNs Liquid Crystal Spatial Light Modulators can modify weights, but extremely slow (around 60Hz) Digital Micromirror Array (DMD) can modify weights and is much faster than SLM Commercial DMD 15-30 kHz frequency, research ones can operate in MHz range Can have up to 4K resolution, potential for parallel processing p 3
Digital Micromirror Array Modulate intensity by flipping its mirrors to reflect away a portion of the input light signal Can operate on different bitwidth (2-8) High-throughput: High resolution and relatively high frequency However, DMD still has its drawbacks 1. Intensity modulation only cannot support complex number by default Can use encoding or MZI approach to support complex number Train with real-valued frequency filters 2. 15-30 kHz frequency is still not fast enough compared to digital devices Need to fully utilize the high-resolution feature p 4
A DMD-based Amplitude Only 4F CNN Use DMD for both input modulation and dot product implementation Filter DMD uses binary mode for higher operating frequency The filter weights are trained directly in Fourier domain as binary values Process 1 input and 1 filter at the same time, DMD not fully utilized High- Speed Camera DMD 2 lens DMD 1 lens p 5
System Training Two-step training: A normal training step followed by a fine- tuning step that uses hardware outputs First train the Fourier kernels using simulation model Then generate hardware convolution outputs using the learned Fourier kernels Finally retrain fully connected layer weights with hardware outputs p 6
Inference Accuracy Evaluation Inference Accuracy of Different Setups 100 Space-domain convolution 4F-Simulation 90 80 4F-Hardware without fine-tuning 4F-Hardware with fine-tuning 70 60 50 40 30 20 10 0 MNIST CIFAR p 7
Bottlenecks and Limitations 1. Intensity-only modulation cannot accelerate conventional neural network 2. Photodetection applies square function to partial sums weights need to be positive for correct results 3. DMDs are heavily under-utilized low throughput p 8
Improving Throughput via Parallelization Use input tiling for parallization (25-49 inputs in parallel), so filters can still be trained in Fourier domain with binarized values Up to 49x throughput improvement p 9
Crosstalk and High-pass Filter One issue is crosstalk between inputs (i.e., tiled inputs are essentially convolved with a large space-domain kernel of same size) Use high-pass filter to filter out frequencies lower than the size of an individual input during training After Fourier transform, low-frequency components are located around the center Set the center region of Fourier kernels to 0 to remove the low-frequency components In actual training the center 3*3 pixels are set to 0 for a 32*32 filter Hardware inference accuracy on CIFAR-10 improves from 35% to 51% after adding high-pass filter HPF (zero block) p 10
Inference Accuracy Gen1-5 Inference Accuracy of Different Setups 100 90 80 70 60 50 40 30 20 10 0 MNIST CIFAR Google QuickDraw Simulation accuraccy Experimental accuracy Experimental accuracy without HPF p 11
Photodetection Challenge All optical systems require photodetection to receive the outputs Photodetection apply square function to the output, removing the sign information All individual channel outputs are positive-only, and accumulated together to generate the full convolution outputs Negative inputs can be handled by a pseudonegative approach which splits the input into positive and negative halving the throughput Significantly limits the learning capability of optical systems If optical system is used for acceleration of normal convolution (hence need to generate same output as spatial convolution), weights need to be positive-only p 12
A Quick Refresher on CNNs Filters in CNN are 3D results of multiple 2D convolutions on different channels are added to get the final result (e.g., RGB channels in images). 4F systems are doing just the 2D convolutions partial results need to be added electronically later Photodetector + ADC square and quantize the partial results loss of information Can we add the different channel results optically in full precision before squaring and quantizing to improve accuracy ? Channel Tiling https://towardsdatascience.com p 13
Introduction to Channel Tiling A parallization technique for accelerating normal CNNs that improves throughput while addressing the positive detection challenge Require 4F systems that support complex value multiplication in the Fourier domain (can be achieved by MZIs or SLMs) Tile all channels of inputs and weights in space domain Load the tiled inputs directly on input DMD (or other optical modulators) Perform Fourier transform on the tiled weights and load on the kernel DMD (or other optical modulators) Use 4F system to compute the convolution between tiled inputs and weights The center part of the output is the correct 3D convolution results (multi-channel convolution) p 14
Channel Tiling Visualization Tile the input channels with separation of the size of a kernel Tile the filter channels (kernels) so that each kernel matches with the corresponding input channel Effectively simulating a moving window convolution The convolution results of individual channels are inherently summed in the optical domain (before photodetection) p 15
Channel Tiling Visualization (a): Tiled inputs and weights (b): Start of sliding window convolution (c): Separation between inputs avoids unwanted overlaps (d): Invalid region: Kernel not convolving with its corresponding input channel, invalid outputs (e): Valid output: Only the center part of the output are valid, and is the full convolution output with individual channel outputs summed p 16
Advantages of Channel Tiling Improves system utilization and throughput Convolution results of individual channels are summed in optical domain, before the square function applied by photodetection Far less information removed during photodetection and photodetection can be treated as activation function Weights can be negative and still generate correct convolution result Channel summation are carried out at full precision, previously it is carried out after ADC so the summation is using reduced bitwidth Make system less susceptible to photodetection noise But, it requires tiling in space domain need the optical system to process both amplitude and phase p 17
Channel Tiling Accuracy Results on VGG-16 For input/kernel tiling, network is trained by applying square function at each individual channel outputs VGG-16 Training Accuracy 100 90 80 For channel tiling, network is trained normally, but use square function as activation function 70 60 50 40 (weights can be both positive and negative) 30 20 10 0 FashionMNIST SVHN CIFAR10 Input/kernel Tiling Channel Tiling p 18
Impact of Camera Bitwidth and Sensing Noise Impact of camera bitwidth and sensing noise on network accuracy 100 90 80 70 60 50 40 30 20 10 0 SNR = 30 SNR = 20 SNR = 15 Ideal Channel Tiling Channel Tiling 12-bit Channel Tiling 8-bit Pseudo-negative 12-bit Pseudo-negative 8-bit p 19
Performance Scaling The optical system has potential to be much faster Analog micromirror arrays can operate in MHz range Concepts of GHz SLMs have been published High-speed camera or Photodetector + high-speed ADC can operate in GHz range Overall throughput can be orders of magnitude better than current generation with 15-30 kHz DMDs Speedup compared to GPU Assuming 2MHz 4F system @ 4K resolution, can be up to 61.7X faster than Nvidia RTX-2080Ti GPU on real network and datasets SRCNN DeconvNet VGG16-SpaceNet7 AlexNet-ImageNet VGG16-ImageNet VGG16-CIFAR10 0 10 20 30 40 50 60 70 p 20
Conclusions 4F optics hold a lot of promise in accelerating CNNs Free space optics have advantage of large format DMD/SLMs There are several challenges that need to be addressed Preserving amplitude, phase and sign information in the system for equivalency to electronic convolutions Improving modulation speeds: need MHz 4-8bit SLMs/DMDs to be competitive with electronic systems Rethink neural network architectures which are well suited to Optics E.g., favor large filter sizes to get field of view vs. deep networks Our ongoing work Investigate further Fourier domain parallelization Investigate training methods and CNN model architectures for optics Build smaller resolution but high-speed photonic on-chip 4F style systems p 21