Block-grained Scaling of Deep Neural Networks for Mobile Vision

Slide Note
Embed
Share

This presentation explores the challenges of optimizing Deep Neural Networks (DNN) for mobile vision systems due to their large size and high energy consumption. The LegoDNN framework introduces a block-grained scaling approach to reduce memory access energy consumption by compressing DNNs. The agenda includes discussions on deep compression, limitations of model-grained methods, LegoDNN architecture, contributions, evaluations, and related work, emphasizing the importance of energy-efficient mobile vision applications.


Uploaded on Apr 05, 2024 | 7 Views


Block-grained Scaling of Deep Neural Networks for Mobile Vision

PowerPoint presentation about 'Block-grained Scaling of Deep Neural Networks for Mobile Vision'. This presentation describes the topic on This presentation explores the challenges of optimizing Deep Neural Networks (DNN) for mobile vision systems due to their large size and high energy consumption. The LegoDNN framework introduces a block-grained scaling approach to reduce memory access energy consumption by compressing DNNs. The agenda includes discussions on deep compression, limitations of model-grained methods, LegoDNN architecture, contributions, evaluations, and related work, emphasizing the importance of energy-efficient mobile vision applications.. Download this presentation absolutely free.

Presentation Transcript


  1. Wireless Network Communication, Dr. Yu Wang 1 1 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  2. Agenda 1. Introduction to DNN problem in Edge Computing 2. Deep Compression 3. Limitation of model-grained approach 4. Introduction to LegoDNN architecture 5. Paper contribution 6. Evaluation (LegoDNN vs baseline models) 7. Related Work 8. Conclusion 9. Question 2 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  3. Deep Neural Network (DNN) Deep Neural Networks (DNN) generally optimize for performance in terms of accuracy As a result, DNN are huge and have parameters in the order of millions The very popular AlexNet has around 62 million parameters, A trained AlexNet takes around 200MB of space Credit: Xu et al, 2019, Wikipedia Alexnet 3 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  4. Mobile vision systems are ubiquitous. Mobile Vision Application: Image Classification Semantic Segmentation Object Detection Anomaly Detection Pose Estimation Action Recognition Intel Movidius VPU The emergence of Al chipsets enables the on- device deep learning on mobile vision systems. Credit: https://github.com/LINC-BIT/legodnn 4 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  5. Mobile vision systems are ubiquitous. Two Issues: Memory Battery Apple A11 Bionic Neural Engine Intel Movidius VPU The emergence of Al chipsets enables the on-device deep learning on mobile vision systems. WikiPedia: Apple A11, Raspberry Pi, Intel Movidius VPU 5 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  6. Motivation Energy consumption is dominated by memory access. Operation Energy [pJ] 32 bit int ADD 0.1 32 bit float ADD 0.9 32 bit SRAM Cache 5 32 bit DRAM Memory 640 Large networks do not fit in on-chip storage and hence require the more costly DRAM accesses. If deep model were small enough to fit in SRAM, that would then reduce energy consumption. Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 6 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  7. Deep Compression 35 X decrease in AlexNet size from 240MB to 6.9MB Song Han et al, Deep Compression: Compressing Deep Neural Network with Puning, Trained Quantization and Huffman Coding, ICLR 2016 7 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  8. Deep Compression: Network Pruning Three steps: Train the model: we start by learning the connectivity via normal network training. Prune the small-weight connections: all connections with weights below a threshold are removed from the network. Retrain: Finally, we retrain the network to learn the final weights for the remaining sparse connections. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 8 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  9. Deep Compression: Weight Sharing Each filter weight are partitioned into k clusters using K-means. Reduce Weight Representation Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 9 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  10. Deep Compression: Quantization and Huffman Coding In their experiment (theoretically) they found no loss of accuracy when quantized upto 8-bits instead of 32-bits HUFFMAN CODING popular weight will be assigned less bits for representation while rare weights will be assigned larger bits. The compression pipeline can save 35 to 49 parameter storage with no loss of accuracy. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 10 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  11. Limitation of model-grained scaling We need to be able to scale the model size based on the available resources The previous techniques suffer severe accuracy loss once its size is changed Online Scaling To provide online scaling, these techniques need to first transform a DNN model into several descendant models of smaller sizes offline, and then select one of them to upgrade or downgrade the model size at run-time. 11 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  12. Limitation of model-grained scaling (a) Incurring large accuracy loss (b) leaving considerable proportions of resources unused 12 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  13. Limitation of model-grained scaling An ideal scaling that fully uses the available resources requires a dauntingly large number of descendant models, which gives rise to two major challenges in practice: First Challenge: construct large number of descendant models (e.g. 1 million) using moderate amount of time such as hours. Second Challenge: how to form the correct combination of these descendant models to maximize accuracy (given the correct resources) 13 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  14. LegoDNN LegoDNN: framework based on block structure (ResNet), Training environment represents block information https://github.com/LINC-BIT/legodnn 14 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  15. LegoDNN LegoDNN: Block are independent from other parts of the DNN, at runtime dynamic selection of descendant block is supported https://github.com/LINC-BIT/legodnn 15 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  16. Contribution of this paper Fast generation of large scaling space via block re-training. re-trains their descendant blocks independent of the whole network. Fine-grained DNN scaling via block switching. run -time optimal selection of descendant blocks for maximum accuracy, unlike filter pruning no need to retrain the entire DNN. Implementation and Evaluation. they design and develop TensorFlow applications, and evaluate against state-of- the-art techniques. I also found a PyTorch repository which was created 3 months ago Summary of experimental result: 1. 2. Implement and tested on smartphones (Huawei) and embedded devices (Raspberry Pi) Compare LegoNet to other models: a. 1,296x to 279,936x more descendant models. b. 76% less time for training than other techniques. c. 31.74% increase in accuracy. d. Saved 71.07% energy on switching models. Testing multi-DNN scenario. Tested on multiple applications. 3. 4. 16 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  17. Section 2: Motivation - How to enlarge the model exploration spaces through blocks? Limitation 1: limited exploration of model choices. Traditional model-grained scaling techniques treat a DNN as a whole and transform it into several descendant models of different sizes In three popular DNNs (VGG16, YOLOv3, and ResNet18), the block-grained scaling provides as much as 7K to 1.6M model sizes by just transforming the original blocks into 5 to 40 descendant blocks. 17 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  18. Section 2: Motivation - How to accelerate the training of block-grain models? Limitation 2: Resource intensive and time consuming training. Traditional model-grained accuracy is fine, but it suffers from block generation time. When handling large datasets such as ImageNet or COCO2017, the generation of a block takes several hours. 18 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  19. Section 2: Motivation - how to optimally assemble descendant block on the fly? Limitation 3: Focus on off-line structure pruning. Given large search space of descendant models, an efficient scaling approach is required to quickly find the optimal combination of blocks under run-time resource dynamics. 19 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  20. Section 3: Design of LegoDNN Overview: Offline vs Online https://github.com/LINC-BIT/legodnn 20 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  21. Section 3: Design of LegoDNN Overview: Offline stage Block Extractor: extract blocks from original model convolution layers. Descendant Block generator: responsible for pruning and generate multiple size descendant block. Block Retrainer: retrain descendant models to improve their accuracies (more about this in next slide) Block Profiler: module to encapsulate statistical information of the block: Size memory Accuracy Latency https://github.com/LINC-BIT/legodnn 21 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  22. Section 3: Design of LegoDNN Overview: offline - Block Retainer 22 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  23. Section 3: Design of LegoDNN Overview: Online Latency Profile: compute the latency reduction percentage of blocks of edge devices. Scaling Optimizer: block selection to maximize accuracy and satisfies latency condition and memory limitations. (next slides) Block Switcher: switch blocks at runtime by using optimization algorithm. (following Slides) https://github.com/LINC-BIT/legodnn 23 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  24. Section 3: Design of LegoDNN Single-DNN Optimization: at runtime scaling optimizer finds the optimal combination of descendant blocks, under four constraints. Accuracy Latency Model Size Block Scale 24 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  25. Section 3: Design of LegoDNN Overview: Online Multi-DNN optimization: When considering ?DNNs (?>1), the optimization objective can be extended as: 25 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  26. Section 3: Design of LegoDNN Overview: Online Block Exchanges (Switcher) Filter Pruning and Low rank decomposition directly replace the entire DNN (large I/O overhead) NestDNN designed descendant model exchange to solve the previous issue but introduced two other issues: Generation time increased by 50 % to 200% comparing filter pruning and low rank decomposition When the model is loaded into the memory, new layers need to be constructed using deep learning libraries. LegoDNN only need to exchange blocks rather than the entire DNN, all block can be integrated without extra operation. (you can t mix and match descendant blocks with other blocks :) ) 26 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  27. Section 4 EVALUATION Experimental Setup Testbeds: Tested on four different phones and embedded platforms. Image Recognition: VGG16 and ResNet18, datasets used Cifar-10 (60K images and 10 classes and ImageNet2012 (1.2M and 1000 class) respectively Real-Time Object Detection: YOLOv3 and COCO2017 0.33M images 1.5M object and 80 classes, mAP @ 0.5 over IoU threshold = 0.5 Multi-DNN Scenarios: three ranges Small 1-6 running application Medium 2-8 running application Large 3-10 running application Baselines Model: four state-of-the-art model-grained scaling techniques Filter Pruning NestDNN Low rank decomposition Knowledge distillation * * Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. In ICLR 16, 2015 27 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  28. Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Offline Model/Block Generation. LegoDNN uses the shortest generation time, the base models will take at least several years to generate the same number of optional models as with LegoDNN. Time (hour) VGG16, ResNet18 and YOLOv3 have 5, 5, and 8 blocks respectively. 28 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  29. Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Accuracy/Latency Tradeoff. First, LegoDNN consistently achieves higher accuracies (increase by 31.74%) than baseline approaches at every available resource. Second, On average, it achieves 52.66% higher accuracy when the available memory is smaller than 30MB. 29 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  30. Comparison of inference accuracies between LegoDNN and model scaling approaches 30 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  31. Comparison of energy consumptions in model scaling 31 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  32. Section 4 EVALUATION Evaluation of Multi-DNN Scenario Two mobile mobile devices CPU based smartphone and GPU-based embedded device. an application s available resource decreases when more co-running applications exist. LegoDNN provides a block-level scaling by just switching 3 blocks, thus still maintaining accurate models (high prediction confidences) with tighter resources. This application is randomly selected from VGG16, ResNet18, and YOLOv3 Object detection and Recognition have different accuracy metrics they used the percentage of loss in the average accuracy 32 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  33. Section 4 EVALUATION Evaluation of Multi-DNN Scenario Performance under Different Loads. LegoDNN achieves considerable performance improvements against baseline approaches by delivering either lower latencies (minimal latency scenario) or smaller accuracy losses (maximum accuracy scenario).Overall, our approach achieves 21.62% reductions (at most 2x) in latency and 69.71% reductions in accuracy losses. 33 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  34. Performance comparison under maximum accuracy and minimum latency scenarios 34 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  35. Section 5 DISCUSSIONS Applicability of LegoDNN to DNNs. LegoDNN first framework that supports block-grained scaling of DNNs for mobile vision system generalized to support most of state-of-the-art DNNs (reported on 7 different DNN categories i.e ResNET layer dimension like depth (ResNet152) and width (ResNeXt29) ) LegoDNN is inapplicable when any filter in a DNN cannot be pruned 35 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  36. Applicability of LegoDNN to seven categories of DNNs 36 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  37. Section 6 RELATED WORK LegoDNN Similar Approaches Model-grained DNN scaling: Applying structured pruning in a different way, mainly generate limited model numbers to fulfil requirements. Scheduling of multiple DNNs: modern mobile vision systems need to execute multi DNN to optimize their latency, accuracy and energy. (Similar to LegoDNN in a different approach) Relationship to device-specific DNN generation and search: this technique studies the generation of optimal DNN based on specific devices, in another word pre-generate DNN models according to features in specific devices , unlike what LegoDNN do during run-time. 37 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  38. Section 7 CONCLUSION Paper Conclusion LegoDNN was presented in this paper as a large exploration of DNN models for mobile vision system Future work will explore the re-training of descendant blocks in the federated learning framework. Such re-training allows a descendant block to learn from multiple participants DNNs to increase its accuracy and generalization. 38 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  39. QUESTIONS? 39

Related


More Related Content