Efficient Vision Transformers: Scaling and Training Insights

1 / 7

Embed Share

Explore the latest advancements in vision transformer models, focusing on efficient scaling techniques and novel training approaches. Discover how PartialFormer enhances model performance and transferability to various dense prediction tasks. Dive into the world of autonomous driving and tumor segmentations with PartialFormer training. Stay updated on the cutting-edge research in the field of computer vision.

mart_ky Follow

Uploaded on Jun 11, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Weekly Report Xuan-Thuy Vo xthuy@islab.ulsan.ac.kr September 12, 2023

Activities Last week: Prepare for Saturday Seminar (Sept. 12): Title: Scale-Aware Modulation Meets Transformer Write PartialFormer paper: Thesis + CVPR 2024 (Due: November 03) Titile: Efficient Vision Transformers with Partial Attention Improvements: Foreground tokens: mixed multi-head self-attention Background tokens: single-query attention Query: abstract tokens learned informative features of foreground tokens Top-1 accuracy: 77.1 79.3 (2.2%)

Activities Partial Attention:

Activities Partial Attention:

Activities Last week: Partial Vision Transformers (PartialFormer) Method Top-1 Acc #param GFLOPs imgs/s EdgeViT-XXS 74.4 4.1 0.6 3954 MobileViTV2-0.75 75.6 2.9 1.0 4504 PartialFormer (BG unchanged) 76.0 8.09 0.5 5336 PartialFormer (BG tokens --> one token) 77.1 8.22 0.5 4910 79.3 PartialFormer (+ abtract tokens) 8.52 0.7 4633

Activities This week: Write the paper with title: Efficient Vision Transformers with Partial Attention Scaling model to 0.1, 0.3, 0.7, 1, 2, 3, 4 GFLOPs Transfer trained models to dense prediction tasks: Detection, semantic/instance segmentation Human detection, multiple object tracking, human pose estimation Try to train PartialFormer to new fields: (learning) Autonomous driving, tumor segmentations

Thank you very much

Efficient Vision Transformers: Scaling and Training Insights

Download Presentation

Presentation Transcript

Related

More Related Content