Deep Learning Project on Persian Visual Question Answering

Deep Learning Project on Persian Visual Question Answering
Slide Note
Embed
Share

The project "Deep Learning Course Final Project of Spring 2020" focuses on developing a solution for Persian Visual Question Answering. The project involves Alireza, Asghari, and Maryam Sadat Hashemi. This initiative explores the intersection of deep learning with the complexities of processing visual and textual information in the Persian language. The final product aims to enhance understanding and applications of deep learning in a visual question-answering context.

  • Deep Learning
  • Persian Language
  • Visual Recognition
  • AI Technology

Uploaded on Feb 16, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Persian Visual Question Persian Visual Question Answering Answering Alireza Asghari Maryam Sadat Hashemi Final Project of Deep Learning Course Spring 2020

  2. 1) VQA Task 2/16/2025 1

  3. 1) VQA Task What color is the baby's shirt? 2/16/2025 2

  4. 1) VQA Task VQA System What color is the baby's shirt? 2/16/2025 3

  5. 1) VQA Task VQA System Red What color is the baby's shirt? 2/16/2025 4

  6. 1) VQA Task Persian VQA System 2/16/2025 5

  7. 2) VQA Applications An aid to visually-impaired or blind persons 2/16/2025 6

  8. 2) VQA Applications An aid to visually-impaired or blind persons Interacting with a robot 2/16/2025 7

  9. 2) VQA Applications An aid to visually-impaired or blind persons Interacting with a robot An aid to clinicians to interpret complex medical images. 2/16/2025 8

  10. 3) Dataset VQA v1 Image Questions Annotations Train 82,783 248,349 2,483,490 Validation 40,504 121,512 1,215,120 Test 81,434 244,302 - 2/16/2025 9

  11. 3) Dataset VQA v1 Image Questions Annotations Train 82,783 248,349 2,483,490 Validation 40,504 121,512 1,215,120 Test 81,434 244,302 - 2/16/2025 10

  12. 3) Dataset 2/16/2025 11

  13. 3) Dataset 2/16/2025 12

  14. 3) Dataset 2/16/2025 13

  15. 4) Methods LSTM Q +norm I SAN HieCoAttention 2/16/2025 14

  16. 4) LSTM Q + norm I 2/16/2025 15

  17. 4) Methods lstmQ+normI SAN HieCoAttention 2/16/2025 16

  18. 4) SAN 2/16/2025 17

  19. 4) Methods lstmQ+normI SAN HieCoAttention 2/16/2025 18

  20. 4) HieCoAttention 2/16/2025 19

  21. 4) Evaluation , overfit? and solution 2/16/2025 20

  22. Soft overall acc : Soft overall acc : 52.90 Patience : Patience : 10 52.90 10 ep ep Soft overall acc : Soft overall acc : 52.85 Patience : Patience : 5 5 ep 52.85 ep Hard overall acc : Hard overall acc : 52.82 Patience : Patience : 3 3 ep 52.82 ep 2/16/2025 21

  23. 5) Results Method Y/N Num Other All Coattention_targoman 74.18 32.41 32.47 48.07 lstmQ+VGG19(T) 75.58 32.61 33.53 49.15 lstmQ+VGG19(baseline) 76.14 32.97 35.78 50.53 SAN_LSTM_2_targoman 75.95 31.61 36.82 50.81 SAN_CNN_2_Targoman 76.48 32.29 37.37 51.37 Coattention_google 76.62 32.7 38.12 51.85 BilstmQ+resNet152 76.46 31.63 38.6 51.89 lstmQ+resNet152 76.83 31.75 38.77 52.13 SAN_LSTM_1 77.46 32.23 38.35 52.22 SAN_LSTM_3_Google 77.12 32.56 38.62 52.27 lstmQ+VGG19(en-paperToken) 78.43 33.7 37.99 52.58 SAN_CNN_2_google 77.49 33.17 39.18 52.76 lstmQ+VGG19 (en-kerasToken) 78.53 31.91 38.78 52.79 cnnQ+resNet152 78.34 31.91 38.98 52.82 SAN_LSTM_2_google 77.83 33.19 39.08 52.84 22

  24. 5) Results Method Y/N Num Other All lstmQ+VGG19(T) 76.86 31.85 36.26 50.91 lstmQ+VGG19(baseline) 76.74 32.5 36.98 51.3 cnnQ+resNet152 78.38 32.36 38.99 52.9 BilstmQ+resNet152 78.22 33 39.89 53.37 lstmQ+resNet152 78.5 31.76 40.4 53.58 lstmQ+VGG19 (en-kerasToken) 79.41 33.62 39.42 53.66 lstmQ+VGG19(en-paperToken) 79.34 32.69 40.41 54.01 23

  25. Demo 2/16/2025 24

Related


More Related Content