Understanding Backpropagation and Convolutional Neural Networks

cs5670 computer vision noah snavely n.w
1 / 175
Embed
Share

Discover the concepts of backpropagation and convolutional neural networks in computer vision with examples and explanations. Learn how to find optimal parameters to minimize loss, emphasizing the crucial step of gradient computation through backpropagation.

  • Backpropagation
  • Convolutional Neural Networks
  • Computer Vision
  • Optimization
  • Gradient Computation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CS5670: Computer Vision Noah Snavely Lecture 25: Backprop and convnets Image credit: Aphex34, [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)] Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/

  2. Review: Setup (2) (1) s x h(1) h(2) Function Function y L - Goal: Find a value for parameters ( (1,) (2), ), so that the loss (L) is small

  3. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L Toy Example:

  4. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L Toy Example: Loss W(1) 12 A weight somewhere in the network

  5. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L Toy Example: Loss W(1) 12 A weight somewhere in the network

  6. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L Toy Example: Loss W(1) 12 A weight somewhere in the network

  7. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L Toy Example: Loss W(1) 12 A weight somewhere in the network

  8. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L L Toy Example: W(1) 12 Loss 1 W(1) 12 A weight somewhere in the network

  9. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L L Toy (Gradient) Example: W(1) 12 Loss 1 W(1) 12 A weight somewhere in the network

  10. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L L Toy (Gradient) Example: W(1) 12 Loss Take a step 1 W(1) 12 A weight somewhere in the network

  11. Review: Setup (2) W (1), b(1) x s W(1)x +b(1) h(1) h(2) Function y L L L Toy (Gradient) Example: W(1) 12 Loss 1 How do we get the gradient? Backpropagation W(1) 12 A weight somewhere in the network

  12. Backprop It s just the chain rule

  13. Backpropagation [Rumelhart, Hinton, Williams. Nature 1986]

  14. Chain rule recap I hope everyone remembers the chain rule: L = L h x h x

  15. Chain rule recap I hope everyone remembers the chain rule: L = L h x h x x Forward propagation: h L h L x Backward propagation:

  16. Chain rule recap I hope everyone remembers the chain rule: L = L h x h x x Forward propagation: h L h L x Backward propagation: (extends easily to multi-dimensional x and y)

  17. Slide from Karpathy 2016

  18. Slide from Karpathy 2016

  19. Slide from Karpathy 2016

  20. Slide from Karpathy 2016

  21. Slide from Karpathy 2016

  22. Slide from Karpathy 2016

  23. Gradients add at branches Activation

  24. Gradients add at branches Activation Gradient

  25. Gradients add at branches Activation Gradient +

  26. Gradients copy through sums Activation +

  27. Gradients copy through sums Activation + Gradient

  28. Gradients copy through sums Activation + Gradient

  29. Gradients copy through sums Activation + Gradient The gradient flows through both branches at full strength

  30. Symmetry between forward and backward + + Forward: copy Backward: add Forward: add Backward: copy

  31. Forward Propagation: (1) (n) s x h(1) L Function Function

  32. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation:

  33. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation: L

  34. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation: L s L

  35. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation: L (n) L s L Function

  36. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation: L (n) L h(1) L s L Function

  37. Forward Propagation: (1) (n) s x h(1) L Function Function Backward Propagation: L (1) L x L (n) L h(1) L s L Function Function

  38. What to do for each layer

  39. L (n) L h(n) L Layer n Layer n +1 h(n 1)

  40. L (n) This is what we want for each layer L h(n) L Layer n Layer n +1 h(n 1)

  41. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1)

  42. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer:

  43. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L (n) = L h(n ) (n) h(n) What we want

  44. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L h(n) h(n ) L (n)= (n) What we want

  45. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L h(n) h(n) L (n)= (n) What we want This is just the local gradient of layer n

  46. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L h(n) h(n) L (n)= L = L h(n) h(n 1) (n) h(n 1) h(n) What we want This is just the local gradient of layer n

  47. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L h(n) h(n) L (n)= L h(n) h(n) L = (n) h(n 1) h(n 1) What we want This is just the local gradient of layer n

  48. L (n) This is what we want for each layer To compute it, we need to propagate this gradient L h(n) L Layer n Layer n +1 h(n 1) For each layer: L h(n) h(n) L (n)= L h(n) h(n) h(n 1) L = (n) h(n 1) What we want This is just the local gradient of layer n

  49. Summary For each layer, we compute: Propagated gradient to the left = Propagated gradient from right Local gradient

  50. Summary For each layer, we compute: Propagated gradient to the left = Propagated gradient from right Local gradient (Can compute immediately)

Related


More Related Content