Dual SVM Formulation & Interpretation: Insights into Machine Learning

Dual SVM Formulation & Interpretation: Insights into Machine Learning
Slide Note
Embed
Share

Exploring the Lagrangian duality in Support Vector Machines (SVM) through dual formulations, focusing on the linearly separable and non-separable cases. Uncover the significance of learning the dual SVM, its sparsity interpretation, and the kernel trick for faster problem-solving.

  • Machine Learning
  • Support Vector Machines
  • Dual Formulation
  • Lagrangian Duality
  • Kernel Trick

Uploaded on Apr 12, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 5424: Introduction to Machine Learning Topics: SVM SVM dual & kernels Readings: Barber 17.5 Stefan Lee Virginia Tech

  2. Lagrangian Duality On paper (C) Dhruv Batra 2

  3. Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 3

  4. Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

  5. Dual SVM formulation the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 5

  6. Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

  7. Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

  8. Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The kernel trick !!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

  9. Dual SVM interpretation: Sparsity (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

  10. Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra 10

  11. Dot-product of polynomials Vector of Monomials of degree m (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

  12. Higher order polynomials d input features m degree of polynomial m=4 number of monomial terms grows fast! m = 6, d = 100 m=3 D = about 1.6 billion terms m=2 number of input dimensions (d) (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

  13. Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernel / Radial Basis Function 2 Sigmoid (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

  14. Kernel Demo Demo http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSV M.html (C) Dhruv Batra 14

  15. What is a kernel? k: X x X R Any measure of similarity between two inputs Mercer Kernel / Positive Semi-Definite Kernel Often just called kernel (C) Dhruv Batra 15

  16. (C) Dhruv Batra Slide Credit: Blaschko & Lampert 16

  17. (C) Dhruv Batra Slide Credit: Blaschko & Lampert 17

  18. (C) Dhruv Batra Slide Credit: Blaschko & Lampert 18

  19. (C) Dhruv Batra Slide Credit: Blaschko & Lampert 19

  20. Finally: the kernel trick! Never represent features explicitly Compute dot products in closed form Constant-time high-dimensional dot- products for many classes of features Very interesting theory Reproducing Kernel Hilbert Spaces (C) Dhruv Batra Slide Credit: Carlos Guestrin 20

  21. Kernels in Computer Vision Features x = histogram (of color, texture, etc) Common Kernels Intersection Kernel Chi-square Kernel (C) Dhruv Batra 21

  22. What about at classification time For a new input x, if we need to represent (x), we are in trouble! Recall classifier: sign(w. (x)+b) Using kernels we are fine! (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

  23. Kernels in logistic regression Define weights in terms of support vectors: Derive simple gradient descent rule on i

  24. Kernels Kernel Logistic Regression Kernel Least Squares Kernel PCA (C) Dhruv Batra 24

More Related Content