Dual SVM Formulation & Interpretation: Insights into Machine Learning

Slide Note

Exploring the Lagrangian duality in Support Vector Machines (SVM) through dual formulations, focusing on the linearly separable and non-separable cases. Uncover the significance of learning the dual SVM, its sparsity interpretation, and the kernel trick for faster problem-solving.

allegretti_l Follow

Uploaded on Apr 12, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ECE 5424: Introduction to Machine Learning Topics: SVM SVM dual & kernels Readings: Barber 17.5 Stefan Lee Virginia Tech

Lagrangian Duality On paper (C) Dhruv Batra 2

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 3

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

Dual SVM formulation the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 5

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The kernel trick !!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

Dual SVM interpretation: Sparsity (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra 10

Dot-product of polynomials Vector of Monomials of degree m (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

Higher order polynomials d input features m degree of polynomial m=4 number of monomial terms grows fast! m = 6, d = 100 m=3 D = about 1.6 billion terms m=2 number of input dimensions (d) (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernel / Radial Basis Function 2 Sigmoid (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Kernel Demo Demo http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSV M.html (C) Dhruv Batra 14

What is a kernel? k: X x X R Any measure of similarity between two inputs Mercer Kernel / Positive Semi-Definite Kernel Often just called kernel (C) Dhruv Batra 15

(C) Dhruv Batra Slide Credit: Blaschko & Lampert 16

(C) Dhruv Batra Slide Credit: Blaschko & Lampert 17

(C) Dhruv Batra Slide Credit: Blaschko & Lampert 18

(C) Dhruv Batra Slide Credit: Blaschko & Lampert 19

Finally: the kernel trick! Never represent features explicitly Compute dot products in closed form Constant-time high-dimensional dot- products for many classes of features Very interesting theory Reproducing Kernel Hilbert Spaces (C) Dhruv Batra Slide Credit: Carlos Guestrin 20

Kernels in Computer Vision Features x = histogram (of color, texture, etc) Common Kernels Intersection Kernel Chi-square Kernel (C) Dhruv Batra 21

What about at classification time For a new input x, if we need to represent (x), we are in trouble! Recall classifier: sign(w. (x)+b) Using kernels we are fine! (C) Dhruv Batra Slide Credit: Carlos Guestrin 22