Machine Learning Model Selection and Bias-Variance Tradeoff

ece 5424 introduction to machine learning n.w
1 / 39
Embed
Share

Explore the concepts of model selection, error decomposition, and the bias-variance tradeoff in machine learning. Learn about techniques such as Naive Bayes classification and how to navigate issues like overfitting and underfitting. Dive into the importance of picking the right model class, hyperparameters, and feature selection for optimal performance.

  • Machine Learning
  • Model Selection
  • Bias-Variance Tradeoff
  • Naive Bayes
  • Error Decomposition

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 5424: Introduction to Machine Learning Topics: (Finish) Model selection Error decomposition Bias-Variance Tradeoff Classification: Na ve Bayes Readings: Barber 17.1, 17.2, 10.1-10.3 Stefan Lee Virginia Tech

  2. Administrativia HW2 Due: Wed 09/28, 11:55pm Implement linear regression, Na ve Bayes, Logistic Regression Project Proposal due tomorrow by 11:55pm!! (C) Dhruv Batra 2

  3. Recap of last time (C) Dhruv Batra 3

  4. Regression (C) Dhruv Batra 4

  5. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 5

  6. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 6

  7. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 7

  8. What you need to know Linear Regression Model Least Squares Objective Connections to Max Likelihood with Gaussian Conditional Robust regression with Laplacian Likelihood Ridge Regression with priors Polynomial and General Additive Regression (C) Dhruv Batra 8

  9. Plan for Today (Finish) Model Selection Overfitting vs Underfitting Bias-Variance trade-off aka Modeling error vs Estimation error tradeoff Na ve Bayes (C) Dhruv Batra 9

  10. New Topic: Model Selection and Error Decomposition (C) Dhruv Batra 10

  11. Model Selection How do we pick the right model class? Similar questions How do I pick magic hyper-parameters? How do I do feature selection? (C) Dhruv Batra 11

  12. Errors Expected Loss/Error Training Loss/Error Validation Loss/Error Test Loss/Error Reporting Training Error (instead of Test) is CHEATING Optimizing parameters on Test Error is CHEATING (C) Dhruv Batra 12

  13. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 13

  14. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 14

  15. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 15

  16. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 16

  17. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 17

  18. Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w such that: (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

  19. Error Decomposition Reality (C) Dhruv Batra 19

  20. Error Decomposition Reality (C) Dhruv Batra 20

  21. Error Decomposition Reality Higher-Order Potentials (C) Dhruv Batra 21

  22. Error Decomposition Approximation/Modeling Error You approximated reality with model Estimation Error You tried to learn model with finite data Optimization Error You were lazy and couldn t/didn t optimize to completion Bayes Error Reality just sucks (i.e. there is a lower bound on error for all models, usually non-zero) (C) Dhruv Batra 22

  23. Bias-Variance Tradeoff Bias: difference between what you expect to learn and truth Measures how well you expect to represent true solution Decreases with more complex model Variance: difference between what you expect to learn and what you learn from a from a particular dataset Measures how sensitive learner is to specific dataset Increases with more complex model (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

  24. Bias-Variance Tradeoff Matlab demo (C) Dhruv Batra 24

  25. Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

  26. (C) Dhruv Batra Slide Credit: Greg Shakhnarovich 26

  27. Learning Curves Error vs size of dataset On board High-bias curves High-variance curves (C) Dhruv Batra 27

  28. Debugging Machine Learning My algorithm doesn t work High test error What should I do? More training data Smaller set of features Larger set of features Lower regularization Higher regularization (C) Dhruv Batra 28

  29. What you need to know Generalization Error Decomposition Approximation, estimation, optimization, bayes error For squared losses, bias-variance tradeoff Errors Difference between train & test error & expected error Cross-validation (and cross-val error) NEVER EVER learn on test data Overfitting vs Underfitting (C) Dhruv Batra 29

  30. New Topic: Na ve Bayes (your first probabilistic classifier) x Classification y Discrete (C) Dhruv Batra 30

  31. Classification Learn: h:X X features Y target classes Y Suppose you know P(Y|X) exactly, how should you classify? Bayes classifier: Why? Slide Credit: Carlos Guestrin

  32. Optimal classification Theorem: Bayes classifier hBayes is optimal! That is Proof: Slide Credit: Carlos Guestrin

  33. Generative vs. Discriminative Generative Approach Estimate p(x|y) and p(y) Use Bayes Rule to predict y Discriminative Approach Estimate p(y|x) directly OR Learn discriminant function h(x) (C) Dhruv Batra 33

  34. Generative vs. Discriminative Generative Approach Assume some functional form for P(X|Y), P(Y) Estimate p(X|Y) and p(Y) Use Bayes Rule to calculate P(Y| X=x) Indirectcomputation of P(Y|X) through Bayes rule But, can generate a sample, P(X) = y P(y) P(X|y) Discriminative Approach Estimate p(y|x) directly OR Learn discriminant function h(x) Direct but cannot obtain a sample of the data, because P(X) is not available (C) Dhruv Batra 34

  35. Generative vs. Discriminative Generative: Today: Na ve Bayes Discriminative: Next: Logistic Regression NB & LR related to each other. (C) Dhruv Batra 35

  36. How hard is it to learn the optimal classifier? Categorical Data How do we represent these? How many parameters? Class-Prior, P(Y): Suppose Y is composed of k classes Likelihood, P(X|Y): Suppose X is composed of d binary features Complex model High variance with limited data!!! Slide Credit: Carlos Guestrin

  37. Independence to the rescue (C) Dhruv Batra Slide Credit: Sam Roweis 37

  38. The Nave Bayes assumption Na ve Bayes assumption: Features are independent given class: More generally: How many parameters now? Suppose X is composed of d binary features (C) Dhruv Batra Slide Credit: Carlos Guestrin 38

  39. The Nave Bayes Classifier Given: Class-Prior P(Y) d conditionally independent features X given the class Y For each Xi, we have likelihood P(Xi|Y) Decision rule: If assumption holds, NB is optimal classifier! (C) Dhruv Batra Slide Credit: Carlos Guestrin 39

More Related Content