Understanding Logistic Regression with Hung-yi Lee

classification logistic regression hung yi lee n.w
1 / 38
Embed
Share

Dive into the world of logistic regression with Hung-yi Lee in this comprehensive guide. Explore the fundamental concepts, steps, and applications of logistic regression, accompanied by visual aids and explanations. Enhance your knowledge of classification techniques and model evaluation through this informative content.

  • Logistic Regression
  • Hung-yi Lee
  • Classification
  • Data Science
  • Machine Learning

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Classification: Logistic Regression Hung-yi Lee

  2. Step 1: Function Set Function set: Including all different w and b ??,??1|? 0.5 z class 1 0 ??,??1|? < 0.5 z class 2 0 ( ) z ??,??1|? = ? ? ? = ? ? + ? = ????+ ? ? 1 ? ? = z 1 + ??? ?

  3. Step 1: Function Set = 1x + z w x b w i i 1 i i w z ( ) z ix + ??,??1|? I w ( ) z Sigmoid Function Ix b 1 ( ) z =1 + z e z

  4. Step 2: Goodness of a Function ?? ?1 ?2 ?3 Training Data ?1 ?2 ?1 ?1 Assume the data is generated based on ??,?? = ??,??1|? Given a set of w and b, what is its probability of generating the data? ? ?,? = ??,??1??,??2 1 ??,??3 ??,??? The most likely w* and b* is the one with the largest ? ?,? . ? ,? = ???max ?,?? ?,?

  5. ?1 ?2 ?3 ?1 ?3 ?2 ?3= 0 ?1= 1 ?2= 1 ?1 ?1 ?2 ??: 1 for class 1, 0 for class 2 ? ?,? = ??,??1??,??2 1 ??,??3 ? ,? = ???min ? ,? = ???max ?,? ??? ?,? ?,?? ?,? = ??? ?,? ?1??? ?1+ 1 ?1?? 1 ? ?1 1 = ????,??1 ????,??2 0 ?2??? ?2+ 1 ?2?? 1 ? ?2 1 0 ?? 1 ??,??3 ?3??? ?3+ 1 ?3?? 1 ? ?3 0 1

  6. Step 2: Goodness of a Function ? ?,? = ??,??1??,??2 1 ??,??3 ??,??? ??? ?,? = ????,??1+ ????,??2+ ?? 1 ??,??3 ??: 1 for class 1, 0 for class 2 ??????,???+ 1 ???? 1 ??,??? Cross entropy between two Bernoulli distribution = ? Distribution p: p ? = 1 = ?? p ? = 0 = 1 ?? Distribution q: q ? = 1 = ? ?? cross entropy q ? = 0 = 1 ? ?? ? ?,? = ? ? ?? ? ? ?

  7. Step 2: Goodness of a Function ? ?,? = ??,??1??,??2 1 ??,??3 ??,??? ??? ?,? = ????,??1+ ????,??2+ ?? 1 ??,??3 ??: 1 for class 1, 0 for class 2 ??????,???+ 1 ???? 1 ??,??? Cross entropy between two Bernoulli distribution = ? minimize ? ?? 1.0 0.0 1 ? ?? cross entropy Ground Truth ??= 1

  8. Step 3: Find the best function 1 ??,??? ? ?? ??????,???+ 1 ???? 1 ??,??? ??? ??? ?,? = ??? ??? ? ?? ??? ?????,?? ??? =?????,?? ?? ??? = ?? ? ? ?? ?? ? ?? ???? ? ?? 1 ?? ? ?? 1 = = ? ?? ? 1 ? ? ? ? ??,?? = ? ? = 1 1 + ??? ? ? = ? ? + ? = ????+ ? ?

  9. Step 3: Find the best function 1 ??,??? ? ? ??,????? ?? ??????,???+ 1 ???? 1 ??,??? ??? ??? ?,? = ??? ??? ? ??? 1 ??,?? ??? 1 ??,?? ?? ??? ?? ??? = ?? = ??? ?? 1 1 ?? ? ?? ??? 1 ? ? ?? = 1 ? ?? ? 1 ? ? = 1 ? ? ??,?? = ? ? = 1 1 + ??? ? ? = ? ? + ? = ????+ ? ?

  10. Step 3: Find the best function 1 ??,??? ? ? ??,????? ?? ??????,???+ 1 ???? 1 ??,??? ??? ??? ?,? = ??? ??? ? ? 1 ????,????? ? ??1 ??,??? = ?? ? ? ?? ????,??? ??,???+ ????,??? = ?? ? ? ?? ??,??? = ?? Larger difference, larger update ? ? ?? ??,??? ?? ?? ? ?? ?

  11. Logistic Regression + Square Error Step 1: ??,?? = ? ????+ ? ? Training data: ??, ??, ??: 1 for class 1, 0 for class 2 ? ? =1 2 ? ???,?? ? (??,?(?) ?)2 ??? Step 2: ??,??? ??2 Step 3: ?? ??? = 2 ??,?? ? ?? = 2 ??,?? ? ??,?? 1 ??,?? ?? ??= 1 If ??,???= 1 (close to target) ?? ???= 0 If ??,???= 0 (far from target) ?? ???= 0

  12. Logistic Regression + Square Error Step 1: ??,?? = ? ????+ ? ? Training data: ??, ??, ??: 1 for class 1, 0 for class 2 ? ? =1 2 ? ???,?? ? (??,?(?) ?)2 ??? Step 2: ??,??? ??2 Step 3: ?? ??? = 2 ??,?? ? ?? = 2 ??,?? ? ??,?? 1 ??,?? ?? ??= 0 If ??,???= 1 (far from target) ?? ???= 0 If ??,???= 0 (close to target) ?? ???= 0

  13. Cross Entropy v.s. Square Error Cross Entropy Total Loss Square Error w2 w1 http://jmlr.org/procee dings/papers/v9/gloro t10a/glorot10a.pdf

  14. Logistic Regression Linear Regression ??,?? = ????+ ? Step 1: ??,?? = ? ????+ ? ? ? Output: any value Output: between 0 and 1 Step 2: Step 3:

  15. Logistic Regression Linear Regression ??,?? = ????+ ? Step 1: ??,?? = ? ????+ ? ? ? Output: any value Output: between 0 and 1 Training data: ??, ?? Training data: ??, ?? ??: 1 for class 1, 0 for class 2 ??: a real number Step 2: ? ? =1 ? ? ??, ?? ? ?? ?? 2 ? ? = 2 ? ? Cross entropy: ? ? ??, ??= ????? ??+ 1 ???? 1 ? ??

  16. Logistic Regression Linear Regression ??,?? = ????+ ? Step 1: ??,?? = ? ????+ ? ? ? Output: any value Output: between 0 and 1 Training data: ??, ?? Training data: ??, ?? ??: 1 for class 1, 0 for class 2 ??: a real number Step 2: ? ? =1 ? ? ??, ?? ? ?? ?? 2 ? ? = 2 ? ? ? ?? ??,??? ?? ?? ? ?? Logistic regression: ? Step 3: ? ?? ??,??? Linear regression: ?? ?? ? ?? ?

  17. Discriminative v.s. Generative ? ?1|? = ? ? ? + ? Find ?1, ?2, 1 directly find w and b ??= ?1 ?2 ? 1 ? = 1 2?1 ? 1 1?1 +1 Will we obtain the same set of w and b? 2?2 ? 2 1?2+ ???1 ?2 The same model (function set), but different function may be selected by the same training data.

  18. Generative v.s. Discriminative Generative Discriminative All: hp, att, sp att, de, sp de, speed 73% accuracy 79% accuracy

  19. Generative v.s. Discriminative Example Training Data 1 0 0 1 X 4 X 4 X 4 1 1 0 0 Class 1 Class 2 Class 2 Class 2 How about Na ve Bayes? Testing Data 1 Class 1? Class 2? 1 ? ?|?? = ? ?1|??? ?2|??

  20. Generative v.s. Discriminative Example Training Data 1 0 0 1 X 4 X 4 X 4 1 1 0 0 Class 1 Class 2 Class 2 Class 2 1 13 ? ?1 = ? ?1= 1|?1 = 1 ? ?2= 1|?1 = 1 ? ?2 =12 ? ?1= 1|?2 =1 ? ?2= 1|?2 =1 13 3 3

  21. Training Data 1 0 0 1 X 4 X 4 X 4 1 1 0 0 Class 1 Class 2 Class 2 Class 2 1 13 1 1 <0.5 ? ?|?1? ?1 = ? ?1|? Testing Data 1 ? ?|?1? ?1 + ? ?|?2? ?2 1 1 13 1 3 1 12 13 1 1 3 1 13 ? ?1 = ? ?1= 1|?1 = 1 ? ?2= 1|?1 = 1 ? ?2 =12 ? ?1= 1|?2 =1 ? ?2= 1|?2 =1 13 3 3

  22. Generative v.s. Discriminative Usually people believe discriminative model is better Benefit of generative model With the assumption of probability distribution less training data is needed more robust to the noise Priors and class-dependent probabilities can be estimated from different sources.

  23. Multi-class Classification (3 classes as example) Probability: 1 > ??> 0 ???= 1 ?1= ?1 ? + ?1 ?2= ?2 ? + ?2 ?3= ?3 ? + ?3 ?1,?1 ?2,?2 ?3,?3 C1: C2: C3: ( ) x = | y P C i i Softmax 0.88 3 3 = j = j = j 20 20 z 1ze z = y e e e 1z j 1 1 1 0.12 3 1 2.7 2.7 z 2ze e z = 2z y e e j 2 2 1 0 0.05 0.05 3 -3 3z e 3ze z z = y e e j 3 3 1 3 = j + zj e 1

  24. [Bishop, P209-210] Multi-class Classification (3 classes as example) y y ?1= ?1 ? + ?1 1y 1 y Cross Entropy Softmax ?2= ?2 ? + ?2 2y 2 y ? 3 ?????? 3y ?3= ?3 ? + ?3 3 y ?=1 target If x class 3 If x class 2 If x class 1 0 0 1 0 1 0 1 0 0 ? = ? = ? = ???3 ???1 ???2

  25. Limitation of Logistic Regression x w x w z + + = 2 2 1 1 b 1x w 1 5 . 0 Class y 1 (? 0) z + y w 2 5 . 0 Class y (? < 0) 2 2x b Can we? 2x Input Feature Label x1 0 x2 0 z 0 z < 0 Class 2 0 1 Class 1 z < 0 z 0 1 0 Class 1 1 1 Class 2 1x

  26. Limitation of Logistic Regression : distance to 0 ?1 Feature transformation 0 : distance to 1 ?2 1 ?1 ?2 ?1 ?2 Not always easy .. domain knowledge can be helpful 2x ?2 0 1 1 0 1 2 2 0 0 0 1 1 1 0 1x ?1

  27. Limitation of Logistic Regression Cascading logistic regression models 1z 1x + ?1 z + y 2z + 2x ?2 Classification Feature Transformation (ignore bias in this figure)

  28. =0.73 =0.27 ?1 ?1 2x -1 2z 2x + =0.27 =0.05 ?1 ?1 -2 ?1 1x 2 2 2z =0.27 =0.05 ?2 ?2 + 2x ?2 -2 2x -1 =0.27 ?2 =0.73 ?2 1x

  29. =0.73 =0.27 ?1 ?1 ?1 w 1 2x z + y w 2 =0.27 =0.05 ?1 ?1 b ?2 2x =0.27 =0.05 ?2 ?2 (0.73, 0.05) 2x ?2 (0.27, 0.27) (0.05,0.73) =0.27 ?2 =0.73 ?2 1x ?1

  30. Deep Learning! All the parameters of the logistic regressions are jointly learned. 1z 1x + Neuron ?1 z + y 2z + 2x ?2 Classification Feature Transformation Neural Network

  31. Reference Bishop: Chapter 4.3

  32. Acknowledgement

  33. Appendix

  34. ?? ?? ?2 ?2 ?1 ?1 ?3 ?3 Three Steps ??= ????? 1,????? 2 Step 1. Function Set (Model) ? ? If ? ?1|? > 0.5, output: y = class 1 Otherwise, output: y = class 2 feature class ? ?1|? = ? ? ? + ? w and b are related to ?1,?2, ?1, ?2, Step 2. Goodness of a function ? ? ?? ?? ? ? ?? ?? ? ? = ? ? = ? ? Step 3. Find the best function: gradient descent

  35. Step 2: Loss function 0 class 1 +1 z ??,?? = < class 2 -1 z 0 Ideal loss: Approximation: ? ? ?? ?? ? ? ??, ?? ? ? = ? ? = ? ? 0 or 1 ? is the upper bound of ? ? ? ?? ?? Ideal loss ????

  36. Step 2: Loss function ? ? ??, ??: cross entropy ? ?? ??= +1 ??= 1 Ground Truth cross entropy 1.0 1 ? ?? If ??= +1: 1 ? ? ??, ??= ln? ?? = ln? ?? = ln 1 + ??? ?? = ln 1 + ??? ?? = ln 1 + ??? ???? If ??= 1: ? ? ??, ??= ln 1 ? ?? ??? ?? 1 + ??? ?? 1 = ln = ln 1 ? ?? = ln 1 + ??? ?? = ln 1 + ??? ?? = ln 1 + ??? ????

  37. Step 2: Loss function ? ? ??, ??: cross entropy ? ? ??, ??= ln 1 + ??? ???? Ideal loss ? ? ?? ?? Divided by ln2 here ????

Related


More Related Content