
Machine Learning Concepts: From Basic Ideas to Case Studies
Dive into the world of machine learning with this comprehensive overview covering fundamental concepts, strategies, case studies like Pokémon vs. Digimon, and the role of unknown parameters in creating classifiers. Explore loss functions, optimization, model complexities, and training examples to enhance your understanding of ML processes.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Review: Basic Idea of ML https://youtu.be/Ye018rCVvOo https://youtu.be/bHcJCp2Fyxs Step 1: function with unknown Step 2: define loss Step 3: optimization
Review: Strategy https://youtu.be/WeHM2xpYQpw More parameters, easier to overfit. Why?
Case Study: Pokmon v.s. Digimon https://medium.com/@tyreeostevenson/teaching-a-computer-to-classify-anime-8c77bc89b881
Pokmon/Digimon Classifier We want to find a function Pok mon or Digimon ? = Determine a function with unknown parameters (based on domain knowledge)
Observation Digimon ? Pok mon ?
Observation ? = 3558 Edge detection 3558 ? = 7389 7389
Function with Unknown Parameters Digimon If ? ? = Pok mon ? : function with threshold If ? < : number of candidate functions (model complexity ) = 1,2, ,10,000
Loss of a function (given data) Given a dataset ? Pok mon ?1, ?1, ?2, ?2, , ??, ?? ? = Loss of a threshold given data set ? ? I ? ?? ?? ? ,? =1 Error rate ? ,??, ?? ? ?=1 If ? ?? ?? Output 1 Otherwise Don t like it? Of course, you can choose cross-entropy. Output 0
Training Examples If we can collect all Pok mons and Digimons in the universe ????, we can find the best threshold ??? ???= ???min ? ,???? We only collect some examples ?????? from ???? ?1, ?1, ?2, ?2, , ??, ?? ??????= ??, ??~???? independently and identically distributed (i.i.d.) ? ,?????? ?????= ???min
Training Examples If we can collect all Pok mons and Digimons in the universe ????, we can find the best threshold ??? ???= ???min ? ,???? We only collect some examples ?????? from ???? ?????= ???min ? ,?????? We hope ? ?????,???? and ? ???,???? are close.
We hope ? ?????,???? and ? ???,???? are close. All Pok mons and Digimons we know as ???? Pok mon: 819 Digimon: 971 In most applications, you cannot obtain ????. (Testing data ????? as the proxy of ????) ???= 4824 ? ???,???? = 0.28 Source of Digimon: https://github.com/mrok273/Qiita Source of Pok mon: https://www.kaggle.com/kvpratama/pokemon- images-dataset/data
We hope ? ?????,???? and ? ???,???? are close. Sample 200 Pok mons and Digimons as ??????1 All Pok mons and Digimons we know as ???? ???= 4824 ? ???,???? = 0.28 ?????1= 4727 ? ?????1,??????1 = 0.27 Even lower than ? ???,????? ? ?????1,???? = 0.28
We hope ? ?????,???? and ? ???,???? are close. Sample 200 Pok mons and Digimons as ??????2 All Pok mons and Digimons we know as ???? ???= 4824 ? ???,???? = 0.28 ?????2= 3642 ? ?????2,??????2 = 0.20 ? ?????2,???? = 0.37
? ?????,?????? can be smaller than ? ???,???? What do we want? ? ?????,???? ? ???,???? ? We want What kind of ??????fulfill it? ,|? ,?????? ? ,????| ?/2 ??????is a good proxy of ???? for evaluating loss ? given any .
What do we want? ? ?????,???? ? ???,???? ? We want What kind of ??????fulfill it? ,|? ,?????? ? ,????| ?/2 ? ?????,???? ? ?????,?????? + ?/2 ?????= ???min ? ???,?????? + ?/2 ? ,?????? ? ???,???? + ?/2 + ?/2 = ? ???,???? + ?
What do we want? ? ?????,???? ? ???,???? ? We want What kind of ??????fulfill it? ,|? ,?????? ? ,????| ?/2 ? = ?/2 We want to sample good?????? ,|? ,?????? ? ,????| ? What is the probability of sampling bad???????
Very General! The following discussion is model-agnostic. In the following discussion, we don t have assumption about data distribution. In the following discussion, we can use any loss function.
Probability of Failure good?????? ??????2 bad?????? ??????1 Each point is a training set.
Each point is a training set. Probability of Failure ? ?????? ?? ???
Each point is a training set. Probability of Failure If a ?????? is bad, at least one makes |? ,?????? ? ,????| > ? ? ?????? ?? ??? ??? ?? 1 ? ?????? ?? ??? ??? ?? 2 2 1 3
? ?????? ?? ??? = ? ?????? ?? ??? ??? ?? ? ?????? ?? ??? ??? ?? 2 1 3
? ?????? ?? ??? = ? ?????? ?? ??? ??? ?? ? ?????? ?? ??? ??? ?? ? ? ,? =1 ? ,??, ?? |? ,?????? ? ,????| > ? ? ?=1 Loss of an example ? ,??, ?? ? ,???? Average ? ,?????? Average
? ?????? ?? ??? = ? ?????? ?? ??? ??? ?? ? ?????? ?? ??? ??? ?? Hoeffding s Inequality: 2??? 2??2 ? ?????? ?? ??? ??? ?? The range of loss ? is 0,1 ? is the number of examples in ??????
? ?????? ?? ??? = ? ?????? ?? ??? ??? ?? ? ?????? ?? ??? ??? ?? 2??? 2??2 = 2??? 2??2 How to make ? ?????? ?? ??? smaller? Larger ? and smaller
? ?????? ?? ??? 2??? 2??2 Larger ? 1 2 1 2 3 3
? ?????? ?? ??? 2??? 2??2 Smaller 1 1 2 3 3
= 1,2,,10,000 ?1, ?1, ?2, ?2, , ??, ?? ??????= Example ,|? ,?????? ? ,????| ? ? ?????? ?? ??? 2??? 2??2 = 10000, ? = 100,? = 0.1 Usually happen QQ ? ?????? ?? ??? 2707 = 10000, ? = 500,? = 0.1 ? ?????? ?? ??? 0.91 = 10000, ? = 1000,? = 0.1 ? ?????? ?? ??? 0.00004
Example ? ?????? ?? ??? 2??? 2??2 If we want ? ?????? ?? ??? ? How many training examples do we need? ? ??? 2 /? 2??? 2??2 ? 2?2 = 10000, ? = 0.1,? = 0.1 ? 610
Model Complexity ? ?????? ?? ??? 2??? 2??2 The number of possible functions you can select What if the parameters are continuous? Answer 1: Everything that happens in a computer is discrete. Answer 2: VC-dimension (not this course)
Model Complexity ? ?????? ?? ??? 2??? 2??2 Why don t we simply use a very small ? ??????is good means ,|? ,?????? ? ,????| ? Larger loss ? ?????,???? ? ???,???? ? ? = ?/2 ???= ???min ? ,???? fewer candidates
Tradeoff of Model Complexity ? ?????,???? ? ???,???? ? Larger ? and smaller Larger ? ???,???? Smaller smaller larger ? ?????,???? ? ?????,???? small large ? ???,???? large ? ???,???? small Yes, Deep Learning.