Naive Bayes and Basic Probability Concepts

na ve bayes n.w
1 / 11
Embed
Share

Explore the concepts of Naive Bayes, discriminative and generative models in supervised learning, probability principles such as total probability and Bayes theorem, and an example illustrating the application of Bayes theorem. Learn to estimate probabilities and apply the concepts into practical scenarios.

  • Naive Bayes
  • Probability Concepts
  • Supervised Learning
  • Bayes Theorem
  • Machine Learning

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Nave Bayes Geoff Hulten

  2. Two Approaches to Supervised Learning Discriminative Model ?(?|?) directly (the posterior probability) Generative Model ?(?,?) (the joint probability) Use Bayes Rule to get ?(?|?) Techniques: Logistic Regression Decision Trees Neural Networks Techniques: Naive Bayes Bayesian Networks Hidden Markov Models https://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf

  3. Some Probability Concepts Both Events ? ? ? = 0 All Possibilities P(*) = 1.0 A Either Event ? ? ? =.3 Sample a point: If out of region: ? occurred If in region: ? occurred Probability A occurs given that we know B occurred ?2 Some Event ? ? = .2 Conditional Events ? ?|? = ? ?|? = B 0 0 Some Event ? ? = .1 ?1

  4. Some Probability Concepts Events ? ? = .2 ? ? = .1 ? ? = .1 Both Events ? ? ? = ? ? ? 0 .05 ? ? ? .05 ? ? = .2 ? ? ? covers A Either Event ? ? ? = ? ? ? .3 .25 about 25% of ? ? C ?2 Conditional Events ? ?|? = ? ?|? = ? ?|? ? ?|? B 0 0 .5 .25 ?1

  5. Basic Rules of Probability ?3= (?1 ?2) Theorem of Total Probability Product Rule Sum Rule ?1 A B ?2 A B B Don t Double Count If ?1, ,??are mutually exclusive and ?=1 ? ?? = 1 then: ? ? ? ? = ? ? ? ? ? = ? ? ? ?(?) ? ? ? = ? ? + ? ? ?(? ?) ? ? ? = ? ? ???(??) ?=1

  6. Bayes Theorem Product Rule Bayes Theorem A B ? ? ? = ? ? ? ? ? = ? ? ? ?(?) ? ? ? ? ? = ? ? ? ?(?) ? ? ? =? ? ? ?(?) ?(?)

  7. Example of using Bayes Theorem ? ? ? =? ? ? ?(?) 0.008 0.992 ?(?) 0.98 0.02 0.03 ? ?????? + =? + ?????? ?(??????) ?(+) 0.97 ? + ?????? = 0.98 ?? ? ? < ? > =? < ? > ? ?(?) ?(< ? >) ? ?????? = 0.008 ? ?????? = 0.02 ? ?????? + =0.98 0.008 ?(+) ? + ?????? = 0.03 ? ?????? + =0.98 0.008 0.0376 Theorem of total probability ? ?????? + = .208 ? + ? + = ? + ?????? ? ?????? + ? + ?????? ?( ??????) ? + = 0.98 0.008 + 0.03 0.992 ? + = 0.0376

  8. For each value of ?: Record # of times each possible <?>,? occurs in training Estimate ? < ? > ?) as: # ????? < ? >,? ??????? # ?? ????? ? ??????? Na ve Bayes ? < ? > ??) ?(??) ?(< ? >) ? = argmax Won t scale to many ?? ? ?? ? Assume each ?? is independent of all the others For each value of ?: For each value of ??: Record # of times ??= ? ? = argmax ? < ? > ??) ?(??) ?? ? Estimate ? < ? > ?) as: ??(??= ?|?) ? = argmax ?(??) ?(??|??) Independence Assumption ?? ? ?

  9. Nave Bayes Example ? = argmax ?(??) ?(??|??) ?? ? ? ? = 0 ???????? ???? p(??= 0) 4/6 ?? ?? ? p(??= 1) ?? ?? ? 2/6 ? = 0 0 0 0 p(?2= 0) 2/6 0 0 ? 0 1 1 p(?2= 1) 4/6 0 1 0 1 ~ ? ? = 1 ? ?1= 0|? = 1 ?(?2= 0|? = 1) 1 ~ 0.4 0.25 ? = 1 1 0 1 0.25 p(??= 0) 1/4 1 1 0 1 ~ .025 p(??= 1) 3/4 1 0 1 p(?2= 0) 1/4 1 1 0 0 ~ ? ? = 0 ? ?1= 0|? = 0 ?(?2= 0|? = 0) 0 ~ 0.6 0.66 p(?2= 1) 3/4 0 0 0 0.33 0 1 0 0 ~ .13 p(y = 0) 6/10 1 0 1 p(y = 1) 4/10

  10. Quick Note about MLE vs MAP Maximum Likelihood Estimation (MLE) Maximum A Posteriori (MAP) Estimate probability from observations of data Estimate probability from data and prior assumptions #????? #????? #????? + ? ? #????? + ? ? ????? = ? ????? = For assignments use m-estimates with uniform p and small m ? ??????? ?? ????? ??? ?????????? ?????? ???? ? ????? ???????? ?? ? ????? Example ? 10 ??? ? .5 22 50= 44% 22 + 5 50 + 10= 45% Flip a coin 50 times, see 22 heads ? ????? = ? ????? = 1 3= 33% 1 + 5 3 + 10= 46% Flip a coin 3 times, see 1 head ? ????? = ? ????? = 0 + 5 0 + 10= 50% Flip a coin 0 times, see 0 heads ? ????? = 0 0=??% ? ????? =

  11. Nave Bayes Summary Overview Structure A very simple probabilistic model All ??are independent of each other It is Good For An introduction to Generative modeling Bayesian thought ML in general Loss Nothing: estimate the parameters with MAP from training data + weak uniform prior In practice Nothing (Maybe as a baseline) Optimization None: just count the occurrences in the data

More Related Content