Foundations of Probabilistic Models with Latent Variables

probabilistic models with latent variables n.w
1 / 29
Embed
Share

Explore the foundations of probabilistic models with latent variables in algorithms and machine learning. Learn about density estimation, latent variables, generative mixture models, and tasks in a mixture model. Discover concepts like unsupervised learning, sub-populations, and parameter estimation in the context of Gaussian mixture models.

  • Probabilistic Models
  • Latent Variables
  • Density Estimation
  • Generative Models
  • Machine Learning

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Probabilistic Models with Latent Variables 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  2. Density Estimation Problem Learning from unlabeled data ?1,?2, ,?? Unsupervised learning, density estimation Empirical distribution typically has multiple modes 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  3. Density Estimation Problem From http://yulearning.blogspot.co.uk From http://courses.ee.sun.ac.za/Pattern_Recognition_813 3 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  4. Density Estimation Problem Conv. composition of unimodal pdf s: multimodal pdf ? ? = ?????(?) where ???= 1 Physical interpretation Sub populations 4 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  5. Latent Variables Introduce new variable ?? for each ?? Latent / hidden: not observed in the data Probabilistic interpretation Mixing weights: ?? ?(??= ?) Mixture densities: ??(?) ?(?|??= ?) 5 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  6. Generative Mixture Model For ? = 1, ,? ??~??? ???? ??~??? ?(?|??) ?? ?(??,??) = ? ???(??|??) ?? ? ?? = ??(??,??) recovers mixture distribution ? Plate Notation 6 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  7. Tasks in a Mixture Model Inference ? ? ? = ?(??|??) ? Parameter Estimation Find parameters that e.g. maximize likelihood Does not decouple according to classes Non convex, many local minima 7 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  8. Example: Gaussian Mixture Model Model For ? = 1, ,? ??~??? ????(?) ?? | ??= ?~??? ?(?|??, ) Inference ?(??= ?|??;?, ) Soft-max function 8 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  9. Example: Gaussian Mixture Model Loglikelihood Which training instance comes from which component? ? ? = log? ?? = log ? ??= ? ?(??|??= ?) ? ? ? No closed form solution for maximizing ? ? Possibility 1: Gradient descent etc Possibility 2: Expectation Maximization 9 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  10. Expectation Maximization Algorithm Observation: Know values of ?? easy to maximize Key idea: iterative updates Given parameter estimates, infer all ?? variables Given inferred ?? variables, maximize wrt parameters Questions Does this converge? What does this maximize? 10 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  11. Expectation Maximization Algorithm Complete loglikelihood ??? = log? ??,?? = log? ???(??|??) ? ? Problem: ?? not known Possible solution: Replace w/ conditional expectation Expected complete loglikelihood ? ?,???? = ?[ log? ??,?? ] ? Wrt ?(?|?,????) where ???? are the current parameters 11 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  12. Expectation Maximization Algorithm ? ?,???? = ?[ log? ??,?? ] ? = ? ? ??= ? log[???(??|??)] ? ? = ?(??= ?|??,????)log[???(??|??)] ? ? = ???log??+ ???log?(??|??) ? ? ? ? Where ???= ?(??= ?|??,????) Compare with likelihood for generative classifier 12 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  13. Expectation Maximization Algorithm Expectation Step Update ??? based on current parameters ??? ??????,? ???? ??????,? ???= Maximization Step Maximize ? ?,???? wrt parameters Overall algorithm Initialize all latent variables Iterate until convergence M Step E Step 13 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  14. Example: EM for GMM E Step remains the step for all mixture models M Step ??= ???? ? ?????? ?? =?? ? ??= =? Compare with generative classifier 14 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  15. Analysis of EM Algorithm Expected complete LL is a lower bound on LL EM iteratively maximizes this lower bound Converges to a local maximum of the loglikelihood 15 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  16. Bayesian / MAP Estimation EM overfits Possible to perform MAP instead of MLE in M-step EM is partially Bayesian Posterior distribution over latent variables Point estimate over parameters Fully Bayesian approach is called Variational Bayes 16 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  17. (Lloyds) K Means Algorithm Hard EM for Gaussian Mixture Model Point estimate of parameters (as usual) Point estimate of latent variables Spherical Gaussian mixture components 2 = argmax ?? ?(??= ?|??,?) = argmin ?? ?? 2 ? k ?:??=??? ? Where ??= Most popular hard clustering algorithm 17 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  18. K Means Problem Given {??}, find k means (?1 assignments (?1 ? ,? = argmin ) and data , ,?? ) such that , ,?? 2 ?,? ?? ??? 2 ? Note: ?? is k-dimensional binary vector 18 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  19. Model selection: Choosing K for GMM Cross validation Plot likelihood on training set and validation set for increasing values of k Likelihood on training set keeps improving Likelihood on validation set drops after optimal k Does not work for k-means! Why? 19 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  20. Principal Component Analysis: Motivation Dimensionality reduction Reduces #parameters to estimate Data often resides in much lower dimension, e.g., on a line in a 3D space Provides understanding Mixture models very restricted Latent variables restricted to small discrete set Can we relax the latent variable? 20 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  21. Classical PCA: Motivation Revisit K-means ?,? ? ?,? = ? ??? 2 min ? W: matrix containing means Z: matrix containing cluster membership vectors How can we relax Z and W? 21 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  22. Classical PCA: Problem ?,? ? ?,? = |? ???|2 min ? X : ? ? Arbitrary Z of size ? ?, Orthonormal W of size ? ? 22 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  23. Classical PCA: Optimal Solution 1 ? ????? Empirical covariance matrix = Scaled and centered data ? = ?? where ?? contains L Eigen vectors for the L largest Eigen values of ??= ???? ? Alternative solution via Singular Value Decomposition (SVD) W contains the principal components that capture the largest variance in the data 23 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  24. Probabilistic PCA Generative model ?(??) = ?(??|?0, 0) ?(??|??) = ?(??|???+ ?, ) forced to be diagonal Latent linear models Factor Analysis Special Case: PCA with = ?2 ? 24 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  25. Visualization of Generative Process From Bishop, PRML 25 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  26. Relationship with Gaussian Density ???[?] = ??? + Why does need to be restricted? Intermediate low rank parameterization of Gaussian covariance matrix between full rank and diagonal Compare #parameters 26 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  27. EM for PCA: Rod and Springs From Bishop, PRML 27 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  28. Advantages of EM Simpler than gradient methods w/ constraints Handles missing data Easy path for handling more complex models Not always the fastest method 28 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  29. Summary of Latent Variable Models Learning from unlabeled data Latent variables Discrete: Clustering / Mixture models ; GMM Continuous: Dimensionality reduction ; PCA Summary / Understanding of data Expectation Maximization Algorithm 29 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Related


More Related Content