Introduction to Expectation Maximization for General Latent Model Learning

ece 5984 introduction to machine learning n.w
1 / 38
Embed
Share

Explore the concepts of Expectation Maximization for Gaussian Mixture Models and general latent model learning in ECE 5984. Delve into topics such as K-means clustering and poster presentations, with insights from readings by Barber and contributions from Dhruv Batra at Virginia Tech.

  • Machine Learning
  • Gaussian Mixture Models
  • Latent Models
  • K-means Clustering
  • Expectation Maximization

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 5984: Introduction to Machine Learning Topics: Expectation Maximization For GMMs For General Latent Model Learning Readings: Barber 20.1-20.3 Dhruv Batra Virginia Tech

  2. Administrativia Poster Presentation: May 8 1:30-4:00pm 310 Kelly Hall: ICTAS Building Print poster (or bunch of slides) Format: Portrait Eg. 2 feet (width) x 4 feet (height) Less text, more pictures. Best Project Prize! (C) Dhruv Batra 2

  3. Recap of Last Time (C) Dhruv Batra 3

  4. Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

  5. K-means 1. Ask user how many clusters they d like. (e.g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin 5

  6. K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

  7. K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. (Thus each Center owns a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

  8. K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

  9. K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns 5. and jumps there 6. Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

  10. K-means Randomly initialize k centers (0) = 1(0), , k(0) Assign: Assign each point i {1, n} to nearest center: Recenter: j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

  11. K-means as Co-ordinate Descent Optimize objective function: Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

  12. K-means as Co-ordinate Descent Optimize objective function: Fix a (or C), optimize (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

  13. Object Bag of words Fei-Fei Li

  14. Clustered Image Patches Fei-Fei et al. 2005

  15. (One) bad case for k-means Clusters may overlap Some clusters may be wider than others GMM to the rescue! (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

  16. GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 16

  17. GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 17

  18. K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 18

  19. Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn t factorize All parameters coupled! (C) Dhruv Batra 19

  20. Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra Figure Credit: Kevin Murphy 20

  21. Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian collapses (C) Dhruv Batra 21

  22. Special case: spherical Gaussians and hard assignments If P(X|Z=k) is spherical, with same for all classes: 1 2 P(xi|z = j) exp - 2s2xi-mj If each xi belongs to one class C(i) (hard assignment), marginal likelihood: N k N 1 2 P(xi,y= j) exp - 2s2xi-mC(i) i=1 j=1 i=1 M(M)LE same as K-means!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

  23. The K-means GMM assumption There are k components Component i has an associated mean vector (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

  24. The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: (C) Dhruv Batra Slide Credit: Carlos Guestrin 24

  25. The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

  26. The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix x Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 26

  27. The General GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix i Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( i ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 27

  28. K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 28

  29. EM Expectation Maximization [Dempster 77] Often looks like soft K-means Extremely general Extremely useful algorithm Essentially THE goto algorithm for unsupervised learning Plan EM for learning GMM parameters EM for general unsupervised learning problems (C) Dhruv Batra 29

  30. EM for Learning GMMs Simple Update Rules E-Step: estimate P(zi = j | xi) M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra 30

  31. Gaussian Mixture Example: Start (C) Dhruv Batra Slide Credit: Carlos Guestrin 31

  32. After 1st iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 32

  33. After 2nd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 33

  34. After 3rd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 34

  35. After 4th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 35

  36. After 5th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 36

  37. After 6th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 37

  38. After 20th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 38

More Related Content