
Introduction to Expectation Maximization for General Latent Model Learning
Explore the concepts of Expectation Maximization for Gaussian Mixture Models and general latent model learning in ECE 5984. Delve into topics such as K-means clustering and poster presentations, with insights from readings by Barber and contributions from Dhruv Batra at Virginia Tech.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
ECE 5984: Introduction to Machine Learning Topics: Expectation Maximization For GMMs For General Latent Model Learning Readings: Barber 20.1-20.3 Dhruv Batra Virginia Tech
Administrativia Poster Presentation: May 8 1:30-4:00pm 310 Kelly Hall: ICTAS Building Print poster (or bunch of slides) Format: Portrait Eg. 2 feet (width) x 4 feet (height) Less text, more pictures. Best Project Prize! (C) Dhruv Batra 2
Recap of Last Time (C) Dhruv Batra 3
Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin 4
K-means 1. Ask user how many clusters they d like. (e.g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin 5
K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin 6
K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. (Thus each Center owns a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin 7
K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin 8
K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns 5. and jumps there 6. Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin 9
K-means Randomly initialize k centers (0) = 1(0), , k(0) Assign: Assign each point i {1, n} to nearest center: Recenter: j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin 10
K-means as Co-ordinate Descent Optimize objective function: Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin 11
K-means as Co-ordinate Descent Optimize objective function: Fix a (or C), optimize (C) Dhruv Batra Slide Credit: Carlos Guestrin 12
Object Bag of words Fei-Fei Li
Clustered Image Patches Fei-Fei et al. 2005
(One) bad case for k-means Clusters may overlap Some clusters may be wider than others GMM to the rescue! (C) Dhruv Batra Slide Credit: Carlos Guestrin 15
GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 16
GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 17
K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 18
Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn t factorize All parameters coupled! (C) Dhruv Batra 19
Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra Figure Credit: Kevin Murphy 20
Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian collapses (C) Dhruv Batra 21
Special case: spherical Gaussians and hard assignments If P(X|Z=k) is spherical, with same for all classes: 1 2 P(xi|z = j) exp - 2s2xi-mj If each xi belongs to one class C(i) (hard assignment), marginal likelihood: N k N 1 2 P(xi,y= j) exp - 2s2xi-mC(i) i=1 j=1 i=1 M(M)LE same as K-means!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 22
The K-means GMM assumption There are k components Component i has an associated mean vector (C) Dhruv Batra Slide Credit: Carlos Guestrin 23
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: (C) Dhruv Batra Slide Credit: Carlos Guestrin 24
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) (C) Dhruv Batra Slide Credit: Carlos Guestrin 25
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix x Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 26
The General GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix i Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( i ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 27
K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 28
EM Expectation Maximization [Dempster 77] Often looks like soft K-means Extremely general Extremely useful algorithm Essentially THE goto algorithm for unsupervised learning Plan EM for learning GMM parameters EM for general unsupervised learning problems (C) Dhruv Batra 29
EM for Learning GMMs Simple Update Rules E-Step: estimate P(zi = j | xi) M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra 30
Gaussian Mixture Example: Start (C) Dhruv Batra Slide Credit: Carlos Guestrin 31
After 1st iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 32
After 2nd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 33
After 3rd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 34
After 4th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 35
After 5th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 36
After 6th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 37
After 20th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin 38