Introduction to Expectation Maximization for General Latent Model Learning

ece 5984 introduction to machine learning n.w

1 / 38

Embed Share

Explore the concepts of Expectation Maximization for Gaussian Mixture Models and general latent model learning in ECE 5984. Delve into topics such as K-means clustering and poster presentations, with insights from readings by Barber and contributions from Dhruv Batra at Virginia Tech.

elliottco Follow

Uploaded on Apr 21, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ECE 5984: Introduction to Machine Learning Topics: Expectation Maximization For GMMs For General Latent Model Learning Readings: Barber 20.1-20.3 Dhruv Batra Virginia Tech

Administrativia Poster Presentation: May 8 1:30-4:00pm 310 Kelly Hall: ICTAS Building Print poster (or bunch of slides) Format: Portrait Eg. 2 feet (width) x 4 feet (height) Less text, more pictures. Best Project Prize! (C) Dhruv Batra 2

Recap of Last Time (C) Dhruv Batra 3

Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

K-means 1. Ask user how many clusters they d like. (e.g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin 5

K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. (Thus each Center owns a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

K-means 1. Ask user how many clusters they d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it s closest to. 4. Each Center finds the centroid of the points it owns 5. and jumps there 6. Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

K-means Randomly initialize k centers (0) = 1(0), , k(0) Assign: Assign each point i {1, n} to nearest center: Recenter: j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

K-means as Co-ordinate Descent Optimize objective function: Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

K-means as Co-ordinate Descent Optimize objective function: Fix a (or C), optimize (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Object Bag of words Fei-Fei Li

Clustered Image Patches Fei-Fei et al. 2005

(One) bad case for k-means Clusters may overlap Some clusters may be wider than others GMM to the rescue! (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 16

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy 17

K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 18

Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn t factorize All parameters coupled! (C) Dhruv Batra 19

Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra Figure Credit: Kevin Murphy 20

Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian collapses (C) Dhruv Batra 21

Special case: spherical Gaussians and hard assignments If P(X|Z=k) is spherical, with same for all classes: 1 2 P(xi|z = j) exp - 2s2xi-mj If each xi belongs to one class C(i) (hard assignment), marginal likelihood: N k N 1 2 P(xi,y= j) exp - 2s2xi-mC(i) i=1 j=1 i=1 M(M)LE same as K-means!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

The K-means GMM assumption There are k components Component i has an associated mean vector (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: (C) Dhruv Batra Slide Credit: Carlos Guestrin 24

The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix x Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 26

The General GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix i Each data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i) Datapoint ( i ) 2. (C) Dhruv Batra Slide Credit: Carlos Guestrin 27

K-means vs GMM K-Means http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra 28

EM Expectation Maximization [Dempster 77] Often looks like soft K-means Extremely general Extremely useful algorithm Essentially THE goto algorithm for unsupervised learning Plan EM for learning GMM parameters EM for general unsupervised learning problems (C) Dhruv Batra 29

EM for Learning GMMs Simple Update Rules E-Step: estimate P(zi = j | xi) M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra 30