Understanding Maximum Likelihood Estimation in Gaussian Models

data mining the em algorithm n.w
1 / 21
Embed
Share

Explore the concept of Maximum Likelihood Estimation in Gaussian models for model-based clustering. Learn how parameters are optimized to fit data points, and understand the principles underlying the EM algorithm in data mining. Discover the importance of finding the best distribution model that fits the data by analyzing mixture models and Gaussian distributions.

  • Gaussian Models
  • Maximum Likelihood Estimation
  • EM Algorithm
  • Data Mining
  • Model-based Clustering

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. DATA MINING THE EM ALGORITHM Maximum Likelihood Estimation

  2. MIXTURE MODELS AND THE EM ALGORITHM

  3. Model-based clustering In order to understand our data, we will assume that there is a generative process (a model) that creates/describes the data, and we will try to find the model that best fits the data. Models of different complexity can be defined, but we will assume that our model is a distribution from which data points are sampled Example: the data is the height of all adults in Greece In most cases, a single distribution is not good enough to describe all data points: different parts of the data follow a different distribution Example: the data is the height of all adults and children in Greece We need a mixture model Different distributions correspond to different clusters in the data.

  4. Gaussian Distribution Example: the data is the height of all adults in Greece Experience has shown that this data follows a Gaussian (Normal) distribution Reminder: Normal distribution: 2??? ? ?2 1 2?2 ? ? = ? = mean, ? = standard deviation

  5. Gaussian Model What is a model? A Gaussian distribution is fully defined by the mean ? and the standard deviation ? We define our model as the pair of parameters ? = (?,?) This is a general principle: a model is defined as a vector of parameters ?

  6. Fitting the model We want to find the normal distribution that best fits our data Find the best values for ? and ? But what does best fit mean?

  7. Maximum Likelihood Estimation (MLE) Find the most likely parameters given the data. Given the data observations ?, find ? that maximizes ?(?|?) Problem: We do not know how to compute ? ? ? Using Bayes Rule: ? ? ? =? ? ? ?(?) ?(?) If we have no prior information about ?, or X, we can assume uniform. Maximizing ? ? ? is now the same as maximizing ? ? ?

  8. Maximum Likelihood Estimation (MLE) We have a vector ? = (?1, ,??) of values and we want to fit a Gaussian ?(?,?) model to the data Our parameter set is ? = (?,?) Probability of observing point ?? given the parameters ? We cheated a little here. More accurately we look at: ?(?? ? ??+ ??) 2??? ?? ?2 1 2?2 ? ??|? = Probability of observing all points (assume independence) ? ? 2??? ?? ?2 1 2?2 ? ?|? = ? ??|? = ?=1 ?=1 We want to find the parameters ? = (?,?) that maximize the probability ?(?|?)

  9. Maximum Likelihood Estimation (MLE) The probability ?(?|?) as a function of ? is called the Likelihood function ? 2??? ?? ?2 1 2?2 ?(?) = ?=1 It is usually easier to work with the Log-Likelihood function ? ?? ?2 2?2 1 ?? ? = 2?log2? ?log? ?=1 Maximum Likelihood Estimation Find parameters ?,? that maximize ??(?) ? ? 1 ? ?=1 1 ? ?=1 2 ?2= (?? ?)2 = ?? ? = ??= ?? Sample Mean Sample Variance

  10. Mixture of Gaussians Suppose that you have the heights of adults and children, and the distribution looks like the figure below

  11. Mixture of Gaussians In this case the data is the result of the mixture of two Gaussians One for Adults, and one for Children Identifying for each value which Gaussian is most likely to have generated it will give us a clustering.

  12. Mixture model A value ?? is generated according to the following process: First select the age group With probability ?? select Adult, with probability ?? select Child (?? + ?? = 1) We can also think of this as a Hidden Variable Z that takes two values: Adult and Child ??= ? ? = Adult ,??= ? ? = Child Given the nationality, generate the point from the corresponding Gaussian ? ???? ~ ? ??,??if Adult ? ???? ~ ? ??,??if Child ??: parameters of the Adult distribution ??: parameters of the Child distribution Using the Hidden Variable Z: ? ??? = Adult = ? ???? ~ ? ??,?? ? ??? = Child =? ???? ~ ? ??,??

  13. Mixture Model Our model has the following parameters = (??,??,??,??,??,??) ?A: parameters of the Adult distribution Mixture probabilities ??: parameters of the Child distribution

  14. Mixture Model Our model has the following parameters = (??,??,??,??,??,??) Mixture probabilities Distribution Parameters For value ??, we have: ? ??| = ??? ???? + ???(??|??) ?1, ,?? For all values ? = ? ? ?| = ?(??| ) ?=1 We want to estimate the parameters that maximize the Likelihood of the data

  15. Mixture Models Once we have the parameters = (??,??,??,??,??,??) we can estimate the membership probabilities ? ? ?? and ? ? ??for each point ??: This is the probability that point ?? belongs to the Adult or the Child population (cluster) Using Bayes Rule: Given from the Gaussian distribution ?(??,??) for Greek ? ??? ?(?) ? ? ?? = ? ??? ? ? + ? ??? ?(?) ? ?????? ? ??????+ ? ?????? =

  16. EM (Expectation Maximization) Algorithm Initialize the values of the parameters in to some random values Repeat until convergence E-Step: Given the parameters estimate the membership probabilities ? ? ?? and ? ? ?? M-Step: Compute the parameter values that (in expectation) maximize the data likelihood ?? = ??log ??? ???? + ??? ???? ? ? ??=1 Fraction of population in G,C ??=1 ? ?=1 ?(?|??) ? ?=1 ?(?|??) ? ? 1 1 ??= ? ? ???? ??= ? ?? ? ? ???? ? ?? MLE Estimates if ? s were fixed ?=1 ? ?=1 ? 1 1 2= 2 2= 2 ?? ? ? ?? ?? ?? ?? ? ?? ? ? ?? ?? ?? ? ?? ?=1 ?=1

  17. Relationship to K-means E-Step: Assignment of points to clusters K-means: hard assignment, EM: soft assignment M-Step: Computation of centroids K-means assumes common fixed variance (spherical clusters) EM: can change the variance for different clusters or different dimensions (ellipsoid clusters) If the variance is fixed then both minimize the same error function

Related


More Related Content