
Introduction to Estimation Methods in Probability and Statistics
Learn about estimation methods such as Maximum Likelihood and Bayes in the context of probability and statistical models. Understand the process of making inferences about unknown parameters using data samples. Explore techniques like AIC, BIC, and Bayesian Estimation. Enhance your understanding of model comparisons and the convergence of estimators. Join Summer Institutes 2020 for in-depth sessions on estimation concepts.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Estimation Summer Institutes 2020 Module 1, Session 4 1
Estimation Probability/statistical models depend on parameters Parameters are properties of the population and are typically unknown. The process of taking a sample of data to make inferences about these parameters is referred to as estimation . There are a number of different estimation methods we will study two estimation methods: Maximum likelihood (ML) Bayes Summer Institutes 2020 Module 1, Session 4 2
Maximum Likelihood Problem: Unknown model parameters, Set-up: Write the probability of the data, Y, in terms of the model parameter and the data, P(Y | ) Solution: Choose as your estimate the value of the unknown parameter that makes your data look as likely as possible. Pick ? that maximizes P(Y | ). L( ) = Likelihood as a function of the parameter, . Based on P(Y | ) l( ) = log(L( )), the log-likelihood. S( ) = dl( )/d = the score . Set dl( )/d = 0 and solve for to find the MLE, ?. I( ) = -d2l( )/d 2= the information . The inverse of the expected information gives the variance of ? Var( ) = E(I( ))-1 (in most cases) Summer Institutes 2020 Module 1, Session 4 3
Maximum Likelihood Maximum likelihood estimates (MLEs) are always based on a probability model for the data. Maximum likelihood can be used even when there are multiple unknown parameters, in which case has several components In complex problems it may not be possible to find the MLE analytically; in that case we use numerical optimization to search for the value of that maximizes the likelihood Summer Institutes 2020 Module 1, Session 4 4
Model Comparisons AIC, BIC AIC Akaike s Information Criterion = 2 ? 2? BIC Bayes Information Criterion = 2 ? ? log(?) (?) = log-likelihood k = number of parameters Use AIC, BIC to compare a series of models. Pick the model with the largest AIC or BIC Summer Institutes 2020 Module 1, Session 4 5
Bayes Estimation Bayes theorem in complete generality: P(X| )P( ) P(X| )P( ) = P( |X) P(X| ) is the likelihood function, as before. P( ) is called the prior distribution of . P( | X) is called the posterior distribution of Based onP( | X) we can define a number of possible estimators of . The Bayesian procedure provides a convenient way of combining external information or previous data (through the prior distribution) with the current data (through the likelihood) to create a new estimate. As N increases, the data (through the likelihood) overwhelms the prior and the Bayes estimator typically converges to the MLE Controversy arises when P( ) is used to incorporate subjective beliefs or opinions. Summer Institutes 2020 Module 1, Session 4 6
Extra Problems 1. Redo the previous problem assuming the man has 2 children who both have the A1 paternal allele. 2. Suppose 197 animals are distributed into five categories with frequencies (95,30,18,20,34). A genetic model for the population predicts the following frequencies for the categories: (.5, .25*p, .25*(1- p), .25*(1-p), .25*p). Use maximum likelihood to estimate p (Hint: use the multinomial distribution). Summer Institutes 2020 Module 1, Session 4 7
Extra Problems 3. Suppose we are interested in estimating the recombination fraction, , from the following experiment. We do a series of crosses: AB/ab x AB/ab and measure the frequency of the various phases in the gametes (assume we can do this). If the recombination fraction is then we expect the following probabilities: phase probability (*4) 3 - 2 + 2 AB 2 - 2 Ab 2 - 2 aB 1 - 2 + 2 ab Suppose we observe (AB,Ab,aB,aa) = (125,18,20,34). Use maximum likelihood to estimate and find the variance of the estimate. (This will require a numerical solution) Summer Institutes 2020 Module 1, Session 4 8
Extra Problems 4. Every human being can be classified into one of four blood groups: O, A, B, AB. Inheritance of these blood groups is controlled by 1 gene with 3 alleles: O, A and B where O is recessive to A and B. Suppose the frequency of these alleles is r, p, and q, respectively (p+q+r=1). If we observe (O,A,B,AB) = (176,182,60,17) use maximum likelihood to estimate r, p and q. Summer Institutes 2020 Module 1, Session 4 9
Extra Problems 5. Suppose we wish to estimate the recombination fraction ( ) for a particular locus. We observe N = 50 and R = 18. Several previously published studies of the recombination fraction in nearby loci (that we believe should have similar recombination fractions) have shown recombination fractions between .22 and .44. We decide to model this prior information as a beta distribution (see http://en.wikipedia.org/wiki/Beta_distribution) with parameters a = 19 and b = 40: ?(?) = (? + ?) (?) (?)?? 1(1 ?)? 1 Find the MLE and Bayesian MAP estimators of the recombination fraction. Also find a 95% confidence interval (for the MLE) and a 95% credible interval (for the MAP) Summer Institutes 2020 Module 1, Session 4 10
Session 4 Solutions Extra Problems 1. X = (A1, A1) P(X | = A1A1) = 1 P(X | = A1A2) = .25 P(X | = A2A2) = 0 P(X) = .00505 P( = A1A1 | X) = 1 * .0001 / .00505 = .02 P( = A1A2 | X) = .25 * .0198 / .00505 = .98 P( = A2A2 | X) = 0 2. estimate of p is .627 3. Here is the probability (likelihood) of the data, ignoring constants. Summer Institutes 2020 Module 1, Session 4 11
Session 4 Solutions Extra Problems 4. First, we use basic genetics to find the probability of the observed phenotypes in terms of the unknown parameters. Assuming random mating, we have: Genotype prob. Phenotype 2 2 prob. 2 OO r O r AA p 2 + 2pr AO 2pr A p 2 BB q 2 + 2qr BO AB 2qr 2pq B AB q 2pq 2) O (p 2+2pr) A (q 2+2qr) B (2pq) AB Pr(data | ) (r 2+2pr) + Blog(q 2+2qr) + ABlog(p) + ABlog(q) l(p,q,r) = 2Olog(r) + Alog(p To estimate p, q and r, we need to maximize l(p,q,r) subject to the constraint p+q+r=1. This constraint makes the problem a bit harder . one approach is to just put r = 1 -p-q in the likelihood so we have just 2 parameters p and q. For (O,A,B,AB) = (176,182,60,17), this gives p = .264 q = .093 r = .642 nd derivatives to find the information and, therefore, the Further analysis would take 2 variances of the estimates. Summer Institutes 2020 Module 1, Session 4 12
Session 4 Solutions Extra Problems The data follow a binomial distribution with N = 50, R = 18 and the prior information is captured by a beta distribution with parameters a = 19, b = 40: + ( a ) a b b ( ) = 1 1 a b (1 ) P ( ) ( ) ! N N R | ) = R ( (1 ) P X !( )! R N R Working through Bayes theorem, we find + + + ( R ) N a N b R a R + N R b + ( | = 1 1 ) (1 ) P X ) ( + ( ) a b which is another beta distribution with parameters (a+R) and (N-R+b). The mode of the beta distribution with parameters and is ( -1)/( + -2) so MAP N a b + + + 1 36 107 a R = = = .336 2 th and 97.5 th percentiles of the posterior distribution (95% credible Also, we can find the 2.5 interval): [.23 - .40] For comparison the MLE is 18/50 = 0.36 with a 95% confidence interval of [.23 - .49] Summer Institutes 2020 Module 1, Session 4 13