Introduction to Empirical Methods in Statistical Computations

Slide Note

This chapter explores various empirical methods used in statistical computations, serving as alternatives or enhancements to classical statistical methods. It covers topics such as the Jackknife Method, Bootstrap Methods, Expectation Maximization Algorithm, and Markov Chain Monte Carlo. These methods offer solutions for scenarios where traditional approaches may not be suitable, providing nonparametric estimation and bias reduction techniques. Practical examples and computer demonstrations are included to aid in understanding and application.

kamiraha Follow

Uploaded on Mar 05, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Ch13 Empirical Methods

CHAPTER CONTENTS CHAPTER CONTENTS 13.1 Introduction .................................................................................................. 640 13.2 The Jackknife Method .................................................................................... 640 13.3 An Introduction to Bootstrap Methods ............................................................. 645 This semester ends here. 13.4 The Expectation Maximization Algorithm ......................................................... 651 13.5 Introduction to Markov Chain Monte Carlo ...................................................... 662 13.6 Chapter Summary .......................................................................................... 678 13.7 Computer Examples ....................................................................................... 679 Projects for Chapter 13 .......................................................................................... 686 Objective of this chapter To introduce several empirical methods that are being increasingly used in statistical computations as an alternative or as an improvement to classical Statistical methods.

13.1 Introduction

13.2 The Jackknife Method The jackknife method: <> also called the Quenouille-Tukey jackknife method; <> invented by Maurice Quenouille in 1956; <> for testing hypotheses and finding confidence intervals where traditional methods are not applicable or not well suited; <> could also be used with multivariate data; <> very useful when outliers are present in the data or the dispersion of the distribution is wide; <> used to estimate the variability of statistic from the variability of that statistic between subsamples; <> avoids the parametric assumptions that we used in obtaining the sampling distribution of the statistic to calculate standard error; <> can be considered as a nonparametric estimate of the parameter; <> was introduced for bias reduction (thus improving a given estimator) and is a useful method for variance estimation. https://pixabay.com/en/knife- pocket-knife-jackknife-3006931/

Note that here is any parameter; it need not be the population mean.

EXAMPLE 13.2.1 A random sample of n = 6 from a given population resulted in the following data: 7.2 5.7 4.9 6.2 8.5 2.8 (a) Find a jackknife point estimate of the population mean . (b) Construct a 95% jackknife confidence interval for the population mean .

13.3 An Introduction to Bootstrap Methods The bootstrap method When to use? the statistical distribution is unknown, or the assumptions of normality are not satisfied For what? for estimating sampling distributions What it does provides a simple method for obtaining an approximate sampling distribution of the statistic, conditional on the observed data Features Easier to calculate Simulation based Based on one sample what we are creating is not what happened, but rather what could have happened in the past from what did happen

E Empirical (or sample) cumulative distribution function mpirical (or sample) cumulative distribution function F: the population probability distribution. F is unknown. Let X1, . . ., Xn be a random sample from a probability distribution F with = E(xi) and 2= V(Xi). empirical The sample) (or X = cumulative distributi function on : # x ( F = ) Proportion of X ' . i x s x i n

* : bootstrap

Subsampling in bootstrap algorithm Necessity The size of the repeated subsamples = n. (n is the size of the original sample.) Not necessary. But that will lead to the best result. Subsampling is done with replacement.

EXAMPLE 13.3.2 The following data represent the total ozone levels measured in Dobson units at randomly selected locations on Earth on a particular day. 269 246 388 354 266 303 295 259 274 249 271 254 Generate N = 6 bootstrap samples of size 12 each and find the bootstrap mean and standard deviation (standard error).

Let ^ = ^(X1, . . . , Xn) be a sample statistic that estimates of the parameter of an unknown distribution F using some procedure. We wish to estimate the standard error of ^ using the bootstrap procedure, which is summarized next.

The accuracy of the bootstrap approximation depends on the accuracy of F^ as an estimate of F and how large a bootstrap sample is used to estimate the standard error of ^.

13.3.1 BOOTSTRAP CONFIDENCE INTERVALS PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEAN 1. Draw N (sub)samples (N will be in the hundreds, and if the software allows, in the thousands) from the original sample with replacement. 2. For each of the (sub)samples, find the (sub)sample mean. 3. Arrange these (sub)sample means in order of magnitude. 4. To obtain, say, a 95% confidence interval, we will find the middle 95% of the sample means. For this, find the means at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position (0.025)(N+1), and the 97.5th percentile will be at the position (0.975)(N+1). If any of these numbers are not integers, round to the nearest integer. The values of these positions are the lower and upper limits of the 95% bootstrap interval for the true mean.

EXAMPLE 13.3.4 For the data given in Example 13.3.2, obtain a 95% bootstrap confidence interval for . Because the bootstrap methods are more in tune with nonparametric methods, sometimes it makes sense to obtain a confidence interval about the median rather than the mean. With a slight modification of the procedure that we have described for the bootstrap confidence interval for the mean, we can obtain the bootstrap confidence interval for the median. Comparing the classical confidence interval we obtained in Example 5.5.9, which is (257.81, 313.59), the bootstrap confidence interval of Example 13.3.4 has smaller length, and thus less variability. In addition, we saw in Example 5.5.9 that the normality assumption necessary for the confidence interval there was suspect. In the bootstrap method, we did not have any distributional assumptions. classical confidence interval bootstrap confidence interval Lower Variability Higher No Distributional assumptions Yes

PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEDIAN 1. Draw N samples (N will be in the hundreds, and if the software allows, in the thousands) from the original sample with replacement. 2. For each of the samples, find the sample median. 3. Arrange these sample medians in order of magnitude. 4. To obtain, say, a 95% confidence interval we will find the middle 95% of the sample medians. For this, find the medians at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position (0.025)(N+1), and the 97.5th percentile will be at the position (0.975)(N+1). If any of these numbers are not integers, round to the nearest integer. The values of these positions are the lower and upper limits of the 95% bootstrap interval for the median.

In practice, how many bootstrap samples should be taken? The answer depends on two things: 1) how much the result matters, and 2) what type of computing power is available. In general, it is better to start with 1000 subsamples.

13.4 The Expectation Maximization Algorithm

13.5 Introduction to Markov Chain Monte Carlo a brief introduction to Markov chain Monte Carlo (MCMC) Methods MCMC : a computational simulation method MCMC : enormously useful for realistic statistical modeling MCMC : uses two standard algorithms: the Metropolis algorithm, and the Gibbs sampler MCMC : to generate random variables having certain target distributions with pdf (x) MCMC : especially useful when the functional form of (x) is not known MCMC : (basic idea : ) find a Markov chain with a stationary distribution that is the same as the desired probability distribution (x) The probability that the chain is in state x ~= the probability that the discrete random variable equals x

The objective of MCMC techniques is to generate random variables having certain distributions called target distributions with pdf p(x). The simulation of standard distributions is readily available in many statistical software packages, such as Minitab. In cases where the functional form of p(x) is not known, MCMC techniques become very useful. The basic idea of MCMC methods is to find a Markov chain with a stationary distribution that is the same as the desired probability distribution p(x); this is the target distribution. Run the Markov chain for a long time (say, K iterations) and observe in which state the chain is after these K iterations. The probability that the chain is in state x will be approximately the same as the probability that the discrete random variable equals x. In Bayesian analysis, whether we are