Using Simulation to Enhance Statistical Techniques

using simulation to evaluate statistical n.w
1 / 49
Embed
Share

Explore the application of simulation in evaluating statistical methods, covering confidence intervals, robustness, power, sample size, and p-values. Learn through examples and visual representations to enhance understanding.

  • Simulation
  • Statistical Techniques
  • Confidence Intervals
  • Robustness
  • Power

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Using Simulation to Evaluate Statistical Techniques. 1. Coverage of Confidence Intervals 2. Robustness 3. Power and Sample Size 4. Simulating p-values 1

  2. 1. Examine coverage for confidence intervals 2

  3. Suppressing output created by by group processing %macro %macro ODSOff /* from the do loop, adapted*/ options nonotes; ods graphics off; ods exclude all; ods noresults; %mend %mend; %macro %macro ODSOn(); /* Call after BY-group processing */ /* from the do loop, adapted*/ options notes; ods graphics on; ods exclude none; ods results; %mend %mend; ODSOff; /* Call prior to BY-group processing */ 3

  4. Confidence Interval for a Mean, Normal Data 4

  5. Generate samples from a N(0,1) distribution. %let obs = 50; %let reps = 10000; %let seed=54321; data data Normal(keep=rep x); call streamianit(&seed); do rep = 1 1 to &reps; do i = 1 1 to &obs; x = rand("Normal"); output; end; end; run run; 5

  6. Compute c.i. for each sample proc proc means means data=Normal noprint; by rep; var x; output out=OutStats mean=Meanx lclm=Lower uclm=Upper; run run; 6

  7. Graph the first 100 c.i.s proc proc sgplot sgplot data=OutStats(obs=100 title "95% Confidence Intervals for the Mean"; scatter x=rep y=Meanx; highlow x=rep low=Lower high=Upper / legendlabel="95% CI"; refline 0 0 / axis=y; yaxis display=(nolabel); run run; 100); 7

  8. Calculate coverage data data OutStats; set OutStats; In_CI = (Lower<0 0 & Upper>0 0); proc proc freq freq data=OutStats; tables in_ci / nocum; run run; 8

  9. Non-normal data -- Coverage for exponential data 9

  10. Generate exponential samples %let obs = 10; %let reps = 10000; %let seed=54321; data data Exp(keep=rep x); call streaminit(&seed); do rep = 1 1 to &reps; do i = 1 1 to &obs; x = rand("Expo") ; output; end; end; run run; 10

  11. Obtain confidence limits for each sample. proc proc means means data=Exp noprint; by rep; var x; output out=OutStats mean=Meanx lclm=Lower uclm=upper; run run; 11

  12. Calculate coverage data data OutStats; set OutStats; In_CI = (Lower<1 1 & Upper>1 1); run run; proc proc freq tables In_CI / nocum; run run; freq data=OutStats; 12

  13. Calculate coverage with IML 13

  14. Mean and Std functions in IML proc proc iml quit quit; iml; x=shape(1 1:12 m=mean(x); s=std(x); print x, m, s; 12,6 6); 14

  15. %let obs = 50; %let reps = 10000; %let seed=54321; proc proc iml iml; call randseed(&seed); x = j(&obs, &reps);/* each column is a sample*/ call randgen(x, "Normal"); SampleMean = mean(x);/* mean of each column*/ s = std(x);/* std dev of each column*/ talpha = quantile("t", 0.975 0.975, &obs-1 1); Lower = SampleMean - talpha * s / sqrt(&obs); Upper = SampleMean + talpha * s / sqrt(&obs); ParamInCI = (Lower<0 0 & Upper>0 0);/*indicator variable*/ PctInCI = ParamInCI[:];/* pct that contain parameter */ print PctInCI; quit quit; 15

  16. 2. Examine Robustness 16

  17. Assessing two-sample t Test Robustness to unequal variances 17

  18. ) ( ( ) x x 1 2 ( 1 2 ) se x x 1 2 Where: 1 n 1 n = + ( ) se x x s 1 2 p 1 1) 2 2 + + 2 1 2 2 ( ) n s ( n n s = 2 p s 1 2 n 1 2 18

  19. Generate two sample data for two cases, equal and unequal variance. %let n1 = 10; %let n2 = 10; %let reps = 10000; %let seed=54321; data data twosamp(drop=i); label x1 = "Normal data, same variance" x2 = "Normal data, different variance"; call streaminit(&seed); do rep = 1 1 to &reps; c = 1 1; do i = 1 1 to &n1; x1 = rand("Normal"); x2 = rand("Normal"); output; end; c = 2 2; do i = 1 1 to &n2; x1 = rand("Normal"); x2 = rand("Normal", 0 0, 10 output; end; end; run run; 10); 19

  20. Examine t-test output ods trace on; proc proc ttest ttest data=fram.frex4 plots=none; class male; var chol; run run; ods trace off; 20

  21. Get probability from t-test assuming equal variance. %ODSOff ODSOff proc proc ttest ttest data=twosamp; by rep; class c; /* compare c=1 to c=2 */ var x1 x2; ods output ttests=TTests(where=(method="Pooled")); run run; %ODSOn ODSOn proc proc print print data=ttests (obs=10 run run; 10); 21

  22. Calculate proportion rejected. data data Results; set TTests; RejectH0 = (Probt <= 0.05 run run; 0.05); proc proc sort by Variable; run run; sort data=Results; proc proc freq by Variable; tables RejectH0 / nocum; run run; freq data=Results; 22

  23. Assessing t test in IML 23

  24. Two Sample t-test Robustness to non-normal populations 24

  25. /*Assessing t test in IML*/ %let n1 = 10; %let n2 = 10; %let reps = 10000;/* number of samples */ %let seed=54321; proc proc iml call randseed(&seed); x = j(&n1, &reps);/* allocate space for Group 1 */ y = j(&n2, &reps);/* allocate space for Group 2 */ call randgen(x, "Normal",0 0,10 call randgen(y, "exponential",10 /* 2. Compute the t statistics; VAR operates on columns */ meanX = mean(x); varX = var(x);/* mean & var of each sample */ meanY = mean(y); varY = var(y); /* compute pooled standard deviation from n1 and n2 */ poolStd = sqrt( ((&n1-1 1)*varX + (&n2-1 1)*varY)/(&n1+&n2-2 2) ); /* compute the t statistic */ t = (meanX - meanY) / (poolStd*sqrt(1 1/&n1 + 1 1/&n2)); /* 3. Construct indicator var for tests that reject H0 */ alpha = 0.05 0.05; RejectH0 = (abs(t)>quantile("t", 1 1-alpha/2 2, &n1+&n2-2 2)); /* 0 or 1 */ /* 4. Compute proportion: (# that reject H0)/NumSamples */ Prob = RejectH0[:]; print Prob; quit quit; iml; 10);/* fill matrix from N(0,10) */ 10);/* fill from Exp(1) */ 25

  26. 3. Examine Power and Sample Size 26

  27. Evaluating power of the t test 27

  28. PROC POWER proc proc power power; twosamplemeans power = . . /* missing ==> "compute this" */ meandiff= 0 0 to 2 2 by 0.1 0.1 /* delta = 0, 0.1, ..., 2 */ stddev=1 1 /* N(delta, 1) */ npergroup=10 10; /* 10 in each group */ plot x=effect markers=none; ods output Output=Power; /* output results to data set */ run run; proc proc print print data=power;run run; 28

  29. Simulated Power. 29

  30. Simulate the data. %let n1 = 10; %let n2 = 10; %let reps = 10000; %let seed=54321; data data PowerSim(drop=i); call streaminit(&seed); do Delta = 0 0 to 2 2 by 0.1 do rep = 1 1 to &reps; c = 1 1; do i = 1 1 to &n1; x1 = rand("Normal"); output; end; c = 2 2; do i = 1 1 to &n2; x1 = rand("Normal", Delta, 1 1); output; end; end; end; run run; 0.1; 30

  31. Compute statistics %ODSOff ODSOff proc proc ttest ttest data=PowerSim; by Delta rep; class c; var x1; ods output ttests=TTests(where=(method="Pooled")); run run; %ODSOn ODSOn 31

  32. Calculate percent rejected. data data Results; set TTests; RejectH0 = (Probt <= 0.05 run run; 0.05); proc proc freq by Delta; tables RejectH0 / out=SimPower(where=(RejectH0=1 1)); run run; freq data=Results noprint; proc proc print print data=simpower;run run; 32

  33. Plot simulated vs PROC POWER results. data data Combine; set SimPower Power; p = percent / 100 label p="Power"; run run; 100; proc proc sgplot sgplot data=Combine noautolegend; title "Power of the t Test"; title2 "Samples are N(0,1) and N(delta,1), n1=n2=10"; series x=MeanDiff y=Power; scatter x=Delta y=p; xaxis label="Difference in Population Means (mu2 - mu1)"; run run; title; 33

  34. Effect of Sample Size on Power 34

  35. = The null hypothesis for the t test is H : . 0 1 2 = + Assume that . 2 1 Find sample size that rejects H 80% of the time. N 0 35

  36. %let reps = 1000; %let delta=.5; %let seed=54321; data data power(drop=iw ); call streaminit(&seed); do obs = 25 25 to 100 do rep = 1 1 to &reps; do i = 1 1 to obs; grp = 1 1; x = rand("Normal");output; grp = 2 2; x = rand("Normal", &Delta, 1 1); output; end; end; end; run run; 100 by 5 5; 36

  37. %ODSOff ODSOff proc proc ttest ttest data=power; by obs rep; class grp; var x; ods output ttests=TTests(where=(method="Pooled")); run run; %ODSOn ODSOn 37

  38. data data Results; set TTests; Reject = (Probt <= 0.05 run run; proc proc print print data=results(obs=50 0.05); 50);run run; 38

  39. proc proc freq by obs; tables Reject / out=sampsize(where=(Reject)); run run; freq data=Results noprint; proc proc print print data=sampsize(obs=5 5);run run; 39

  40. proc proc power power; twosamplemeans meandiff = &delta stddev = 1 1 alpha = 0.05 npergroup =25 power = . .; plot markers=none; ods output Output=Power; run run; 0.05 25 to 100 100 by 5 5 proc proc print print data=power;run run; 40

  41. data data total; set power sampsize; pct=percent/100 n=npergroup; run run; 100; 41

  42. proc proc sgplot sgplot data=total noautolegend; title "Power of the t Test by Sample Size"; title2 "Samples are N(0,1) and N(&delta,1), n1=n2=N"; label N="Size of each sample" p="Power"; refline 0.8 0.8 / axis=y; series x=N y=Power; scatter x=obs y=pct; run run; title; 42

  43. Simulating p-values 43

  44. A 6-sided die is tossed 36 times and the number of times each size appears in recorded. 1 8 2 4 3 4 4 3 5 6 6 11 44

  45. data data tmp; value=_n_; input count @@; datalines; 8 4 4 3 6 11 ; run run; proc proc freq freq data=tmp; tables value/chisq; weight count; run run; 45

  46. %let reps=10000; proc proc iml iml; Observed = {8 8 4 4 4 4 3 3 6 6 11 k = ncol(Observed); N = sum(Observed); p = j(1 1, k, 1 1/k); 11}; NumSamples = 10000 freq = RandMultinomial(&reps, N, p); x = repeat(1 1:k, &reps); rep = repeat(T(1 1:&reps), 1 1, k); create die var {"rep" "Freq" "x"}; append; close; quit quit; 10000; proc proc print print data=die(obs=18 run run; 18); 46

  47. proc proc freq by rep; weight Freq; tables x / chisq; output out=chi2 chisq; run run; proc proc contents contents data=chi2;run freq data=die noprint; run; 47

  48. proc proc sql select count(*)/&reps from chi2 where _pchi_>=7.67 ; title "Percent > observed"; quit quit; title; sql; 7.67 48

  49. proc proc sgplot sgplot data=chi2; title "Simulated Distribution of Test Statistic under Null Hypothesis"; histogram _pchi_ / binstart=0 0 binwidth=1 1; refline 7.67 7.67 / axis=x; xaxis label="Test Statistic"; run run; title; 49

More Related Content