
Innovative Approaches to Teaching Statistics Through Student Collaboration
"Explore how undergraduate statistics courses are incorporating real applications, data science, and diverse models to enhance student learning and communication skills. Learn about team projects, final examinations, and insights for statistics students. Acknowledging key sources such as ASA Teaching Guidelines and web-based course materials, this study delves into the evolution of statistical education based on the 2014 Guidelines."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistical Computation Using Student Collaborative Work Joint Statistics Meetings, August 2015 John D. Emerson Department of Mathematics Middlebury College, Middlebury, Vermont
Summary Outline I. Introduction II. Incorporating the 2014 Guidelines: Two Courses III. Student Team Projects IV. Team Component of Final Examination V. Discussion and Findings for Students of Statistics Acknowledgments Sources
I. Introduction Expansion of undergraduate programs and numbers of statistics majors New ASA Guidelines (Nick Horton, 2014) 1. Data science gaining prominence 2. Use real applications 3. More diverse models and approaches 4. Students learn communication skills Courses structured to reflect these Guidelines
II. Incorporating the Guidelines in Two Introductory Courses 1. Understanding Uncertainty: Exploring Data Using Randomization Intensive 4-week course in January 2014 No pre-req, but some quantitative/tech background Some background from preliminary version of Tintle et al. Writing about technology; met college requirement 12 students organized in four teams of three Largely project based and very computer-intensive Eleven students presented at Research Symposium
2. Introduction to Statistical Science Revision of longstanding intro course (DVB) 12 week, 3 classes plus lab, 2 sections of 12 Each section partitioned into 4 teams of 3 students Students from all four years; majors sciences, environmental studies, geography
OpenIntro web-based materials using text: Introductory Statistics with Randomization and Simulation (1sted. Diez, Barr, and Cetinkaya- Rundel, 2014) Assumption that students read OpenIntro Daily use of R/RStudio. I provided lots of scripts as models for students Substantial use of simulation, sampling, permutations tests, bootstrap intervals Team projects prominent Weekly labs, 2 exams, two-part final exam
Resources Influencing Course Development ASA Teaching Guidelines (2014) Diez, Barr, Cetinkaya-Rundel. Introductory Statistics with Randomization and Simulation (OpenIntro, web-based course text, 2015) Tintle et al. Introduction to Statistical Investigations (2015 Preliminary Edition) Lock^5. Statistics: Unlocking the Power of Data (2013) R and RStudio Hesterberg. What Teachers Should Know about the Bootstrap (2014)
Course content outline Introduction to data frames, variable types, descriptive statistics, displays and plots Intro to R/Rstudio & simulation with sampling Some discrete probability Study designs Inference for proportions and their sampling distributions. Normal approximations & CLT Larger tables of counts (brief) At this point, first examination and then introduction of team projects
Outline (continued) Inference for measured data; t-distributions One-way ANOVA, related inference, multiple comparisons all motivated by real data sets More in-depth use of simulation for inference Introduction to the bootstrap Regression and related inferences using permutation tests and bootstrap Re-expression of variables Linear models more generally
III. Student Team Projects Late start in week 5 let me try to balance the teams Teams sometimes huddled for 2-4 minutes in class to probe a particular concept, report out Team leaders organize out-of-class meetings to wrestle with projects At first, student collaboration fell short of goals
Illustrative projects 1. Use permutation test to extend a test on one- proportion to a two-binomial problem. (Do two clinics have same success rate in using Assisted Reproductive Therapy to achieve pregnancies?) 2. Distinguishing the basic uses of t-distributions: one-sample, paired comparison, two independent samples. (Data for percentages of voters in various cities who support ban on large sodas. This was not a simulation-based project.)
Illustrative projects 3. Use bootstrap to give a confidence interval for difference of binomial proportions. (Gender Bias in Promotion Decisions by Manager, adapted from ISRS curricula. n1=n2=24) Two basic bootstrap confidence intervals suggested by Tim Hesterberg for initial use. Small n s are an issue Problem was a stretch for at least half the teams 4. Regression problem with re-expression (Planets in earth s solar system, adapted from problem in DVB.)
Student survey about course changes In-class 17-item survey at end of course Likert scale: strongly agree - strongly disagree n=22 students completed survey Two items specifically addressed team work: 15 agreed or strongly agreed: small team work aided my understanding ; one strong disagreed 9 agreed with: the team projects were more trouble than they were worth ; 10 disagreed Private conversations with students suggested logistical issues
IV. Team Component of Final Exam Delve more deeply into computer-intensive methods for testing and confidence intervals Three-part project, 25% of exam grade Investigate role of sample size and classical assumptions in making valid inferences Use simulation where the right answer is known Price paid: not real data
1. Examine SEs and conf. interval coverage Normal vs. Exponential (Skewed) Populations Sample means: n = 4, 16, 64, 256, 1024 n=4 n=16 n=64 n=256 n=1024 5.01 2.50 1.24 0.620 0.311 Skewed.SE 5.01 2.53 1.26 0.632 0.316 Normal.Cover 94.76 94.85 95.08 95.10 94.75 Skewed.Cover 88.41 91.80 93.67 94.64 94.62 Normal.SE
2. Estimating Means and 99% Quantiles; Normal and Exponential n=4 n=16 n=64 n=256 n=1024 n=4096 5.07 2.49 1.23 0.622 0.309 0.154 5.04 2.49 1.23 0.617 0.313 0.152 7.04 5.08 3.36 2.065 1.143 0.566 11.68 11.59 8.68 Normal.Q99.Mean 20.19 27.03 31.06 32.60 33.13 33.23 Skewed.Q99.Mean 20.40 32.34 40.84 44.08 45.18 45.48 Normal.SE Skewed.SE Normal.Q99.SE Skewed.Q99.SE 5.379 2.930 1.473 Notes: a. The first four rows of Table 2 give SEs of the sample means and of the 99% quantiles for both Normal and Exponential data. b. The true 99% quantiles of populations are: 33.3 (Normal, = =10) and 46.1 (Exponential, = =10)
3. Bootstrap vs. sampling distribution for sample mean and for 99%-ile. n=256
Coverages for 95% Confidence Intervals Using Classical t, Bootstrap SE (B), and Percentile Bootstrap (PB) n=16 n=64 n=256 n=1024 n=4096 94.92 94.90 94.84 95.42 95.31 94.15 94.78 94.77 95.43 95.33 Normal.Cover.PB 91.95 94.36 94.69 95.37 95.27 Skewed.Cover.t 91.63 94.17 94.57 95.24 95.36 Skewed.Cover.B 90.93 93.97 94.55 95.20 95.37 Skewed.Cover.PB 89.67 93.81 94.41 Normal.Cover.t Normal.Cover.B 95.21 95.35 Notes: 1. Percentile bootstrap under-coverage for small n 2. CLT needs n=256 or more with exponential data 3. Bootstrap SE method generally outperforms percentile bootstrap
V. Discussion and Findings for Students of Statistics Observations, Positive Impressions, Potential Gains: A more direct and immediate understanding of randomness and, in particular, of the random variability of a statistic that we use for making inferences; A hands-on acquaintance with a wide variety of sampling distributions; A deeper appreciation of the role of traditional assumptions about the distributions of statistics and how those assumptions can fail with real-world data sets; An enhanced ability to distinguish the random variation that is inherent is a statistical model (often with an assumed hypothesis) from systematic departures going beyond such variation;
Observations and Positive Impressions (continued) A basic (though sometimes superficial) appreciation of such computer-intensive concepts as permutation tests and bootstrap distributions; A beginning appreciation of some of the (surprising) issues relating to sample size that have emerged from the insights given in Tim Hesterberg s paper; An acquaintance, though not expertise, with a statistical software environment (R/RStudio) that has become a world-wide de facto standard for managing and exploring data, providing graphics, carrying out statistical analyses, and performing statistical simulations;
Observations and Positive Impressions (continued) An experience with the kinds of skills that are essential in tackling challenging questions as a participant in a collaborative team project; A realization that most important questions do not always have unambiguous textbook or cookbook answers; ambiguities abound and statistics does involve some art as well as sound statistical science. Although the results of my own first effort to create a substantially revised introduction to statistics were surely mixed, I remain an optimist and will continue the experiment with some adjustments and further tweaks. Most importantly, I had a lot of fun as I worked through the material arm-in-arm with my students. I know that more fun is in store.
Acknowledgments Sincere thanks to: Joe Chang, who first pointed me to the OpenIntro materials on the web Dick De Veaux, whose intro textbook I still love Jay Emerson, who learned some programming from me in 1984, and who taught me more statistical computing than I could possible absorb while visiting at Yale in Fall 2014. My Middlebury colleagues, especially Bill Peterson, who is always a supportive and stimulating colleague in Prob/Stat. Tim Hesterberg, whose 2014 overview of computer intensive methods and the bootstrap surprised me and motivated a good portion of the student project work outlined above.
References ASA Teaching Guidelines (2014) Diez, Barr, Cetinkaya-Rundel. Introductory Statistics with Randomization and Simulation (OpenIntro, web- based course text, 2015) Hesterberg. What Teachers Should Know about the Bootstrap (2014) Lock^5. Statistics: Unlocking the Power of Data (2013) R and RStudio Tintle et al. Introduction to Statistical Investigations (2015 Preliminary Edition) Thank you!