
Understanding Measures of Central Tendency and Variability in Statistics
Explore the concepts of measures of central tendency and variability in statistics, including how to calculate the mean and median of a data set. Learn about box plots, visualizing proportions, distributions, and time series. Enhance your understanding of numerical measures and how to interpret them effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
HUDM4122 Probability and Statistical Inference January 28, 2015
ASSISTments Did everyone get an account for the ASSISTments system? Did anyone have difficulties setting up an account? First homework is due in 5 days
In the last class We looked at how to visualize Proportions (pie chart, bar graph) Distributions (relative frequency histogram) Time Series (line graph)
Today Numerical measures Box plots
Today Ch. 2 in Mendenhall, Beaver, & Beaver Measures of Centrality Measures of Variability Measures of Relative Standing
One of the things we often want to know is What is the middle value ?
What is the middle value? Pretty easy to eyeball from this graph 18 16 14 12 10 8 6 4 2 0 96-100 51-55 56-60 61-65 66-70 71-75 76-80 86-90 91-95 81-85 Exam Grade
What is the middle value? Not quite as easy to eyeball from this graph 18 16 14 Frequency 12 10 8 6 4 2 0 96-100 51-55 56-60 61-65 66-70 71-75 76-80 86-90 91-95 81-85 Exam Grade
Two ways to think of the middle value The mean, also called the average Add all the numbers together Divide by the number of numbers (the count ) The median Take the literal middle number
Example: Volunteer Please Can anyone here compute the mean and median of this set of numbers? 62, 19, 44, 33, 12, 18
Example: Another Volunteer Please Can anyone here compute the mean and median of this set of numbers? 2, 4, 6, 1
Example: Another Volunteer Please Can anyone here compute the mean and median of this set of numbers? 2, 4, 6, 1 Note that for median, when there are an even number of numbers You take the average of the two in the middle
Mathematical Notation Mean is written x by the authors A lot of other people write it as M
Mathematical Notation Formula for the mean: ? ? = ?=1 ?? ? This is nothing to get stressed about! Let s go through what each of the symbols means
Note M or x represents the sample mean If you have the population mean, you write it
When Will the mean and median be very different? When the data is very skew or has a huge outlier The outlier case is very important
Compare mean and median Data set A: 1, 2, 3, 4, 5, 6, 7 Data set B: 1, 2, 3, 4, 5, 6, 798
Mode Most frequent value 1, 2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10 What is the mode?
Distribution with no mode 6 5 4 Frequency 3 2 1 0 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 Exam Grade
Distribution with a mode 8 7 6 5 Frequency 4 3 2 1 0 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 Exam Grade
Hypermodal Data 20 19 18 17 16 15 14 13 12 Frequency 11 10 9 8 7 6 5 4 3 2 1 0 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 Exam Grade
Bimodal Data 20 19 18 17 16 15 14 13 12 Frequency 11 10 9 8 7 6 5 4 3 2 1 0 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 Exam Grade
Also Called Bimodal Data 10 9 8 7 6 Frequency 5 4 3 2 1 0 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 Exam Grade
Important note When people call data bimodal They don t actually mean that it has two equal modes They mean that there are two parts of the distribution with much higher values And a valley in between
Bimodal Data Often means there are actually two distinct groups being looked at together Mendenhall et al. talk about looking at the size of fish, when both male and female fish are included it s a good example Bimodality messes with many of the statistical tests we ll use in the rest of the semester There are things you can do, though
These data sets have the same mean 50 50 50 45 45 45 40 40 40 35 35 35 30 30 30 Frequency Frequency Frequency 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Number of Classes Taken Pass-Fail Number of Classes Taken Pass-Fail Number of Classes Taken Pass-Fail
But theyre very different in terms of how spread out they are 50 50 50 45 45 45 40 40 40 35 35 35 30 30 30 Frequency Frequency Frequency 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Number of Classes Taken Pass-Fail Number of Classes Taken Pass-Fail Number of Classes Taken Pass-Fail
Usually measured by standard deviation What is the standard deviation?
Usually measured by standard deviation What is the standard deviation? To get there, we need to start with some other concepts first
Deviation For any given data point i, its deviation is the difference between i and the mean Deviation = ?? ?
Compute the Deviation: Example Deviation = ?? ? 10 18 16 26 22
Compute the Deviation: Volunteer 3 Deviation = ?? ? 1 2 3 4 5 6 7
We cant just average the deviations Why not? Let s go back to the previous example and try
We could take the absolute value of each, and then average them Called the mean absolute deviation (MAD) Used in some applications; standard deviation is much more common
Another way to get everything going the same direction Square the deviations ?2 ??
Squaring the deviations Why square rather than take absolute value? Two reasons Greater penalty from being further from the mean There are consequences of taking absolute value when you get into more advanced statistics (outside the scope of this class)
Once you square all the deviations, you can add them all together Sum of squared deviations ?2 ??
And then divide by the sample size minus 1 To get the variance ?2 ?2= ?? ?2 ? 1
And then divide by the sample size minus 1 To get the variance ?2 ?2= ?? ?2 ? 1 Why n-1 and not n?
Im glad you asked The goal of computing the sample variance ?2 Is to estimate the population variance
It turns out that If you take all the possible samples from a population And you use n-1 to compute sample variance Then the average of all your sample variances Equals the population variance
The mathematical proof of this is outside the scope of this class Detailed explanations can be found in [Text] http://nebula.deanza.edu/~bloom/math10/m10divide by_nminus1.pdf [Video] https://www.khanacademy.org/math/probability/descr iptive-statistics/variance_std_deviation/v/review-and- intuition-why-we-divide-by-n-1-for-the-unbiased- sample-variance
Compute the Variance: Example ?2= ?? ?2 ? 1 10 18 16 26 22
Compute the Variance: Volunteer 4 ?2= ?? ?2 ? 1 1 2 3 4 5 6 7
Standard Deviation ?? ?2 ? 1 s = ?2= More commonly written SD The standard deviation takes the variance And puts it back to the original scale of the data
Compute the SD: Example ?? ?2 ? 1 s = ?2= 10 18 16 26 22