
Comparison of Group Means in Inferential Statistics
Explore the concepts of inferential statistics, including comparison of group means, normal curves, population vs. sample, skewed distributions, interpreting skewness, and more. Learn about parameters, statistics, and the characteristics of various distributions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Inferential Statistics Comparison of Group Means
Normal curve Characteristic curve Bell curve Normal distribution Gaussian distribution Continuous probability distribution 0.13% 0.13% 2.14% 2.14% 13.59% 13.59% 34.13% 34.13% Standard Deviations (z scores) -3 -2 -1 0 +1 +2 +3 .01% 2.3% 15.9% 50.0% 84.1% 97.7% 99.9 % Cumulative Percentages Percentile Equivalents 1 5 10 20 40506070 80 90 95 99 30 IQ Scores 55 70 85 100 115 130 145 SAT Scores (sd 209) 400 608 817 1026 1235 1444 1600
Some Definitions A population is the largest group to which you wish to generalize. Summaries of the measures of the characteristics of individuals in a population are called parameters. A group selected from the population for study is called a sample. Summaries of the measures of the characteristics of individuals in a sample are called statistics.
Normal curves: 1. Lots of measures in the middle fewer toward the extremes (bell shaped). 2. Symmetrical 3. Mean, median, and mode are the same
Often distributions are not normal. Positively skewed distribution Distributions where the bulk of the responses are weighted in one direction are called skewed distributions. If the majority of the data are toward lower values, the distribution is negatively skewed. If the majority of the data are toward higher values, the distribution is positively skewed.
Median Skewed distributions behave badly Mode Mean The mean, median, and mode are not identical. The distribution is not symmetrical. The relationship between the standard deviation and the percent of the responses under the curve is not a constant.
Interpreting Skew Positive number positive skew (long tail right) Negative number negative skew (long tail left) 0 = no skew -.5 to .5 = small skew -1.0 to -.51; .51 to 1.0 = moderate skew Less than -1.0; greater than 1.0 = strong skew
Computing Skew Rainfall data Transform data =skew(c:c)
Choosing Bin Sizes Greater Precision Noise Reduction EZAnalyze Results Report - Histogram of totalx100 EZAnalyze Results Report - Histogram of totalx100 25 20 14 Frequency 15 12 10 Frequency 10 8 6 5 4 2 0 0 53 78 103 128 153 177 202 227 252 277 302 327 352 377 402 426 451 476 501 526 53 132 211 290 totalx100 368 447 526 605 totalx100 20 bins, width = 24.9 7 bins, width = 78.8 Skew = .55
Kurtosis The characteristic of a distribution to be sharper (or flatter) than expected is called kurtosis.
There are other common shapes of distributions. This would be called bi-modal.
There are other common shapes of distributions. This distribution just looks jumbled. There is no term for this.
Why Worry About the Shape Inferential statistics are based on the assumption of a normal distribution. Populations are never perfectly normal. Inferences about populations are based on things that look like percentile rank. In order to make inferences about a population some representation of the population has to be normally distributed.
Often population distributions are not normal. In order to figure out percentile rank in this distribution, it must be made normal.
The Highly Over-Simplified Central Limit Theorem Randomly select a sample of a given size from the population you are studying and compute the mean Do this over and over again with the same sample size Build a histogram of the resulting mean scores The histogram will be a normal curve with the same mean as the study population
Population Curve Mean Group of 30 Population Mean Instead of looking at individuals, what would happen if I looked at a group of 30?
Frequency Distribution for Groups of 30 Theoretically, a new frequency distribution will appear representing the means of groups of 30. Population Mean Computing the means of groups of 30 randomly selected from the population.
Sampling Distribution of the Mean Sampling Distribution Mean (n = 30) Sampling Distribution for random groups of 30 Population Mean Population Distribution
The Second Part of The Central Limit Theorem As the sample size of the sampling distribution of the mean gets larger the standard deviation of the sampling distribution gets smaller Standard deviation of the sampling distribution is call standard error
Sample Sizes n=1000 If the sample size was really large then it would be unlikely to get sample group means that were very far from the population mean. n=100 n=30 Population Distribution If the sample size was really small then sample group means would be distributed similarly to the population. As the sample size increases the Sampling Distribution of the Mean gets narrower.
Standard Error of the Mean n=1000 n=100 n=30 Population Distribution The standard error is the standard deviation of the population adjusted for sample size. The standard deviation of a Sampling Distribution is called the Standard Error. For any distribution the larger the sample size the smaller the numerical standard error.
The standard error of the sampling distribution is a function of the sample size. Sampling Distribution n=30 1 standard error test scale
The standard error of the sampling distribution is a function of the sample size. Sampling Distribution n=100 1 standard error test scale
Standard Error Sampling Distribution Mean 1 Standard Error 2 Standard Errors Sampling Distribution (n = 30) 50.0 34.13 13.59 The relationship of standard error to a sampling distribution is the same as standard deviation to a normal distribution.
Standard Error Sampling Distribution Mean 1 Standard Error 2 Standard Errors Sampling Distribution (n = 30) 50.0 34.13 13.59 Remember this is not a distribution of scores on the test. It is a distribution of the means of randomly selected groups of 30.
What do we have so far? Normal distributions are good because of the relationship of the sd to the area under the curve. When a sampling distribution is built from a population it is normally distributed. Standard error is the population standard deviation adjusted for group size. The relation of the standard error to a sampling distribution is the same as a standard deviation to a population distribution.
Where are we going with this? Gathering data on the characteristics of individuals in groups
The Problem One the most important uses of summaries of group characteristics is to compare groups. Should be simple If the summary number for one group is higher or lower than another group then there is a difference between the two groups. Well maybe
What if? If Mrs. Johnson and Mr. Smith taught the same curriculum using the same instructional style to two groups of students who were demographically similar, would you expect the summary of the measures of student learning to be exactly the same in the two classes? 29
What if? What if Mrs. Johnson and Mr. Smith taught differently? Since you already would not expect the measures of learning to be the same, how would you know if something other than random influences caused the differences? 30
Group Differences We solve this problem by asking a single question: Is the difference between the groups so big that it is really unlikely that the difference could have appeared randomly? If the difference is really unlikely to appear randomly then we say it did not appear randomly and the difference between the two groups means something.
Ok, lets do this in Excel Inquiry Science
(from the Excel Files) Inquiry Science EZAnalyze Results Report - Independent T-Test of group 1 and 2 on Score Class 1 2 Mean: Std. Dev: 36.185 8.544 31.833 7.435 N: 27 24 Mean Difference: 4.352 1.929 .068 .060 T-Score: Eta Squared: P: 33
These are all the same question: Is the difference between the groups so big that it is really unlikely that the difference could have appeared randomly? Could the two groups have appeared randomly within the same population? What is the probability that the group mean differences could have appeared by chance? Is there sufficient evidence to reject the null hypothesis? Is the difference between the groups statistically significant?
Why Do It? We are thinking about these things so that we can convince others of the veracity of statements we make about the individuals and the groups we are describing. Otherwise it is just unsubstantiated opinion. No one cares. A research secret this is much harder to do qualitatively than quantitatively.
Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean 1 Standard Error Sampling Distribution (n = 30) 50.0 34.13
Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean 1 Standard Error Sampling Distribution (n = 30) 50.0 34.13 15.87% (.16) Probability of higher mean At any given Standard Error it is possible to compute the likelihood that a higher or lower group mean could have occurred by chance (remember the relationship of z-scores to percentile rank).
Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean 2 Standard Errors 13.59 Sampling Distribution (n = 30) 50.0 34.13 2.28% (.02) Probability of higher mean At any given Standard Error it is possible to compute the likelihood that a higher or lower group mean could have occurred by chance (remember the relationship of z scores to percentile rank).
Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean 2 Standard Errors 13.59 Sampling Distribution (n = 30) 50.0 34.13 2.28% (.02) Probability of higher mean Remember that each group mean that got into the sampling distribution represents a group that was randomly selected.
Probability How unlikely does the occurrence of a group mean have to be before we would say that it is so unlikely that it could not have happened by chance? It must have occurred for some other reason.
Significance Probability of higher mean If the probability that a given group mean would occur by chance in a sampling distribution is very small then the occurrence of that group mean is said to be significant.
Significance Probability of higher mean To use the z-score analogy when the percentile ranking of a group mean is really high (or really low) then it is significant. It is significant because it is unlikely to occur randomly.
Significance 5% (.05) 1% (.01) Most social science research declares that group means occurring by chance less than 5% of the time are significant. Most medical research uses 1% or much less.
With a z-score we used a table to look up percentile rank. Since the normal distribution is now a sampling distribution based on a specific group size the z-table will not work. A new table for each possible group size needs to be generated to test the percentile rank of the group mean comparison. Fortunately the computer does this for you. Occurs randomly 5% of the time (.05) Occurs randomly 1% of the time (.01) Probability (p) that a given score or higher could have appeared by chance is .01 (1% or 1 in 100 times), substantially less than .05 (5 % or 5 in 100 times [1 in 20])
t-Critical The sampling distribution of the mean is a different shape for every sample size (n). This point on the distribution where a given mean becomes significant is called t-critical. In the old days we would compute a t-score (analogous to a z-score) and then look up the significance point on a table. Now you can read the p value directly to get the same information.
Inquiry Science Mary wants to know if teaching science with inquiry techniques will do better than more traditional methods. She teaches a unit in her class using inquiry science methods and then compares the chapter end test scores of her students with students in John s class who have been taught the same unit with more traditional methods. What is the probability that the group mean differences could have appeared by chance?
(from the Excel Files) Inquiry Science EZAnalyze Results Report Independent t-Test of group 1 and 2 on Score Class 1 2 Mean: Std. Dev: 36.185 8.544 31.833 7.435 N: 27 24 Mean Difference: 4.352 1.929 .068 .060 T-Score: Eta Squared: P: 48
Inquiry Science 31.83 36.19 6% (.06) Most social science research declares that group means occurring by chance less than 5% of the time are significant. Most medical research uses 1% or much less.
Hypothesis Testing Analysis based on normal distributions