
Exploring Statistics: Understanding Data Analysis and Distributions
Delve into the world of statistics with this comprehensive guide covering data extraction, variable types, and distribution visualization methods. Learn about categorical and quantitative variables, explanatory and response variables, and the impact of weight change on heart disease risk in women.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Chapter 1 Picturing Distributions with Graphs 6/10/2025 6/10/2025 Chapter 1 1 1
What is Statistics? Statistics is the science that extracts information from data It involves logic, scientific reasoning, and careful analysis. It is not merely mathematical computations! 6/10/2025 6/10/2025 Chapter 1 2 2
Data Table A value is where a row and column meet. rows in data table Observation an individual Variables characteristic being measured columns 6/10/2025 Chapter 1 3
Types of Variables Measurement Scale Categorical variables Named (nominal) categories E.g., SEX (male or female) Quantitative variables Numeric scales Example: WEIGHT (in kilograms) 6/10/2025 6/10/2025 Chapter 1 4 4
Types of Variables Variable Function Explanatory variable = predictor; independent variable ; X Response variable = outcome; dependent variable ; Y Does X lead to Y? Example: Does smoking increase the risk of coronary heart disease (CHD)? Explanatory variable = smoking Response variable = CHD 6/10/2025 5
Illustration from Willet et al. (1995). Weight, weight change, and coronary heart disease in women. JAMA, 273(6), 461-5. Does weight change influence the risk of coronary heart disease in women? Unit of observation individual women Women followed to see who develops coronary heart disease (observational study) 6/10/2025 6/10/2025 Chapter 1 6 6
Example: Willet et al. (1995) Explanatory variable Change in Body Mass Index (Weight / Height2) QUANTITATIVE Response variable Coronary Heart Disease (Yes or no) CATEGORICAL 6/10/2025 6/10/2025 Chapter 1 7 7
Distributions Statistical distribution how often a variable takes on specific values We will describe distributions with graphs Categorical variables pie charts, bar graphs Quantitative variables stemplots, histograms, and boxplots (boxplots introduced in Chapter 2) 6/10/2025 6/10/2025 Chapter 1 8 8
Always look at the data A few carefully drawn and interpreted graphs are often more instructive than a great pile of numbers! Always start your analysis by exploring the distributional pattern of the data 6/10/2025 Chapter 1Chapter 1 9
Example (Categorical Variable) Type of Solid Waste Material Type Food scraps Glass Metals Paper, paperboard Plastics Rubber, leather, textiles Wood Yard trimmings Other Total Weight (million tons) 25.9 12.8 18.0 86.7 24.7 15.8 12.7 27.7 7.5 231.9 Percent 11.2 % 5.5 % 7.8 % 37.4 % 10.7 % 6.8 % 5.5 % 11.9 % 3.2 % 100.0 % 6/10/2025 6/10/2025 10 10 Chapter 1
Example (Categorical Variable, Cont.) Pie charts: Use Excel or Web applet Bar charts: Use computer or graph paper. Note: bars do not touch You should already be familiar with these techniques from high school (Not emphasized in this class.) 6/10/2025 6/10/2025 Chapter 1 11 11
Example (Quantitative Variable) Body Weight (lbs.) 192 152 135 110 128 180 260 170 165 150 110 120 185 165 212 119 165 210 186 100 195 170 120 185 175 203 185 123 139 106 180 130 155 220 140 157 150 172 175 133 170 130 101 180 187 148 106 180 127 124 215 125 194 n = 53 students 6/10/2025 6/10/2025 Chapter 1 12 12
Example: Creating a Histogram First create frequency table If data set small create between 4 to 12 (or more) non- overlapping class intervals Tally (count) frequencies within each interval Calculate proportions Wt. interval 100 119 120 139 140 159 160 179 180 199 200 219 220 239 240 259 260 279 TOTAL Frequency % 7 13.5% 23.1 13.5 15.4 23.1 12 7 8 12 4 1 0 1 52 7.7 1.9 0.0 1.9 100% 6/10/2025 6/10/2025 Chapter 1 13 13
Example: Histogram Label axes Plot frequencies or proportions within each interval 14 12 Number of students Notes: 10 Use only for quantitative data 8 6 Bars touch, i.e., are contiguous 4 2 0 100 120 140 160 180 200 220 240 260 280 Weight (pounds) 6/10/2025 6/10/2025 14 14 Chapter 1
Stem-and-Leaf Plot Instead of using histograms, we will always use histogram-like plots called stemplots If text says draw histogram, do a stemplot instead Use graph paper or lined paper to plot stemplots (if you use lined paper, make sure digits line-up evenly) Stemplot split stem values |1|4 |1|789 |2|2234 |2|66789 |3|00012344 |3|5678 10 6/10/2025 Chapter 1 15
Stemplot data example Body weights in a class (lbs.), n = 53 192 152 135 110 128 180 260 170 165 150 110 120 185 165 212 119 165 210 186 100 195 170 120 185 175 203 185 123 139 106 180 130 155 220 140 157 150 172 175 133 170 130 101 180 187 148 106 180 127 124 215 125 194 6/10/2025 6/10/2025 Chapter 1 16 16
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 10 Step 1 draw stem Create a stem to serve as an axis Stem = first one or two digits of the number Current data range from 100 to 260 Start with between 4 to 16 stem value bins Use axis multiplier (e.g., x10) 6/10/2025 6/10/2025 Chapter 1 17 17
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 10 Step 2 plot leaves 192 152 135 5 Plot the next significant digit of each value against the stem 2 Note: You may have to try different stem axis multipliers to get a good looking plot (trial and error) 2 6/10/2025 6/10/2025 Chapter 1 18 18
10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 10(lbs.) Example: stemplot (cont.) Plot all data points Sort the leaves in ascending order when done Interpret (most important part) Shape Location Spread Outliers? 6/10/2025 6/10/2025 Chapter 1 19 19
Interpreting Stemplots & Histograms Shape Mound-shaped?, number of peaks (modes) Symmetrical? If asymmetrical, direction of skew Right tail Positive skew Left tail Negative skew Central location For now, identify median Location of median: L(M) = (n+1)/2 Count to depth of L(M) and read value off the plot Spread for now, identify minimum and maximum value [better methods presented next chapter] Outliers look for exceptions to the pattern 6/10/2025 6/10/2025 Chapter 1 20 20
10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 ( 10) Shape: asymmetrical with tail toward larger numbers positive skew Middle location (median) L(M) =(53+1)/2 = 27 count to a depth of 27 (underlined) Median = 165 Spread: From 100 to 260 Outliers?: 260 appears to be an outlier 6/10/2025 6/10/2025 21 21 Chapter 1
Second Example (n = 8) Data (coliform count): 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 Try axis-multiplier of 1. Therefore: Stem ones-place Leaves tenths-place Truncate next significant digit Example: 1.47 plotted to right |1|4 |2| |3| |4| 1 Never plot decimal point Never plot more than one digit per leaf 6/10/2025 22
Second Example Final Plot Data 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 |1|4 |2|03 |3|4779 |4|4 1 6/10/2025 23
Truncate, dont round! The text rounds values before it plots leaves, but we do not in deference to the great John W. Tukey 6/10/2025 Chapter 1Chapter 1 24
Third Example (n = 25) Data & simple stemplot Stemplot, split stem |1|4789 |2|223466789 |3|000123445678 10 |1|4 |1|789 |2|2234 |2|66789 |3|00012344 |3|5678 10 Too squished to view shape! Solution Split stem values First stem 1 bin for values between 10 and 14 Second stem 1 bin for values between 15 and 19 Notice negative skew 6/10/2025 25
Back-to-Back Stemplots Men 180 120 184 181 320 240 166 188 171 Women 090 120 105 145 125 138 185 138 162 men| |women ------------------- |0|9 |1| 8|1| |2| |2| |3| 100 Very useful for comparing distributions First value for men (180 lbs) and first value for women (90 lbs) plotted Can you complete the plot? 6/10/2025 6/10/2025 26 26 Chapter 1
Interpret Histograms Just Like Stemplots Example: 7th Grader Vocabulary Score Shape: symmetrical Center location Eyeball method where histogram balances Counting method: will be demonstrated in a separate class Spread: from 2 to 12 Outlier: 12(?) 6/10/2025 6/10/2025 Chapter 1 27 27