Statistical Methods for Comparing Averages of Measured Variables
Statistical methods such as t-tests and linear regression for comparing averages of measured variables, with insights on Gaussian distribution and confidence intervals. Understand the importance of making observations and minimizing errors in data analysis.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistics Review ChE 477 Winter 2020 January 23, 2020 Dr. Harding
Pertinent Quote Jacob Bernoulli (1731): For even the most stupid of men, by some instinct of nature, by himself and without any instruction (which is a remarkable thing), is convinced that the more observations have been made, the less danger there is of wandering from one s goal. From Lemons, An Introduction to Stochastic Processes in Physics, Johns Hopkins University Press, pg.13
Statistical Methods Repeated Data Points Comparing Averages of Measured Variables Linear Regression Confidence Interval Prediction Interval Sensitivity Analysis Propagation of Error (other presentation)
1. Repeated Data Points Use t-test based on measured st dev (s) measured mean s = = where t f , 1 x t n 2 n true mean In Excel, =T.INV( ,r) for one-tailed test ( =0.025 for 95% confidence interval) =T.INV.2T( ,r) for two-tailed test ( =0.05 for 95% confidence interval) r = n-1
Gaussian Distribution 68.27% of distribution lies within one 95.45% of distribution lies within two 99.73% of distribution lies within three 68.27% 95.45% 99.73% t-test is used when we do not have enough data points (<30)
2. Comparing averages of measured variables Experiments were completed on two separate days. = = = Day 1: Day 2: 9 . 40 s 3.27 n 7 x 1 x1 x1 = 2.67 = = 2 . 37 s n 9 x 2 x2 x2 When comparing means at a given confidence level (e.g. 95%), is there a difference between the means?
2. Comparing averages of measured variables New formula: x x Larger |T|: More likely different = 1 2 T Step 1 (compute T) + n 2 2 x 2 x ( ) 1 n ( ) 1 1 1 n s n s + 1 1 2 2 x x + n n 1 2 1 2 x x x x = For this example, 5 . 2 T Step 2 Compute net r r = nx1+nx2-2 In Excel, =T.INV( ,r) for one-tailed test ( =0.025 for 95% confidence interval) =T.INV.2T( ,r) for two-tailed test ( =0.05 for 95% confidence interval)
2. Comparing averages of measured variables Step 3 Compute net t from net r Step 4 Compare |T| with t At a given confidence level (e.g. 95% or =0.05), there is a difference if: ? 2,? 2-tail ? > ? T t 2.5 > 2.145 95% confident there is a difference! (but not 98% confident)
3. Linear Regression y = mx + b Fit m and b, get r2 Find confidence intervals for m and b m = 3.56 0.02, etc. Use standard error and t-statistic Excel add-on (or Igor, Matlab, Python) Find confidence intervals for line Use standard error around mean Narrow waisted curves around line Depends on n Meaning: How many ways can I draw a line through data Find prediction band for line Meaning: Where are the bounds of where the data should lie
3. Linear Regression (Confidence Interval) Confidence Interval Prediction Band
Confidence Intervals and Prediction Bands What good is the confidence interval for a line? Shows how many ways the line can fit the points Let s you state the confidence region for any predicted point r2 still helps determine how good the fit is What good is the prediction band? Shows where the data should lie Helps identify outlying data points to consider discarding
4. Sensitivity Analysis Used to determine how independent variable values will impact a particular dependent variable under a given set of assumptions i.e., how sensitive is the output by changes in one input variable while keeping all other inputs constant Useful for the following reasons: Testing the robustness of the results of a model or system in the presence of uncertainty Increased understanding of the relationships between input and output variables in a system Uncertainty reduction, through the identification of model inputs that cause significant uncertainty in the output and should therefore be the focus of attention to increase robustness (perhaps by further research)
Sensitivity Analysis (Cont.) Usefulness (Con t.): Searching for errors in the model (by encountering unexpected relationships between inputs and outputs) Model simplification fixing model inputs that have no effect on the output, or identifying and removing redundant parts of the model structure Finding regions in the space of input factors for which the model output is either maximum or minimum or meets some optimum criterion In case of calibrating models with large number of parameters, a primary sensitivity test can ease the calibration stage by focusing on the sensitive parameters. Not knowing the sensitivity of parameters can result in time being uselessly spent on non- sensitive ones To seek to identify important connections between observations, model inputs, and predictions or forecasts, leading to the development of better models
Sensitivity Analysis - Mechanics First, the base case output is defined; using the average value of all input variables for a particular base case; Then the value of the output is calculated using a new value for the one input under consideration while keeping other inputs constant Find the percentage change in the output and the percentage change in the input. The sensitivity is calculated by dividing the percentage change in output by the percentage change in input. This process is repeated till the sensitivity figure for each of the inputs is obtained. The conclusion is that the higher the sensitivity figure, the more sensitive the output is to any change in that input and vice versa
QUESTIONS? (Propagation of Error is in Other Presentation!)