Basic Statistical Tools Review for Psychometric Analysis

review of basic statistical tools n.w
1 / 63
Embed
Share

Explore the concepts of hypotheses, plausibility, general hypothesis approach, linear modeling, and OLS in this insightful review of basic statistical tools presented by Prof. Dr. Gavin T. L. Brown. Learn about the importance of validating observations, understanding scientific and statistical hypotheses, evaluating probabilities, and estimating models using Ordinary Least Squares (OLS) method.

  • Statistical Tools
  • Hypotheses
  • Linear Modeling
  • OLS
  • Psychometric Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Review of Basic Statistical Tools. HSE Psychometric School August 2019 Prof. dr. Gavin T. L. Brown University of Auckland Ume University

  2. Hypotheses Hence, we need to know if our observations are valid Scientific A suggested solution to a problem; educated, informed, intelligent guess. An empirical proposition testable by experience. Substantive. Important to the real world. I wonder if .. Statistical A statement about an unknown parameter; the mean will be 100, the correlation between variables will be zero, the variances will be equal. Often trivial, lacking generality, and easily evaluated

  3. Plausibility of hypotheses Population It rains on 300 out of 365 days per year (82% of days) Prediction: today it will rain plausible, because if the sample is like the population, then it is considerably more likely that it will rain, but you would more likely be right if the population value had been 95% What values of probability give you confidence in the plausibility of a hypothesis? 10%, 5%, 1%, 0.10%, .. Convention (arbitrary) is 5% What value is so large that the probability of it occurring by chance is so low that we can discount the null hypothesis of no effect or no difference?

  4. A general hypothesis approach The results for the sample will be the same as the population Not zero but equal to the population The results of the treatment will be the same as the control Not zero but rather evidence that the treatment is of no additional value The difference will be zero, not that the value will be zero we don t really care if the observed value is different to zero, since zero is uninteresting. No matter the type of hypothesis it will always be a probabilistic answer we could be right by chance .

  5. Linear modeling Thus, a linear model has to not just mathematically solve It must also be theoretically or conceptually explainable An observed linear relationship begs an explanation How could this association take place? Know your theories Know your empirical literature

  6. Estimating a model: OLS For any model parameter (e.g., association between variables) we can determine how far the model is from the data by subtracting the observed values from the model value and squaring them to always get a positive value Few data points on line (model) but distances are not large, so line fits data Regression or trend line This is called Ordinary Least Squares (OLS) minimises the squared value of the deviance of observed data from the model value; the smaller the discrepancy the better the model fits the data In a perfect model, all the data points sit on predicted model value, so error=0, but this is an unrealistic expectation This model has least variance (Gauss-Markov theorem)

  7. Normal Distribution Gaussian Gaussian or Gauss (aka Bell curve) Gaussor Laplace Laplace Gauss Gauss curve useful because of the central limit theorem averages of samples of observations of random variables independently drawn from independent distributions converge in distribution to the normal, that is, they become normally distributed when the number of observations is sufficiently large many other distributions are bell-shaped

  8. Other distributions: almost normal? F distributions Note some distributions change shape according to various factors In F: There are 2 degrees of freedom d1=number of groups d2=number of people altogether

  9. Kurtosis When non-normal is desirable Platykurtic or rectangular distributions of items in a discriminating test Leptokurtic distribution of categorical question: Do you love your mother? Yes-No Non-normal may reflect reality so don t remove it too quickly, even if statistics say to do so. High kurtosis up to 7.00 handled well by Maximum Likelihood Estimation High kurtosis common with categorical or restricted range ordinal variables

  10. Skew Not so dangerous; but could effect whether you use Median or Mean If people really tend to agree skew is both normal and correct Don t be in a hurry to remove this

  11. Normality of Distributions With large enough sample sizes With large enough sample sizes (>30 assumption should not cause major problems; this implies that we can use parametric procedures even when the data are not normally distributed. With 100s distributions don t matter because of the central limit theorem, (a) if the sample data are approximately normal then the sampling distribution too will be normal; (b) in large samples (>30 or 40), the sampling distribution tends to be normal, regardless of the shape of the data; and (c) means of random samples from any distribution will themselves have normal distribution. Ghasemi, A., & Zahediasl, S. (2012). Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489. 10.5812/ijem.3505 (>30 or 40), or 40), the violation of the normality

  12. Central Limit Theorem The power of N to create normal distributions and small SD around mean. So standard error (se) gets small with bigger N. Random 0s and 1s were generated, and then their means calculated for sample sizes ranging from 1 to 512. Note that as the sample size increases the tails become thinner and the distribution becomes more concentrated around the mean. By Daniel Resende - [github] (https://github.com/resendedaniel/math/tree/master/ 17-central-limit-theorem),

  13. What to do Therefore, critical values critical values for rejecting the null hypothesis need to be different according to the sample size to the sample size as follows: For small samples small samples (n < 50), if absolute z-scores for either skewness or kurtosis are >1.96, which corresponds with a alpha level 0.05, then reject the null hypothesis and conclude the distribution of the sample is non-normal. For medium medium- -sized samples sized samples (50 < n < 300), reject the null hypothesis at absolute z- value >3.29, which corresponds with a alpha level 0.05, and conclude the distribution of the sample is non-normal. For sample sizes >300 >300, depend on the histograms and the absolute values of skewness and kurtosis without considering z-values. Either an absolute skew value larger than 2 or an absolute kurtosis (proper) larger than 7 may be used as reference larger than 2 or an absolute kurtosis (proper) larger than 7 may be used as reference values for determining substantial non values for determining substantial non- -normality normality. . Kim, H.-Y. (2013). Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative Dentistry & Endodontics, 38(1), 52 54. http://doi.org/10.5395/rde.2013.38.1.52 different according absolute skew value

  14. Transformations Mosteller, F., & Tukey, J. W. (1977). Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley.

  15. Which transformation Finding a value that allows adjustment to skew & kurtosis simultaneously Box-Cox transformation Automated in normalr package in Shiny https://kcha193.shinyapps.io/normalr/ Courtney, M. G. R., & Chang, K. C. (2018). normalr: An R package and Shiny app for large-scale variable normalization. Teaching Statistics, 40(2), 51-59. doi:10.1111/test.12154

  16. Transformation value

  17. A simple model: The mean score A mean (M) is the arithmetic average of data for a variable scores/Nscores Not all cases have the same score as the M (duh?) But the MEAN is a model of the centre of a distribution and is useful Variance in the mean The distance from the mean is the variance But distances below will cancel those above so we square them so all are positive ( 2) Standard Deviation standardises or transforms the deviance into a normal distribution curve It is the square root of the mean of all deviances within a variable SD= 2/(N-1)

  18. Point estimate The value we calculate from our samples is a point estimate There is actually a range in which highly plausible values would occur if our sampling had been different (standard error) Also to be reasonably sure of the true value, we should account for the error caused by sampling Now we can create an interval (range) which depicts how confident we are that the true value has been captured Remember margin of error?

  19. Standard error formula Note: the standard error and the standard deviation of small samples tend to systematically underestimate the population standard error and deviations: the standard error of the mean is a biased estimator of the population standard error. With n = 2 the underestimate is about 25%, but for n = 6 the underestimate is only 5%. Corrections for this effect exist, but it would make more sense to have more or bigger samples. A practical result: Decreasing the uncertainty in a mean value estimate by a factor of two requires acquiring four times as many observations in the sample. Or decreasing standard error by a factor of ten requires a hundred times as many observations

  20. Confidence intervals An interval that communicates information regarding the probable magnitude of the point estimate When the sample mean is an estimator of the population mean, and the population is normally distributed, the sample mean will be normally distributed 95% confidence interval (95CI) in a NORMAL distribution is the M +/- 2SD; so we can be pretty sure ( =.05) that the range includes the mean

  21. Understanding CI Probabilistic If 95CI, then in 100 samplings, 95 should include the population value Practical When sampling is from a normally distributed population with known standard deviation, we are CI% confident that the point estimate contains the population value A higher percent CI gives a wider band, meaning there is less chance of making an error but there is more uncertainty

  22. Determine a CI CI gets smaller as N gets bigger Point estimate +/- (CI interval multiplier * standard error) For Normal Distributions: CI interval multiplier is the z- score Imagine % boys = 54, 47, 44, 50 in 4 samples M=48.75, SD=4.27, se=2.14 So although the POINT estimate is not = to population value, the 90CI includes the true value CI (1 CI (1- - ) ) z z CI CI range range .10 .90 1.645 43.72 53.78 .05 .95 1.96 41.95 55.55 .01 .99 2.575 36.27 61.23

  23. For small n se follows the t-distribution not the normal Gaussian bell curve; but if N>30 use z- distribution So the Standard Error is calculated with a t-statistic http://www.sjsu.edu/faculty/gerstman/Sta tPrimer/t-table.pdf Use this df when n=4

  24. Figure this out What interpretation do the 95%CI error bars support? Study examined difference in responses to online and paper-based surveys.

  25. Impact of Big N: Whats different? Group Group IQ Mean ( IQ Mean (SD SD) ) N N Population 100 (15) 5000 Sample 102.4 (12) 400 F(1,5398)=9.741, p=.002 Group Group IQ Mean ( IQ Mean (SD SD) ) N N Population 100 (15) 500 Sample 102.4 (12) 40 F(1,538)=0.974, p=.324 Implication for using NHST? It only indicates if CHANCE is involved Big N is almost always STAT SIG

  26. Type of Variables Dependent Variable Continuous Discrete Nominal Ordinal In Regression Analysis there are DEPENDENT and INDEPENDENT variables. Statistical models investigate how the former depend on the latter.

  27. Prediction, Causation, Association Most common models assume linear (i.e., correlations and regressions) relationships (paths) exist among constructs. a straight line relationship exists between variables and is sufficient basis for modeling how these things inter-relate Linearity requires a plausible causal mechanism . Linear relations can be diagrammed and statistically calculated provided enough data exists does it work? And then the quality of the model to the data can be estimated---it works but is it worth keeping? Does it explain much?

  28. Covariance measure of the joint variability of two or more random variables This shows how aligned variables are magnitude of the covariance is not easy to interpret because it is not normalized and depending on the magnitudes of the variables. The normalized version of the covariance, the Pearson correlation coefficient, however, shows by its magnitude the strength of the linear relation.

  29. Covariance math For each case Calculate deviance for each case for the pair of variables Multiply the deviances Sum them up for all cases Divide by number of cases minus 1 Strong covariance means items elicit similar responding Correlation is Sxy/(SDx*SDy)

  30. Correlations Synchronised patterns (A B) [2 or more things behave in a similar way] How big to be meaningful? How to qualitatively interpret values? When does a pattern really become visible? Weak: r<.40 Moderate: .40<r<.70 Strong: r>.70

  31. Correlation of Factors 2 things exist simultaneously and behave in a coordinated fashion There is no explanation; they just co- exist Perhaps because of how we collected the data? Note paths not specified = zero No causal specification or assumption

  32. correlations Continuous with continuous: Pearson correlation r Partial correlation a measure of the strength and direction of a linear relationship between two continuous variables when a covariate is controlled But maybe you should use multiple regression instead

  33. Categorical correlations rho 0.6071 (se=0.1152) rho 0.4199 (se=0.0747) Rank order with rank order: Spearman correlation (rho) also known as: Polychoric correlation binary variables: Tetrachoric correlation NB NB poly & tetrachoric: estimate correlation if variables were made on a continuous scale

  34. Why go further? Correlations are ultimately the raw material of fancier analyses so many reviewers or examiners want to see them But they don t have much explanatory power it s just everything is connected and we don t know which things matter Regression techniques at least allow one to create an argument that A causes B rather than simply describe that A is associated with B

  35. Linearity: Regression Changes in X cause a linear change (increase or decrease) in YYY Formula: Y= m*X + b b1=slope [standardised beta = a proportion of standard deviation] b0=intercept [starting point of equation; represents all the unknown stuff] Y variable b0 intercept X variable Interpretations Interpretations: 1. For every 1 SD change in X, you will get b1*SDY change in Y. 2. This relationship explains m2% of variance in Y

  36. Regression equations Y = mx + b (I was taught at HS) Y = b1x + b0x + e1x Y = 1f + u1 (regression approach) (factor analysis approach) m = b1 = 1 (different traditions; jingle-jangle)

  37. Regression concepts Predictor (continuous) causes changes in dependent (continuous) Output Amount of variance explained in the Y variable = R2 Strength of relationship coefficients: b, unstandardized; , standardised Constant = intercept: value of Y when X=0 Statistical significance of intercept and b (p<.05) Price = 8287 + 0.564(Income) Or Price =8287 + (.873*SD)Income

  38. Sample output Constant = intercept Standardised Beta has M=0, SD=1 So easy to interpret What s the missing information?

  39. Interpretive notes R2 can be inaccurate depending on sample size so use the adjusted R2 Interpretation aided by conversion to effect size: f2 = R2 /(1- R2) b coefficient is raw multiplier (for each unit increase you get b increases in the dependent variable) BUT if multiple predictors exist they might not all be on the same scale (e.g., IQ, age, motivation test, etc.) so beta coefficient puts all predictors on the same scale (proportion of SD) so their relative strengths can be evaluated

  40. Regression Multiple Regression Multiple IV predict 1 DV Handles better possible overlap among IV (eliminates problem that every univariate variable is stat sig) Sequence of Adding predictors important Simultaneous Simultaneous (all at once; focused on individual unique contribution) Hierarchical Hierarchical (analyst specified order of introduction, sometimes in blocks; should be logically or theoretically grouped) Demographics; control variables; variables of interest Step Step- -wise wise (data mining technique; machine identifies DV that predicts most to least and removes those not adding any value)

  41. IV1A Types of Linear Regression IV1 IV2A IV2 DV IV3A IV3 Simultaneous IV4B DV IV5B IV1A IV6B IV2A IV7C Hierarchical or block-wise IV5B DV IV8C IV4B Step-wise IV9C IV9C IV3A IV7C IV6B IV8C

  42. Blockwise Recommend report CI If VIF 1 ok Note. What happens to contribution of variables as more correlated predictors are added (e.g., sex)? What is relationship of B 95%CI to Beta and its t-value and significance?

  43. Impact of Predictors Proportion of variance explained by the predictors R2 Square the standardised beta weight of the predictor to get R2 If multiple predictors then the sum of squares will be R2 the R2 value is sometimes called the squared multiple correlation (SMC) If predictors are correlated then the sum of SMC may be smaller than actual individual contribution because of shared variance in the IV Remember STAT SIG is easy if N is big; what matters is explanatory power NB. The value of beta in a regression is identical to the value of r in a correlation Hierarchical Hierarchical: How much extra variance is achieved in DV by addition of additional predictors? If nothing more then additional predictors not needed.

  44. Effect Size in Regression Analysis Standardised beta weight ( ) indicates: A change of 1 unit SD in IV (SDIV) will result in *SD(IV) change in DV Effect size for R2 and SMC is same This allows us to compute an effect size: * = R2; f2 = R2/(1-R2) S M L Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

  45. But correlation value = regression value .70 .70 X X Y Y Regression Prediction (Causation) Regression Prediction (Causation) The value of Y is predicted by the value of X. X makes changes in Y. For every unit of X increase, there is a .70 increase in Y and 49% of variance in Y is explained by X. A A B B .70 .70 Correlation (Association) Correlation (Association) There is a shared variance between A and B of some 49%. But we don t know which caused which or if something else caused the association. They just go up and down together. The difference is in the theoretical understanding of the relationship between variables, not so much the math

  46. Is it proof? The regression model is causal Increase in A causes change in B But is it proof? Unless you have a causal design and a causal mechanism it is NOT proof It is indicative, suggestive and possibly the basis for further research But mathematically & statistically it looks like causal proof Be prepared for critique

  47. Disentangling association from causation Just because 2 things are related does not make one of them causal of the other Positive correlation between ice cream consumption and drowning murder boating accidents shark attacks But ice cream eating does not cause these events to take place HOWEVER, warm weather is associated with ice cream eating and the other events

  48. Spurious correlations

  49. Linear models are additive Simplest model is that change in one variable relates systematically to a change in a second variable Y=mX+b Other factors, predictors can be added to the equation Y=mX+b+e+mQ+bQ .. Linear equations can be embedded in each other We can have multiple predictors and an unknown factor (e) Yi=b0+b1X1i+b2X2i+ bnXni)+ei

  50. Anything can be linear Elements of a linear relationship can be continuous or categorical Categorical: Analysis of Variance Continuous: Regression or Correlation

More Related Content